Nested cross-validation: How does cross_validate handle GridSearchCV as its input estimator?2019 Community Moderator Electionscikit-learn GridSearchCV with multiple repetitionsGridSearchCV final modelR caret / How does cross-validation for train within rfe workHow to generate a custom cross-validation generator in scikit-learn?Using sklearn cross_val_score and kfolds to fit and help predict modelUsing Scikit-Learn GridSearchCV for cross validation with PredefinedSplit - Suspiciously good cross validation resultsHow to access hyperparameters in case of nested cross-validation using scikit-learnWhy did Run best_estimator_ from GridSearch using cross-validation produce different accuracy score?How to give GridSearchCV a list of indicies for cross-validation?Nest cross validation for predictions using groupsFitting in nested cross-validation with cross_val_score with pipeline and GridSearchGridsearchCV and Kfold Cross validation

Who is our nearest planetary neighbor, on average?

Making a sword in the stone, in a medieval world without magic

Why must traveling waves have the same amplitude to form a standing wave?

I need to drive a 7/16" nut but am unsure how to use the socket I bought for my screwdriver

Do I need life insurance if I can cover my own funeral costs?

Why are there 40 737 Max planes in flight when they have been grounded as not airworthy?

Ban on all campaign finance?

When do we add an hyphen (-) to a complex adjective word?

Calculus II Professor will not accept my correct integral evaluation that uses a different method, should I bring this up further?

How is the Swiss post e-voting system supposed to work, and how was it wrong?

Rules about breaking the rules. How do I do it well?

Does this AnyDice function accurately calculate the number of ogres you make unconcious with three 4th-level castings of Sleep?

Provisioning profile doesn't include the application-identifier and keychain-access-groups entitlements

How to generate globally unique ids for different tables of the same database?

2D counterpart of std::array in C++17

How to write cleanly even if my character uses expletive language?

Where is the 1/8 CR apprentice in Volo's Guide to Monsters?

Instead of Universal Basic Income, why not Universal Basic NEEDS?

My adviser wants to be the first author

Latest web browser compatible with Windows 98

Converting Functions to Arrow functions

Know when to turn notes upside-down(eighth notes, sixteen notes, etc.)

Employee lack of ownership

Be in awe of my brilliance!

Nested cross-validation: How does cross_validate handle GridSearchCV as its input estimator?

2019 Community Moderator Electionscikit-learn GridSearchCV with multiple repetitionsGridSearchCV final modelR caret / How does cross-validation for train within rfe workHow to generate a custom cross-validation generator in scikit-learn?Using sklearn cross_val_score and kfolds to fit and help predict modelUsing Scikit-Learn GridSearchCV for cross validation with PredefinedSplit - Suspiciously good cross validation resultsHow to access hyperparameters in case of nested cross-validation using scikit-learnWhy did Run best_estimator_ from GridSearch using cross-validation produce different accuracy score?How to give GridSearchCV a list of indicies for cross-validation?Nest cross validation for predictions using groupsFitting in nested cross-validation with cross_val_score with pipeline and GridSearchGridsearchCV and Kfold Cross validation

The following code combines cross_validate with GridSearchCV to perform a nested cross-validation for an SVC on the iris dataset.

(Modified example of the following documentation page:
https://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html#sphx-glr-auto-examples-model-selection-plot-nested-cross-validation-iris-py.)

from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, cross_validate, KFold
import numpy as np
np.set_printoptions(precision=2)

# Load the dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target

# Set up possible values of parameters to optimize over
p_grid = "C": [1, 10],
 "gamma": [.01, .1]

# We will use a Support Vector Classifier with "rbf" kernel
svm = SVC(kernel="rbf")

# Choose techniques for the inner and outer loop of nested cross-validation
inner_cv = KFold(n_splits=5, shuffle=True, random_state=1)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=1)

# Perform nested cross-validation
clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv, iid=False)
clf.fit(X_iris, y_iris)
best_estimator = clf.best_estimator_

cv_dic = cross_validate(clf, X_iris, y_iris, cv=outer_cv, scoring=['accuracy'], return_estimator=False, return_train_score=True)
mean_val_score = cv_dic['test_accuracy'].mean()

print('nested_train_scores: ', cv_dic['train_accuracy'])
print('nested_val_scores: ', cv_dic['test_accuracy'])
print('mean score: 0:.2f'.format(mean_val_score))

cross_validate splits the data set in each fold into a training and a test set. In each fold, the input estimator is then trained based on the training set associated with the fold. The inputted estimator here is clf, a parameterized GridSearchCV estimator, i.e. an estimator that cross-validates itself again.

I have three questions about the whole thing:

If clf is used as the estimator for cross_validate, does it (in the course of the GridSearchCV cross validation) split the above mentioned training set into a subtraining set and a validation set in order to determine the best hyper parameter combination?

Out of all models tested via GridSearchCV, does cross_validate validate only the model stored in the best_estimator_ attribute?

Does cross_validate train a model at all (if so, why?) or is the model stored in best_estimator_ validated directly via the test set?

To make it clearer how the questions are meant, here is an illustration of how I imagine the double cross validation at the moment.

enter image description here

edited Mar 8 at 18:36

asked Mar 6 at 18:48

zwithouta

957

add a comment |

The following code combines cross_validate with GridSearchCV to perform a nested cross-validation for an SVC on the iris dataset.

from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, cross_validate, KFold
import numpy as np
np.set_printoptions(precision=2)

# Load the dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target

# Set up possible values of parameters to optimize over
p_grid = "C": [1, 10],
 "gamma": [.01, .1]

# We will use a Support Vector Classifier with "rbf" kernel
svm = SVC(kernel="rbf")

# Choose techniques for the inner and outer loop of nested cross-validation
inner_cv = KFold(n_splits=5, shuffle=True, random_state=1)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=1)

# Perform nested cross-validation
clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv, iid=False)
clf.fit(X_iris, y_iris)
best_estimator = clf.best_estimator_

cv_dic = cross_validate(clf, X_iris, y_iris, cv=outer_cv, scoring=['accuracy'], return_estimator=False, return_train_score=True)
mean_val_score = cv_dic['test_accuracy'].mean()

print('nested_train_scores: ', cv_dic['train_accuracy'])
print('nested_val_scores: ', cv_dic['test_accuracy'])
print('mean score: 0:.2f'.format(mean_val_score))

I have three questions about the whole thing:

If clf is used as the estimator for cross_validate, does it (in the course of the GridSearchCV cross validation) split the above mentioned training set into a subtraining set and a validation set in order to determine the best hyper parameter combination?

Out of all models tested via GridSearchCV, does cross_validate validate only the model stored in the best_estimator_ attribute?

Does cross_validate train a model at all (if so, why?) or is the model stored in best_estimator_ validated directly via the test set?

To make it clearer how the questions are meant, here is an illustration of how I imagine the double cross validation at the moment.

enter image description here

edited Mar 8 at 18:36

asked Mar 6 at 18:48

zwithouta

957

add a comment |

The following code combines cross_validate with GridSearchCV to perform a nested cross-validation for an SVC on the iris dataset.

from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, cross_validate, KFold
import numpy as np
np.set_printoptions(precision=2)

# Load the dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target

# Set up possible values of parameters to optimize over
p_grid = "C": [1, 10],
 "gamma": [.01, .1]

# We will use a Support Vector Classifier with "rbf" kernel
svm = SVC(kernel="rbf")

# Choose techniques for the inner and outer loop of nested cross-validation
inner_cv = KFold(n_splits=5, shuffle=True, random_state=1)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=1)

# Perform nested cross-validation
clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv, iid=False)
clf.fit(X_iris, y_iris)
best_estimator = clf.best_estimator_

cv_dic = cross_validate(clf, X_iris, y_iris, cv=outer_cv, scoring=['accuracy'], return_estimator=False, return_train_score=True)
mean_val_score = cv_dic['test_accuracy'].mean()

print('nested_train_scores: ', cv_dic['train_accuracy'])
print('nested_val_scores: ', cv_dic['test_accuracy'])
print('mean score: 0:.2f'.format(mean_val_score))

I have three questions about the whole thing:

If clf is used as the estimator for cross_validate, does it (in the course of the GridSearchCV cross validation) split the above mentioned training set into a subtraining set and a validation set in order to determine the best hyper parameter combination?

Out of all models tested via GridSearchCV, does cross_validate validate only the model stored in the best_estimator_ attribute?

Does cross_validate train a model at all (if so, why?) or is the model stored in best_estimator_ validated directly via the test set?

To make it clearer how the questions are meant, here is an illustration of how I imagine the double cross validation at the moment.

enter image description here

edited Mar 8 at 18:36

asked Mar 6 at 18:48

zwithouta

957

The following code combines cross_validate with GridSearchCV to perform a nested cross-validation for an SVC on the iris dataset.

from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, cross_validate, KFold
import numpy as np
np.set_printoptions(precision=2)

# Load the dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target

# Set up possible values of parameters to optimize over
p_grid = "C": [1, 10],
 "gamma": [.01, .1]

# We will use a Support Vector Classifier with "rbf" kernel
svm = SVC(kernel="rbf")

# Choose techniques for the inner and outer loop of nested cross-validation
inner_cv = KFold(n_splits=5, shuffle=True, random_state=1)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=1)

# Perform nested cross-validation
clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv, iid=False)
clf.fit(X_iris, y_iris)
best_estimator = clf.best_estimator_

cv_dic = cross_validate(clf, X_iris, y_iris, cv=outer_cv, scoring=['accuracy'], return_estimator=False, return_train_score=True)
mean_val_score = cv_dic['test_accuracy'].mean()

print('nested_train_scores: ', cv_dic['train_accuracy'])
print('nested_val_scores: ', cv_dic['test_accuracy'])
print('mean score: 0:.2f'.format(mean_val_score))

I have three questions about the whole thing:

If clf is used as the estimator for cross_validate, does it (in the course of the GridSearchCV cross validation) split the above mentioned training set into a subtraining set and a validation set in order to determine the best hyper parameter combination?

Out of all models tested via GridSearchCV, does cross_validate validate only the model stored in the best_estimator_ attribute?

Does cross_validate train a model at all (if so, why?) or is the model stored in best_estimator_ validated directly via the test set?

To make it clearer how the questions are meant, here is an illustration of how I imagine the double cross validation at the moment.

enter image description here

python python-3.x scikit-learn nested cross-validation

edited Mar 8 at 18:36

asked Mar 6 at 18:48

zwithouta

957

edited Mar 8 at 18:36

asked Mar 6 at 18:48

zwithouta

957

edited Mar 8 at 18:36

asked Mar 6 at 18:48

zwithouta

957

asked Mar 6 at 18:48

zwithouta

957

asked Mar 6 at 18:48

zwithouta

957

add a comment |

1 Answer
1

active

oldest

votes

If clf is used as the estimator for cross_validate, does it split the above mentioned training set into a subtraining set and a validation set in order to determine the best hyper parameter combination?

Yes as you can see here at Line 230 the training set is again split into a subtraining and validation set (Specifically at line 240).

Update Yes, when you will pass the GridSearchCV classifier into cross-validate it will again split the training set into a test and train set. Here is a link describing this in more detail. Your diagram and assumption is correct.

Out of all models tested via GridSearchCV, does cross_validate train & validate only the model stored in the variable best_estimator?

Yes, as you can see from the answers here and here, the GridSearchCV returns the best_estimator in your case(since refit parameter is True by default in your case.) ~~However, this best estimator will has to be trained again~~

Does cross_validate train a model at all (if so, why?) or is the model stored in best_estimator_ validated directly via the test set?

As per your third and final question, Yes, it trains an estimator and returns it if return_estimator is set to True. See this line. Which makes sense, since how else is it supposed to return the scores without training an estimator in the first place ?

Update
The reason the model is trained again is because the default use case for cross-validate does not assume that you give in the best classfier with the optimum parameters. In this case specifically, you are sending in a classifier from the GridSearchCV but if you send any untrained classifier it is supposed to be trained. What I mean to say here is that, yes, in your case it shouldn't train it again since you are already doing cross-validation using GridSearchCV and using the best estimator. However, there is no way for cross-validate to know this, hence, it assumes that you are sending in an un-optimized or rather untrained estimator, thus it has to train it again and return the scores for the same.

edited Mar 7 at 22:26

answered Mar 6 at 19:17

Mohammed Kashif

4,5631726

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55030190%2fnested-cross-validation-how-does-cross-validate-handle-gridsearchcv-as-its-inpu%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

If clf is used as the estimator for cross_validate, does it split the above mentioned training set into a subtraining set and a validation set in order to determine the best hyper parameter combination?

Yes as you can see here at Line 230 the training set is again split into a subtraining and validation set (Specifically at line 240).

Out of all models tested via GridSearchCV, does cross_validate train & validate only the model stored in the variable best_estimator?

Does cross_validate train a model at all (if so, why?) or is the model stored in best_estimator_ validated directly via the test set?

edited Mar 7 at 22:26

answered Mar 6 at 19:17

Mohammed Kashif

4,5631726

add a comment |

If clf is used as the estimator for cross_validate, does it split the above mentioned training set into a subtraining set and a validation set in order to determine the best hyper parameter combination?

Yes as you can see here at Line 230 the training set is again split into a subtraining and validation set (Specifically at line 240).

Out of all models tested via GridSearchCV, does cross_validate train & validate only the model stored in the variable best_estimator?

Does cross_validate train a model at all (if so, why?) or is the model stored in best_estimator_ validated directly via the test set?

edited Mar 7 at 22:26

answered Mar 6 at 19:17

Mohammed Kashif

4,5631726

add a comment |

If clf is used as the estimator for cross_validate, does it split the above mentioned training set into a subtraining set and a validation set in order to determine the best hyper parameter combination?

Yes as you can see here at Line 230 the training set is again split into a subtraining and validation set (Specifically at line 240).

Out of all models tested via GridSearchCV, does cross_validate train & validate only the model stored in the variable best_estimator?

Does cross_validate train a model at all (if so, why?) or is the model stored in best_estimator_ validated directly via the test set?

edited Mar 7 at 22:26

answered Mar 6 at 19:17

Mohammed Kashif

4,5631726

If clf is used as the estimator for cross_validate, does it split the above mentioned training set into a subtraining set and a validation set in order to determine the best hyper parameter combination?

Yes as you can see here at Line 230 the training set is again split into a subtraining and validation set (Specifically at line 240).

Out of all models tested via GridSearchCV, does cross_validate train & validate only the model stored in the variable best_estimator?

Does cross_validate train a model at all (if so, why?) or is the model stored in best_estimator_ validated directly via the test set?

edited Mar 7 at 22:26

answered Mar 6 at 19:17

Mohammed Kashif

4,5631726

edited Mar 7 at 22:26

answered Mar 6 at 19:17

Mohammed Kashif

4,5631726

answered Mar 6 at 19:17

Mohammed Kashif

4,5631726

answered Mar 6 at 19:17

Mohammed Kashif

4,5631726

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Лубенський полк

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Лубенський полк

1 Answer
1

1 Answer
1

1 Answer
1