10 fold cross validation

20,016

Solution 1

In short: Training is the process of providing feedback to the algorithm in order to adjust the predictive power of the classifier(s) it produces.

Testing is the process of determining the realistic accuracy of the classifier(s) which were produced by the algorithm. During testing, the classifier(s) are given never-before-seen instances of data to do a final confirmation that the classifier's accuracy is not drastically different from that during training.

However, you're missing a key step in the middle: the validation (which is what you're referring to in the 10-fold/k-fold cross validation).

Validation is (usually) performed after each training step and it is performed in order to help determine if the classifier is being overfitted. The validation step does not provide any feedback to the algorithm in order to adjust the classifier, but it helps determine if overfitting is occurring and it signals when the training should be terminated.

Think about the process in the following manner:

1. Train on the training data set.
2. Validate on the validation data set.
if(change in validation accuracy > 0)
   3. repeat step 1 and 2
else
   3. stop training
4. Test on the testing data set.

Solution 2

In k-fold method, you have to divide the data into k segments, k-1 of them are used for training, while one is left out and used for testing. It is done k times, first time, the first segment is used for testing, and remaining are used for training, then the second segment is used for testing, and remaining are used for training, and so on. It is clear from your example of 10 fold, so it should be simple, read again.

Now about what training is and what testing is:

Training in classification is the part where a classification model is created, using some algorithm, popular algorithms for creating training models are ID3, C4.5 etc.

Testing means to evaluate the classification model by running the model over the test data, and then creating a confusion matrix and then calculating the accuracy and error rate of the model.

In K-fold method, k models are created (as clear from the description above) and the most accurate model for classification is the selected.

20,016

Author by

Nickool

Web Developer specializing in PHP, JavaScript, Ajax, JQuery and CSS3 with 5 years of experience creating and maintaining professional web applications. Experience developing technical specifications, documenting code and procedures. Effectively develop well structured, easily maintainable applications, web components, pages, style sheets. https://www.linkedin.com/in/neginnickparsa

Updated on July 09, 2022

Comments

Nickool almost 2 years

In k fold we have this: you divide the data into k subsets of (approximately) equal size. You train the net k times, each time leaving out one of the subsets from training, but using only the omitted subset to compute whatever error criterion interests you. If k equals the sample size, this is called "leave-one-out" cross-validation. "Leave-v-out" is a more elaborate and expensive version of cross-validation that involves leaving out all possible subsets of v cases.

what the Term training and testing mean?I can't understand.

would you please tell me some references where I can learn this algorithm with an example?

Train classifier on folds: 2 3 4 5 6 7 8 9 10; Test against fold: 1
Train classifier on folds: 1 3 4 5 6 7 8 9 10; Test against fold: 2
Train classifier on folds: 1 2 4 5 6 7 8 9 10; Test against fold: 3
Train classifier on folds: 1 2 3 5 6 7 8 9 10; Test against fold: 4
Train classifier on folds: 1 2 3 4 6 7 8 9 10; Test against fold: 5
Train classifier on folds: 1 2 3 4 5 7 8 9 10; Test against fold: 6
Train classifier on folds: 1 2 3 4 5 6 8 9 10; Test against fold: 7
Train classifier on folds: 1 2 3 4 5 6 7 9 10; Test against fold: 8
Train classifier on folds: 1 2 3 4 5 6 7 8 10; Test against fold: 9
Train classifier on folds: 1 2 3 4 5 6 7 8 9;  Test against fold: 10

Nickool over 12 years

Thank you SpeedBirdNine the both were perfect I choose the sooner one
Dr. Jekyll over 7 years

"the most accurate model for classification is the selected". I disagree here. The purpose of the k-fold method is to test the performance of the model without the bias of dataset partition by computing the mean performance (accuracy or else) on all k partitions. If you select the best partition, you completely bias the results on your advantage and if you are writing a scientific paper (for example ...), your peers should not accept the paper for this reason.