Weka : How to prepare test set in weka

16,319

Solution 1

Steps to prepare the test set:

  1. Create a training set in CSV format.
  2. Also create the test set in CSV format with same no. of attributes and same type.
  3. Copy the test set and paste at the end of the training set and save as new CSV file.
  4. Import the saved CSV file in step 3 using Weka>>Explorer>>Preprocess.
  5. In Filter Option Choose filters>>unsupervised>>instances>>Remove Range.
  6. Click the feed which says RemoveRange-R first-last.
  7. Specify the range you want to remove say the training data had 100 values, then select first-100 and Apply the filter.
  8. Save as Arff file and this can be used as a test set.
  9. Then Apply this set. If you still have any errors, write as a reply to this post.

Solution 2

If you don't want to go through hassles, then you can prepare your test set with exact names, data types and data range as in your training set and of course with attribute values. The class attribute will be present but the value should be a question mark (?). For instance, to convert your given training set to a test set the following change can be done`@relation whatever

    @relation whatever-TEST

    @attribute mfe numeric
    @attribute GB numeric
    @attribute GTB numeric
    @attribute Seeds numeric
    @attribute ABP numeric
    @attribute AU_Seed numeric
    @attribute GC_Seed numeric
    @attribute GU_Seed numeric
    @attribute UP numeric
    @attribute AU numeric
    @attribute GC numeric
    @attribute GU numeric
    @attribute A-U_L numeric
    @attribute G-C_L numeric
    @attribute G-U_L numeric
    @attribute (G+C) numeric
    @attribute MFEi1 numeric
    @attribute MFEi2 numeric
    @attribute MFEi3 numeric
    @attribute MFEi4 numeric
    @attribute dG numeric
    @attribute dP numeric
    @attribute dQ numeric
    @attribute dD numeric
    @attribute Outcome {Yes,No}


    @data
    -24.3,1,18,2,9,4,3,0.5,8,10,7,1,0.454545455,0.318181818,0.045454545,7,-0.157792208,-0.050206612,-1.104545455,-1.35,-1.104545455,0,0,0,?
    -24.8,2,15,2,7.5,2,3,1,7,5,8,2,0.208333333,0.333333333,0.083333333,8,-0.129166667,-0.043055556,-0.516666667,-1.653333333,-1.033333333,0,0,0,?
    -24.4,1,16,3,5.333333333,1.666666667,2.666666667,1,4,5,8,3,0.217391304,0.347826087,0.130434783,8,-0.132608696,-0.046124764,-1.060869565,-1.525,-1.060869565,0,0,0,?
    -24.2,1,18,2,9,2,2.5,1,10,5,11,2,0.227272727,0.5,0.090909091,11,-0.1,-0.05,-1.1,-1.344444444,-1.1,0,0,0,?
    -24.5,3,17,2,8.5,2,3,1,5,6,9,2,0.272727273,0.409090909,0.090909091,9,-0.123737374,-0.050619835,-0.371212121,-1.441176471,-1.113636364,-0.12244898,0,0,?

`
Share:
16,319
ramko
Author by

ramko

Research Scholar

Updated on June 14, 2022

Comments

  • ramko
    ramko almost 2 years

    I have been using SVM classifier with the following data

    @relation whatever
    
    @attribute mfe numeric
    @attribute GB numeric
    @attribute GTB numeric
    @attribute Seeds numeric
    @attribute ABP numeric
    @attribute AU_Seed numeric
    @attribute GC_Seed numeric
    @attribute GU_Seed numeric
    @attribute UP numeric
    @attribute AU numeric
    @attribute GC numeric
    @attribute GU numeric
    @attribute A-U_L numeric
    @attribute G-C_L numeric
    @attribute G-U_L numeric
    @attribute (G+C) numeric
    @attribute MFEi1 numeric
    @attribute MFEi2 numeric
    @attribute MFEi3 numeric
    @attribute MFEi4 numeric
    @attribute dG numeric
    @attribute dP numeric
    @attribute dQ numeric
    @attribute dD numeric
    @attribute Outcome {Yes,No}
    
    
    @data
    -24.3,1,18,2,9,4,3,0.5,8,10,7,1,0.454545455,0.318181818,0.045454545,7,-0.157792208,-0.050206612,-1.104545455,-1.35,-1.104545455,0,0,0,Yes
    -24.8,2,15,2,7.5,2,3,1,7,5,8,2,0.208333333,0.333333333,0.083333333,8,-0.129166667,-0.043055556,-0.516666667,-1.653333333,-1.033333333,0,0,0,No
    -24.4,1,16,3,5.333333333,1.666666667,2.666666667,1,4,5,8,3,0.217391304,0.347826087,0.130434783,8,-0.132608696,-0.046124764,-1.060869565,-1.525,-1.060869565,0,0,0,Yes
    -24.2,1,18,2,9,2,2.5,1,10,5,11,2,0.227272727,0.5,0.090909091,11,-0.1,-0.05,-1.1,-1.344444444,-1.1,0,0,0,Yes
    -24.5,3,17,2,8.5,2,3,1,5,6,9,2,0.272727273,0.409090909,0.090909091,9,-0.123737374,-0.050619835,-0.371212121,-1.441176471,-1.113636364,-0.12244898,0,0,Yes
    

    This is my training set . And in this its defined whether my data is yes class or no class. My question is my test data is from unknown source and i dont have idea to what class it belongs. so how to prepare my test set. without the outcome attribute weka is giving the "ereor: Data mismatch " . How to prepare the test set? to separate my variable as Yes and nO class using SVM.

  • Admin
    Admin over 8 years
    Nice one :) very useful :)
  • UserK
    UserK about 8 years
    Can I create a test set with a different number of attributes?
  • adev
    adev over 6 years
    Hi Rushdi, as you wrote I've substituted the class with ? but in the "Classifier Output" I've only zeros values and NaN. What should I do?
  • Rushdi Shams
    Rushdi Shams about 6 years
    @adev, if you want to evaluate, you must need to know the labels of the data points. otherwise, you will see NaNs because the model predicted the label but '?' in the data limits it from comparing the predictions with ground truth.