split training data and testing data

748 views (last 30 days)

Show older comments

abdulaziz marie on 18 Jan 2018

1
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/377839-split-training-data-and-testing-data

Commented: Abhijit Bhattacharjee on 4 Mar 2023

Accepted Answer: Akira Agata

Hello i have a 54000 x 10 matrix i want to split it 70% training and 30% testing whats the easiest way to do that ?

1 Comment
Show -1 older commentsHide -1 older comments

Delvan Mjomba on 6 Jun 2019

Use the Randperm command to ensure random splitting. Its very easy.

for example:

if you have 150 items to split for training and testing proceed as below:

Indices=randperm(150);

Trainingset=<data file name>(indices(1:105),:);

Testingset=<data file name>(indices(106:end),:);

Accepted Answer

Akira Agata on 18 Jan 2018

25
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/377839-split-training-data-and-testing-data#answer_300781

Edited: the cyclist on 16 Aug 2022

Open in MATLAB Online

I would recommend using cvpartition, like:

% Sample data (54000 x 10)
data = rand(54000,10);
% Cross varidation (train: 70%, test: 30%)
cv = cvpartition(size(data,1),'HoldOut',0.3);
idx = cv.test;
% Separate to training and test data
dataTrain = data(~idx,:);
dataTest  = data(idx,:);

11 Comments
Show 9 older commentsHide 9 older comments

Rishikesh Shetty on 9 Jan 2023

Hi Akira,

Thank you for this straight forward approach.

After following these steps, I was able to predict my model accuracy as expected.

My next question is - how do I split my data for all possible combinations?

For example, I have a 13*2 array that will split into 70/30 as 9*2 (training) and 4*2 (testing). I would like to repeat this split for all possible combinations(13C9) and then obtain an average of the model prediction accuracy.

Any advise is deeply appreciated.

Abhijit Bhattacharjee on 4 Mar 2023

Rishikesh,

The CVPARTITION function randomizes the selection of the training and test datasets, so to get a new random combination just run it again. I am not sure it is advisable to try all combinatorial possibilities, as it is questionable whether that will return a much better model than you could get with considerably less effort. Just retrain with a new random partitioning a few times (say 10 times). This would be 10-fold cross-validation (or also called k-fold cross-validation for the case of k different random partitions).

Best,

Abhijit

More Answers (4)

Gilbert Temgoua on 19 Apr 2022

4
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/377839-split-training-data-and-testing-data#answer_946240

Edited: Gilbert Temgoua on 20 Apr 2022

Open in MATLAB Online

I find dividerand very straightforward, see below:

    % randomly select indexes to split data into 70% 
    % training set, 0% validation set and 30% test set.
    [train_idx, ~, test_idx] = dividerand(54000, 0.7, 0,
0.3);
    % slice training data with train indexes 
    %(take training indexes in all 10 features)
    x_train = x(train_idx, :);
    % select test data
    x_test = x(test_idx, :);

1 Comment
Show -1 older commentsHide -1 older comments

uma on 28 Apr 2022

how to split the data into trainx trainy testx testy format but both trainx trainy should have first dimension same also for testx testy should have first dimension same.Example i have a dataset 1000*9 . trainx should contain 1000*9, trainy should contain 1000*1, testx should contain 473*9 and texty should contain473*1.

Vrushal Shah on 14 Mar 2019

3
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/377839-split-training-data-and-testing-data#answer_365466

If we want to Split the data set in Training and Testing Phase what is the best option to do that ?

0 Comments
Show -2 older commentsHide -2 older comments

Jere Thayo on 28 Oct 2022

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/377839-split-training-data-and-testing-data#answer_1086138

what if both training and testing are already in files, i.e X_train.mat, y_train.mat, x_test.mat and y_test.mat

0 Comments
Show -2 older commentsHide -2 older comments

Syed Iftikhar on 1 Jan 2023

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/377839-split-training-data-and-testing-data#answer_1139367

I have input variable name 's' in which i have data only in columns. The size is 1000000. I want to split that for 20% test. So i can save that data in some other variable. because i will gonna use that test data in some python script. Any Idea how to do this?

0 Comments
Show -2 older commentsHide -2 older comments

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!