Remove Rows that are entirely NaN
Show older comments
Hi,
I'm used to machine learning in Python, and I'm trying to get used to data cleaning to prepare a dataset for that in Matlab. I'm using the inflation dataset attached from the World Bank.
In short, I am trying to drop all rows that are completely null, because I need to be able to impute those NaNs, and eventually be able to attach my predictions to the same rows in the original dataset.
My process so far is to read the csv in as a table. drop all the text columns except for the country, reserve the orignal dataset for joining it back later, then normalize the data between 0 an 1 and then impute the nulls.
I've tried the following, but, I keep getting Incorrect number or types of inputs or outputs for function 'isnan'. error, and I'm not sure what I'm doing wrong.
%drop rows that are entirely NaN
%testing
N = table2cell(N);
N(cellfun(@(cell) any(isnan(cell(:))),N))={''};
empties = cellfun('isempty',N);
N(empties) = {NaN};
N(all(isnan(N),2),:) = [];
indices = find(N(:,2)==0);
N(indices,:) = [];
%testing
% N = table2array(N);
% out = sum(N,2);
Original code, minus the removing rows that are NaNs
%read in inflation dataset from worldbank.org
N = readtable('inflation.csv','NumHeaderLines',5);
%drop cols 2-4. All text data.
N(:,[2,3,4]) = [];
%reserve the text data for joining later.
n = N;
n(:,1) = [];
%normalize the dataset for neural network
n = normalize(n, 'range');
%impute nulls with nearest neighbor method
%n = table2array(n);
%n(n=='NaN') = nan;
n = knnimpute(n);
Accepted Answer
More Answers (2)
the cyclist
on 23 Dec 2022
Edited: the cyclist
on 23 Dec 2022
% I am reading from the file you posted here, but you can of course read your local file
T = readtable("https://www.mathworks.com/matlabcentral/answers/uploaded_files/1241637/Inflation.csv",'NumHeaderLines',5);
size(T)
R = rmmissing(T);
size(R)
3 Comments
Tiffany
on 23 Dec 2022
the cyclist
on 24 Dec 2022
Ah, sorry, I misread what you wanted to do
You can still use this command, if you use the Name-Value pair to specify the minimum number of missing elements needed to warrant row removal. For example,
T = readtable("https://www.mathworks.com/matlabcentral/answers/uploaded_files/1241637/Inflation.csv",'NumHeaderLines',5);
size(T)
R = rmmissing(T,'MinNumMissing',62);
size(R)
Image Analyst
on 23 Dec 2022
Try this:
filename = 'Inflation.csv'
data = readmatrix(filename)
rowsToDelete = all(isnan(data), 2) % Rows where all columns are nan.
data(rowsToDelete, :) = [] % Delete those rows.
3 Comments
Image Analyst
on 24 Dec 2022
Edited: Image Analyst
on 24 Dec 2022
You said in your question title "Remove Rows that are entirely NaN" so that's why I used all(). To most of use entirely means all.
If you now want "to remove any row with a null, rather than rows with all nulls" then you should use any() rather than all():
filename = 'Inflation.csv'
data = readmatrix(filename)
rowsToDelete = any(isnan(data), 2) % Rows where any columns are nan.
data(rowsToDelete, :) = [] % Delete those rows.
Tiffany
on 24 Dec 2022
Image Analyst
on 24 Dec 2022
The usual recommendation is to avoid cell arrays if at all possible, in favor of a table. In your case you can use a table. Here is my code adapted to read your data into a table and remove any rows with a nan in them:
filename = 'Inflation.csv'
t = readtable(filename); % Read into table.
data = table2array(t(:, 5:end));
rowsToDelete = any(isnan(data), 2) % Rows where any columns are nan.
t(rowsToDelete, :) = [] % Delete those rows.
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!