How do I exclude certain columns from rmmissing rmoutliers?

HI, I have a university assignment and I want to remove missing data and outliers, but there are some columns that I dont want to be affected by this. Is there any way to do that?

 Accepted Answer

I would treat those as two separate operations.
First, remove the missing data on the entire matrix, not only selectyed columns, The reason for this is to keep the matrix column lengths the same, and so all the rows with non-missing data remain the same.
Removing the outliers is similar.
I would instead use fillmissing for the missing data, and then selectively use filloutliers for the columns you want to process with it. This keeps the matrix structure intact.
To select the columns, just choose the ones you want to process —
A = randn(10, 7);
A(randi(numel(A),1,10)) = 10*randn(1,10)
A = 10×7
0.4947 0.4777 -0.7853 1.4267 0.2754 0.8703 0.1128 -1.1803 -0.4330 -6.5968 6.1281 -1.2801 -0.0351 -0.0035 -0.8221 -0.1481 0.3624 -0.2828 -0.8384 0.4235 0.8282 2.1646 -1.4000 0.8089 0.6031 3.2114 0.3854 -0.9425 0.5259 0.4188 0.7603 0.6749 0.1443 -0.1284 1.4667 -0.9490 -0.9427 0.0424 0.6979 30.1904 -0.0591 24.3082 -0.1488 -0.2924 0.0388 -3.3811 1.9964 -0.1855 1.0776 -0.5285 0.5173 0.1483 1.0972 -0.9505 -0.7633 1.3687 0.5081 -0.2370 10.6356 -0.5512 0.0538 0.5488 -0.7221 13.2377 -0.8052 0.2765 -1.1534 -1.3314 -6.0683 -0.4741
Afilloutliers = filloutliers(A(:,[2 5 7]), 'linear','grubbs')
Afilloutliers = 10×3
0.4777 0.2754 0.1128 -0.4330 -1.2801 -0.0035 -0.1481 -0.8384 0.8282 -1.4000 3.2114 -0.9425 0.4188 0.1443 1.4667 -0.9427 1.0704 1.2722 -0.2924 1.9964 1.0776 0.5173 -0.9505 1.3687 -0.2370 0.0538 -0.7221 -0.8052 -1.3314 -0.4741
Aedited = A;
Aedited(:,[2 5 7]) = Afilloutliers;
figure
plot(1:10, A)
grid
ylim([-50 50])
figure
plot(1:10, Aedited)
grid
ylim([-50 50])
EDIT — (18 May 2023 at 19:00)
Added example.
.

4 Comments

The problem is that i have an hour and minute column, both of which have zeroes in them, which does'nt mean the data is missing, i want to remove the rows where anything but the hours and minute column is an outlier or missing.
I do not have your matrix, so select only the columns other than the hour and minute columns if that is what you want to do.
Also, zero values are not ‘missing’ from the MATLAB perspective. Data that are missing are either NaN or NaT (Not a Time, for datetime arrays).
So if there are no NaN or NaT values in your data, then you only need to fill the outliers. I am not sure how hour changes (between days) and minute changes (between hours) would be handled by fillloutliers, so you may want to selectively exclude those columns. My code example will work for that.
It would likely be easier to convert the hour and minute (and other associated date or time fields if they exist) to datetime arrays anyway. That will likely make much of the rest of what you want to do easier.
Okay thank you! You've been super helpful.

Sign in to comment.

More Answers (0)

Products

Release

R2023a

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!