How to filter out useless data

11 views (last 30 days)
Angelo Catania
Angelo Catania on 23 Oct 2015
Edited: Thorsten on 28 Oct 2015
Hi everyone, I need to clean a big dataset (more than 1,5 million obs.) so to exclude all those meaningless/useless obs. Basically, each observation comes with several variables (price, delta, implied volatility ecc. ecc.) and I would need to get rid of any obs for which the implied volatility is more than 100%. Moreover, for many obs the implied volatility is just missing (i have a blank cell). So, for any value of the column "implied volatility" which is missing or >1, I want matlab to remove the corresponding observation, that is, the entire row. How could i do that in a smart and quick way? (I am a beginner in matlab) Thanks
  4 Comments
dpb
dpb on 23 Oct 2015
Start with the "Getting Started" section in Matlab documentation and spend a few minutes getting familiar with basic concepts of array and cell notation, etc. It'll be time well spent in that it'll be much quicker than waiting on answers here, particularly when you don't yet even have the vocabulary to accurately describe the problem.
On that last, what does
whos _yourvariablename_
return? That'll tell us what the data storage as is, is...
yourvariablename is, of course whatever you are using for the data, be that data, x, whatever, not a literal string.
Nick Hobbs
Nick Hobbs on 27 Oct 2015
Edited: Nick Hobbs on 27 Oct 2015
I understand you want to remove rows from your cell array based on information in your data. The following documentation link may help you with your goal.
The following link provides an example on how to remove a row from a cell array.

Sign in to comment.

Answers (2)

Image Analyst
Image Analyst on 27 Oct 2015
Check out the "ismissing()" function.
And to remove rows from your table with volatility more than 100 I think you can do this (untested)
badRows = mytable.volatility > 100;
mytable(badRows,:) = [];

Thorsten
Thorsten on 27 Oct 2015
Edited: Thorsten on 28 Oct 2015
iv = data(:,3); % implied volatility, assumed to be stored in column 3
idx = isnan(iv) | iv > 1; % logical array of indices
data(idx,:) = []; % remove all rows where idx is true

Categories

Find more on Numeric Types in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!