Delete column and the row if there is a duplicate value in the other column

2 views (last 30 days)
I have the below data (around 50000 rows):
Symbol, Data, Time, Price, Volume
CLF7,01/03/1996,12:54:04,17.58,0
CLF7,01/03/1996,13:31:19,17.57,0
CLF7,01/03/1996,15:07:42,17.57,0
CLF7,01/03/1996,15:07:45,17.54,0
CLF7,01/03/1996,15:10:03,17.12,0
CLF7,01/03/1996,12:54:04,17.12,0
CLF7,01/03/1996,13:27:39,17.54,0
CLF7,01/03/1996,14:35:40,17.57,0
CLF7,02/03/1996,12:54:04,17.57,0
CLF7,02/03/1996,13:31:19,17.57,0
CLF7,02/03/1996,15:07:42,17.54,0
CLF7,02/03/1996,15:07:45,17.54,0
CLF7,02/03/1996,15:10:03,17.12,0
CLF7,02/03/1996,12:54:04,17.12,0
CLF7,02/03/1996,13:27:39,17.54,0
CLF7,02/03/1996,14:35:40,17.57,0
Could you give some advice for my require: Delete the column 5. I already tried the below code, but it is not efficient
for
data(:,5)=[];
end
In the column Price, if the line of succession has duplicated values, I want to keep the last row. It means I need the data as follows:
Symbol, Data, Time, Price, Volume
CLF7,01/03/1996,12:54:04,17.58
CLF7,01/03/1996,15:07:42,17.57
CLF7,01/03/1996,15:07:45,17.54
CLF7,01/03/1996,12:54:04,17.12
CLF7,01/03/1996,13:27:39,17.54
CLF7,02/03/1996,13:31:19,17.57
CLF7,02/03/1996,15:07:45,17.54
CLF7,02/03/1996,12:54:04,17.12
CLF7,02/03/1996,13:27:39,17.54
CLF7,02/03/1996,14:35:40,17.57
Thanks for your help.

Answers (2)

the cyclist
the cyclist on 5 Jun 2016
Here's one way:
% Some pretend data
R = round(rand(6,5));
% Remove 5th column -- There is no more efficient way, and this should be very fast
R(:,5) = [];
% Identify repeated values
repeatValueRowToKeep = [find(diff(R(:,4))~=0); size(R,1)];
% Keep only that last row
R = R(repeatValueRowToKeep,:);
  5 Comments
Image Analyst
Image Analyst on 6 Jun 2016
The comments on that are all messed up. You botched the adaption of the code. You're confusing volume and data. It should look like
% Try to remove the field called Volume, if there is one.
% First see if there is a field of "data" called "Volume".
hasField = isfield(data, 'Volume'); % Will be True or False.
% Now remove it if it's there.
if hasField
% The Volume field is there. Remove it.
data = rmfield(data, 'Volume');
else
% The Volume field is not there -- warn the user.
warningMessage = sprintf('Warning: the structure "data"\ndoes not have a field called "Volume."');
uiwait(warndlg(warningMessage));
end

Sign in to comment.


Image Analyst
Image Analyst on 5 Jun 2016
You say "Delete column and the row if there is a duplicate value in the other column" but I don't see in your example how any column was removed. There are still 5 columns.
Is your data a cell array or a table? Assuming it's a table, I think you can delete rows where columns match like this example
col2 = t{:, 2} % Extract column 2.
col3 = t{:, 3} % Extract column 3.
% Find where col 2 = col 3
duplicateRows = col2 == col3;
% Remove those rows where column 2 = column 3
t(duplicateRows, :) = []; % Setting whole row to null removes the row.
(Note, this is untested, just off the top of my head).
  1 Comment
Trung Hieu Le
Trung Hieu Le on 8 Jun 2016
Sorry. I just edit the output data. It shall be deleted the column 5. My data is an array. I just take care about the column 4. I expect to find the continuously duplicated price and keep the last value. I tried with below code, but it does not work:
dim = size(data);
data_3 = zeros(dim);
i = 1; j = 1;
data_3(i,:) = data(i,:);
for i=2:dim(1)
if data(i,4) != data(i-1, 4)
j = j+1;
data_3(j,:) = data(i,:);
end
end
data_4 = data_3([1:j], :);
Do you have any idea about my code?
Thanks

Sign in to comment.

Categories

Find more on Price and Analyze Financial Instruments in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!