Unique function is not working as expected

I have a problem with the unique-function and I can't find the reason why.
I import a table vom excel and afterwards I want to use [table,~] = unique(table,'rows');
It worked until recently. The only change I remember I did, was that I adjusted the timeformat.
Now I have doubled rows in my table, even after using [table,~] = unique(table,'rows');.
table = [];
for i=1:length(files_table_data)
path_table = fullfile(files_table_data(i).name);
opts = detectImportOptions(path_table);%excel
opts.VariableNames(1) = {'Name'};
opts.VariableNames(2) = {'Date'};
opts.VariableNames(3) = {'Number'};
opts = setvartype(opts,{'Date'},'datetime');%datetime
opts = setvaropts(opts,{'Date'},'InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,{'Date'},'DatetimeFormat','yyyy-MM-dd HH:mm');
opts = setvartype(opts,{'Name'},'string');
opts = setvartype(opts,{'Number'},'double');
opts.SelectedVariableNames = {'Name','Date','Number'};
table = [table; readtable(path_table,opts)];
end
[table,~] = unique(table,'rows');

4 Comments

Doing the following will help us help you:
  • Upload the table in a MAT file, so we can work with it. (FYI, it's generally a bad idea to give a variable the same name as a MATLAB command.)
  • Tell us what you mean by "it's not working as expected". Be specific.
Thank you for your tips. I'm not sure, how to upload the table as a .mat file, because it's sensitive data. The import of the table works, but it has still repeating rows/rows with the same values/data.
An extraction from the table (output) would be:
Name Date Number
"378478409220" 2021-03-22 23:24 1
"378478409220" 2021-03-22 23:24 2
"378478409220" 2021-03-22 23:24 3
"378478409220" 2021-03-22 23:24 4
"378478409295" 2020-12-16 11:28 NaN
"378478409295" 2020-12-16 11:28 4
"378478409295" 2020-12-16 11:28 4
"378478409295" 2020-12-17 04:17 1
"378478409295" 2020-12-17 04:17 1
The last four rows are repeating. I would like to keep only one row per combination of values.
The last two rows -- as you have written them -- are obviously identical.
If I create a table using exactly what you have written, they will be identical, and the unique command will get rid of one of them. (Unless there is a bug in the unique command, which I doubt.)
This is why uploading the actual data, not just typing what it looks like, is so important to replicating the problem.
Can you create a very small segment of the data that is not sensitive to upload? Perhaps just a few rows, and only the relevant columns? Then test that segment to make sure it still exhibits the bug? Then upload everything we need to replicate that bug.
I think with the Code and the data from the excel sheet it should be visible... You might need to adjust the data_path, so the code can find the Test.xlsx.
I needed to change the original Time-Format, because I compare this data afterwards with different data from other sources, that have different time-formats.
Thank you!
function test = import_test()
files_test = dir('../data/not_anonymized/Test*');
%% read table
table = [];
for i=1:length(files_test)
path_table = fullfile('..','data','not_anonymized', files_test(i).name);
%fprintf('Starting file %d/%d: %s\n',i,length(files_test),files_test(i).name)
opts = detectImportOptions(path_table);%excel
opts.VariableNames(1) = {'Name'};
opts.VariableNames(2) = {'Date'};
opts.VariableNames(3) = {'Number'};
opts = setvartype(opts,{'Date'},'datetime');%datetime
opts = setvaropts(opts,{'Date'},'InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,{'Date'},'DatetimeFormat','yyyy-MM-dd HH:mm');
opts = setvartype(opts,{'Name'},'string');
opts = setvartype(opts,{'Number'},'double');
opts.SelectedVariableNames = {'Name','Date','Number'};
table = [table; readtable(path_table,opts)];
end
[table,~] = unique(table,'rows');
end

Sign in to comment.

 Accepted Answer

First, do not name your table variable table. That overwrites MATLAB's function for constructing tables.
When I look at your excel file, I see that times record milliseconds to three decimal places. With times, you can modify how it is displayed without losing this level of detail. You have set your display format to 'yyyy-MM-dd HH:mm' but the actual data is still saved as 'yyyy-MM-dd HH:mm:ss.SSS XXX'.
Therefore, the unique command is working just fine. The times are not identical when looking at the actual data.
If you only want to compare your times to the minute, then you must actually modify the data to only be minutes. Perhaps something like this.
% Create a time with seconds and milliseconds, but set display format to
% HH:mm
d=datetime(2021,01,02,3,4,5,123,'Format','yyyy-MM-dd HH:mm')
d = datetime
2021-01-02 03:04
% Change display format so seconds and milliseconds are shown
d.Format = 'dd-MMM-yyyy HH:mm:ss.SSS'
d = datetime
02-Jan-2021 03:04:05.123
% set second and milliseconds to 0.
d.Second = 0
d = datetime
02-Jan-2021 03:04:00.000
For a datetime variable in a table, use dot notation: tbl.Date.Second = 0

5 Comments

Thank you! The actual table is not called table, that was for the code I posted here. But it's definetly good to know! I renamed table to test.
I tried now to add test.Date.Second = 0; below my import and before the unique-call. It is still not changing the output, my rows are still doubled. I am not sure, whether I am still doing a mistake or what the problem is...
...
test = [test; readtable(path_test,opts)];
end
test.Date.Second=0;
[test,~] = unique(test,'rows');
"378478409295" 2021-02-24 04:32:00.000 +01:00 2
"378478409295" 2021-02-24 04:32:00.000 +01:00 2
This is part of the Output, even if I don't change the DatetimeFormat.
In my actual table I have 3 rows for numbers, sometimes there are NaN-Values (In the excel files this entries are empty). Can the problem be connected to this kind of entries?
Without your data, or at least an example to look at, I have to believe you are not inspecting your data carefully enough to spot the difference.
Here's a simplified version using the test data set you shared.
path_table = "Ariane_Test.xlsx";
opts = detectImportOptions(path_table);%excel
opts = setvartype(opts,'Date','datetime');%datetime
opts = setvaropts(opts,'Date','InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,'Date','DatetimeFormat','yyyy-MM-dd HH:mm.ss.SSS');
opts = setvartype(opts,'Name','string');
test = readtable(path_table,opts);
test.Date.Second = 0
test = 63×3 table
Name Date Number ______________ _______________________ ______ "338778418867" 2020-10-05 18:34.00.000 3 "338778418867" 2020-10-05 18:34.00.000 NaN "338778419063" 2020-10-05 18:34.00.000 1 "338778419063" 2020-10-05 18:34.00.000 NaN "338778419063" 2020-10-05 18:34.00.000 3 "338778419063" 2020-10-05 18:34.00.000 3 "338778419204" 2020-10-05 18:34.00.000 1 "338778419204" 2020-10-05 18:34.00.000 1 "338778419204" 2020-10-05 18:34.00.000 3 "338778419204" 2020-10-05 18:34.00.000 4 "338045763939" 2020-10-05 17:50.00.000 1 "378045659745" 2020-10-05 05:00.00.000 2 "378045659745" 2020-10-05 05:00.00.000 3 "378045659745" 2020-10-05 05:00.00.000 4 "378045659760" 2020-10-05 05:00.00.000 1 "378045659760" 2020-10-05 05:00.00.000 2
testU = unique(test,'rows');
tail(testU)
ans = 8×3 table
Name Date Number ______________ _______________________ ______ "378045659760" 2020-10-05 05:00.00.000 3 "378045659760" 2020-10-05 05:00.00.000 4 "378049751076" 2020-10-04 23:44.00.000 1 "378049751076" 2020-10-04 23:44.00.000 2 "378049751076" 2020-10-04 23:44.00.000 3 "378049751076" 2020-10-04 23:44.00.000 4 "378049751076" 2020-10-04 23:44.00.000 5 "378049751076" 2020-10-04 23:44.00.000 6
The last 6 have the same Name and Date, but Number is different.
Thank you very much!
I checked my data again, googled and found out, that unique is indeed seeing NaN as NaN~=NaN. Therefore I added:
test.Number(isnan(test.Number),:)=inf(size(test.Number(isnan(test.Number))));
[test,~] = unique(test,'rows');
test.Number(isinf(test.Number),:)=NaN(size(test.Number(isinf(test.Number))));
to my code and now it works! :)
Thank you very much for your help!

Sign in to comment.

More Answers (0)

Products

Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!