MATLAB Answers

How do I erase string duplicates in one column of a cell array and add the strings' corresponding numbers in the second column?

68 views (last 30 days)
Matlab User
Matlab User on 31 Jul 2020 at 22:50
Commented: dpb on 2 Aug 2020 at 16:30
I am working with a cell array. It consists of two columns. One has a name, and the second has a number that corresponds to/is associated with that name. This is a sample:
Sarah 12
Marie 3
Sam 5
However, there are many duplicate names in the first column, and I want to get rid of the duplicates, add their corresponding numbers, and send both the name and the new corresponding number to another array.
Below, I have code that, given any value of b, will output the truenumber corresponding to Name2.
However, I was wondering:
  1. How do I make this code run automatically for all values of b in the array?
  2. How do I send Name2 and send truenumber to new array?
fid = fopen('array.txt');
array = textscan(fid,'%s %s');
a = 1
b = 50
while a < 21677
Name1 = array{1}{a}
Name2 = array{1}{b}
if isequal(Name1, Name2) == 1
X = str2num(array{2}{a})
Y = str2num(array{2}{b})
truenumber = X + Y
display('This is the duplicate ^')
display('No duplicate, no change.')
a = a + 1


dpb on 1 Aug 2020 at 2:15
It would be sitll easier if you would attach a .mat file, but...can get by. Altho the sample is pretty weak in that there's only one duplicated entry.

Sign in to comment.

Answers (1)

dpb on 1 Aug 2020 at 12:01
Edited: dpb on 1 Aug 2020 at 19:31
>> c
c =
6×2 cell array
{'Sarah' } {[12]}
{'Marie' } {[ 3]}
{'Sam' } {[ 5]}
{'Rose' } {[ 7]}
{'Sam' } {[ 6]}
{'Edward'} {[ 9]}
>> [u,~,ic]=unique(c(:,1))
u =
5×1 cell array
{'Marie' }
{'Rose' }
{'Sam' }
{'Sarah' }
ic =
>> histc(ic,unique(ic))
ans =
The first answer is u, the unique names in the first column... ic returns the row of each unique name location in the original, and the last shows which group (4 -- 'Sam') has more than one member.
Use ic to pick the rows from c(:,2) to add for the associated data values by iterating over it.
(A clever one-line solution didn't come to me last night on doing it other than the loop altho there probably is one.)
I was almost there last night; I just didn't do one thing right -- encapsulate the output of cat() in a cell.
[u,~,ic]=unique(c(:,1)); % optional--use 'stable' to return in origninal order
new=[u accumarray(ic,[c{:,2}],[],@(v) {cat(1,v)})];
>> new
new =
5×2 cell array
{'Edward'} {[ 9]}
{'Marie' } {[ 3]}
{'Rose' } {[ 7]}
{'Sam' } {2×1 double}
{'Sarah' } {[ 12]}
>> new{4,:}
ans =
ans =
shows does put the right values where wanted/needed...
NB: Above returns in sorted alphabetic order of the names; the 'stable' option on unique would retain the existing order if that were to be significant.


Show 1 older comment
dpb on 1 Aug 2020 at 23:54
Well, we don't know what array.txt contains but looks like should be ok...
What does
whos c
Again, as cyclist said, would be easier if we had the input array so we could see...
But, taking a guess, textscan has a penchant to wrap everything in a cell so you probably have a cell array in a cell.
fid = fopen('array.txt');
c = textscan(fid,'%s %s','collectoutput',1);
first to dereference back to a cell array not embedded in a cell.
Whether that will work as is will depend on the form of the input file which again, we don't know.
A little easier perhaps, try
and see if joy ensues w/o the bother of a file handle, fopen/fclose, etc., ...but, again, we don't know the form of the input file...did I mention before we don't have the input file structure? :)
dpb on 2 Aug 2020 at 16:30
Ah well...that's why I made one very trivial and wrote it down on a sticky on the old Mathworks membership card they used lo! so many years ago! :)
I'm pretty sure there's a way to retrieve or reset on an original account, but I don't know offhand exactly how that would be--I can't even recall if the Answers forum uses the same as the login account or can make another just for it...

Sign in to comment.