problems with intersect and cellfun functions

I'm using the script below to try to find all the matches in a column of 2 excel files. The intersect function returns indices which are out of bounds for the first cell (patternsize is 41351 x 1, dataP is 2000 x 1). The first cell of "ic" is 41352, and for "id" it is 6001. The first cell of D is then blank. When I try to pass all of this on in the next line, everything blows up and I get the error 'Index exceeds matrix dimensions'. The script seemed to work fine with other files with different dimensions. I'm pretty much a novice with this, so any suggestions as to what is going wrong is greatly appreciated.
[numdataP,patternsizeP]=xlsread('state.xlsx');
[~,dataP]=xlsread('pinellas.xlsx');
[D,ic,id] = intersect(patternsizeP,dataP)
indexP=cellfun(@(x)find(ismember(dataP,x)==1),D,'uniformoutput',false)

6 Comments

You've already got the indices to where the matching values are in the two arrays, ic is the location in patternsizeP and id is the location in dataP of the values returned as matching in D
If there's a blank cell in D, then that was true in both inputs and is expected result.
What are you now trying to do?
Sometimes it helps to show a small subset of the data to help those here visualize; you know what you're looking at, we can't see your terminal/monitor from here...
jonas
jonas on 6 Aug 2018
Edited: jonas on 6 Aug 2018
He is trying to find duplicates in dataP, whereas intersect only finds unique matching cells.
I know this from his previous question
+1, attach data
Gotcha', looks like you've got it in hand if OP will only cooperate... :)
Attached are the sample files. What I’ve been trying to do is the following. The file “state” has in the first column single entries of different codes (e.g. ‘PIN159614’). The file ‘pinellas’ can have multiple matches to these codes (e.g. 10 to 30 matches to ‘Pin159614). The goal is to find all the matches, and then output all of those matches in one column of a table or array, along with other information grabbed from other columns from the ‘state’ and ‘pinellas’ files. So for example all the matches for PIN159614 from column 1 in state that are found and put in the first column, and all of the corresponding entries from column 2 in Pinellas are put in the next column of the output, and the single entry from column 16 in state (which could be repeated for each match in Pinellas) would be put into the output in column 3 etc. So the final output could be something like
PIN159614 SO5 358
PIN159614 SOA 358
PIN159614 SOD 358
.
.
.
I had a script which did this using while/for loops that took days or even weeks to run with the files (which are on the order of 700,000 x 25). Using the code utilizing the function is very rapid but I can’t get it to index properly and return what I need. Attached are examples of the data. I’m really sorry if I breached some protocol which I was truly unaware of regarding the help (i.e. comment regard OP cooperating). If so let me know so there are no further breaches. I really appreciate the help. Thanks very much.
jonas
jonas on 7 Aug 2018
Edited: jonas on 7 Aug 2018
I am the same person who worked with you previously, so I know the background.
The script works fine for me, probably because you did not include the entire data set. For some reason, xlsread grabs an empty row at the end of the excel-sheet when importing state.xlsx. That's why you get an empty cell. You should make sure to either not import this cell (either directly on import or remove it afterwards). As a first test, you can manually specify the range of cells to import from excel. I think this may solve your problem.
In addition, I believe you are only using the first column of patternsizeP so make sure to REMOVE all other columns (or better yet, don't import them). Right now you are jamming a bunch of unnecessary data into the intersect function, which is likely to cause issues.
If none of the above works, make sure to upload a segment of data for which the error is reproduced, so that we can find the bug.
I think dsb was referring to you cooperating by submitting data, don't worry about it :)
Thanks. Yes I realized that it's grabbing an empty cell at the end, but no matter what I do, it does so. The program seems to ignore whatever I import into the workspace and just goes for the files in the matlab folder. No matter what I do to these files or what I import, the empty cell still gets grabbed. I am using column 17 (and would like to use more) from patternsizeP, but I'll try removing a bunch of those columns to see what effect it has. Thanks again.

Sign in to comment.

 Accepted Answer

>> [~,~,rawS] = xlsread('state.xlsx');
>> [~,~,rawP] = xlsread('pinellas.xlsx');
>> [idxP,idxS] = ismember(rawP(:,1),rawS(:,1));
>> out = rawS(idxS(idxP),[1,1,16]);
>> out(:,2) = rawP(idxP,2)
out =
'PIN159614' 'SOO' [271]
'PIN159614' 'SO1' [271]
'PIN159614' 'SOA' [271]
'PIN159614' 'SO0' [271]
'PIN159614' 'SO3' [271]
'PIN159614' 'SO2' [271]
'PIN159614' 'SO5' [271]
'PIN159614' 'SOE' [271]
'PIN159614' 'SOD' [271]
'PIN159614' 'SOG' [271]
'PIN159614' 'SO6' [271]
'PIN725789' 'SOA' [262]
'PIN725789' 'SOC' [262]
'PIN725789' 'SOB' [262]
'PIN725789' 'SOE' [262]
'PIN725789' 'SOD' [262]
'PIN725789' 'SOG' [262]
'PIN725789' 'SOL' [262]
'PIN725789' 'SOO' [262]
'PIN725789' 'SO1' [262]
'PIN725789' 'SO0' [262]
'PIN725789' 'SO3' [262]
'PIN725789' 'SO2' [262]
'PIN725789' 'SO5' [262]
'PIN725789' 'SO7' [262]
'PIN725789' 'SO6' [262]
'PIN725789' 'SO8' [262]

5 Comments

Even more efficient, per usual :)
Thanks. I tried running the code and it give me the error
Error: File: newscriptcode.m Line: 1 Column: 1 Unexpected MATLAB operator.
I tried checking the path and everything seems OK. Donn't know why it fails for me?
@Mark Bodner: please upload your file by clicking the paperclip button.
Is newscriptcode the name of your file? How are you calling it? Please show the output of this command:
which newscriptcode -all
Yes that is the name. The output of the command is C:\Users\mbodner\Downloads\MathWorks\R2017b\newscriptcode.m
The files 'pinellas' and 'state' are in the same location. The file is open and I'm just hitting the green "run" arrow.
Solved the problem. Foolishly copied over >> from your code in each line. Removed it and everything works great.

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!