Import a non rectangular string file using textscan
Show older comments
Hello all!
I'm currently working on social nets and have a string file of names i have to import in order to create an adjacency matrix. The format of the file is as such:
Author1 Recipient1,Recipient2,Recipient3 "Date"
Author2 Recipient4,Recipient3 "Date"
Author3 Recipient5 "Date"
Author2 Recipient3,Recipient4,Recipient6 "Date"
etc
Using the code below, I have no problem importing the author or date lists.
fid = fopen(['txtfiles.txt']);
C = textscan(fid,'%s %s %q %*[^\n]','CollectOutput');
fclose(fid);
The trouble is with the recipients lists since i have no way of knowing what the maximum length of recipients will be. I'd like to have a rectangular cell array (CA) so that i can read them straight away. What i get at the moment is a CA as such:
Recipient1,Recipient2,Recipient3
Recipient4,Recipient3
Recipient5
Recipient3,Recipient4,Recipient6
So i textscan the latter CA once more and get a nested CA as such:
3x1cell
2x1cell
'Recipient5'
3x1cell
At the moment what i do is search for the commas using:
findCommas=strfind(mentions,',');emptyCell=cellfun(@isempty,findCommas);
commaPos=find(~emptyCell);
In addition, I use a for-loop to expand the nested cells described above. As you can imagine, all this is taking forever when i have to process 2M entries. Is there anything i can do to get a CA of strings and not a CA of mixed format data for the recipients? Thanking you in anticipation, Dinos
Answers (0)
Categories
Find more on String Parsing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!