Importing data with unequal number of column

Hi all,
Firstly, I apologise if the title is not describing what I am going to ask here. Please feel free to change the title of this post.
I have the following dataset save as txt file and I intend to load this into Matlab. The matrix is suppose to be a [3 x 178], how to do this?
Hit:
The first row is started by '10864.0', the second row is by '10864.5' and the third is by '10865.0'.
I have attached the original data.txt to this post.

 Accepted Answer

Stephen23
Stephen23 on 20 Jan 2020
Edited: Stephen23 on 20 Jan 2020
This is very simple and efficient using fscanf:
[fid,msg] = fopen('Data.txt','rt');
assert(fid>=3,msg)
mat = fscanf(fid,'%f',[179,3]).';
fclose(fid);
or using fileread and sscanf:
str = fileread('Data.txt');
mat = sscanf(str,'%f',[179,3]).'
Giving:
>> size(mat)
ans =
3 179
>> mat(:,1:11) % Look at just the first few columns:
ans =
10864 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069
10865 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069
10865 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069 3.3069

8 Comments

Adam Danz
Adam Danz on 20 Jan 2020
Edited: Adam Danz on 20 Jan 2020
Simple and clean, but this doesn't isolate the 3 sections of data correctly (unless I've misunderstood the goal). Each matrix has 178 elements and there are 3 matricies identified by the '10864' values.
Stephen23
Stephen23 on 20 Jan 2020
Edited: Stephen23 on 20 Jan 2020
"but this doesn't isolate the 3 sections of data correctly..."
Each "section" is one row of the matrix. What does "isolate" mean in concrete terms related to MATLAB?
" Each matrix has 178 elements and there are 3 matricies identified by the '10864' values"
Yes, which is why the output matrix has size 3x179, corresponding to the size requested by the original question, plus one column for the "section" values. It is not clear what you imagine is missing from this.
It will certainly be much more efficient than any of the other answers, although more fragile because it relies on a fixed matrix size.
Part of the data file is shown below (just a few rows). I believe that the data file is arranged into 3 sections marked by the rows with a single value (eg 10864.0). Within each of those sections are a matrix with 178 elements. I'm considering a section to be one of the matrices.
10864.0
3.3069 3.3069 3.3069 3.3069 3.3069 3.3069
3.3069 3.3069 3.3069 3.3069 3.3069 3.3069
3.3069 3.3069 3.3069 3.3069 3.3069 3.3069
3.3069 3.3069 3.3069 3.3069 3.3069 3.3069
3.3069 3.3069 3.3069 6.3816 3.3069 6.3507
3.3069 3.9303 3.3069 6.1992 6.1788 6.1649
6.1425 6.1251 6.1111 6.0697 6.0322 3.8311
5.9331 5.9033 5.8818 5.8497 5.8391 5.8281
5.7981 5.7882 5.7668 5.7424 3.6757 5.6934
5.6556 5.6257 5.6053 5.5690 5.5374 3.5514
"I'm considering a section to be one of the matrices."
That is clear, I also understood that definition... which is exactly why my answer places each of those "sections" onto one row of the output matrix, just as the question requests.
You seem to think that my answer does not work, but you have not given a single explanation of why. Please compare the output matrix with the original file: please tell me exactly where the problem lies.
" but this doesn't isolate the 3 sections of data correctly (unless I've misunderstood the goal)."
When I check it, my answer gives the same rows as yours (it just does it much more efficiently):
>> isequal(mat(:,2:end),M) % ignore first column of my answer
ans =
1
If my code "doesn't isolate the 3 sections of data correctly" then why I am I getting the same data in the same order as your answer?
From the question, "The matrix is suppose to be a [3 x 178]"
Yours is 3x179. After further inspection, all you'd need to do is
mat(:,1) = [];
to meet those requirements. The single-value rows (eg 10864.0) are merely separators and apparently should not belong in the output matrix.
Stephen23
Stephen23 on 20 Jan 2020
Edited: Stephen23 on 20 Jan 2020
@Adam Danz: Your original criticism was actually "...but this doesn't isolate the 3 sections of data correctly..." and yet you have failed to explain or justify what this means, or how my answer does not achieve it. Now you are talking about the leading column, which as the example in my answer clearly shows contains the "section" number, making the matrix 3x179: as you showed, it is trivial to ignore/remove that column if required.
Are you now implying that my code does isolate those "sections", even though you wrote that it didn't? I'm confused.
"and yet you have failed to explain or justify what this means"
I did expland that cricisms 2 comments later.
"it is trivial to ignore/remove that column if required"
Agreed.
"Are you now implying that my code does isolate those "sections","
Yes, provided the first column is removed, or, as dpb mentioned, the OP incorrectly defined the problem in the quesiton.
I have accepted this answer because it gives me what I want. Thanks!

Sign in to comment.

More Answers (2)

Adam Danz
Adam Danz on 20 Jan 2020
Edited: Adam Danz on 20 Jan 2020
See the other two solutions for more efficient approachs.
The input to the code below is your data file Data.txt. The outputs are 1) M, a 3 x 178 matrix where each row is a block of values from your text file. 2) rowDefs, a 3 x 1 vector identifying the sections of each row of M.
See inline comments for details.
% Read in data a char array, convert to cell array split by rows
Ch = fileread('Data.txt'); % char array
Cs = strsplit(Ch,newline); % Cell array of strings
Cs(cellfun(@isempty,Cs)) = []; % Remove empties
% Convert all elements to numeric
Cv = cellfun(@(c){str2double(strsplit(strtrim(c)))},Cs); % cell array of numeric vectors
% Detect the cell array elements with only 1 value (ie, 10864.0)
sectionIdx = find(cellfun(@numel, Cv) == 1); % numel(sectionIdx) shows that there are 3 sections
% Isolate each section into its own row of a matrix.
M = cell2mat(arrayfun(@(i,j){[Cv{i+1:j-1}]},sectionIdx,[sectionIdx(2:end),numel(Cv)+1])');
% Get row definitions
rowDefs = [Cv{sectionIdx}]';

2 Comments

This answer works fine too if I add one extra line to merge the matrices.
out = [rowDefs M];
Ahh, in that case you did intend for the matrix to be 3x179.
Glad you found an efficient solution!

Sign in to comment.

The problem is the file has embedded \n in what should be unbroken records. Whether this came from the original creation of the file or was introduced by looking at in a text editor or what is unknown. If can't fix the problem at the source, then
b=reshape(textread('beedata.txt','%f'),[],3).';
The above presumes the number of records is known a priori and fixed.
I used the deprecated textread because it works for the purpose and doesn't need the extra nuisance of a file handle as do the other input functions.
The above yields for a portion of the file
>> b(:,1:10)
ans =
1.0e+04 *
1.0864 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003
1.0864 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003
1.0865 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003
>>
NB: The array size is actually 3x179 since there are 178 elements of the array after each what looks like a time stamp, maybe?
>> whos b
Name Size Bytes Class Attributes
b 3x179 4296 double
>>

5 Comments

Adam Danz
Adam Danz on 20 Jan 2020
Edited: Adam Danz on 20 Jan 2020
I have a different interpretation. Each matrix has 178 elements in that text file. There are 3 matrices each segmented by the larger values (eg 10865).
So the task, I believe, is to exract each matrix and convert it to a vector, then concatenate the 3 vcetors into a matrix. That's what my answer does, at least.
Adam Danz: wrote: "I have a different interpretation. Each matrix has 178 elements in that text file. There are 3 matrices each segmented by the larger values (eg 10865)."
"So the task, I believe, is to exract each matrix and convert it to a vector, then concatenate the 3 vcetors into a matrix. That's what my answer does, at least."
Your output compared to dpb's:
>> isequal(b(:,2:end),M) % ignore first column of dpb's output
ans =
1
If dpb's answer gives the same output as your (plus one column, as described), what exactly is the problem?
@Stephen Cobeldick,
Both yours and dpb's solutions are much cleaner and more efficient than mine as is stated in my first comment under your answer. The only problem is that the question asks for a 3x178 matrix and these two answers produce a 3x179 matrix. They include unwanted data in the first column. That's the problem. Is there something still unclear?
Nothing's not clear, no. The difference of opinion is the OP in SC's and mine interpretation of the request is to augment the 178 elements with the section ID number resulting in a [3x1 3x178] --> [3x179] output array total.
If OP doesn't really want the first column, it's trivial to elide it after the fact, but I'm betting it's wanted data, too, and simply didn't account for it in the original Q? text.
Agreed, that could be the case.

Sign in to comment.

Asked:

on 20 Jan 2020

Commented:

on 20 Jan 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!