Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Error importing file using dataset

Asked by Ivan on 5 Sep 2013

I need to import a file into dataset, here's a file snippet:

User Starting time(up) Starting time(down) Duration(up) Duration(down) Bytes(up) Bytes(down) Packets(up) Packets(down)

41.224.9.214 1366669568254 1366669568254 195 94 488 502 10 8

I need the first column as string, 2:5 columns as int, and the rest as double. I've tried using ds=dataset('File', cur_file, 'Format', 's%u64%u64%u64%u64%d%d%d%d', 'HeaderLines', 1, 'Delimiter', '\t');

But get the error: Error using dataset/readFile (line 165) Variable lengths must all be the same. You may have specified the format string, delimiter, or number of header lines incorrectly.

Error in dataset (line 347) a = readFile(a,fileArg,otherArgs);

I've also tried

ds=dataset('File', cur_file, 'Delimiter', '\t', 'ReadObsNames',true);

That gives: Error using dataset/readFile (line 195) The number of variable names read from ~/matlab/data/ContinuousUserServiceProfiles/ctu_cmp_download/147.32.86.92:443:6 does not match the number of data columns. You may have specified the format string, delimiter, or the number of header lines incorrectly.

Error in dataset (line 347) a = readFile(a,fileArg,otherArgs);

What can I do?

1 Comment

Ivan on 5 Sep 2013

One more thing, the entries in the file are tab-separated, but I can change the format if needed, as long as I get the data imported.

Ivan

2 Answers

Answer by Tom Lane on 11 Sep 2013

Would you explain some more? Your quoted line starts "41.2249.214" but you say you want "4" as string, "1.22" as integer, and the rest as double including apparently "4.9.214".

If I make a file that consists of a header line followed by multiple copies of the line you quoted, I can read it like this:

fmt = '%1s%5f%2f %f %f %f %f %f %f %f %f %f';
dataset('file','deleteme.txt','delimiter',' ','headerlines',1,'readvar',false,'format',fmt)

I hope this gives you an idea of how to proceed.

0 Comments

Tom Lane
Answer by Peter Perkins on 12 Sep 2013

Like Tom, I'm a little unclear on what you're asking for. I'm going to assume from your format string that you want to read the tab-separated line

41.224.9.214 1366669568254 1366669568254 195 94 488 502 10 8

as

  • the string (IP address?) "41.224.9.214"
  • the integers 1366669568254, 1366669568254, 195, 94
  • the doubles 88, 502, 10, 8

So first, you want %f ("floating"), not %d ("decimal", I think). But that's not the problem. Based on the two error messages you're seeing, I'd have to guess that you either have some stray tabs at the ends of some of the lines in your file, or some lines that are short. You might experiment with the first few lines of you file to get the format string working, and then look through your file to try to find the bad line.

If you're up for an adventure, you should be able to use MATLAB's debugger to figure out the line in your data file that caused the problem, by setting a breakpoint at line 165 of dataset/readFile.m. The easiest way to get there is just to click on the "line 165" in the error message in the command window, it's a hyperlink. Set the breakpoint, run your ds=dataset(...) command, and then when execution stops at your breakpoint, take a look at the variable called "raw". It should be a 1x9 cell array, and the lengths of the contents of each cell will tell you how far the import got before it failed. Just type "raw" at the command line and you should see the contents' sizes.

1 Comment

Ivan on 12 Sep 2013

Thank you for the answer. Sorry if I was a unclear in my description, your understanding of my format is correct. I have already solved the problem by moving away from dataset type all together, as it has awfully slow indexing. For importing, I've used the matlab import tool to generate a function for me.

Peter Perkins

Contact us