Error importing file using dataset

2 views (last 30 days)
Ivan
Ivan on 5 Sep 2013
I need to import a file into dataset, here's a file snippet:
User Starting time(up) Starting time(down) Duration(up) Duration(down) Bytes(up) Bytes(down) Packets(up) Packets(down)
41.224.9.214 1366669568254 1366669568254 195 94 488 502 10 8
I need the first column as string, 2:5 columns as int, and the rest as double. I've tried using ds=dataset('File', cur_file, 'Format', 's%u64%u64%u64%u64%d%d%d%d', 'HeaderLines', 1, 'Delimiter', '\t');
But get the error: Error using dataset/readFile (line 165) Variable lengths must all be the same. You may have specified the format string, delimiter, or number of header lines incorrectly.
Error in dataset (line 347) a = readFile(a,fileArg,otherArgs);
I've also tried
ds=dataset('File', cur_file, 'Delimiter', '\t', 'ReadObsNames',true);
That gives: Error using dataset/readFile (line 195) The number of variable names read from ~/matlab/data/ContinuousUserServiceProfiles/ctu_cmp_download/147.32.86.92:443:6 does not match the number of data columns. You may have specified the format string, delimiter, or the number of header lines incorrectly.
Error in dataset (line 347) a = readFile(a,fileArg,otherArgs);
What can I do?
  1 Comment
Ivan
Ivan on 5 Sep 2013
One more thing, the entries in the file are tab-separated, but I can change the format if needed, as long as I get the data imported.

Sign in to comment.

Answers (2)

Tom Lane
Tom Lane on 11 Sep 2013
Would you explain some more? Your quoted line starts "41.2249.214" but you say you want "4" as string, "1.22" as integer, and the rest as double including apparently "4.9.214".
If I make a file that consists of a header line followed by multiple copies of the line you quoted, I can read it like this:
fmt = '%1s%5f%2f %f %f %f %f %f %f %f %f %f';
dataset('file','deleteme.txt','delimiter',' ','headerlines',1,'readvar',false,'format',fmt)
I hope this gives you an idea of how to proceed.

Peter Perkins
Peter Perkins on 12 Sep 2013
Like Tom, I'm a little unclear on what you're asking for. I'm going to assume from your format string that you want to read the tab-separated line
41.224.9.214 1366669568254 1366669568254 195 94 488 502 10 8
as
  • the string (IP address?) "41.224.9.214"
  • the integers 1366669568254, 1366669568254, 195, 94
  • the doubles 88, 502, 10, 8
So first, you want %f ("floating"), not %d ("decimal", I think). But that's not the problem. Based on the two error messages you're seeing, I'd have to guess that you either have some stray tabs at the ends of some of the lines in your file, or some lines that are short. You might experiment with the first few lines of you file to get the format string working, and then look through your file to try to find the bad line.
If you're up for an adventure, you should be able to use MATLAB's debugger to figure out the line in your data file that caused the problem, by setting a breakpoint at line 165 of dataset/readFile.m. The easiest way to get there is just to click on the "line 165" in the error message in the command window, it's a hyperlink. Set the breakpoint, run your ds=dataset(...) command, and then when execution stops at your breakpoint, take a look at the variable called "raw". It should be a 1x9 cell array, and the lengths of the contents of each cell will tell you how far the import got before it failed. Just type "raw" at the command line and you should see the contents' sizes.
  1 Comment
Ivan
Ivan on 12 Sep 2013
Thank you for the answer. Sorry if I was a unclear in my description, your understanding of my format is correct. I have already solved the problem by moving away from dataset type all together, as it has awfully slow indexing. For importing, I've used the matlab import tool to generate a function for me.

Sign in to comment.

Categories

Find more on Cluster Configuration in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!