Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

To resolve issues starting MATLAB on Mac OS X 10.10 (Yosemite) visit: http://www.mathworks.com/matlabcentral/answers/159016

problem with track of index

Asked by huda nawaf on 28 Jul 2012

hi,

I have 17770 files , to make my code faster I merged each set of files to creat one file.

eventually , I got 2221 files with different length , but I do not remember how many old files within each new file. what I have is the length of each old file.

The problem that I faced is :I have to increment the ind each time read old file :

array(k)=ind i.e if the lengths of old files are 12, 45, 10,23 ,4 ,10,11,13...etc

and the lengths of new files are : 57,37,21 if we suppose that the first new file include two old files (12+45), and the second include(10+23+4), etc.

what i need is when read the new file if the counter =12 for example,ind=ind+1 and when counter = 12+45 ,then ind=ind+1; so on.

I can not tune the index of counter each time read new file. Note: this is just example my files with very long lengths.

thanks in advance

3 Comments

per isakson on 28 Jul 2012

Questions:

  • do you still have copies of all old files?
  • what is the total size of all old files? Less than 2GB?
  • the files are they text files?

17700 files is not that bad. The approach you outline is error prone - I fear.

If three YES, I think you should make an appropriate set of binary files. That would make your code faster.

huda nawaf on 28 Jul 2012

thanks, the answers of all three questions are yes. my problem with 17770 files is long story , and I displayed it in this forum. no one can solve it. I ran my labtop core 5 8 hours in day to collect 400 users from 400000 users because my code looked for id of users over all 17771, so is very slow. But I find when I reduce the no. of files to be 2221 , the code be faster. unfortunately , I faced the problem above. in addition , I have no idea about what you suggest regarding binary files.

per isakson on 28 Jul 2012

What version of Matlab do you run? 64bit?

Binary file alternatives:

  • mat-file version 7.3, access with the function, matfile, which creates an object. v7.3 is a HDF5-file under the hood.
  • netCDF or HDF
  • plain binary, i.e. fwrite/fread

I'm not aware of what kind of data you are working with. Base on this question, I guess it might a job for SQL. However, SQL comes with a learning curve. See e.g. SQLite or better a system someone near you uses.

huda nawaf

Products

No products are associated with this question.

1 Answer

Answer by Image Analyst on 28 Jul 2012

I doubt that if you include the time to combine thousands of files into a single file, and the time to break them apart again (if needed) that it will be much less than just processing the thousands of files individually. What kind of times are you getting for the two approached? Anyway, if you combine them, then it's your responsiblity to keep track of the sizes somehow, say in a second file with just the original files sizes as text, if you need that information later.

You say " I can not tune the index of counter each time read new file". Well for one run, you have just one new file (which is composed of the thousands of smaller files). The file pointer (what you called index of counter) does not need to be "tuned" - it starts at 1 and the help for fread() says this:

A = fread(fileID, sizeA) reads sizeA elements into A and positions the file pointer after the last element read. sizeA can be an integer, or can have the form [m,n].

so the file pointer is left at the end of the last read location - no tuning necessary.

0 Comments

Image Analyst

Contact us