The best way to save huge amount of data.

13 views (last 30 days)
Jonasz
Jonasz on 10 Aug 2013
Hello. I have about 12 thousand txt files which I need to import to workspace. What is the best way to load in the most efficient way. Here is my code :
function spectrum = loading()
PathName=uigetdir('*.txt');
cd(PathName);
names=dir('*.txt');
n=length(names);
spectrum=cell(n,1);
h = waitbar(0,'Please wait...');
for i=1:n,
fid=fopen(names(i,1).name);
spectrum{i}=cell2mat(textscan(fid,'%f %f'))';
fclose(fid);
waitbar(i/n)
end
close(h);
end
The problem is speed. The operation took too much time. Thanks for answers.

Answers (3)

Robert Cumming
Robert Cumming on 10 Aug 2013
refreshing the waitbar 12000 times will actually take quite a long time, try refreshing every 100 files
  3 Comments
Azzi Abdelmalek
Azzi Abdelmalek on 10 Aug 2013
[Jonasz commented]
The main problem is the program is going pretty fast at the beginning but slow when reaches the section (2500-3000) files read. Any ideas how to solve it ?
Robert Cumming
Robert Cumming on 11 Aug 2013
"Dont you think" - I have no idea as I have no idea how much data is in each of your files....
You could use the "memory" command to check what memory is available when you start and then again at 1000 files, 2000 files etc...

Sign in to comment.


Ken Atwell
Ken Atwell on 10 Aug 2013
Are the files all of consistent length? If so, you might be able to pull off a "trick" where you
  1. Concatenate all 12,000 into one large file
  2. Import that one file into MATLAB
  3. Break the file into 12,000 pieces if you need to
On Mac on Linux, you can concat easily from the command-line (off the top of my head, I don't know how to do this in Windows). In MATLAB:
!cat *.txt > allfiles.big
It is worth a try. Tips in other replies are important to heed:
  • If you load the file individually, update the wait bar far less frequently. Better, still get rid of it and you could switch to a parallel-for loop (parfor) which may give you a near linear speedup -- if you have Parallel Computing Toolbox, of course.
  • Keep you eye on free memory. If progress slows down at 2,000 files in, could you be using all the memory in your computer?

Azzi Abdelmalek
Azzi Abdelmalek on 10 Aug 2013
Edited: Azzi Abdelmalek on 10 Aug 2013
Try
PathName=uigetdir('*.txt');
cd(PathName);
names=dir('*.txt');
n=length(names);
spectrum=cell(n,1);
for i=1:n,
spectrum{i}=dlmread(names(i).name)';
end
  3 Comments
Azzi Abdelmalek
Azzi Abdelmalek on 10 Aug 2013
Apparently no, I've made some test, and it appears that textscan is faster
Ken Atwell
Ken Atwell on 10 Aug 2013
textscan is faster than dlmread. Lower-level file reading (fread, fscanf) would be faster still (but require more code).

Sign in to comment.

Categories

Find more on File Operations in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!