Appending to a very large file

10 views (last 30 days)
Stefan Oline
Stefan Oline on 5 Jan 2021
Commented: Stefan Oline on 8 Mar 2021
I'm having trouble writing very large files to disk. I'm appending 64 smaller files (each ~1 GB) into a sinlge giant matrix. I expect the file to be ~64 GB, and I'm running into an "Out of memory" problem during processing. I'm wondering if there's a more efficient way to do this without needing to load all of the smaller files into memory before writing one monster file to disk. Is there a way for me to load each one at a time and append that to the file, then clear memory and load the next?
Current code looks like this:
close all
% Make a for loop to import every channel
for i=1:64
fprintf('i = %f\n', i);
[Samples, Header] = Nlx2MatCSC(['CSC' num2str(i,'%02.f') '.ncs'],...
[0 0 0 0 1], 1, 1, [] );
%temp_1 = Samples';
temp_2 = reshape(Samples,[],1)';
if exist('signal_mat')
signal_mat = vertcat(signal_mat,temp_2);
signal_mat = temp_2;
clear Samples Header temp_2
clear i
% Demedian the data
fprintf('Demedian data');
signal_med = median(signal_mat);
signal_mat_demed = signal_mat - signal_med;
%% Write to file for KS2
fprintf('Write data');
fid = fopen('myNewFile.dat', 'w');
fwrite(fid,signal_mat, 'int16');
fid = fopen('myNewFile_demed.dat', 'w');
fwrite(fid,signal_mat_demed, 'int16');

Accepted Answer

Jan on 7 Jan 2021
This line increases the problem:
signal_mat = vertcat(signal_mat,temp_2);
In e.g. the last step, you concatenate a 63 GB array with a 1 GB array and copy it to a new 64 GB array. This requires 63+64 GB of RAM.
Pre-allocation would avoid this problem. In your case it could work with 64 + X GB RAM, where X might be 8 or 20. But even then this is a huge signal. How much RAM do you have?
Stefan Oline
Stefan Oline on 8 Mar 2021
Hello, if I could ask a follow up question, I'm having trouble writing the .dat file since it's so large (~64GB). I attempted to follow the method here:
I'm having trouble doing two things.
  1. Denoising the 64 channels by finding the median across all 64 channels and subtracting that from each signal.
  2. Writing the output (a 64 x 407297536 matrix) to a .dat which will end up being ~64GB.
Is there an easy way to demedian the signals, and then to write them to disk as a giant .dat?
Thanks very much.
%% User inputs
channels = 1:64;
demed_flag = 1;
store_flag = 1;
% Choose a directory to store the files
outDir = 'H:\Falkner_lab\Ephys\2020.07.16_Mouse2357\2020.08.28\tall_eg';
writeDir = 'H:\Falkner_lab\Ephys\2020.07.16_Mouse2357\2020.08.28\tall_eg\write';
%% Setup
n_channels = length(channels);
% Check how many samplesa are in a single channel
[Sample_check, Header] = Nlx2MatCSC(['CSC' num2str(channels(1),'%02.f') '.ncs'],...
[0 0 0 0 1], 1, 1, [] );
temp_a = reshape(Sample_check,[],1)';
n_samples = length(temp_a);
clear Header Sample_check temp_a
%% Import .ncs data files from the channels list to individual .mat files
fprintf('*Importing data*\n')
for i=1:n_channels
disp(['Importing channel ' num2str(i) ' of ' num2str(n_channels) ' (' ...
num2str(i/n_channels*100,2) '%)'])
%fprintf('i = %.0f\n', i )
[Samples, Header] = Nlx2MatCSC(['CSC' num2str(channels(i),'%02.f') '.ncs'],...
[0 0 0 0 1], 1, 1, [] );
data = reshape(Samples,[],1)';
% Choose a file name - ensure these progress in order
fname = fullfile(outDir, sprintf('data_%05d.mat', channels(i)));
% Save the data and increment counters
save(fname, 'data', '-v7.3');
clear Samples Header data fname
clear i
fprintf('*Importing data complete*\n');
%% Create a datastore from the files
% Read the data back in as a tall array. First create a datastore ...
fprintf('*Creating a datastore*\n');
ds = fileDatastore(fullfile(outDir, '*.mat'), ...
'ReadFcn', @(fname) getfield(load(fname), 'data'), ...
'UniformRead', true);
fprintf('*Creating a datastore complete*\n');
% ... and then a tall array
fprintf('*Storing the datastore in a tall array*\n');
tdata = tall(ds);
fprintf('*Storing the datastore in a tall array complete*\n');
%% Demedian the signals
% ???
%% Write to file for KS2
if store_flag == 1
fprintf('*Writing data*\n');
fid = fopen('myNewFile_zeros.dat', 'w');
fwrite(fid,tdata, 'int16');
fprintf('*Writing data complete*\n');

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!