Unable to read a huge XML or text file

3 views (last 30 days)
JFz
JFz on 24 Jul 2017
Commented: Santa Raghavan on 27 Jul 2017
Hi,
I have a XML of 2GB in size. I keep getting java heap memory error when loading it. So I am thinking of reading it in as a text file and remove many useless rows in that file before saving it into a new and smaller file.
How to do that? I cannot even read it with textpad. Thanks!

Answers (1)

Santa Raghavan
Santa Raghavan on 26 Jul 2017
Edited: Santa Raghavan on 26 Jul 2017
The amount of Java Heap memory available to MATLAB can be increased and this can be done in the following way:
In the MATLAB Desktop Window:
For versions of MATLAB R2010a and above, use - File -> Preferences -> General -> Java Heap Memory. Move the slider to adjust the allocated heap memory.
For versions of MATLAB prior to R2010a, refer to the link below-
If that does not work, you can read it in as a text file using the textscan function by specifying the block size you wish to read at a time.
fileID = fopen('bigfile.txt');
formatSpec = '%s %f %*f %*f %s';
Read a block of data in the file. Use the HeaderLines name-value pair argument to instruct textscan to skip two lines before reading data.
D = textscan(fileID,formatSpec,'HeaderLines',2,'Delimiter','\t')
Refer for more info: Import large text files
  2 Comments
JFz
JFz on 27 Jul 2017
Thank! I will try it. I have increased the java heap memory to the maximum but still got the same error.
Santa Raghavan
Santa Raghavan on 27 Jul 2017
You can also try the datastore function that lets you read files that dont fit into the memory.
ds = datastore('Myfile.xml', ...
'TreatAsMissing','NA')
ds.ReadSize = 100; % Specifies the number of lines
% you want to read at a time.
read(ds) % Reads first 100 lines in file
read(ds) % Reads next 100 lines in file
Subsequent read calls on ds fetches data from last read point.

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Tags

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!