How can I read Or Use txt file for Next Step?

Hi
I had Big Data(with 1 million Rows!) and i have to save it on txt file because Matlab couldn't perform it and happend Errore like: "OUT OF MEMORY"
So, i saved it in txt file(output).for Example a part of output is Attached(a_freq.txt).
Now, I need to Use New output for Next Step . my next step 's code is( X=output ):
X=??? ; %means I don't know!
param = 'Logarithmic' ;
[Y w] = tfidf2(X,param ) ;
and the Related Functione is :(also it's attached)
function [Y w] = tfidf2( X , param)
switch param
case 'Boolean'
% FUNCTION applies TF-IDF weighting to word count vector matrix.
%
% [Y w] = tfidf2( X );
%
% INPUT :
% X - word count vectors (one column = one document)
%
% OUTPUT :
% Y - TF-IDF weighted document-term matrix
% w - IDF weights (useful to process other documents)
%
% get inverse document frequencies
w = idf( X );
% TF * IDF
Y = tf1( X ) .* repmat( w, 1, size(X,2) );
case 'Logarithmic'
% get inverse document frequencies
w = idf( X );
% TF * IDF
Y = tf2( X ) .* repmat( w, 1, size(X,2) );
end
end
function Y = tf1( X )
% SUBFUNCTION computes word frequencies(Boolean)
Y = X ./ repmat( sum(X,1), size(X,1), 1 );
Y( isnan(Y) ) = 0;
end
function Y = tf2( X )
% SUBFUNCTION computes word frequencies(Logarithmic)
% Y = log(1+X) ;
Y = X ./ repmat( sum(X,1), size(X,1), 1 );
Y( isnan(Y) ) = 0;
end
function I = idf(X)
% SUBFUNCTION computes inverse document frequencies
% % count the number of words in each document
% counto the number of documents the term is repestead in
nz = sum( ( X > 0 ), 2 );
% compute idf for each document
I = log( size(X,2) ./ (nz(:) + 1) );
end
So, How can I Read txt file for New Step ???it's very important For me.

6 Comments

You've asked 11 questions and accepted only 2 answers. Is there any reason for that? There's very little incentive for people to help you if you ignore their answer and don't reward them with the credits of an accepted answer.
Isay
Isay on 27 Nov 2014
Edited: Isay on 27 Nov 2014
because the answers didn't help me. if helped me, I choised them! 3 min ago i choised your answer, because it's True
Isay
Isay on 27 Nov 2014
Edited: Isay on 27 Nov 2014
can you help me for my New Problem ?
If an answer (including mine) does not satisfy you, then comment on it to say why, rather than leaving the answerer without any feedback.
As I said, there's no incentive for us to answer questions, if we don't see the answers acknowledged in some way. You don't have to accept the answer if it's not what you want, but do give feedback then.
With regards to your new question, I've no idea what you're asking. It sounds like it's about file reading but the code you've posted does not include any such thing.
Can you clarify what you want and remove stuff that is not relevant?
please look at the txt file(that is attached).It's my OUTPUT.
a_freq.txt is X , ok ?
Now, please look at the other Function , "function [Y w] = tfidf2( X , param)"
how can I read X in Matlab ???(X is too big and i can't load it in matlab)

Sign in to comment.

Answers (3)

The memory used by the content of your file, when read as double is only 2 MB (and only 250 KB as uint8). How can that be too big for matlab?
x = csvread('a_freq.txt');
should work. If you're really constrained by memory that you can't read a 2MB array, then most likely the rest of your code won't be able to cope either.
However, if a file is too big to read all at once, you can use low level functions to parse it yourself:
fid = fopen('a_freq.txt', 'rt');
tline = fgetl(fid); %read a line at time
while ~isempty(tline)
xrow = str2num(tline); %convert one line of text into a row vector of numbers
%...
%do something with that row
%...
tline = fgetl(tline); %read next line
end
fclose(fid);

4 Comments

Isay
Isay on 28 Nov 2014
Edited: Isay on 28 Nov 2014
Thanks,but I said 'a_freq.txt' is a part of Real output!! :(
The real output is 9 GB !!
Guillaume
Guillaume on 28 Nov 2014
Edited: Guillaume on 28 Nov 2014
The second part of my answer shows you how to load one row at a time.
However, it looks like you're still using the same data structure as in your previous answers. As pointed out, it's extremely inefficient. Why don't you change that and save yourself a lot of headache?
Another option, would be to use a sparse matrix when you generate the data, since it looks like your matrix is full of 0s.
sparse method didn't help me,out of memory happend again.

Sign in to comment.

X = dlmread('a_freq.txt');
[Y w] = tfidf2(X, 'Logarithmic');

1 Comment

Thanks,but I said 'a_freq.txt' is a part of Real output!! :(
The real output is 9 GB !!
i can't load it in MATLAB with dlmread

Sign in to comment.

2 Comments

I try to use it, but i think first , i should create memmapfile ? and then read it ??? or I can use a_freq.txt ?
anyway, i use my output txt file(a_freq.txt)
my code:
X = memmapfile('a_freq.txt')'
and message was :
Filename: '...\a_freq.txt'
Writable: false
Offset: 0
Format: 'uint8'
Repeat: Inf
Data: 544269x1 uint8 array
(why Data is 544269*1 ?? a_freq.txt is 49*5549)
so , can I use "X.Data" for other function? is it ture? but it's faild..
can you say an example ?
Honestly I haven't used it personally but I know that this is it's main purpose. It's for cases where you have massive amount of data in a file and not enough RAM memory. This is it's whole reason for existing. Sorry, I don't know the details of how to use it though.

Sign in to comment.

Tags

Asked:

on 27 Nov 2014

Commented:

on 29 Nov 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!