Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
problem in data import

Subject: problem in data import

From: Pap

Date: 11 Feb, 2011 22:17:04

Message: 1 of 10

Hello everyone,


A new user in Matlab. Can anyone help me on how to import aASCII file in matlab which is space and tab delimited.


The file ( intraday trades on greek stocks) looks like:

Stock Date Time Price Volume Stock Category
ΕΤΕ 04/01/2010 10293955 18.34 500 Big Cap
ΕΤΕ 04/01/2010 10293955 18.34 70 Big Cap
ΕΤΕ 04/01/2010 10293955 18.34 430 Big Cap
ABC 04/01/2010 10293955 18.34 200 Big Cap
YYY 04/01/2010 10293955 18.34 100 Big Cap
ΕΤΕ 04/01/2010 10293955 18.34 40 Big Cap
ΕΤΕ 04/01/2010 10293955 18.34 215 Big Cap
 

However it is huge ascii file since it contains almpst 9,900,000 rows.
My purpose is to examine almost 250 variables appearing in the total rows along their observations ( collumns 4,5)

When I try the import wizard only one varible is created (cell array) puting all the sic collumns in just one cell.

As I think its due to different delimiters space and tab which exist in this file particularly between the 1st and 2nd column

Any hint?

Subject: problem in data import

From: Miroslav Balda

Date: 12 Feb, 2011 00:35:04

Message: 2 of 10

SNIP
>
> Stock Date Time Price Volume Stock Category
> ΕΤΕ 04/01/2010 10293955 18.34 500 Big Cap
> ΕΤΕ 04/01/2010 10293955 18.34 70 Big Cap
> ΕΤΕ 04/01/2010 10293955 18.34 430 Big Cap
> ABC 04/01/2010 10293955 18.34 200 Big Cap
> YYY 04/01/2010 10293955 18.34 100 Big Cap
> ΕΤΕ 04/01/2010 10293955 18.34 40 Big Cap
> ΕΤΕ 04/01/2010 10293955 18.34 215 Big Cap
>
>
> However it is huge ascii file since it contains almpst 9,900,000 rows.
> My purpose is to examine almost 250 variables appearing in the total rows along their observations ( collumns 4,5)

Variables (strings), or values?

> Any hint?

You may use the following code to store only columns 4 and 5 from the file of upper data:

% Pap 2011-02-12
fid = fopen('data.txt','r');
C = textscan(fid,'%*s%*s%*s%n%n%*[^\n]','headerLines',1);
fclose(fid);

C is the cell array , where C{1} contains the 4th column and C{2} the fifth one. Then, you should select yours 250 values out of them.

Mira

Subject: problem in data import

From: Pap

Date: 12 Feb, 2011 10:37:02

Message: 3 of 10

"Miroslav Balda" <miroslav.nospam@balda.cz> wrote in message <ij4kjo$ljn$1@fred.mathworks.com>...
> SNIP
> >
> > Stock Date Time Price Volume Stock Category
> > ΕΤΕ 04/01/2010 10293955 18.34 500 Big Cap
> > ΕΤΕ 04/01/2010 10293955 18.34 70 Big Cap
> > ΕΤΕ 04/01/2010 10293955 18.34 430 Big Cap
> > ABC 04/01/2010 10293955 18.34 200 Big Cap
> > YYY 04/01/2010 10293955 18.34 100 Big Cap
> > ΕΤΕ 04/01/2010 10293955 18.34 40 Big Cap
> > ΕΤΕ 04/01/2010 10293955 18.34 215 Big Cap
> >
> >
> > However it is huge ascii file since it contains almpst 9,900,000 rows.
> > My purpose is to examine almost 250 variables appearing in the total rows along their observations ( collumns 4,5)
>
> Variables (strings), or values?
>
> > Any hint?
>
> You may use the following code to store only columns 4 and 5 from the file of upper data:
>
> % Pap 2011-02-12
> fid = fopen('data.txt','r');
> C = textscan(fid,'%*s%*s%*s%n%n%*[^\n]','headerLines',1);
> fclose(fid);
>
> C is the cell array , where C{1} contains the 4th column and C{2} the fifth one. Then, you should select yours 250 values out of them.
>
> Mira

Many Thanks Mira,
The above refers to values.


Any hint to delete rows with respect to values of column time?

For instance,

Delete rows with time values from 10150000 to 10295959 for all rows (or extract these values) ?


Many thanks again Mira

Subject: problem in data import

From: Miroslav Balda

Date: 12 Feb, 2011 15:57:03

Message: 4 of 10

Hi,

> Delete rows with time values from 10150000 to 10295959 for all rows (or extract these values) ?

In order to show you the functionality of the code with deleting rows outside the required range, I have prepared a modfified data "data2.txt":

Stock Date Time Price Volume Stock Category
ETE 04/01/2010 10145959 18.34 500 Big Cap
ETE 04/01/2010 10150000 18.34 70 Big Cap
ETE 04/01/2010 10170000 18.34 430 Big Cap
ABC 04/01/2010 10190000 18.34 200 Big Cap
YYY 04/01/2010 10200000 18.34 100 Big Cap
ETE 04/01/2010 10250000 18.34 40 Big Cap
ETE 04/01/2010 10295959 18.34 215 Big Cap
ETE 04/01/2010 10300000 18.34 500 Big Cap
ETE 04/01/2010 10320000 18.34 500 Big Cap

The following code makes what you required:

% Pap2 2011-02-12
% Function "inp.m" for manual input from keyboard:
% www.mathworks.com/matlabcentral/fileexchange/9033

file = inp('file name','data2.txt'); % Enter name of the file to be processed
fid = fopen(file,'r');
C = textscan(fid,'%*s%*s%n%n%n%*[^\n]','headerLines',1);
fclose(fid);

time1 = inp('time1',10150000,'%8d'); % lower boundary of time period
time2 = inp('time2',10295959); % upper boundary of time period

I = C{:,1}>=time1 & C{:,1}<=time2; % logicals of accepted times
% D = [time price volume] between time1 and time 2 included:
D = [C{:}]; % It may be put into C for sparing memory space
D = D(I,2:3) % Columns 4 & 5 in C are columns 2 & 3 in D

There are the results of a run:

>> Pap2
          file name = data2.txt =>
          time1 = 10150000 =>
          time2 = 10295959 =>
D =
   18.3400 70.0000
   18.3400 430.0000
   18.3400 200.0000
   18.3400 100.0000
   18.3400 40.0000
   18.3400 215.0000

You see that only rows within the required times are present.

Best regards

Mira

PS:
Maybe that the stock code be also interesting. In that case you may either modify the format in textscan and process bigger matrix, or read the same file for the second time and extract only the first column. Afterwards, you may apply the same logical vector "I" for selecting the required rows.

Subject: problem in data import

From: Pap

Date: 12 Feb, 2011 22:34:04

Message: 5 of 10

"Miroslav Balda" <miroslav.nospam@balda.cz> wrote in message <ij6akf$67d$1@fred.mathworks.com>...
> Hi,
>
> > Delete rows with time values from 10150000 to 10295959 for all rows (or extract these values) ?
>
> In order to show you the functionality of the code with deleting rows outside the required range, I have prepared a modfified data "data2.txt":
>
> Stock Date Time Price Volume Stock Category
> ETE 04/01/2010 10145959 18.34 500 Big Cap
> ETE 04/01/2010 10150000 18.34 70 Big Cap
> ETE 04/01/2010 10170000 18.34 430 Big Cap
> ABC 04/01/2010 10190000 18.34 200 Big Cap
> YYY 04/01/2010 10200000 18.34 100 Big Cap
> ETE 04/01/2010 10250000 18.34 40 Big Cap
> ETE 04/01/2010 10295959 18.34 215 Big Cap
> ETE 04/01/2010 10300000 18.34 500 Big Cap
> ETE 04/01/2010 10320000 18.34 500 Big Cap
>
> The following code makes what you required:
>
> % Pap2 2011-02-12
> % Function "inp.m" for manual input from keyboard:
> % www.mathworks.com/matlabcentral/fileexchange/9033
>
> file = inp('file name','data2.txt'); % Enter name of the file to be processed
> fid = fopen(file,'r');
> C = textscan(fid,'%*s%*s%n%n%n%*[^\n]','headerLines',1);
> fclose(fid);
>
> time1 = inp('time1',10150000,'%8d'); % lower boundary of time period
> time2 = inp('time2',10295959); % upper boundary of time period
>
> I = C{:,1}>=time1 & C{:,1}<=time2; % logicals of accepted times
> % D = [time price volume] between time1 and time 2 included:
> D = [C{:}]; % It may be put into C for sparing memory space
> D = D(I,2:3) % Columns 4 & 5 in C are columns 2 & 3 in D
>
> There are the results of a run:
>
> >> Pap2
> file name = data2.txt =>
> time1 = 10150000 =>
> time2 = 10295959 =>
> D =
> 18.3400 70.0000
> 18.3400 430.0000
> 18.3400 200.0000
> 18.3400 100.0000
> 18.3400 40.0000
> 18.3400 215.0000
>
> You see that only rows within the required times are present.
>
> Best regards
>
> Mira
>
> PS:
> Maybe that the stock code be also interesting. In that case you may either modify the format in textscan and process bigger matrix, or read the same file for the second time and extract only the first column. Afterwards, you may apply the same logical vector "I" for selecting the required rows.




Thanks for the help

I'll try to work it this way.


May thanks again Mira

Subject: problem in textscan

From: Pap

Date: 15 Mar, 2011 18:54:07

Message: 6 of 10

"Miroslav Balda" <miroslav.nospam@balda.cz> wrote in message <ij6akf$67d$1@fred.mathworks.com>...
> Hi,
>
> > Delete rows with time values from 10150000 to 10295959 for all rows (or extract these values) ?
>
> In order to show you the functionality of the code with deleting rows outside the required range, I have prepared a modfified data "data2.txt":
>
> Stock Date Time Price Volume Stock Category
> ETE 04/01/2010 10145959 18.34 500 Big Cap
> ETE 04/01/2010 10150000 18.34 70 Big Cap
> ETE 04/01/2010 10170000 18.34 430 Big Cap
> ABC 04/01/2010 10190000 18.34 200 Big Cap
> YYY 04/01/2010 10200000 18.34 100 Big Cap
> ETE 04/01/2010 10250000 18.34 40 Big Cap
> ETE 04/01/2010 10295959 18.34 215 Big Cap
> ETE 04/01/2010 10300000 18.34 500 Big Cap
> ETE 04/01/2010 10320000 18.34 500 Big Cap
>
> The following code makes what you required:
>
> % Pap2 2011-02-12
> % Function "inp.m" for manual input from keyboard:
> % www.mathworks.com/matlabcentral/fileexchange/9033
>
> file = inp('file name','data2.txt'); % Enter name of the file to be processed
> fid = fopen(file,'r');
> C = textscan(fid,'%*s%*s%n%n%n%*[^\n]','headerLines',1);
> fclose(fid);
>
> time1 = inp('time1',10150000,'%8d'); % lower boundary of time period
> time2 = inp('time2',10295959); % upper boundary of time period
>
> I = C{:,1}>=time1 & C{:,1}<=time2; % logicals of accepted times
> % D = [time price volume] between time1 and time 2 included:
> D = [C{:}]; % It may be put into C for sparing memory space
> D = D(I,2:3) % Columns 4 & 5 in C are columns 2 & 3 in D
>
> There are the results of a run:
>
> >> Pap2
> file name = data2.txt =>
> time1 = 10150000 =>
> time2 = 10295959 =>
> D =
> 18.3400 70.0000
> 18.3400 430.0000
> 18.3400 200.0000
> 18.3400 100.0000
> 18.3400 40.0000
> 18.3400 215.0000
>
> You see that only rows within the required times are present.
>
> Best regards
>
> Mira
>
> PS:
> Maybe that the stock code be also interesting. In that case you may either modify the format in textscan and process bigger matrix, or read the same file for the second time and extract only the first column. Afterwards, you may apply the same logical vector "I" for selecting the required rows.







Hello,

Can anyone help?

I am trying to run the above function.

1. After running the textscan function hoe can I the results (cell arrays) exported in a new Text/Excel file?

2. When I run the function for selecting loewr and upper bounds to delete rows I get the below error message:

> time1=('time1',10300000,'%8d');
??? time1=('time1',10300000,'%8d');
                  |
Error: Expression or statement is incorrect--possibly unbalanced (, {, or [.


Can anyone help please?

Pap

Subject: problem in textscan

From: Miroslav Balda

Date: 15 Mar, 2011 22:01:21

Message: 7 of 10

"Pap" wrote in message <ilockf$mkd$1@ginger.mathworks.com>...

SNIP

> Hello,
>
> Can anyone help?
>
> I am trying to run the above function.
>
> 1. After running the textscan function hoe can I the results (cell arrays) exported in a new Text/Excel file?

help xlswrite
 
> 2. When I run the function for selecting loewr and upper bounds to delete rows I get the below error message:
>
> > time1=('time1',10300000,'%8d');
> ??? time1=('time1',10300000,'%8d');
> |
> Error: Expression or statement is incorrect--possibly unbalanced (, {, or [.
>
>
> Can anyone help please?

It is difficult to help you, if you are not able to copy a line from the upper code. The wrong line
     time1=('time1',10300000,'%8d');
should be replaced by
     time1=inp('time1',10300000,'%8d');

Mira

Subject: problem in textscan

From: Pap

Date: 17 Mar, 2011 20:34:06

Message: 8 of 10

"Miroslav Balda" <miroslav.nospam@balda.cz> wrote in message <ilonjh$pjt$1@ginger.mathworks.com>...
> "Pap" wrote in message <ilockf$mkd$1@ginger.mathworks.com>...
>
> SNIP
>
> > Hello,
> >
> > Can anyone help?
> >
> > I am trying to run the above function.
> >
> > 1. After running the textscan function hoe can I the results (cell arrays) exported in a new Text/Excel file?
>
> help xlswrite
>
> > 2. When I run the function for selecting loewr and upper bounds to delete rows I get the below error message:
> >
> > > time1=('time1',10300000,'%8d');
> > ??? time1=('time1',10300000,'%8d');
> > |
> > Error: Expression or statement is incorrect--possibly unbalanced (, {, or [.
> >
> >
> > Can anyone help please?
>
> It is difficult to help you, if you are not able to copy a line from the upper code. The wrong line
> time1=('time1',10300000,'%8d');
> should be replaced by
> time1=inp('time1',10300000,'%8d');
>
> Mira





Many thanks Mira,

Do you know how I can extract the cell arrays, resulting from the above code, into a text file (or excel etc)?



Pap

Subject: problem in textscan

From: Miroslav Balda

Date: 18 Mar, 2011 06:29:05

Message: 9 of 10

"Pap" wrote in message <iltr7u$r79$1@ginger.mathworks.com>...

SNIP

> Do you know how I can extract the cell arrays, resulting from the above code, into a text file (or excel etc)?
>
> Pap

Hi Pap,
I do not understand you, what is the problem? Look at the code sent on March 12. The result is in the real matrix D, the rows of which are only those from the prescribed range of times and required columns. Try to run it with data2.txt file. If you do not believe, try to enter the command
     isreal(D)
after the run of Pap2.m script.
Good luck.
Mira

Subject: problem in textscan

From: Pap

Date: 18 Mar, 2011 15:08:05

Message: 10 of 10

"Miroslav Balda" <miroslav.nospam@balda.cz> wrote in message <iluu3h$oal$1@ginger.mathworks.com>...
> "Pap" wrote in message <iltr7u$r79$1@ginger.mathworks.com>...
>
> SNIP
>
> > Do you know how I can extract the cell arrays, resulting from the above code, into a text file (or excel etc)?
> >
> > Pap
>
> Hi Pap,
> I do not understand you, what is the problem? Look at the code sent on March 12. The result is in the real matrix D, the rows of which are only those from the prescribed range of times and required columns. Try to run it with data2.txt file. If you do not believe, try to enter the command
> isreal(D)
> after the run of Pap2.m script.
> Good luck.
> Mira



Thanks Mira,

Sorry for any incovenience caused. As a new user I am slightly confused even with the basics of the programm.


Many thanks again for your help

Pap

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us