Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Parsing huge text files?

Subject: Parsing huge text files?

From: Luna Moon

Date: 13 Apr, 2011 02:47:11

Message: 1 of 3

Hi all,

I have huge logs files that need to be processed. Each log file is
about 500MB.

These are non-structured mixtures of strings and numbers.

To give an example, the first row might contain trader id, trade_id,
quantity, limit_price, etc.

And then the next row contains the acknowledgement returned back from
the exchange after the trade is sent to the exchange.

The next a few rows may contain info related to earlier trades and the
status of the orderbook, etc.

And then the next row may contain the fills info for that trade sent
to the exchange. Of course, we use the "trade_id" to keep track of the
info.

And also, in between rows, the log file contains the status of the
order book (order depth, etc.)

We may want to construct the queries, such as what trades did a trader
do and what are the average prices for his trades...

The most stupid way I can think of is to search the whole hundreds of
such log files for a specified trader id, and then find the trade_ids
and then search the details of the trades for each of the
trade_ids...

Therefore there will be a lot of searching and data-extracting on
these huge files...

Is there a way to parse the huge log files conveniently in Matlab, or
read one line at a time to parse the text data?

Any tools that can help processing such meta-data?

Thanks a lot!

Subject: Parsing huge text files?

From: Ralph Schleicher

Date: 13 Apr, 2011 05:00:02

Message: 2 of 3

Luna Moon <lunamoonmoon@gmail.com> writes:

> I have huge logs files that need to be processed. Each log file is
> about 500MB.

The 'textscan' command is your friend. If your data does not match a
regular pattern, preprocess the files with a Perl script and perform
the analysis in Matlab.

--
Ralph Schleicher <http://ralph-schleicher.de>

Development * Consulting * Training
Mathematical Modeling and Simulation
Software Tools

Subject: Parsing huge text files?

From: Rune Allnor

Date: 13 Apr, 2011 05:49:22

Message: 3 of 3

On Apr 13, 4:47 am, Luna Moon <lunamoonm...@gmail.com> wrote:

> The most stupid way I can think of is to search the whole hundreds of
> such log files for a specified trader id, and then find the trade_ids
> and then search the details of the trades for each of the
> trade_ids...

Maybe 'stupid', but that's the only way if you only
have these log-files.

> Therefore there will be a lot of searching and data-extracting on
> these huge files...
>
> Is there a way to parse the huge log files conveniently in Matlab, or
> read one line at a time to parse the text data?
>
> Any tools that can help processing such meta-data?

If you have hundreds of GByte'ish-sized files, expect to
spend a lot of time waiting for the results - hours or
even days every time you do the scan. I would suggest you
set up some sort of database where you store the data, so
that you easily can find whatever info in the future.

Rune

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us