What is the fastest way to extract data from a huge text file?

9 views (last 30 days)
I have a text file like this:
1.0 IONOSPHERE MAPS GNSS IONEX VERSION / TYPE
ADDNEQ2 V5.3 AIUB 03-JUL-14 20:57 PGM / RUN BY / DATE
CODE'S GLOBAL IONOSPHERE MAPS FOR DAY 180, 2014 COMMENT
Global ionosphere maps (GIM) are generated on a daily basis DESCRIPTION
(I don't want this part)
.
.
.
(skip 600 lines)
1 START OF TEC MAP
2014 6 29 0 0 0 EPOCH OF CURRENT MAP
87.5-180.0 180.0 5.0 450.0 LAT/LON1/LON2/DLON/H
154 154 155 155 155 156 156 156 156 155 155 155 154 154 153 153
152 151 150 149 148 147 146 145 145 144 143 142 141 140 139 139
138 138 137 137 137 137 136 136 137 137 137 137 137 138 138 139
139 139 140 140 141 142 142 143 143 144 145 145 146 147 147 148
149 149 150 151 152 152 153 153 154
85.0-180.0 180.0 5.0 450.0 LAT/LON1/LON2/DLON/H
160 161 162 163 164 164 165 165 165 164 164 163 163 162 161 159
158 157 155 153 151 149 147 145 143 141 139 138 136 134 133 132
131 130 130 129 129 129 130 130 131 131 132 133 134 135 136 136
137 138 139 139 140 140 141 142 142 143 144 145 146 146 148 149
150 151 153 154 155 157 158 159 160
.
.
.
I have to search for a specific value by entering specific latitude, longitude and time.
I have a function using fopen and fgetl for searching this. The data have a fixed spacing. So, I use strcmp string comparison and isequal to search for the value I want. . . .
Let say, value = search(lat, lon, time)
lat = 85.0; lon = -175; time (UT) = 0;
I will first compare each line getting from fgetl with the string:
2014 6 29 0 0 0 EPOCH OF CURRENT MAP
If matched, then search for 85.0 from the from following line getting by fgetl
85.0-180.0 180.0 5.0 450.0 LAT/LON1/LON2/DLON/H
If matched, store all related data into a vector:
160 161 162 163 164 164 165 165 165 164 164 163 163 162 161 159 158 157 155 153 151 149 147 145 143 141 139 138 136 134 133 132 131 130 130 129 129 129 130 130 131 131 132 133 134 135 136 136 137 138 139 139 140 140 141 142 142 143 144 145 146 146 148 149 150 151 153 154 155 157 158 159 160 (in vector form)
then get the value by specific vector index (corresponding to longitude, index=2 in this example)
. . .
But I have to call this search function for 250,000 times. It will take over 24 hours!!!
How can I do? I cannot change my computer. Thank!! I need your help!!*
PS: the text file is about 12,000 row * 80 column
  11 Comments
SO
SO on 5 Feb 2015
  • Since the data from the text file only got -180 to 180 longitude with 5 degree interval.
  • Input Longitude cannot be completely divided by 5 , or larger than 180 or smaller than -180, will not be accepted. This If condition is correct.
SO
SO on 5 Feb 2015
There are lots of unwanted data.
I want these kind of data only:
154 154 155 155 155 156 156 156 156 155 155 155 154 154 153 153
152 151 150 149 148 147 146 145 145 144 143 142 141 140 139 139
138 138 137 137 137 137 136 136 137 137 137 137 137 138 138 139
.
.
  • Creating a < 73x70x11> (lon,lat,ut) array is a possible solution. But I have to ignore the unwanted data and re-organize the structure of the data (e.g. one row for all data within the same latitude instead of using 5 rows).Any faster way to extract wanted data and re-organize them instead of using fgetl?
Thank for your advice.!!

Sign in to comment.

Accepted Answer

per isakson
per isakson on 5 Feb 2015
Edited: per isakson on 5 Feb 2015
Now I'm done:
  • less than a tenth of a second to read and parse the sample file (with the file in the system cache)
  • less than a tenth of a millisecond to retrieve one value
  • the array ION is half a MB. Make ION uint8 to save memory - if needed.
  • 62196 values retrieved from the sample file.
You add tests and comments!
>> tic,ION = cssm();toc
Elapsed time is 0.074765 seconds.
>> sum(not(isnan(ION(:))))
ans =
62196
>> whos ION
Name Size Bytes Class Attributes
ION 73x71x12 497568 double
>> ION(lon2ix(0),lat2ix(85),ut2ix(20))
ans =
164
>> tic,ION(lon2ix(0),lat2ix(85),ut2ix(20));toc
Elapsed time is 0.000067 seconds.
compared to
>> tic, [gim_tec] = sample_search_function( 20, 85, 0 ), toc
gim_tec =
164
Elapsed time is 0.265756 seconds.
where
function ION = cssm()
str = fileread( 'c:\m\cssm\CODG1520.txt' );
ca1 = regexp( str, '(?<=START OF TEC MAP).+?(?=END OF TEC MAP)', 'match' );
ION = nan( 73, 70, 11 );
lat2ix = @(lat) round((lat+87.5)/2.5)+1;
lon2ix = @(lon) round((lon+180)/5.0)+1; %#ok<NASGU>
ut2ix = @(ut) round(ut/2)+1;
for jj = 1 : length( ca1 )
buf = regexp( ca1{jj}, '\n', 'split', 'once' );
buf = regexp( buf{2} , '\n', 'split', 'once' );
ut = textscan( buf{1}, '%*f%*f%*f%f%*[^\n]' );
ut = ut{1};
ca2 = regexp( buf{2}, 'LAT/LON1/LON2/DLON/H', 'split' );
pos = ca2{1};
for kk = 2 : length( ca2 )
lat = textscan( pos,'%f%*[^\n]' );
lat = lat{1};
num = sscanf( ca2{kk}(1:end-60), '%f' );
pos = strtrim( ca2{kk}(end-60+1:end) );
ION(:,lat2ix(lat),ut2ix(ut)) = num;
end
end
end
  9 Comments
Dogan Deniz Karadeniz
Dogan Deniz Karadeniz on 21 Jun 2019
@per isakson: Is it possible to read many files instead of your giving example (CODG1520) for your cssm function?

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!