Read fixed file format with multiple new line characters?

1 view (last 30 days)
Hello All,
I am trying to read in a file-type that is supplied by a data vendor (IHS) that has basic header data and oil and gas production data. It is a fixed file format with various new-line characters that indicate what type fo data it is. A small version of this file is copied and pasted at the bottom.
Every well's data will start with "START_US_PROD" followed by an entity number.
++ as new line tells us entity number
+A tells us the state, unique well number, etc
+AT, +AR, +A# all give different properties of the well
+B, +C give information about who drilled and on what lease
+D! gives location information
+F tells us annual production
+G tells us monthly production
Every well will end with "END_US_PROD" followed by the entity number
I would like to read this information into an array or series of arrays, but I can't figure out where to start at all with this type of file.
pre-formatted
IHS Inc. US PRODUCTION DATA 298 1.1 FIXED 2014/09/13 160
START_US_PROD 142020003009
++ 142020003009 Enerdeq
+A TX42037726255KARNES 006385 OI602EDRD 220
+AT02 HERNANDEZ ANDRES4 42255
+AR003009 015874L0066502
+A# EDWARDS
+B KOTARA REGINA ET AL CHEVRON U S A INCORPORATED
+C PANNA MARIA 196204197009 EDWARDS
+D 42255006650000 MULTI A0VI
+D! 28.98917 -97.92547SN
+F 1962 581 0 0
+G 19620131 0 0 0 0 0
+G 19620228 0 0 0 0 0
+G 19620331 0 0 0 0 0
+G 19620430 1279 0 0 1 0
+G 19620531 176 0 0 1 0
+G 19620630 678 0 0 1 0
+G 19620731 855 0 0 1 0
+G 19620831 908 0 0 1 0
+G 19620930 688 0 0 1 0
+G 19621031 630 0 0 1 0
+G 19621130 712 0 0 1 0
+G 19621231 701 0 0 1 0
+F 1963 7208 0 0
+G 19630131 700 0 0 1 0
+G 19630228 709 0 0 1 0
+G 19630331 746 0 0 1 0
+G 19630430 668 0 0 1 0
+G 19630531 638 0 0 1 0
+G 19630630 615 0 0 1 0
+G 19630731 641 0 0 1 0
+G 19630831 636 0 0 1 0
+G 19630930 610 0 0 1 0
+G 19631031 1425 0 0 1 0
+G 19631130 1374 0 0 1 0
+G 19631231 762 0 0 1 0
+L 20140131 ORUN MOBPL
END_US_PROD 142020003009
START_US_PROD 142020003277
++ 142020003277 Enerdeq
+A TX42038652255KARNES 240292 OI602EDRD 220
+AT02 S532 GARY ISAAC 124 42255
+AR003277 015874L0013302
+A# EDWARDS
+B MIKA MARY BLACKBRUSH OIL & GAS LLC
+C PERSON 196203200612 EDWARDS
+D 42255001330000 1 D0VI
+D! 29.01843 -97.85792SN 10875
+E 001 10875 34.0 11117 U 19700101
+E 002 10875 33.0 11117 U 19700201
+E 003 10875 28.0 11117 U 19700301
+E 004 10875 29.0 10655 U 19700601
+E 005 10875 26.0 10655 U 19700901
+E 006 10875 24.0 10655 U 19701001
+E 007 10875 20.0 14400 U 19701201
+E 008 10875 20.0 14400 U 19710101
+E 009 10875 24.0 12792 U 19710401
+E 010 10875 50.0 13380 U 19710601
+E 011 10875 33.0 13380 U 19711101
+E 012 10875 35.0 11886 U 19711201
+E 013 10875 35.0 11886 U 19720101
+E 014 10875 40.0 9850 U 19720301
+E 015 10875 34.0 11588 U 19720601
+E 016 10875 25.0 11588 U 19720701
+E 017 10875 72.0 20847 U 19720801
+E 018 10875 53.0 16075 U 19721201
+E 019 10875 53.0 16075 U 19730101
+E 020 10875 39.0 20000 U 19730601
+E 021 10875 37.0 26071 U 19730901
+E 022 10875 31.0 20161 U 19731201
+E 023 10875 31.0 20161 U 19740101
+E 024 10875 33.0 17030 U 19740301
+E 025 10875 32.0 17000 U 19740501
+E 026 10875 32.0 30 48.4 17000 U 19750101
+E 027 10875 31.0 30 49.2 17000 U 19750601
+E 028 10875 35.0 66 65.3 7142 F 19760426
+E 029 10875 55.2 45 44.9 2264 F 19770301
+E 030 10875 55.2 45 44.9 F 19770601
+E 031 10875 30.0 45 60.0 2233 F 19771001
+E 032 10875 28.8 60 67.6 2191 F 19780201
+E 033 10875 30.6 60 66.2 2156 F 19780427
+E 034 10875 29.7 28 48.5 1448 P 19790426
+E 035 10875 U 19800601
+E 036 10875 39.0 1 P 19800801
+E 037 10875 13.0 1 P 19810206
+E 038 10875 38.0 26 P 19811215
+E 039 10875 17.5 40 69.6 2285 P 19820206
+E 040 10875 24.6 61 71.3 3049 P 19830326
+E 041 10875 19.6 54 73.4 1990 P 19840309
+E 042 10875 54.0 74 57.8 1981 P 19850327
+E 043 10875 24.0 64 72.7 1833 P 19860211
+E 044 10875 12.0 65 84.4 1909 P 19870207
+E 045 10875 18.8 17 36 65.7 904 U 19910313
+E 046 10875 15.8 68 49 75.6 4303 P 19920323
+E 047 10875 15.0 61 80.3 3733 P 19930330
+E 048 10875 13.0 19 59.4 1615 P 19940327
+E 049 10875 15.0 70 10 4666 G 20050320
+D 42255001330000 MULTI A0VI
+D! 29.01843 -97.85792SN
+F 1962 0 0 0
+G 19620131 0 0 0 0 0
+G 19620228 0 0 0 0 0
+G 19620331 1147 0 0 1 0
+G 19620430 2273 0 0 1 0
+G 19620531 2464 0 0 1 0
+G 19620630 2200 0 0 1 0
+G 19620731 2178 0 0 1 0
+G 19620831 2280 0 0 1 0
+G 19620930 2278 0 0 1 0
+G 19621031 2285 0 0 1 0
+G 19621130 2275 0 0 1 0
+G 19621231 2271 0 0 1 0
+F 1963 21651 0 0
+G 19630131 2305 0 0 1 0
+G 19630228 2313 0 0 1 0
+G 19630331 2472 0 0 1 0
+G 19630430 2393 0 0 1 0
+G 19630531 2473 0 0 1 0
+G 19630630 2401 0 0 1 0
+G 19630731 2510 0 0 1 0
+G 19630831 2521 0 0 1 0
+G 19630930 2391 0 0 1 0
+G 19631031 2477 0 0 1 0
+G 19631130 2349 0 0 1 0
+G 19631231 2484 0 0 1 0
+L 20100131 CGRUN REGEF
+L 20100131 ORUN SHLTR
+L 20110131 CGRUN REGEF
+L 20110131 ORUN SHLTR
+L 20120131 CGRUN REGEF
+L 20120131 ORUN SHLTR
+L 20130131 CGRUN REGEF
+L 20130131 ORUN SHLTR
+L 20130831 ORUN UNKWN 37
+L 20140131 CGRUN REGEF
+L 20140131 ORUN SHLTR
END_US_PROD 142020003277

Answers (1)

per isakson
per isakson on 14 Sep 2014
Edited: per isakson on 14 Sep 2014
Matlab is neither good at reading fixed format nor files with multiple blocks of data. It requires a little "program" to read your file.
&nbsp
Proposal:
  • Attach a sample file to the question. Copy&Paste is error prone.
  • Specify (and attach) a template for the result in the form of a struct array, e.g. UsOil, or the data structure you prefer.
UsOil.entity_number
UsOil.unique_well_number
UsOil.state
...
UsOil.annual_production <kx4 double>
UsOil.monthly_production <mx5 double>
...
"+E" &nbsp what's that?
&nbsp
Assuming the entire file fits in memory I would use the approach
  • read the entire file as character, buffer
  • split buffer into blocks (an appropriate job for regexp (/strsplit))
  • loop over all blocks and assign data to the struct, UsOil
A bit like
strncmp({UsOil.state}, 'TX', 2 ) will spot all wells in Texas.
  1 Comment
Cameron Snow
Cameron Snow on 14 Sep 2014
I have updated the original post with an attachment. I will try using the buffer and split method suggested.

Sign in to comment.

Categories

Find more on Oil, Gas & Petrochemical in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!