MATLAB Answers

Parsing of financial data file that is very irregular in format

28 views (last 30 days)
Mark Smith
Mark Smith on 1 Aug 2020 at 19:42
Commented: dpb on 2 Aug 2020 at 21:01
Thanks in advance for any responses. I am a retired engineer with some programming experience and am working to use Matlab for portfolio analysis. I am currently at the point of reading data from my investment company's data files. I can obtain data on my current holdings as four CSV files that are of similar but different formats. Ther are intended to be printed as PDFs I think.
Numeric values sometimes have $ symbols and sometimes have commas and sometimes have other formats. I have written some code that is successful in extracting Date, Symbol, Description, Quantity and Price from each of the files that is based on counting commas and performing IF tests. There is more information in which I may be interested but for which I have not written code. I have attached the code file which works for the full files and a sample of two files that differ in format but which have personal data removed and most of the data removed. I have not converted the attached DOCX files back to CSV to try my code on them.
I feel that what is shown in "importMLdata.m" is a brute force approach and am looking for more elegant methods for attacking these files.
What I would like in responses is direction to advanced file and string parsing methods. If the use of nested IF statements as I have used is the best approach, I would appreciate affirmation of that also so I do not think Matlab is hiding some strength from me in its documentation.
I appreciate your indulgence in evaluating this general question. The attached matlab code is a work in progress as shown by the many commented out lines but it currently does work for multiple files in a directory. I would also appreciate your comments on programming style, and any other issues you see in the code.
Thank you

  6 Comments

Show 3 older comments
Mark Smith
Mark Smith on 2 Aug 2020 at 17:06
Thank you DB and CL for your insights. I will try the approach suggested and learn some more Matlab. With regard to the presence of garbage such as the input1, input2 variables, etc.. in the function name, I just had not deleted the unnecessary stuff since the code was still being developed. If I find your suggested method better, I will mark the question answered.
I really appreciate your help
dpb
dpb on 2 Aug 2020 at 18:54
Undoubtedly it will take some effort to implement the idea; particularly if (like me) you're not all that adept with regular expressions, but I have no doubt it will be much better and more easily maintained in the end to go that route.
I believe the route will lead to more generic code by far and that it would then be the basis to expand to other brokerages more easily once it's working than the other way by starting each from scratch.
Factorization will be your friend here as well as mentioned above...
dpb
dpb on 2 Aug 2020 at 21:01
BTW.
function [output1,output2] = importMLdata %(input1,input2,input3)
takes care of the spurious inputs...

Sign in to comment.

Answers (0)

Products


Release

R2020a