create csv files by extracting data from a starting csv file
You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Show older comments
0 votes
Hi everyone, I am writing to you as not being familiar with matlab I do not know the commands and I would need your help to create a script.
I have a large csv file that cannot be opened with excel, within this file there are many lines that are composed of both numbers and letters, within these lines there are codes formed by letters and numbers (eg UPRE 14234) that identify the element that interests me and to which they belong, in total there are 21 different UPRE codes.
What I would like is to divide the initial csv file into 21 csv files each composed with the lines concerning only one of those UPRE codes.
How can I do? Can anyone tell me what functions to use and how to write the script?
Accepted Answer
Walter Roberson
on 30 Oct 2020
0 votes
I suggest you work with lower level utilities:
- grep or egrep if you are using Mac or Linux
- findstr or select-string if you are using MS Windows
https://stackoverflow.com/questions/1416241/is-there-a-pattern-matching-utility-like-grep-in-windows
These will be much more efficient than the equivalent MATLAB code.
Another option is to make use of the fact that Perl is shipped with MATLAB.
18 Comments
Walter Roberson
on 30 Oct 2020
Example:
infile = 'exp_misure_prelievo.txt';
outdir = 'UPRE_files';
UPREs = {'UPRE_S14SPLO_901', 'UPRE_S14LCRN_901'};
if ~exist(outdir, 'dir'); mkdir(outdir); end
for K = 1 : length(UPREs)
this_UPRE = UPREs{K};
outfile = fullfile(outdir, [this_UPRE '.txt']);
cmd = sprintf('FINDSTR "|\"%s\"|" < "%s" > "%s"', this_UPRE, infile, outfile);
[status, result] = system(cmd);
if status ~= 0
fprintf('Problem processing %s, output was:\n', this_UPRE);
fprintf('%s\n', result);
end
end
fprintf('Done\n');
Giuseppe D'Amico
on 30 Oct 2020
it seems not to work, this is what it appears to me
Walter Roberson
on 30 Oct 2020
I was reading off the file name exp_misure_prelievo from the Excel image you posted, but I did not know the file extension.
When I look at the list of files in your current folder, I see the MATLAB source code file Untitled.m and I see the new directory just created, but the only other thing I see is Name which I cannot tell anything about.
You should change the line
infile = 'exp_misure_prelievo.txt';
to reflect your actual input file name -- fully qualified if it is not in the current directory.
For example if that Name file is your input data, then
infile = 'Name';
Giuseppe D'Amico
on 30 Oct 2020
ok now it created me two files but they are empty
Walter Roberson
on 30 Oct 2020
We will need an actual file to test with.
Note: I have not booted into Windows for a few months, so I am not sure when I would be able to test findstr
Giuseppe D'Amico
on 30 Oct 2020
what would you need?
Walter Roberson
on 30 Oct 2020
The data you posted an image of should be enough for testing.
Giuseppe D'Amico
on 31 Oct 2020
I don't know why the files it saves me come out empty
Giuseppe D'Amico
on 1 Nov 2020
sorry I saw that you had defined the input file as a vector, instead of defining it so I imported the data on matlab, now having the data on matlab how can I extract them and create the csv files?
Walter Roberson
on 1 Nov 2020
infile needs to hold the name of the file to be processed. Do not use importdata for it unless the file you are importing contains the name of the input file (and you are prepared for the fact that importdata will return a struct or table)
I am still waiting for the file of sample data.
The code I posted creates the csv files.
Giuseppe D'Amico
on 1 Nov 2020
I don't understand which file you want, the starting file?
Walter Roberson
on 1 Nov 2020
in https://www.mathworks.com/matlabcentral/answers/631569-create-csv-files-by-extracting-data-from-a-starting-csv-file#comment_1098684
you posted an image of the first 21 lines of your data file. Please extract those lines and attach them as a file. 21 lines should be enough to create a proof of concept.
Giuseppe D'Amico
on 1 Nov 2020
here's to you
Walter Roberson
on 1 Nov 2020
That looks like it will be fine.
I have been trying to fall asleep for the last 7 hours so it will be a few hours before I am at my desk to test with (and more hours to get Windows booted... I haven't booted it for a month or more)
Giuseppe D'Amico
on 1 Nov 2020
I tried it as is but the files it generates are empty.
Giuseppe D'Amico
on 2 Nov 2020
hello, sorry for the trouble, did you by chance have the opportunity to try to create the csv files?
Walter Roberson
on 2 Nov 2020
I managed to upgrade my Windows to the latest release and all the other update housekeeping tasks (except I didn't finish defragmenting the drive) . I managed to install Mac OS Catalina and use that to download all of the Mac and Windows install files and install R2020b on Catalina, which gives me the files I need to flip over to Windows to install R2020b. So I have everything ready.
... but by then it was close to 4am so I started playing a game that I had been waiting on for the last year, as it needed Catalina and I had always been too busy answering questions to install Catalina before.
Walter Roberson
on 3 Nov 2020
The below is tested.
Please note that you will need to add all strings that you want searched for into the UPREs cell array, and you must use exact matches.
The command that is created is quite sensitive to which quotes are used and how many are used. The rules for quoting strings in MS Windows are not well documented, and are quite different from Unix. The rules for handling double-quotes are particularly strange.
This code expects a text file, not a .xlsx file.
infile = 'exp_misure_prelievo.txt';
outdir = 'UPRE_files';
UPREs = {'UPRE_S14SPLO_901', 'UPRE_S14LCRN_901'};
if ~exist(outdir, 'dir'); mkdir(outdir); end
for K = 1 : length(UPREs)
this_UPRE = UPREs{K};
outfile = fullfile(outdir, [this_UPRE '.txt']);
cmd = sprintf('FINDSTR "|"""%s"""|" "%s" > "%s"', this_UPRE, infile, outfile);
[status, result] = system(cmd);
if status ~= 0
fprintf('Problem processing %s, output was:\n', this_UPRE);
fprintf('%s\n', result);
end
end
fprintf('Done\n');
I had to do a lot of work to get to the point of being able to debug this problem. I only boot Windows every few months, and it turned out to be a truly remarkable amount of work to get my Mac to share some files with Windows.
More Answers (2)
Jon
on 30 Oct 2020
0 votes
Without knowing all of the details of what is in your files it is hard to give a specific answer. However for general advice I would suggest using the matlab function readcell. This will read the entire file into a cell array. You can then work with the data in that cell array relatively easily indexing by rows and columns. Maybe you can do the processing you want on this overall cell array and it isn't even necessary to create many smaller csv files unless you need them for some other task.
1 Comment
Giuseppe D'Amico
on 30 Oct 2020
the first line looks like this: ANNO|"MESE"|"GIORNO_H"|"ID_ELEMENTO"|"TIPO_ELEMENTO"|"VERSIONE"|"DATA_VAL_SAPR"|"COD_IMPIANTO"|"CODICE_UP"|"EEA"|"EUA"|"EEI"|"EUI"|"EEC"|"EUC"
from the second on they are like this:
2017|1|29-GEN-17 19:45:00|"PVP_S14SPLO_901"|"PVP"|1|19-MAG-17 10:59:40|"S14SPLO"|"UPRE_S14SPLO_901"|0|1476|0|60|0|12
UPRE_S14SPLO_901 this is the code that interests me.
I would need the 21 files to be able to rework them later.
I am attaching an image of the starting csv file.
Giuseppe D'Amico
on 2 Nov 2020
0 votes
ok, I'll wait
Categories
Find more on Spreadsheets in Help Center and File Exchange
Products
Tags
See Also
on 30 Oct 2020
on 3 Nov 2020
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)