Loop through CSV (or XLS), compare values in rows and create new Cell Array based on results

7 views (last 30 days)
Attached is a sample xls and csv of the data I'm looking at. There are 5 columns: Vendor, Sku, Month, Day, Quantity. I want to loop through the imported data and create new cell arrays based on the 3 categorical variables: Vendor, Sku, Month and then store Day and Quantity in the appropriate place. So the desired output is multiple cell arrays which are broken down by Vendor, then Sku, and Month. (Ex.: One cell array for Vendor A, Sku B, and Month C. Then another for Vendor A, Sku B, and Month D.) This would continue as it iterates through all the rows which have the same Vendor, Sku, and Month until Month changes. It then creates a new cell array which would have the same Vendor and Sku as the last, but for a new month. For each row, it'll have to check and compare Vendor, then Sku and finally Month.
I'm fairly new working with this sort of data organization in Matlab and this is the way a senior dev recommended to do it. However, I appreciate any help with this problem as well as any addition recommendations such as the best format or data structure to use.
Thank you ahead of time for any and all help.
I am using Matlab 2018b and have the Deep Learning, Parallel Computing, Statistics and Machine Learning Toolbox.
  2 Comments
Guillaume
Guillaume on 10 May 2019
I'm not particularly clear on the desired output. Can you give an example using valid matlab syntax?
However it sounds like you want a cell array of cell arrays of cell arrays. This sounds like a nightmare. In old versions of matlab, that may have been a good way of storing your information but nowadays, that would be a big waste of time. What is the data going to be used for? It is more likely that leaving the data as it is already may be the most efficient for future processing.
In any case, it is unlikely that a loop is needed to produce whatever it is you want.
By the way, what's the point of the ; following the numeric data? All it does it prevent matlab recognising they're numbers. It can be worked around at the cost of a few extra lines of code, but it'd be better if it wasn't there in the first place.
Zac Lingen
Zac Lingen on 15 May 2019
This data is going to be used for a NN forecasting model. I want the data to be divided into it's own cell array based on Vender, Sku, and Month. So for one vendor and one sku, I would have 12 different cell arrays that are organized by month. For instance, the cell array for January would have all Quantities for every day in January from 2013 to 2019. This would continue for every month for this specific sku. Then, it would continue and create another Sku at the same retailer.
At the moment, my DB guy is getting me data breakdowns by retailer so all I would need is a breakdown via Sku and Month in the fashion I described above.
Regarding the semicolon, it was how I was originally getting my data. However, I am now getting data without the semicolon.
I appreciate the help and sorry for not getting back to you sooner.

Sign in to comment.

Answers (2)

Guillaume
Guillaume on 16 May 2019
As I commented, in my opinion what you're asking is not a good idea. Using cell arrays of cell arrays just make it more difficult to access the data. Instead, I would just keep the data as one big flat table and use the group processing functions of matlab to apply whatever processing you're going to do to all the groups at once. These group processing functions, principally rowfun and splitapply require a flat table.
But anyway, you'd use the same function to generate your cell array so I'll show you how to do it. Here, the function that is applied is make a cell array of this group (actually here, I'm making tables) but you can do the same with do NN on this group.
Using findgroup and splitapply:
%readtable works exactly the same with csv or excel files. So you can replace the filename by 'Sample.xls' and it will work just as well
%With ; after the number, readtable will read these entries as text. Without the ; it will correctly interpret them as numbers
sales = readtable('Sample.csv');
[groupid, vendor, sku, month] = findgroups(sales.VENDOR, sales.SKU, sales.MONTH); %indentify unique combinations of vendor, sku, month
groupedsales = splitapply(@(DAY, QUANTITY) {table(DAY, QUANTITY)}, sales(:, {'DAY', 'QUANTITY'}), groupid);
Using rowfun instead:
sales = readtable('Sample.csv');
groupedsales = rowfun(@(DAY, QUANTITY) table(DAY, QUANTITY), sales, 'GroupingVariables', {'VENDOR', 'SKU', 'MONTH'}, 'OutputFormat', 'cell'); %you may want a table output though

Sulaymon Eshkabilov
Sulaymon Eshkabilov on 10 May 2019
Here is the MATLAB generated function file attached (IM_data.m) that creates categorical cell array data. You can generate this type of MATLAB auto function by using [Import Data] from the menu window. Or edit this function w.r.t your needs easily.
Or create cell arrays with: >> importdata('Sample.csv');
Or create structure array variable with: >> uiimport('Sample.csv');

Categories

Find more on Data Type Identification in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!