Info

This question is closed. Reopen it to edit or answer.

How do I improve loop performance when going through a large dataset? Or use another method?

1 view (last 30 days)
I have a matrix with about a million rows, and I need to loop through it to find specific fields that match and create a newly organized table. I need to find unique combinations of the store (column 1) and the product (column 2) in inventory.textdata, and then pull the day (column 3) and the corresponding inventory in the inventory.data . Basically I want a list of inventory by day for each product for each store. I then need to analyze the change in inventory over time.
One complicating factor is there are missing days for some product location combinations, and I need to fill these in with zeros.
Here's a piece of the data in inventory.textdata:
'S0211' '000410013701328' '23-MAY-12'
'S0211' '000410013701212' '23-MAY-12'
'S0211' '000410013701250' '23-MAY-12'
'S0211' '000410013701168' '23-MAY-12'
'S1150' '000410013701304' '23-MAY-12'
'S1150' '000410013701236' '23-MAY-12'
'S0211' '000410013701311' '23-MAY-12'
'S0211' '000410013701199' '23-MAY-12'
'S0211' '000410013701120' '23-MAY-12'
'S0211' '000410013701175' '23-MAY-12'
'S1150' '000410013701243' '23-MAY-12'
And here's a piece of the data in inventory.data:
2
3
1
1
1
2
2
1
1
And I need to come up with something like:
'23-MAY-12' 1 1 1 1 2 4
'24-MAY-12' 1 1 1 1 2 4
'25-MAY-12' 1 1 1 1 2 4
'26-MAY-12' 1 1 1 1 2 4
'27-MAY-12' 1 1 1 1 2 4
'28-MAY-12' 1 1 0 1 2 4
Where the columns refer to different product inventory levels.
You can see I have yet to integrate different stores, I'm trying to get through a one store solution first.
Basically I'm creating an excel pivot table, and filling in some missing values. These takes a few seconds to breakout in excel, but although the below loop works, it would take hours. Right now I'm just looking at a specific store (the 6:6 part of the loop)
function [ sizetable] = SizeBreakdown(inventory, stores, product_id, sizetable)
for i = 6:6
for j = 1:numel(product_id)
for k = 1:numel(inventory.data)
if and(strcmp(inventory.textdata{k,1},stores{i}), strcmp(inventory.textdata{k,2},product_id{j}))
sizetable(find(sizetable == datenum(inventory.textdata(k,3)) ),j+1) = inventory.data(k);
%sizetable(k,:) = [datenum(inventory.textdata(k,3)) inventory.data(k)];
end
end
end
end
end
I've tried to be clear as possible since this is a bit confusing, but is there a better matrix function to accomplish this? Or am I stuck with a loop, and is there a faster way to loop? I tried parfor but ran into a number of issues.

Answers (1)

Babak
Babak on 27 Nov 2012
My understanding of your question is that, you need to find specific data inside a huge structure. Your structure called inventory is something that has been developed in time and stuff will be probably added to it (or removed) but in time there won't be a huge change on it. ( you won't just suddenly everyday move to another similar structure which is totally different)
So, to deal with this kind of huge structure, which has different columns and rows, I would have the whole structure be sorted based on different orders and save it in those sorted formats. you can sort your structure, based on "dates" and save it as inventory_dates, or sort it based on stores and save it as inventory_stores or products and save it as inventory_products. Then when you are looking for a specific store name you could just simply look at inventory_stores. You can write much faster algorithms that search for stores in inventory_stores instead of inventory. The only extra step you need to take, is to regularly (like everyday or week) resort the inventory structure and resave inventory_stores, inventory_products, inventory_date.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!