Info

This question is closed. Reopen it to edit or answer.

How do I use indexing to create discrete column vectors for a large data set?

1 view (last 30 days)
I have different parameters for about 600 patients in a single spreadsheet giving me about 400K rows of data with 30 columns. The data is organized such that the patients are stacked on top of each other and each identifier is in the first column. Thus, the first 700ish rows belong to patient 1, the second 600ish rose belong to patient 2 so and so forth. I would like to automate the separation of this data so that I have specific arrays and/or column vectors that belong to only 1 patient.
For example current data:
Patient 1 Time 1 Heart Rate 1
Patient 1 Time 2 Heart Rate 2
Patient 2 Time 1 Heart Rate 1
Patient 2 Time 2 Heart Rate 2
I would like to be able to refer to each variable for each patient in its own column vector. This will allow me to plot the different patient information. For example, variable "T1" returns all the time values for patient 1. Variable "HeartRate1" returns all the heart rate values for patient 1. I would then plot the time and heart rate for patient 1.
Thanks for any help.
Chris

Answers (2)

Walter Roberson
Walter Roberson on 2 Jun 2015
It sounds like perhaps you would like table() for some of what you are doing. Perhaps a vector of tables.
To do the grouping, read the data into a cell array, and then
[uniquepatients, ~, uidx] = unique(FullData(:,1), 'stable');
numpatients = length(uniquepatients);
PerPatientData = cell(numpatients, 1);
PatientIDs = cell(numpatients, 1);
for K = 1 : numpatients
PatientIDs = uniquepatients(K);
PerPatientData(K) = FullData(uidx==K, 2:end); %presuming cell array!
end
  3 Comments
Walter Roberson
Walter Roberson on 2 Jun 2015
Note: I edited your sample data to make the tabs obvious, as the forum runs tab-separated parts together.

Peter Perkins
Peter Perkins on 3 Jun 2015
Christopher, your sample data is different than your original example. I'm gonna go with the first (mostly because I don't really understand the sample data). If you have a spreadsheet file, read it into MATLAB using readtable. Then you could do one of two things.
  • If the vectors of times are the same for all the patients, use unstack to turn the (600-ish*700-ish)-by-3 table into a (700-ish)-by-(600-ish+1) table, with one row for each time, one (common) Time variable, and 600-ish HeartRate variables.
  • If the time vectors are not all the same, you can create one table for each patient using the following:
Some example data:
>> t = table(Patient,Time,HeartRate)
t =
Patient Time HeartRate
_______ ____ _________
111 1.1 0.83083
111 2.1 0.58526
111 3.1 0.54972
222 1.2 0.91719
222 2.2 0.28584
222 3.2 0.7572
333 1.3 0.75373
333 2.3 0.38045
333 3.3 0.56782
Now turn that into three tables, each in its own cell of a cell array (there are several ways to do this, here's one):
>> splitter = @(Time,HeartRate)table(Time,HeartRate);
>> patients = rowfun(splitter, t, ...
'GroupingVariable','Patient', 'OutputFormat','cell')
patients =
[3x2 table]
[3x2 table]
[3x2 table]
And then, for example,
plot(patients{1}.Time,patients{1}.HeartRate)
This assumes you have R2013b or later. Some tweaks you might consider:
  • Convert the patient variable in the table to a categorical vector. Of course, if you're just gonna throw that tall table away, may that's not needed.
  • If you have R2014b or later, use a datetime or duration vector for the Time variable.
Hope this helps.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!