How do I reshape 3 vectors (date, depth, and variables) into a matrix separated by day?

This is my first question, and I’m struggling for an eloquent way to do this. I have a large dataset of vectors separated by parameter, and I would like to reshape into a matrix in order to do 3D Matlab manipulations. For example, I get an error if I try to use pcolor with 3 vectors. I’ve attached a data sample.
I’d like to end up with a matrix where X = Datetime (separated by day), Y = water depth, and Z = variable (such as water temperature, salinity, etc.).
Here is a snapshot of the existing data format, where size of each vector = 7 x 1:
SampleDate = [01/28/2014 12:13:37; 01/28/2014 12:14:58; 01/28/2014 12:15:20; 02/06/2014 11:02:42; 02/06/2014 11:05:59; 02/06/2014 11:06:24; 02/06/2014 11:06:32];
Depth = [1.5; 5; 10; 1; 5; 10; 12];
SampleTemperature = [8.22; 8.29; 8.29; 7.72; 7.88; 7.9; 7.92];
I would like to transform to a matrix, so that I end up with instead a size of 4 x 2. For example,
SampleDate = 01/28/2014 12:13:37 02/06/2014 11:02:42
01/28/2014 12:14:58 02/06/2014 11:05:59
01/28/2014 12:15:20 02/06/2014 11:06:24
NaN 02/06/2014 11:06:32
SampleTemperature = 8.22 7.72
8.29 7.88
8.29 7.9
NaN 7.92
I recognize that I will have to have NaN values if some of the vectors are shorter than others, and the length of the matrix will be equal to the max of the number of samples per day, so I put this in the example.
I will be filtering the data first so that I use only the value of “Down” for the down cast results. Below is the import code that I have currently to load the attached file.
[file, pathname] = uigetfile({'*.*', 'All Files (*.*)'}, 'Load the CTD profile xls file');
fid = fopen(file);
headerline = fgetl(fid);
formatSpec = '%s%s%f%s%s%f%s%f%s%f%s%f%s%f%s%f%s%f%s%f%s%f%s%s%s';
data = textscan(fid,formatSpec,'Delimiter','\t');
fclose(fid);
%Pull out the row with the headers
headers = textscan(headerline,'%s','Delimiter','\t');
%Remove special characters from the header text
headers{1,1} = regexprep(headers{1,1},'[%!/()^, =]','');
%Convert 1st column into the date time format
data{2} = datetime(data{2}, 'Format', 'MM/dd/yyyy HH:mm:ss', 'InputFormat', 'M/d/yyyy h:mm:ss a');
%Create a structure array
datastr = struct();
for i=1:numel(headers{:})
datastr.(headers{1}{i}) = data{i};
end
%Next, index so looking at the down cast only
idxD = ~cellfun('isempty',strfind(datastr.Updown,'Down'));
%Pull out vectors of interest
SampleDate = datastr.Sampledate(idxD);
Depth = datastr.Depth(idxD);
SampleTemperature = datastr.SampleTemperatureFielddegC(idxD);
clear ans fid file formatSpec headerline k n tline i pathname data headers;

2 Comments

I do not understand the question. What is "a matrix separated by day" and a "vectors separated by parameter"?

Sign in to comment.

 Accepted Answer

clc; clear all;
SampleDate = [{'01/28/2014 12:13:37'}; {'01/28/2014 12:14:58'}; {'01/28/2014 12:15:20'}; {'02/06/2014 11:02:42'};
{'02/06/2014 11:05:59'}; {'02/06/2014 11:06:24'}; {'02/06/2014 11:06:32'}];
Depth = [1.5; 5; 10; 1; 5; 10; 12];
SampleTemperature = [8.22; 8.29; 8.29; 7.72; 7.88; 7.9; 7.92];
%%seperate dates by day
data = datevec(SampleDate) ;
days = data(:,2) ;
[c,ia,ib] = unique(days) ;
%%Get indices of repeated values
N = length(c) ;
idx = cell(N,1) ;
for i = 1:N
idx{i} = days==c(i) ;
end
%%seperate the given arrays
SD = cell(N,1) ; % sample date reshaped
D = cell(N,1) ; % sample depth reshaped
ST = cell(N,1) ; % sample temperature reshaped
for i = 1:N
SD{i} = SampleDate(idx{i}) ;
D{i} = Depth(idx{i}) ;
ST{i} = SampleTemperature(idx{i}) ;
end
Your data is reshaped into cells, you can access those using SD{1}, D{2} etc,.

3 Comments

Pardon the delay from the holidays. Thank you very much KSSV, this answers the 1st part of my question and I hadn't used "unique" function yet. The rest of my question is to figure out how to store this indexed data in a matrix (double array) rather than a cell array. So that I can use functions like pcolor. I can't use those on cell arrays with dimensions that don't agree.
So from the example, D would be a 4 x 2 double array, rather than a 2 x 1 cell array (with embedded 3x1 and 4x1 vectors). I'm working on this too but let me know if you get there first. Thanks.
Okay, on further inspection, I found a problem with using "unique" on the actual dataset based on the day. Some samples were collected on the same day num but on a different month. So they aren't separated yet. Attached the actual dataset, and the code below that I'm using to pull out the relevant variables (SampleDate, Depth, and SampleTemperature). How can I index the unique days by both day and month? Thanks.
clear all; close all;
[file, pathname] = uigetfile({'*.*', 'All Files (*.*)'}, 'Load the CTD profile xls file');
fid = fopen(file);
headerline = fgetl(fid);
formatSpec = '%s%s%f%s%s%f%s%f%s%f%s%f%s%f%s%f%s%f%s%f%s%f%s%s%s';
data = textscan(fid,formatSpec,'Delimiter','\t');
fclose(fid);
%Pull out the row with the headers
headers = textscan(headerline,'%s','Delimiter','\t');
%Remove special characters from the header text
headers{1,1} = regexprep(headers{1,1},'[%!/()^, =]','');
%Convert 1st column into the date time format
data{2} = datetime(data{2}, 'Format', 'MM/dd/yyyy HH:mm:ss', 'InputFormat', 'M/d/yyyy h:mm:ss a');
%Create a structure array
datastr = struct();
for i=1:numel(headers{:})
datastr.(headers{1}{i}) = data{i};
end
%Next, index so looking at the down cast only
idxD = ~cellfun('isempty',strfind(datastr.Updown,'Down'));
%Pull out vectors of interest
SampleDate = datastr.Sampledate(idxD);
Depth = datastr.Depth(idxD);
SampleTemperature = datastr.SampleTemperatureFielddegC(idxD);
clear ans fid file formatSpec headerline k n tline i pathname data headers;
OK, my last comment on this. I think I answered the rest of my own question. I have some erroroneous data to eliminate because of sorting problems, but I'm almost there with below code....
%%Separate dates by unique day
DatesData = datevec(SampleDate) ;
Days = DatesData(:,1:3);
Days = datetime(Days);
[ProfileID,ia,ib] = unique(Days) ;
%%Get indices of repeated values
N = length(ProfileID) ;
idxU = cell(N,1) ;
for i = 1:N
idxU{i} = Days==ProfileID(i) ;
end
%%Create empty matrices and populate with data.
%Use the max cell array size for the # of rows
MaxRows = 425;
T = NaN(MaxRows,N); %SampleTemperature
t = NaN(MaxRows,N); %SampleDate
z = NaN(MaxRows,N); %Depth
for ind = 1:N
%Populate matrices with data
Ti = SampleTemperature(idxU{ind});
T(1:length(Ti), ind) = Ti;
zi = Depth(idxU{ind});
z(1:length(zi), ind) = zi;
ti = datenum(SampleDate(idxU{ind}));
t(1:length(ti), ind) = ti;
clear Ti zi ti
end

Sign in to comment.

More Answers (1)

I'm not sure I've understood exactly what you need to do, but this seems related:
Create a table containing data like yours:
>> Date = datetime({'01/28/2014 12:13:37'; '01/28/2014 12:14:58'; '01/28/2014 12:15:20'; '02/06/2014 11:02:42'; '02/06/2014 11:05:59'; '02/06/2014 11:06:24'; '02/06/2014 11:06:32'});
>> Depth = [1.5; 5; 10; 1; 5; 10; 12];
>> Temperature = [8.22; 8.29; 8.29; 7.72; 7.88; 7.9; 7.92];
Add some variables that will help transform the data:
>> ID = [1;2;3;1;2;3;4];
>> Day = dateshift(Date,'start','day');
>> Day = categorical(Day,datetime({'01/28/2014' '02/06/2014'}),{'First' 'Second'});
>> t = table(ID,Day,Date,Depth,Temperature)
t =
ID Day Date Depth Temperature
__ ______ ____________________ _____ ___________
1 First 28-Jan-2014 12:13:37 1.5 8.22
2 First 28-Jan-2014 12:14:58 5 8.29
3 First 28-Jan-2014 12:15:20 10 8.29
1 Second 06-Feb-2014 11:02:42 1 7.72
2 Second 06-Feb-2014 11:05:59 5 7.88
3 Second 06-Feb-2014 11:06:24 10 7.9
4 Second 06-Feb-2014 11:06:32 12 7.92
Unstack the "stacked" data to an "unstacked" format:
>> t = unstack(t,{'Date' 'Depth' 'Temperature'},'Day','GroupingVariable','ID')
t =
4×7 table
ID Date_First Date_Second Depth_First Depth_Second Temperature_First Temperature_Second
__ ____________________ ____________________ ___________ ____________ _________________ __________________
1 28-Jan-2014 12:13:37 06-Feb-2014 11:02:42 1.5 1 8.22 7.72
2 28-Jan-2014 12:14:58 06-Feb-2014 11:05:59 5 5 8.29 7.88
3 28-Jan-2014 12:15:20 06-Feb-2014 11:06:24 10 10 8.29 7.9
4 NaT 06-Feb-2014 11:06:32 NaN 12 NaN 7.92
Now you can access the paired variables using braces and get back one matirx for each pair:
>> t{:,{'Date_First' 'Date_Second'}}
ans =
4×2 datetime array
28-Jan-2014 12:13:37 06-Feb-2014 11:02:42
28-Jan-2014 12:14:58 06-Feb-2014 11:05:59
28-Jan-2014 12:15:20 06-Feb-2014 11:06:24
NaT 06-Feb-2014 11:06:32
>> t{:,{'Temperature_First' 'Temperature_Second'}}
ans =
8.22 7.72
8.29 7.88
8.29 7.9
NaN 7.92

1 Comment

Hi Peter, thanks for your comments. This is on the right track, but I'm dealing with a very large dataset, so I won't be able to manually assign IDs and names. Ideally, I'd like each parameter stored in a separate table in order to use functions like pcolor. So, that I end up with the final tables like you gave, but in a more automated way. Thanks again!

Sign in to comment.

Categories

Asked:

on 23 Dec 2016

Commented:

on 4 Jan 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!