multivariate regression with multi dimensional variables

Dear Reader,
I'm having trouble with implementing a regression.
I have the following:
I have a 3D matrix results_demands(24,1,n) and a 3D matrix results_han(24,3,n)
Where 24 are the time steps and n are the observations.
I want to model the regression between the (24,1) vector results_demands and (24,3) matrix results_han
What I already tried is splitting up the results_han such that I have a matrix with n rows(observations) and 3 columns and every input is a (24,1) vector.
So that we have 3 response variables and 1 explanatory variable(results_demands) all variables in form of a vector.
How do I need to implement such extra dimensional case.
A visualization of the 3D matrices.
results_han : (24,3,n):
[ [vec_i] [vec_i] [vec_i]
[vec_i] [vec_i] [vec_i]
......................
...................... ]
with vec_i a (24,1) column vector
results_demands: (24,1,n):
[ [vec_k]
[vec_k]
..........]
with vec_k a (24,1) column vector
Thank you in advance!

10 Comments

Form the obvious (24*n,3) matrix A from results_han and the (24*n,1) vector b from results_demands and solve for the (3,1) vector p of regression coefficients using p = A\b.
Or how does your regression model look like ?
Dear @Torsten thank you! By 'plugging' them below each other, does it still make a regression such that the 24x3 matrix results_han can be obtained from the 24x1 vector results_demands?
I don't want to make a regression with 1 dimensional values but with vectors vs matrices.
I still don't get your regression model. How many regression coefficients do you expect as outcome ?
@Torsten Just one actually.
Y = X *a + E
Where Y is the (24x3) matrix and X is the (24x1) vector!
and a is a (1x3) vector.
The main idea is to find a regression such that the matrix Y (results_han) can be expressed in terms of results_demands vectors.
The 'depth' of the results_han and results_demands are the amount of data I have available. So if I run my (optimization) model 5 times with my results_demands and I obtain 5 results_han, my n is equal to 5.
So you make n independent regressions, and for each you get a different (1x3) vector a. In total you thus get 3*n regression coefficients. Is this correct ?
@Torsten I might be completely on the wrong path but if you want to measure a relation/correlation between 2 things, you need multiple observations to measure such a relation, right? So those are the n.
So ultimately I want to insert a vector of demands, and via the regression coefficients I want to obtain the matrix of han (here and now decisions). Those n is used to make a huge sample of (demands, han) to measure their relationship.
For this I need to make a (linear) regression and here is where I get stuck.
For me, the question arises which data can sensefully be taken together to be regressed against.
Is it senseful to assume that the here and now decisions are independent of the time they were taken to satisfy certain demands ? Then all of the observations can be taken in one matrix to determine only three regression coefficients.
Or are the decisions dependent on time ? Then it would be senseful to group the observations together that were taken at the same time and to regress them against. For this procedure, you would need 3*24 regression coefficients.
@Torsten They do depend on time.
I have an optimization model. (T = 24 amount of time steps)
One of the parameters of this model is a demand vector. For every time a certain demand is required, therefore we obtain a vector of demands(for every timestep t \in [1,24]).
The here and now decisions are decisions the model gives after solving the model. For every timestep we also have a certain here and now decision, therefore we obtain a vector of here and now decisions. Every factory has those decisions and are different from each other, since we have 3 factories we obtain a (24x3) matrix with here and now decisions.
My task is to discover if there is a correlation between those demand vectors and the here and now decisions.
So when I have a certain demand vector, can I via regression determine what the here and now decisions will be.
So the data that be regressed against is the demand vector and the here and now matrix.
So your advice is group results_demands and results_han together when they have the same timestep?
So the here and now decisions come from an underlying optimization model depending on demand vectors from three factories over a time horizon of 24 hours.
Is the complete demand vector for 24 hours known when the model starts to calculate the here and now decisions ? Or does the model only calculate the optimal strategy for the next hour knowing the demand for the next hour ?
Are the decisions taken from the 3 factories independent from another or do they take into account their respective demands and the deduced here and now decisions of the other factories ?
What you are supposed to do is to find a simpler (and faster) model than the big optimization model based on regression. But this won't be an easy task. My guess is that a neural network solution is the way to go. It is a (usually very complex) regression model, but the developer does not need to specify the dependencies, but the program tries to find them on its own in a training phase based on the demand/here and now decision data. Simple linear regression methods to reproduce the complex decisions coming from a nonlinear optimizer won't be sufficient in my opinion.
As you've already mentioned in your title, your problem is a multivariate regression (doc mvregress) since your response is affected by your independent variable, and you have a 3D dependent variable. Please note that this is equivalent (as mentioned) if you perform 3 separate univariate fits (fitlm for instance):
X = randn(1000, 1); % one predictor
Y = randn(1000, 3); % 3 response vars
% multivariate regression
newX = [ones(size(X)), X]; % add design matrix
coef.mv = mvregress(newX, Y);
% fit 3 separate lm
for i = 1:3
mdl = fitlm(X, Y(:, i));
coef.uv(:, i) = mdl.Coefficients.Estimate.';
end
norm(coef.uv - coef.mv) < eps % they're equivalent
ans = logical
1
However, the difference lies in covariance matrix of errors (noise) since mv regression accounts for the correlation between responses (for instance, if you're doing some hypothesis testing).

Sign in to comment.

Answers (1)

Hi Jane,
I understand that you are trying to perform a regression analysis where your explanatory variable is a (24,1) vector from results_demands for each observation, and you have three response variables, each a (24,1) vector from results_han, across the same observations. Given the structure of your data, you're aiming to model the relationship between these variables across time steps and observations.
To achieve this, you'll need to reshape your 3D matrices into 2D matrices where each row represents a time step for a given observation, and then perform regression analysis on these reshaped matrices. Here's how you can approach this problem in MATLAB:
  1. Reshape your 3D matrices: Convert results_demands into a 2D matrix of size (24*n, 1) and results_han into a 2D matrix of size (24*n, 3). This flattens your data across the time steps and observations, aligning them for regression analysis.
  2. Perform regression: Once you have your matrices reshaped, you can use MATLAB's regression functions, such as fitlm for linear regression, to model the relationship between results_demands and each column of results_han.
Here is how you can implement it:
% Assuming results_demands is of size (24, 1, n)
% and results_han is of size (24, 3, n)
n = size(results_demands, 3); % Number of observations
numTimeSteps = 24;
% Reshape the matrices
demands_reshaped = reshape(results_demands, [numTimeSteps*n, 1]);
han_reshaped = reshape(permute(results_han, [1 3 2]), [numTimeSteps*n, 3]);
% Now, demands_reshaped is (24*n, 1) and han_reshaped is (24*n, 3)
% Perform regression for each response variable in results_han
% Here's an example using the first column of han_reshaped as the response variable
lm = fitlm(demands_reshaped, han_reshaped(:, 1));
% Display the regression model summary
disp(lm);
% Repeat the regression for the other columns of han_reshaped as needed
This code snippet demonstrates how to reshape your 3D matrices into 2D matrices suitable for regression analysis and perform a linear regression between your explanatory variable (results_demands) and one of the response variables in results_han. You would repeat the regression process for each response variable in results_han as needed.
Remember, the reshaping aligns all time steps of each observation into a single row for the regression analysis, effectively treating each time step as a separate observation in the regression model. This approach assumes that the relationship you're modeling is consistent across all time steps.
Hope this helps.
Regards,
Nipun

Products

Release

R2021a

Asked:

on 29 May 2021

Answered:

on 22 May 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!