Attempting to find patterns within my data

16 views (last 30 days)
Hello everyone,
I have an idea I'd like to impliment, but I don't quite know how to.
I have created a script that will designate a 154xm matrix (the more data points I add, the more columns that are created). However, as it stands, I just have a long list of numbers, but it would be impossible to interpert this data once I add more data points (getting 154x100 matrix), so I want a write a program that can analyze the data for me.
It might just be easier for me to demonstrate what I want to do:
A =
8 1 2
2 4 4
5 2 1
6 1 3
1 1 1
Assume I have a 5x3 column. What I want to do is find a diagonal pattern that goes through my matrix at values below 2. So in this example if we scoure each column and the elements in this column, we can determine easily find the diagonal line that goes through each columns values that have a value below 2 (I have zeroed all the values out to demonstrate what I mean)
A =
0 0 0
0 0 0
0 0 1
0 1 0
1 0 0
Now I don't actually want to zero out my actual data (since multiple diaganol lines may exist), but I hope it's clear what I'm trying to do. I have found the pattern I was looking for in my data. Now in a 5x3 matrix, you can easily visualize this by looking at it, but using a 154x100 matrix, this becomes impossible to visualize.
If it helps, this is the script I am currently using to obtain my data:
predictions=load('Predictions2.txt');
experimental=load('Experimental.txt');
x=predictions(:,1);
error=predictions(:,2);
y=experimental(:,1);
z = zeros(1,6);
sizeval = 3; % in this example I am using 3 data points, so I will have 3 columns in my final matrix
b = zeros(sizeval,154);
d=(1:154); %this is simply for plotting purposes and is not used in any calculations
e=zeros(sizeval,6);
e=zeros(1,6);
for n=1:154 % there are 154 predictions, so I am determining the RMSD of 1 data point (using 6 different parameters) against each prediction
for j=1:sizeval % each data point has 6 parameters, here I am creating the loop to calculate RMSDs for multiple data points
for i=1:6 % I am taking the RMSD between the prediction and experimental values
xindex = i+(6*(n-1));
yindex=i+(6*(j-1));
z(i)=((x(xindex)-y(yindex)))^2;
e(1,i)=(z(i)/(error(xindex)^2));
if e(1,i)>1000
e(1,i)=0;
end
b(j,n)=sqrt((1/5)*sum(e,2)); %this is the output of my data, creating a 154xm (m being data points) matrix
end
end
b'
end
With an output like this:
ans =
3.9481 5.3775 5.1606
4.4432 3.6738 3.7466
2.7247 6.6981 6.7029
5.4045 4.2693 3.9113
1.3158 10.7013 10.4940
7.9002 6.2291 5.8123
2.2395 10.3191 10.1340
2.6847 9.3292 9.2099
7.5437 7.5024 7.2936
5.8558 8.5550 8.3015
1.6878 11.2286 11.0484
6.7887 8.6833 8.4203
12.6863 1.7771 0.9488
13.4256 4.2317 3.4892
2.3376 8.3851 8.2385
5.0820 5.3472 5.0439
10.3929 1.7875 1.3311
4.1463 3.4607 2.2643
6.0488 5.8100 5.6339
...
  5 Comments
Image Analyst
Image Analyst on 2 Aug 2019
It's easy to get the 1's in A by doing:
[rows, columns] = find(A);
If you each separate, contiguous grouping of 1's in A, then you can use bwlabel() and/or regionprops() depending on exactly what you want. Post your larger matrix in the text files, if you want an example.
Image Analyst
Image Analyst on 3 Aug 2019
You can certainly threshold
A = b < someValue; % Produces a logical matrix. Or use > someValue.
Then you can skeletonize the lines/regions down to single pixel wide lines with bwmorph()
A = bwmorph(A, 'skel', inf);
imshow(A);

Sign in to comment.

Accepted Answer

the cyclist
the cyclist on 2 Aug 2019
% The original data
A = [
8 1 2
2 4 4
5 2 1
6 1 3
1 1 1];
% Get the dimensions of A
[m,n] = size(A);
% Initialize the pattern matrix as all false. Will fill in valid
% antidiagonals as true.
pattern = false(m,n);
% Find the vector of linear indices that span the first possible
% antidiagonal
dvec = n : m-1 : n + (n-1)*(m-1);
% Work down all antidiagonals, and fill in "true" if the pattern is
% matched, updating the linear indices as we go.
for ni = n : m
pattern(dvec) = all(A(dvec)<2);
dvec = dvec + 1;
end
  19 Comments
the cyclist
the cyclist on 7 Aug 2019
What Guillaume said is all true.
"How do I learn a programming language (or programming in general) really well?" is a huge topic. In the case of MATLAB there are very good beginner-level materials out there, e.g. the MATLAB Onramp.
Things that I think help a person come up to speed more quickly:
  • Having real-world problems that one is trying solve. In my experience, nothing motivates one to learn more than the need for a solution.
  • Trying to understand the core concepts of the language. For example, understanding the power of vectorization is key to using MATLAB well.
  • Not just blindly copying & pasting code (from here, Stack Overflow, etc), but instead trying to really understand what the algorithms are doing. [You seem to be trying that!] Remembering those techniques, for next time, helps you build up that "bag of tricks" for similar problems.
  • Really really trying hard to solve problems yourself before asking for help. In my experience, I remember better when I figured it out for myself. (There is of course a balance here, between the value of figuring it out, and the frustration of pounding your head against a wall.)
In the end, it really is the experience of doing, over and over again, that builds that expertise.
Sam Mahdi
Sam Mahdi on 7 Aug 2019
To Guillaume:
No, sorry I was trying to understand the cyclists code first before I moved on to yours.
But thank you guys for your help and feedback. I'm currently in a Machine learning class that uses Matlab, so sorta learning linear algebra and all the things you can do with matrices and vectors/arrays as I go, as well as trying to apply it to what I'm doing (like my job above).

Sign in to comment.

More Answers (1)

Guillaume
Guillaume on 2 Aug 2019
Edited: Guillaume on 2 Aug 2019
%demo data:
A = logical([
0 1 0
0 0 1
1 0 1
0 1 0
1 0 1
0 1 0
0 1 1])
Finding the start index (in the first column) of dagonals:
indices = hankel(1:size(A, 1)+1-size(A, 2), size(A, 1)+1-size(A, 2):size(A, 1)) + (0:size(A, 2)-1) * size(A, 1);
isdiago = all(A(indices), 2);
diag_idx = indices(isdiago)
Finding the start index in the first column of antidiagonals:
indices = toeplitz(size(A, 2):size(A, 1), size(A, 2):-1:1) + (0:size(A, 2)-1) * size(A, 1)
isantidiag = all(A(indices), 2);
antidiag_idx = indices(isantidiag)
If you want the indices in all the columns, just repmat the isdiago, isantidiag across all columns of the respective indices.

Categories

Find more on Operating on Diagonal Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!