How to list compared values from compared datasets

5 views (last 30 days)
Hi board,
I am trying to automatically evaluate experimental mass spectra datasets (one column, 10'000's of rows) for matching values of simulated spectra datasets (10's of columns, 100's rows, 10's to 1000's of tables/arrays/dimensions depending on the complexity of the experiment). There is a slight allowed deviation of experimental and simulated datapoints based on a fraction of each datapoint. After finding values that (inside the given tolerance) show up in both datasets, I need the position of each value, as it is directly related to information I need to get from the experiment. I know, I can in principle get a list of matching common values using
CommonValues = ismembertol(experimental,simulated,tolerance)
I also now, I can get the position of each common value by
[col,row] = find(CommonValues==1)
and aligning them via
AssignedData = table(CommonValues,col,row)
But this only gives me the values and their position in the simulated spectra. I still do not know what exact actual values in the experimental spectra they were matched to. So I already have all the "numbers" but I can not list them so I can use the data I get from the calculation. I can not simply align them again using the above method, as they will contain vastly different numbers of rows - the simulated data will have datapoints that are either present in multiple columns/rows/dimensions yet are still needed for evaluation or multiple datapoints inside the allowed deviation margins.
The final output I would need would be a table or array, that shows
expDatapoint1 simDatapoint1.1 Deviation1.1 allowedMaxDeviation1 simulatedCol simulatedRow
expDatapoint1 simDatapoint1.2 Deviation1.2 allowedMaxDeviation1 simulatedCol simulatedRow
expDatapoint1 simDatapoint1.3 Deviation1.3 allowedMaxDeviation1 simulatedCol simulatedRow
expDatapoint2 simDatapoint2.1 Deviation2.1 allowedMaxDeviation2 simulatedCol simulatedRow
expDatapoint2 simDatapoint2.2 Deviation2.2 allowedMaxDeviation2 simulatedCol simulatedRow
and so forth.
As you might be able to tell I'm a chemist, not a programmer and really new to MatLab. I tried searching for solutions for a few days and already worked through most of the "getting started" parts of the help pages but do not actually know WHAT to look for. I used to do the above task manually by creating simulated hughe datasets in Excel and going through each and every datapoint from my spectra via the highlighting function and then putting together the above table by hand - which works for for very small and simple experiments only. Any help is highly appreciated!
Matthias

Accepted Answer

Shruti Sapre
Shruti Sapre on 31 Jul 2015
Hi Matthias,
I understand that you want to find values in the “experimental” dataset that fall within a particular tolerance of the elements in the “simulated” dataset. You are using the ismembertol function to do the same:
>>CommonValues = ismembertol(experimental,simulated,tolerance)
This function will return a logical array “CommonValues” the same size as “experimental”. Since your “experimental” array is a column vector, the return value will also be a column vector. The location of “1”s in this vector will correspond to elements in “experimental” array that are within tolerance of elements in “simulated”.
If you need the position of matched elements in the “experimental” array, you can use the “find” function:
>>expRows = find(CommonValues==1);
This will give you a vector “expRows” containing the locations of elements “experimental” which are within a tolerance of elements in “simulated” (These will be the row numbers from the “experimental” as it is a column vector).
You can also get the actual elements in “experimental” using the below command:
>> expValues = experimental(CommonValues);
This will give you a vector with elements from “experimental” that matched to elements in “simulated”.
If you need to get the index location in “simulated” for each element in “experimental” that is a member of “simulated”, you can specify an array in the output parameters:
>>[ CommonValues, locSimulated] = ismembertol(experimental,simulated,tolerance);
“locSimulated” will be an array the same size as “experimental” (unless the “ByRows” option is specified) and will contain indices to the elements in “simulated” that are found in “experimental” (within tolerance). It will contain a 0 whenever an element in “experimental” is not a member of “simulated”. Note that if there are multiple matches in “simulated” for a single entry in “experimental”, only the first matched location in “simulated” would be returned. The contents will be the indices of “simulated” as a linearly indexed array. Please refer to the below link for details on linear indexing:
These linear indices can be converted to subscripts (for example, rows and columns in the case of “simulated” being a 2D matrix) using the ind2sub function.
You can get the elements in “simulated” using the above indices by the following command:
>> simulated(locSimulated (locSimulated ~=0))
This will first get all the indices in “simulated” that are not equal to 0 (that is, the ones that are within tolerance) and will return values in “simulated” based on the non-zero positions.
Hope that helps!
-Shruti
  1 Comment
Matthias Eing
Matthias Eing on 5 Aug 2015
Thanks a lot, this is pretty much what I needed. At the time I asked, I did not quite understand how functions with multiple output arguments work and how to index arrays at all. Thank god Matlab is rather easy to learn =)

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!