I need to extract specific data and sum them up in different variables
Show older comments
I have a table of data - essentially two columns and about 8000 rows. Among those rows is information i need to collate to analyse later. These were in a long data text file. It looks like this:
Column 1 Column 2
BEGIN STUDY
Block 1
Ratio Known Outcome known
Ratio 30/70
green
red
red
green
Decision Bean count 4
I have about 40 participants and each participant does four blocks (four different conditions - configurations of ratio known or unknown, outcome known or unknown) and each condition has four ratios. Needless to say, for each participant and across participants, and conditions/ratios, this 'decision bean count' could be different, so the number of rows between 'ratio' and 'decision bean count' is also different.
I had this crazy idea i could do a while loop, but have no idea how to phrase this ... find the beginning of each participant/block/ratio, followed by: while decision bean count = false, row x + 1 (yes, you can laugh, this is how much of a novice I am).
I need to extract the data (i.e. the outcome measure of interest which is the Decision Bean Count) for:
Each participant
Differentiating different conditions
Differentiating different ratios
At the end i need to have two matrixes, which four columns each:
1 with all the Decision Bean count scores, per participant, per condition
1 with all the Decision Bean count scores, per participant, per ratio
I am a total beginner in Matlab, the only thing I have managed to do, is get Matlab to create a table with my data. Whatever I try next, it just gives me the same table in the output ...
This feels very complex to me, but if it is crystal clear for anyone out there who could help, I would be so grateful
kind regards
Laura
Answers (1)
William Rose
on 29 May 2022
1 vote
I think I and others will be able to understand you problem better if you post a spreadsheet with columns for each of the different quantities. I expect that there will be columns for:
Subject number (1 to 40); "configuration of ratios known" (true or false); "outcome known" (true or false); ratio (four possible values). Every row should include a value for all of these values, even if the value is repeated from the preceding row. In other words, there will be many rows with each Subject Number, and so on.
There will be more columns, but I am not certain what they will be, because I do not understand your experimental protocol. Maybe there is a column for color (red or green). Maybe there is a column for Bean Count, or Decision Bean Count, or both. Is the value in Decision Bean Count a number, or a True/False? Do you want to compute the value of Decision Bean Count based on the values in the preceding coumns and rows? This is where I do not understand the problem.
15 Comments
Laura Lennuyeux-Comnene
on 29 May 2022
William Rose
on 29 May 2022
It is my understanding, based on your original post and your comment, that Decision Bean Count is aready in the text file. I think you are saying that you do not need to calculate it. You just need to extract it. For every DBC, there is also a participant number, a ration, and two experimental conditions, which are in the text file, but not necessarily on the same row as the DBC.
You want to create two matrices with four columns each.
Matrix A has "all the DBC scores", i.e. it has as many rows as there are DBC scores. The columns are: subject number, condition 1, condition 2, DBC. Condition 1 is "configuraiton of ratios known", and has value T or F (1 or 0). Condition 2 is "outcome known", and has values T or F (1 or 0).
Matrix B has "all the DBC scores", i.e. it has as many rows as there are DBC scores. The columns are: subject number, ratio (4 possible values, and DBC. YOu said this matrix has four columns, but I do not understand what the fourth column is.
I don't think I will be able to assist, without seeing the text file, to get a better understanding. I'm sure there are people on this site who are a lot better than me at working with text fles.
Laura Lennuyeux-Comnene
on 29 May 2022
William Rose
on 30 May 2022
I have looked at your text file. I have saved as an Excel file (Jellybean2.xlsx, attached). I have added two tabs. Tab 2 includes my general comments on the file structure. Tab 3 includes an example of how the summarized data could be organized. Please read the information on those tabs, and comment if you wish.
There is no Sona ID for the first set of five blocks. All 32 subsequent sets of 5 blocks have a Sona ID.
Laura Lennuyeux-Comnene
on 30 May 2022
Laura Lennuyeux-Comnene
on 30 May 2022
William Rose
on 31 May 2022
I opened the file in Excel and looked at it until I started to understand its structure. I inspected the first 780 lines in order to generate the Example Summary Table in tab 3 of the Excel workbook.
If you like the data that is in the Example Summary Table, then we can think about how to automate the process.
Laura Lennuyeux-Comnene
on 31 May 2022
William Rose
on 1 Jun 2022
Here is a script, processTextFileLLC.m, that extracts the data from the text file. Results are written to a separate text file. The script includes many explanatory comments.
To analyze a different text file, change the input file name (variable infilespec) at the top of the script. The output file name will update automatically.
The text file you sent me, Jellybeans2.txt, did not have a subject ID or other header lines for the first subject. The script has an error when it attempts to analyze this file, because it cannot determine the subject ID for the first subject. Therefore I added a first subject ID (17600) to the input data file. The modified file is called Jellybeans2a.txt, and is attached.
Th results file (Jellybeans2aResults.txt) is tab-delimited text, with six columns. It has a header row with column labels, and as many rows of numbers as there are trials. Practice trials are excluded. You can open the results file in Excel, and sort on a column of interest, such as condition or ratio, to assist in organizing and analyzing the results. Here is a screenshot of the top of the results file, when it is opened with the Windows "Notepad" app.

I compared the first 36 rows in the results file to the 36 rows of results in tab 3 of Excel workbook Jellybeans2.xlsx, sent previously, which I obtained by carefully reading the file. This is why I did the time-consuming process of determining results for thirty-six trials by eye - so that I would have someting to compare to the script, once the script was written, which it now is. There were four numbers, out of 36x6=216 numbers, which did not match. This is because I made a mistake when determining the results by eye. The script is doing things correctly.
Now that you have all these results, you can decide how you want to analyze the results, i.e. what statistical tests to perform, what graphs to make, etc.
William Rose
on 2 Jun 2022
In earlier comments, yu expressed your interest in "confidence ratings". What do you mean by this? One possibility is that you want to estimate the mean value of DBC for different ratios, and that you would like a confidence interval for the DBC. This can be formalized in a statement such as "There is a 95% chance that the true value of the DBC lies between x and y."
Is that what you mean?
William Rose
on 2 Jun 2022
The script which I posted computes the ratio using the two numbers in the order that they appear in the original file. Therefore 40/60 becomes 0.67 and 60/40 becomes 1.50. If you want to treat the DBC's from 40/60 and 60/40 as being from the same ratio, then you can and should combine them. Likewise for the other three pairs of ratios.
Laura Lennuyeux-Comnene
on 6 Jun 2022
William Rose
on 6 Jun 2022
I am glad to hear that you took time to enjoy the Jubilee festivities. Celebration of a 70 year reign doesn't happen every day.
My script does not extract the confidence number, because I did not understand that you wanted the confidence for each trial. Therfore I have modified the script. The script now reports the confidence for each trial, in the output file, in a newly added seventh column, labelled "Conf". The new script is attached.
William Rose
on 7 Jun 2022
When the script runs, ir produces this output on the console. You should get a similar result.
>> processTextFileLLC
Input file: Jellybeans2a.txt, output file: Jellybeans2aResults.txt.
Number of subjects=33, blocks=132, trials=528, DBCs=528, Confs=528.
>>
The script produces a tab-delimited text file. Here are the first 5 lines and the last 4 lines of the results file created by the script.

You should get the same. You can open this text file with Excel or Matlab or your favorite statistics program for analysis. For example, you might want to know if the mean DBC is different for different conditions. Or you might want to ask if Confidence is different for different ratios. This file has eight, not four, ratios, as I explained in an earlier post. If you want to have only be four ratios, then we can make an adjustment in the script to make this happen. If you make this adjustment, then 40 red:60 green will be considered equivalent to 40 green:60 red, for analysis purposes.
"It may take me a while to work out what is going on, as I am rather slow at these things." Compared to people who have been programming in Matlab or some other language for many years, maybe you are slow, or maye not. It is hard reading other peoples' code. They had a strategy in mind when they wrote it, which is usually not obvious. It makes sense to them, and it works, but it can be hard to decipher. And even if you are slow compared to some others, you are trying to learn a programming language, which most invesitgators will not do.
William Rose
on 7 Jun 2022
What software do you want to use to do your statistical analysis of the data? Matlab, Excel, R, something else?
What questions do you want to ask and answer with the data you have collected? You and your advisor probably figured this out before you started collecting data. I suspect that you would like to know if DBC varies with conditon, and if DBC varies with ratio. I suspect you would like to know if Confidence varies with Condtion and if Confidence varies with ratio. But maybe I am wrong.
Other thoughts: Again, you and your advisor probably have already figured out what follows. You have two pairs of conditions. Therefore you could analyze condition with a one-factor ANOVA, with four possible values for the one factor. Or you could do a two-factor ANOVA, with two possible values for each factor. There is also the quesiton of whether to do standard ANOVA with an F-test, or do a non-parametric test. The Kruksal-Wallis test is a viable non-parametric alternative to the one-factor ANOVA. There is not a good non-parametric alternative to the two-factor ANOVA. (The Friedman test, which is non-parametric, is not an option, since you have repeated measures in each condition.) See this discussion and this discussion for more info on the lack of a non-parametric alternative to 2-factor ANOVA.
Categories
Find more on Timetables in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!