| Description |
The environment: The subject are presented pairs of symbols, (A,B), (C,D) and (E,F),
A wins in 80% of the cases, B in 20%, C=70%, D=30% and E=60% while F=40%.
After a response signal an answer is expected. If the answer correspond to a winning choice follows a reward, if not follows a negative reward. No answer, double conflicting answers and ansers before the AW. Signal are discouraged (low negative reward)
The goal is to develop a strategy which allows to maximize reward starting with almost no prior knowledge.
The prior knowledge schematize the introductive instructions given to a subject of a fMRI/Behavioural test.
In this case:-
You will be presented a couple of symbols for each trial, only one is winning, you have to choose the right one. and answer after a response signal.
The strategy can be complicated.
The Simulator:
Couples of symbols are represented as logical conditions on different input lines of the system.
The different parts of the triale are followed on by a finite state automa, which tries to sincronise with the trial. The automa has 6 different states
-at the beginning
0) pre trial states
-after presentation of the sympbols
1)elaboration state
the system tries to make chioces, any answer is discouraged
-after presentatio of the answer signal
2)the system gives an answer (first symbol, secon simbol, no answer)
3)waiting for reward
-after presentation of reward
4)elaboration of the choice/reward dependencies and updat of the internal state
5)If an error occurs (anticipated answer, unexpected stimalation, etc.) the automata tries to re-syncronize with the trial.
error-negativity response: |