Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Ok-landscape
GitHub Repository: Ok-landscape/computational-pipeline
Path: blob/main/latex-templates/templates/cognitive-science/decision_making.tex
51 views
unlisted
1
% Decision Making Models Template
2
% Topics: Expected utility, prospect theory, drift-diffusion, Bayesian decisions, reinforcement learning
3
% Style: Computational neuroscience report with behavioral modeling
4
5
\documentclass[a4paper, 11pt]{article}
6
\usepackage[utf8]{inputenc}
7
\usepackage[T1]{fontenc}
8
\usepackage{amsmath, amssymb}
9
\usepackage{graphicx}
10
\usepackage{siunitx}
11
\usepackage{booktabs}
12
\usepackage{subcaption}
13
\usepackage[makestderr]{pythontex}
14
15
% Theorem environments
16
\newtheorem{definition}{Definition}[section]
17
\newtheorem{theorem}{Theorem}[section]
18
\newtheorem{example}{Example}[section]
19
\newtheorem{remark}{Remark}[section]
20
21
\title{Computational Models of Human Decision Making:\\
22
From Rational Choice to Bounded Rationality}
23
\author{Cognitive Neuroscience Laboratory}
24
\date{\today}
25
26
\begin{document}
27
\maketitle
28
29
\begin{abstract}
30
This report presents a comprehensive computational analysis of human decision-making processes,
31
spanning classical normative theories to modern descriptive models. We examine expected utility
32
theory as the rational benchmark, prospect theory's account of systematic deviations from
33
rationality, drift-diffusion models for response time distributions, Bayesian frameworks for
34
decision making under uncertainty, and reinforcement learning models of value-based choice.
35
Computational simulations reveal the parametric signatures that distinguish these frameworks
36
and their ability to capture key phenomena including loss aversion, probability weighting,
37
speed-accuracy tradeoffs, and learning dynamics.
38
\end{abstract}
39
40
\section{Introduction}
41
42
Decision making is a fundamental cognitive process that has been studied across economics,
43
psychology, and neuroscience. Classical economic theory posits that humans are rational agents
44
who maximize expected utility, but decades of empirical research have revealed systematic
45
deviations from this normative ideal. Modern computational approaches seek to characterize
46
both the mechanisms underlying choice behavior and the real-time dynamics of the decision process.
47
48
\begin{definition}[Decision Problem]
49
A decision problem consists of a set of alternatives $\mathcal{A}$, a set of possible outcomes
50
$\mathcal{O}$, and a probability distribution $p(o|a)$ mapping actions to outcomes. The decision
51
maker selects an action $a^* \in \mathcal{A}$ to optimize some objective function.
52
\end{definition}
53
54
\section{Expected Utility Theory}
55
56
\subsection{Theoretical Foundation}
57
58
Expected utility theory (EUT), formalized by von Neumann and Morgenstern, assumes that decision
59
makers assign utilities $u(o)$ to outcomes and choose actions that maximize expected utility.
60
61
\begin{definition}[Expected Utility]
62
For a risky prospect that yields outcome $o_i$ with probability $p_i$, the expected utility is:
63
\begin{equation}
64
EU = \sum_{i=1}^{n} p_i \cdot u(o_i)
65
\end{equation}
66
where $u: \mathcal{O} \rightarrow \mathbb{R}$ is a utility function.
67
\end{definition}
68
69
\begin{theorem}[Risk Aversion]
70
A decision maker is risk-averse if $u$ is concave ($u'' < 0$), risk-neutral if $u$ is linear,
71
and risk-seeking if $u$ is convex ($u'' > 0$).
72
\end{theorem}
73
74
\subsection{Computational Analysis}
75
76
We now examine the implications of different utility functions for choice behavior under risk.
77
The curvature of the utility function determines risk preferences: concave functions exhibit
78
diminishing marginal utility, leading to risk aversion. We simulate decisions over monetary
79
gambles with varying utility function parameters to demonstrate these effects.
80
81
\begin{pycode}
82
import numpy as np
83
import matplotlib.pyplot as plt
84
from scipy.optimize import minimize, fsolve
85
from scipy.stats import norm, gamma
86
from scipy.integrate import odeint
87
import matplotlib.patches as mpatches
88
89
np.random.seed(42)
90
91
# Expected Utility Theory
92
def power_utility(x, alpha=0.5):
93
"""Power utility function with risk aversion parameter alpha"""
94
if alpha == 1.0:
95
return x
96
return np.sign(x) * np.abs(x)**alpha
97
98
def expected_utility(outcomes, probabilities, alpha=0.5):
99
"""Calculate expected utility for a prospect"""
100
utilities = power_utility(outcomes, alpha)
101
return np.sum(probabilities * utilities)
102
103
# Simulate choice behavior under different risk attitudes
104
wealth = 100.0
105
stake = 50.0
106
win_prob = 0.5
107
108
# Range of risk aversion parameters
109
alphas = np.linspace(0.3, 1.5, 100)
110
certainty_equivalents = []
111
112
for alpha in alphas:
113
# Gamble: win or lose $50 with equal probability
114
outcomes = np.array([wealth + stake, wealth - stake])
115
probs = np.array([win_prob, 1 - win_prob])
116
eu_gamble = expected_utility(outcomes, probs, alpha)
117
118
# Find certainty equivalent: CE such that u(wealth + CE) = EU(gamble)
119
def objective(ce):
120
return power_utility(wealth + ce, alpha) - eu_gamble
121
122
ce = fsolve(objective, 0)[0]
123
certainty_equivalents.append(ce)
124
125
certainty_equivalents = np.array(certainty_equivalents)
126
127
# Risk premium: expected value - certainty equivalent
128
expected_value = stake * win_prob - stake * (1 - win_prob) # = 0 for fair gamble
129
risk_premium = expected_value - certainty_equivalents
130
131
# Create first figure
132
fig1, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
133
134
# Plot 1: Utility functions with different risk attitudes
135
x_wealth = np.linspace(0, 200, 500)
136
ax1.plot(x_wealth, power_utility(x_wealth, 0.5), 'b-', linewidth=2.5, label=r'$\alpha=0.5$ (Risk-averse)')
137
ax1.plot(x_wealth, power_utility(x_wealth, 1.0), 'g--', linewidth=2.5, label=r'$\alpha=1.0$ (Risk-neutral)')
138
ax1.plot(x_wealth, power_utility(x_wealth, 1.5), 'r-.', linewidth=2.5, label=r'$\alpha=1.5$ (Risk-seeking)')
139
ax1.axvline(wealth, color='gray', linestyle=':', alpha=0.5)
140
ax1.axvline(wealth + stake, color='orange', linestyle=':', alpha=0.5)
141
ax1.axvline(wealth - stake, color='purple', linestyle=':', alpha=0.5)
142
ax1.set_xlabel('Wealth (\$)', fontsize=11)
143
ax1.set_ylabel('Utility $u(x)$', fontsize=11)
144
ax1.set_title('Expected Utility Theory: Risk Attitudes', fontsize=12, fontweight='bold')
145
ax1.legend(fontsize=10)
146
ax1.grid(True, alpha=0.3)
147
148
# Plot 2: Risk premium as function of risk aversion
149
ax2.plot(alphas, risk_premium, 'b-', linewidth=2.5)
150
ax2.axhline(0, color='black', linestyle='--', alpha=0.5)
151
ax2.axvline(1.0, color='gray', linestyle=':', alpha=0.5, label='Risk-neutral')
152
ax2.fill_between(alphas, risk_premium, 0, where=(alphas < 1.0), alpha=0.3, color='red', label='Risk premium (averse)')
153
ax2.fill_between(alphas, risk_premium, 0, where=(alphas > 1.0), alpha=0.3, color='green', label='Risk premium (seeking)')
154
ax2.set_xlabel(r'Risk Aversion Parameter $\alpha$', fontsize=11)
155
ax2.set_ylabel('Risk Premium (\$)', fontsize=11)
156
ax2.set_title('Risk Premium vs. Risk Attitude', fontsize=12, fontweight='bold')
157
ax2.legend(fontsize=10)
158
ax2.grid(True, alpha=0.3)
159
160
plt.tight_layout()
161
plt.savefig('decision_making_eut.pdf', dpi=150, bbox_inches='tight')
162
plt.close()
163
164
# Calculate key statistics for reporting
165
alpha_moderate = 0.5
166
eu_gamble_moderate = expected_utility(
167
np.array([wealth + stake, wealth - stake]),
168
np.array([win_prob, 1 - win_prob]),
169
alpha_moderate
170
)
171
ce_moderate = fsolve(lambda ce: power_utility(wealth + ce, alpha_moderate) - eu_gamble_moderate, 0)[0]
172
rp_moderate = expected_value - ce_moderate
173
\end{pycode}
174
175
\begin{figure}[htbp]
176
\centering
177
\includegraphics[width=\textwidth]{decision_making_eut.pdf}
178
\caption{Expected utility theory predictions for risky choice. Left panel shows utility functions
179
with varying risk attitudes: concave (risk-averse, $\alpha=0.5$), linear (risk-neutral, $\alpha=1.0$),
180
and convex (risk-seeking, $\alpha=1.5$). Vertical lines indicate current wealth (\$100) and potential
181
outcomes (\$150, \$50) from a 50-50 gamble. Right panel displays the risk premium as a function of
182
the risk aversion parameter $\alpha$, demonstrating that risk-averse agents ($\alpha < 1$) demand
183
compensation to accept fair gambles, while risk-seeking agents ($\alpha > 1$) accept unfavorable
184
gambles. For $\alpha=0.5$, the risk premium is approximately \$\py{f"{abs(rp_moderate):.2f}"},
185
indicating the agent requires the gamble's expected value to exceed this amount.}
186
\label{fig:eut}
187
\end{figure}
188
189
The analysis demonstrates that a moderately risk-averse agent ($\alpha = \py{f"{alpha_moderate}"}$)
190
has a certainty equivalent of \$\py{f"{ce_moderate:.2f}"} for the gamble, yielding a risk premium
191
of \$\py{f"{abs(rp_moderate):.2f}"}. This quantifies the intuition that risk-averse individuals
192
prefer certain outcomes over risky prospects with the same expected value.
193
194
\section{Prospect Theory}
195
196
\subsection{Value Function and Loss Aversion}
197
198
Kahneman and Tversky's prospect theory revolutionized the study of decision making by documenting
199
systematic violations of expected utility theory. The theory proposes that people evaluate outcomes
200
relative to a reference point and exhibit loss aversion: losses loom larger than gains.
201
202
\begin{definition}[Prospect Theory Value Function]
203
The value function is defined as:
204
\begin{equation}
205
v(x) = \begin{cases}
206
x^\alpha & \text{if } x \geq 0 \\
207
-\lambda |x|^\beta & \text{if } x < 0
208
\end{cases}
209
\end{equation}
210
where $\alpha, \beta \in (0,1)$ capture diminishing sensitivity, and $\lambda > 1$ represents
211
loss aversion (typical value: $\lambda \approx 2.25$).
212
\end{definition}
213
214
\subsection{Probability Weighting}
215
216
Prospect theory also incorporates nonlinear probability weighting, whereby small probabilities
217
are overweighted and large probabilities are underweighted.
218
219
\begin{definition}[Probability Weighting Function]
220
The one-parameter Prelec weighting function is:
221
\begin{equation}
222
w(p) = \exp\{-(-\ln p)^\gamma\}
223
\end{equation}
224
where $\gamma < 1$ produces the characteristic inverse-S shape.
225
\end{definition}
226
227
\subsection{Computational Simulation}
228
229
We now implement the full prospect theory model to analyze how loss aversion and probability
230
weighting jointly affect decision making. The model predicts specific phenomena such as the
231
fourfold pattern of risk attitudes and preference reversals that cannot be explained by
232
expected utility theory.
233
234
\begin{pycode}
235
# Prospect Theory Implementation
236
def prospect_value_function(x, alpha=0.88, beta=0.88, lam=2.25):
237
"""Kahneman-Tversky value function with loss aversion"""
238
if isinstance(x, np.ndarray):
239
v = np.zeros_like(x)
240
v[x >= 0] = x[x >= 0]**alpha
241
v[x < 0] = -lam * np.abs(x[x < 0])**beta
242
return v
243
else:
244
if x >= 0:
245
return x**alpha
246
else:
247
return -lam * np.abs(x)**beta
248
249
def prelec_weighting(p, gamma=0.61):
250
"""Prelec probability weighting function"""
251
return np.exp(-(-np.log(p))**gamma)
252
253
def prospect_theory_value(outcomes, probabilities, alpha=0.88, beta=0.88, lam=2.25, gamma=0.61):
254
"""Calculate prospect theory value for a gamble"""
255
values = prospect_value_function(outcomes, alpha, beta, lam)
256
weights = prelec_weighting(probabilities, gamma)
257
return np.sum(weights * values)
258
259
# Parameters from Tversky & Kahneman (1992)
260
alpha_tk = 0.88
261
beta_tk = 0.88
262
lambda_tk = 2.25
263
gamma_tk = 0.61
264
265
# Create value function plot
266
x_range = np.linspace(-100, 100, 500)
267
v_values = prospect_value_function(x_range, alpha_tk, beta_tk, lambda_tk)
268
269
fig2, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 11))
270
271
# Plot 1: Value function
272
ax1.plot(x_range, v_values, 'b-', linewidth=2.5)
273
ax1.axhline(0, color='black', linestyle='-', linewidth=0.5)
274
ax1.axvline(0, color='black', linestyle='-', linewidth=0.5)
275
# Add reference lines showing loss aversion
276
x_ref = 50
277
v_gain = prospect_value_function(x_ref, alpha_tk, beta_tk, lambda_tk)
278
v_loss = prospect_value_function(-x_ref, alpha_tk, beta_tk, lambda_tk)
279
ax1.plot([x_ref, x_ref], [0, v_gain], 'g--', alpha=0.5)
280
ax1.plot([-x_ref, -x_ref], [v_loss, 0], 'r--', alpha=0.5)
281
ax1.plot([0, x_ref], [v_gain, v_gain], 'g--', alpha=0.5)
282
ax1.plot([0, -x_ref], [v_loss, v_loss], 'r--', alpha=0.5)
283
ax1.text(x_ref+5, v_gain/2, f'Gain: {v_gain:.1f}', fontsize=9, color='green')
284
ax1.text(-x_ref-25, v_loss/2, f'Loss: {v_loss:.1f}', fontsize=9, color='red')
285
ax1.set_xlabel('Outcome (Relative to Reference Point)', fontsize=11)
286
ax1.set_ylabel('Subjective Value $v(x)$', fontsize=11)
287
ax1.set_title(f'Prospect Theory Value Function ($\\lambda={lambda_tk}$)', fontsize=12, fontweight='bold')
288
ax1.grid(True, alpha=0.3)
289
290
# Plot 2: Probability weighting
291
p_range = np.linspace(0.01, 0.99, 100)
292
w_values = prelec_weighting(p_range, gamma_tk)
293
ax2.plot(p_range, w_values, 'b-', linewidth=2.5, label=f'$\\gamma={gamma_tk}$')
294
ax2.plot([0, 1], [0, 1], 'k--', linewidth=1, label='Identity (no weighting)')
295
ax2.fill_between(p_range, p_range, w_values, where=(w_values > p_range),
296
alpha=0.3, color='red', label='Overweighting')
297
ax2.fill_between(p_range, p_range, w_values, where=(w_values < p_range),
298
alpha=0.3, color='blue', label='Underweighting')
299
ax2.set_xlabel('Objective Probability $p$', fontsize=11)
300
ax2.set_ylabel('Decision Weight $w(p)$', fontsize=11)
301
ax2.set_title('Prelec Probability Weighting Function', fontsize=12, fontweight='bold')
302
ax2.legend(fontsize=9)
303
ax2.grid(True, alpha=0.3)
304
305
# Plot 3: Fourfold pattern of risk attitudes
306
# Gains domain
307
gain_amounts = np.linspace(10, 100, 20)
308
p_low = 0.01
309
p_high = 0.90
310
311
ce_gain_low = []
312
ce_gain_high = []
313
314
for gain in gain_amounts:
315
# Low probability gain
316
pt_val_low = prospect_theory_value(np.array([gain, 0]), np.array([p_low, 1-p_low]),
317
alpha_tk, beta_tk, lambda_tk, gamma_tk)
318
# Find CE
319
ce_low = fsolve(lambda x: prospect_value_function(x, alpha_tk, beta_tk, lambda_tk) - pt_val_low, gain*p_low)[0]
320
ce_gain_low.append(ce_low)
321
322
# High probability gain
323
pt_val_high = prospect_theory_value(np.array([gain, 0]), np.array([p_high, 1-p_high]),
324
alpha_tk, beta_tk, lambda_tk, gamma_tk)
325
ce_high = fsolve(lambda x: prospect_value_function(x, alpha_tk, beta_tk, lambda_tk) - pt_val_high, gain*p_high)[0]
326
ce_gain_high.append(ce_high)
327
328
expected_gain_low = gain_amounts * p_low
329
expected_gain_high = gain_amounts * p_high
330
331
ax3.plot(expected_gain_low, ce_gain_low, 'b-', linewidth=2.5, label=f'Low prob ({p_low})')
332
ax3.plot(expected_gain_high, ce_gain_high, 'g-', linewidth=2.5, label=f'High prob ({p_high})')
333
ax3.plot([0, 100], [0, 100], 'k--', linewidth=1, label='Risk-neutral')
334
ax3.set_xlabel('Expected Value', fontsize=11)
335
ax3.set_ylabel('Certainty Equivalent', fontsize=11)
336
ax3.set_title('Fourfold Pattern: Gains Domain', fontsize=12, fontweight='bold')
337
ax3.legend(fontsize=10)
338
ax3.grid(True, alpha=0.3)
339
340
# Plot 4: Loss aversion magnitude
341
lambda_values = np.linspace(1.0, 3.5, 50)
342
mixed_gamble_acceptance = []
343
344
for lam in lambda_values:
345
# Mixed gamble: 50% chance to win $100, 50% chance to lose $100
346
outcomes = np.array([100, -100])
347
probs = np.array([0.5, 0.5])
348
pt_val = prospect_theory_value(outcomes, probs, alpha_tk, beta_tk, lam, gamma_tk)
349
mixed_gamble_acceptance.append(pt_val)
350
351
mixed_gamble_acceptance = np.array(mixed_gamble_acceptance)
352
353
ax4.plot(lambda_values, mixed_gamble_acceptance, 'b-', linewidth=2.5)
354
ax4.axhline(0, color='red', linestyle='--', linewidth=2, label='Indifference threshold')
355
ax4.axvline(lambda_tk, color='gray', linestyle=':', alpha=0.7, label=f'Typical $\\lambda={lambda_tk}$')
356
ax4.fill_between(lambda_values, mixed_gamble_acceptance, 0,
357
where=(mixed_gamble_acceptance > 0), alpha=0.3, color='green', label='Accept')
358
ax4.fill_between(lambda_values, mixed_gamble_acceptance, 0,
359
where=(mixed_gamble_acceptance < 0), alpha=0.3, color='red', label='Reject')
360
ax4.set_xlabel('Loss Aversion Parameter $\\lambda$', fontsize=11)
361
ax4.set_ylabel('Prospect Value', fontsize=11)
362
ax4.set_title('Mixed Gamble Acceptance vs. Loss Aversion', fontsize=12, fontweight='bold')
363
ax4.legend(fontsize=10)
364
ax4.grid(True, alpha=0.3)
365
366
plt.tight_layout()
367
plt.savefig('decision_making_prospect.pdf', dpi=150, bbox_inches='tight')
368
plt.close()
369
370
# Calculate key statistics
371
loss_aversion_ratio = abs(v_loss / v_gain)
372
overweighting_01 = prelec_weighting(0.01, gamma_tk) / 0.01
373
underweighting_90 = prelec_weighting(0.90, gamma_tk) / 0.90
374
\end{pycode}
375
376
\begin{figure}[htbp]
377
\centering
378
\includegraphics[width=\textwidth]{decision_making_prospect.pdf}
379
\caption{Prospect theory components and predictions. (a) Value function showing loss aversion:
380
losses loom larger than equivalent gains by a factor of $\lambda=\py{f"{lambda_tk}"}$, resulting
381
in a loss aversion ratio of \py{f"{loss_aversion_ratio:.2f}"}. The function is steeper for losses
382
than gains, demonstrating diminishing sensitivity in both domains. (b) Prelec probability weighting
383
function ($\gamma=\py{f"{gamma_tk}"}$) exhibits inverse-S shape: small probabilities (e.g., 0.01)
384
are overweighted by a factor of \py{f"{overweighting_01:.2f}"}, while large probabilities (e.g., 0.90)
385
are underweighted to \py{f"{underweighting_90:.3f}"} of their objective value. (c) Fourfold pattern
386
in gains domain: risk-seeking for low-probability gains (lottery tickets) and risk-averse for
387
high-probability gains (insurance). (d) Mixed gamble acceptance threshold: agents with typical loss
388
aversion ($\lambda \approx \py{f"{lambda_tk}"}$) reject 50-50 mixed gambles even when expected value
389
is zero, requiring gains to be approximately twice as large as losses to accept.}
390
\label{fig:prospect}
391
\end{figure}
392
393
The prospect theory analysis reveals that with parameters $\alpha = \py{f"{alpha_tk}"}$,
394
$\beta = \py{f"{beta_tk}"}$, $\lambda = \py{f"{lambda_tk}"}$, and $\gamma = \py{f"{gamma_tk}"}$,
395
losses are experienced as \py{f"{loss_aversion_ratio:.2f}"} times more impactful than equivalent
396
gains. This loss aversion explains the rejection of symmetric mixed gambles and the endowment effect.
397
398
\section{Drift-Diffusion Model}
399
400
\subsection{Sequential Sampling Framework}
401
402
The drift-diffusion model (DDM) provides a mechanistic account of two-alternative forced choice
403
decisions, including both choice and response time distributions. Evidence accumulates over time
404
as a noisy process until reaching a decision boundary.
405
406
\begin{definition}[Drift-Diffusion Process]
407
The accumulated evidence $x(t)$ evolves according to:
408
\begin{equation}
409
dx = \mu \, dt + \sigma \, dW
410
\end{equation}
411
where $\mu$ is the drift rate (evidence quality), $\sigma$ is diffusion noise, and $dW$ is a
412
Wiener increment. Decision occurs when $x(t)$ reaches boundary $\pm a$.
413
\end{definition}
414
415
\begin{theorem}[Mean Decision Time]
416
For symmetric boundaries at $\pm a$ with starting point $z=0$, the mean decision time is:
417
\begin{equation}
418
\mathbb{E}[T] = \frac{a}{\mu} \tanh\left(\frac{a\mu}{\sigma^2}\right) + T_{er}
419
\end{equation}
420
where $T_{er}$ is non-decision time (encoding and response execution).
421
\end{theorem}
422
423
\subsection{Simulation of Decision Dynamics}
424
425
We simulate the drift-diffusion process using Euler-Maruyama discretization to generate response
426
time distributions and examine the speed-accuracy tradeoff. The model predicts that increasing
427
boundary separation increases accuracy but slows responses, while drift rate affects both.
428
429
\begin{pycode}
430
# Drift-Diffusion Model Simulation
431
def simulate_ddm_trial(drift_rate, boundary, noise=1.0, dt=0.001, non_decision_time=0.3):
432
"""Simulate a single DDM trial"""
433
evidence = 0
434
t = 0
435
max_time = 10.0 # Prevent infinite loops
436
437
while abs(evidence) < boundary and t < max_time:
438
evidence += drift_rate * dt + noise * np.sqrt(dt) * np.random.randn()
439
t += dt
440
441
choice = 1 if evidence >= boundary else -1
442
reaction_time = t + non_decision_time
443
return choice, reaction_time
444
445
def simulate_ddm_experiment(drift_rate, boundary, n_trials=1000, noise=1.0, non_decision_time=0.3):
446
"""Simulate multiple DDM trials"""
447
choices = []
448
rts = []
449
450
for _ in range(n_trials):
451
choice, rt = simulate_ddm_trial(drift_rate, boundary, noise, non_decision_time=non_decision_time)
452
choices.append(choice)
453
rts.append(rt)
454
455
return np.array(choices), np.array(rts)
456
457
# Simulate DDM with different parameters
458
drift_rates = [0.1, 0.3, 0.5]
459
boundaries = [0.8, 1.2, 1.6]
460
n_trials = 2000
461
462
fig3 = plt.figure(figsize=(14, 10))
463
464
# Drift rate manipulation
465
ax1 = plt.subplot(2, 3, 1)
466
for drift in drift_rates:
467
choices, rts = simulate_ddm_experiment(drift, boundary=1.2, n_trials=n_trials)
468
correct = choices[choices == 1]
469
correct_rts = rts[choices == 1]
470
ax1.hist(correct_rts, bins=50, alpha=0.6, density=True, label=f'$\\mu={drift}$')
471
472
ax1.set_xlabel('Response Time (s)', fontsize=11)
473
ax1.set_ylabel('Density', fontsize=11)
474
ax1.set_title('RT Distributions: Drift Rate Effect', fontsize=12, fontweight='bold')
475
ax1.legend(fontsize=10)
476
ax1.set_xlim(0, 5)
477
478
# Boundary manipulation
479
ax2 = plt.subplot(2, 3, 2)
480
for bound in boundaries:
481
choices, rts = simulate_ddm_experiment(drift_rate=0.3, boundary=bound, n_trials=n_trials)
482
correct = choices[choices == 1]
483
correct_rts = rts[choices == 1]
484
ax2.hist(correct_rts, bins=50, alpha=0.6, density=True, label=f'$a={bound}$')
485
486
ax2.set_xlabel('Response Time (s)', fontsize=11)
487
ax2.set_ylabel('Density', fontsize=11)
488
ax2.set_title('RT Distributions: Boundary Effect', fontsize=12, fontweight='bold')
489
ax2.legend(fontsize=10)
490
ax2.set_xlim(0, 5)
491
492
# Speed-accuracy tradeoff
493
ax3 = plt.subplot(2, 3, 3)
494
boundaries_sac = np.linspace(0.5, 2.0, 10)
495
accuracy_sac = []
496
mean_rt_sac = []
497
498
for bound in boundaries_sac:
499
choices, rts = simulate_ddm_experiment(drift_rate=0.3, boundary=bound, n_trials=1000)
500
accuracy_sac.append(np.mean(choices == 1))
501
mean_rt_sac.append(np.mean(rts))
502
503
ax3.plot(mean_rt_sac, accuracy_sac, 'bo-', linewidth=2, markersize=6)
504
ax3.set_xlabel('Mean Response Time (s)', fontsize=11)
505
ax3.set_ylabel('Accuracy', fontsize=11)
506
ax3.set_title('Speed-Accuracy Tradeoff', fontsize=12, fontweight='bold')
507
ax3.grid(True, alpha=0.3)
508
509
# Sample trajectories
510
ax4 = plt.subplot(2, 3, 4)
511
drift_demo = 0.4
512
bound_demo = 1.5
513
n_trajectories = 20
514
515
for _ in range(n_trajectories):
516
evidence_path = [0]
517
t_path = [0]
518
evidence = 0
519
t = 0
520
dt = 0.01
521
522
while abs(evidence) < bound_demo and t < 5:
523
evidence += drift_demo * dt + 1.0 * np.sqrt(dt) * np.random.randn()
524
t += dt
525
evidence_path.append(evidence)
526
t_path.append(t)
527
528
color = 'green' if evidence >= bound_demo else 'red'
529
alpha = 0.3
530
ax4.plot(t_path, evidence_path, color=color, alpha=alpha, linewidth=1)
531
532
ax4.axhline(bound_demo, color='blue', linestyle='--', linewidth=2, label='Upper boundary')
533
ax4.axhline(-bound_demo, color='orange', linestyle='--', linewidth=2, label='Lower boundary')
534
ax4.axhline(0, color='black', linestyle='-', linewidth=0.5)
535
ax4.set_xlabel('Time (s)', fontsize=11)
536
ax4.set_ylabel('Evidence $x(t)$', fontsize=11)
537
ax4.set_title(f'Sample Trajectories ($\\mu={drift_demo}$, $a={bound_demo}$)', fontsize=12, fontweight='bold')
538
ax4.legend(fontsize=10)
539
ax4.set_xlim(0, 5)
540
ax4.grid(True, alpha=0.3)
541
542
# Quantile probability plot
543
ax5 = plt.subplot(2, 3, 5)
544
choices_qp, rts_qp = simulate_ddm_experiment(drift_rate=0.4, boundary=1.2, n_trials=5000)
545
correct_rts = rts_qp[choices_qp == 1]
546
error_rts = rts_qp[choices_qp == -1]
547
548
quantiles = [0.1, 0.3, 0.5, 0.7, 0.9]
549
correct_quantiles = np.quantile(correct_rts, quantiles)
550
error_quantiles = np.quantile(error_rts, quantiles) if len(error_rts) > 10 else np.zeros(len(quantiles))
551
552
ax5.plot(quantiles, correct_quantiles, 'go-', linewidth=2, markersize=8, label='Correct')
553
if len(error_rts) > 10:
554
ax5.plot(quantiles, error_quantiles, 'ro-', linewidth=2, markersize=8, label='Error')
555
ax5.set_xlabel('Quantile', fontsize=11)
556
ax5.set_ylabel('Response Time (s)', fontsize=11)
557
ax5.set_title('Quantile-Probability Plot', fontsize=12, fontweight='bold')
558
ax5.legend(fontsize=10)
559
ax5.grid(True, alpha=0.3)
560
561
# Drift rate vs accuracy/RT
562
ax6 = plt.subplot(2, 3, 6)
563
drift_range = np.linspace(0.05, 0.8, 15)
564
acc_by_drift = []
565
rt_by_drift = []
566
567
for drift in drift_range:
568
choices_d, rts_d = simulate_ddm_experiment(drift, boundary=1.2, n_trials=500)
569
acc_by_drift.append(np.mean(choices_d == 1))
570
rt_by_drift.append(np.mean(rts_d))
571
572
ax6_twin = ax6.twinx()
573
line1 = ax6.plot(drift_range, acc_by_drift, 'b-o', linewidth=2, markersize=5, label='Accuracy')
574
line2 = ax6_twin.plot(drift_range, rt_by_drift, 'r-s', linewidth=2, markersize=5, label='Mean RT')
575
576
ax6.set_xlabel('Drift Rate $\\mu$', fontsize=11)
577
ax6.set_ylabel('Accuracy', fontsize=11, color='blue')
578
ax6_twin.set_ylabel('Mean RT (s)', fontsize=11, color='red')
579
ax6.set_title('Drift Rate Effects', fontsize=12, fontweight='bold')
580
ax6.tick_params(axis='y', labelcolor='blue')
581
ax6_twin.tick_params(axis='y', labelcolor='red')
582
ax6.grid(True, alpha=0.3)
583
584
lines = line1 + line2
585
labels = [l.get_label() for l in lines]
586
ax6.legend(lines, labels, fontsize=10, loc='center right')
587
588
plt.tight_layout()
589
plt.savefig('decision_making_ddm.pdf', dpi=150, bbox_inches='tight')
590
plt.close()
591
592
# Calculate key DDM statistics
593
drift_key = 0.4
594
boundary_key = 1.2
595
choices_key, rts_key = simulate_ddm_experiment(drift_key, boundary_key, n_trials=5000)
596
accuracy_key = np.mean(choices_key == 1)
597
mean_rt_key = np.mean(rts_key)
598
std_rt_key = np.std(rts_key)
599
median_rt_key = np.median(rts_key)
600
\end{pycode}
601
602
\begin{figure}[htbp]
603
\centering
604
\includegraphics[width=\textwidth]{decision_making_ddm.pdf}
605
\caption{Drift-diffusion model simulations of two-alternative forced choice. (a) Response time
606
distributions for different drift rates ($\mu$): higher evidence quality produces faster, more
607
accurate decisions with reduced RT variability. (b) Boundary separation effects: increasing the
608
decision threshold ($a$) from 0.8 to 1.6 produces slower but more conservative responses with
609
broader RT distributions. (c) Speed-accuracy tradeoff: adjusting boundary separation produces
610
the characteristic inverted-U relationship between mean RT and accuracy. (d) Twenty sample
611
evidence accumulation trajectories (green = correct, red = error) showing the stochastic nature
612
of the decision process. (e) Quantile-probability plot showing that correct responses are
613
systematically faster than errors at all quantiles. (f) Drift rate effects on both accuracy
614
and mean RT: with $a=\py{f"{boundary_key}"}$, a drift rate of $\mu=\py{f"{drift_key}"}$ yields
615
accuracy = \py{f"{accuracy_key:.3f}"} and mean RT = \py{f"{mean_rt_key:.3f}"} s (SD = \py{f"{std_rt_key:.3f}"} s).}
616
\label{fig:ddm}
617
\end{figure}
618
619
The drift-diffusion analysis with $\mu = \py{f"{drift_key}"}$ and $a = \py{f"{boundary_key}"}$
620
produces mean RT = \py{f"{mean_rt_key:.2f}"} s (median = \py{f"{median_rt_key:.2f}"} s) with
621
accuracy = \py{f"{accuracy_key:.1%}"}. The model successfully captures empirical RT distributions,
622
including their characteristic positive skew and the speed-accuracy tradeoff.
623
624
\section{Bayesian Decision Making}
625
626
\subsection{Decision Making Under Uncertainty}
627
628
Bayesian decision theory provides a normative framework for optimal decision making when beliefs
629
must be updated from noisy evidence. The decision maker maintains a posterior distribution over
630
states and chooses actions to minimize expected loss.
631
632
\begin{definition}[Bayes' Rule]
633
Given evidence $e$ and hypotheses $h_i$, the posterior probability is:
634
\begin{equation}
635
P(h_i | e) = \frac{P(e | h_i) P(h_i)}{\sum_j P(e | h_j) P(h_j)}
636
\end{equation}
637
\end{definition}
638
639
\begin{theorem}[Optimal Bayesian Decision]
640
The action $a^*$ that minimizes expected loss $L(a, h)$ is:
641
\begin{equation}
642
a^* = \arg\min_a \sum_h L(a, h) P(h | e)
643
\end{equation}
644
\end{theorem}
645
646
\subsection{Sequential Evidence Accumulation}
647
648
We simulate a medical diagnosis scenario where a physician must decide whether a patient has
649
a disease based on sequential test results. The Bayesian framework prescribes how beliefs should
650
be updated and when sufficient evidence has accumulated to make a decision.
651
652
\begin{pycode}
653
# Bayesian Decision Making
654
def bayesian_update(prior, likelihood_pos, likelihood_neg, evidence):
655
"""Update beliefs using Bayes' rule"""
656
if evidence == 1: # Positive test
657
posterior_unnorm = prior * likelihood_pos
658
else: # Negative test
659
posterior_unnorm = prior * likelihood_neg
660
661
return posterior_unnorm / np.sum(posterior_unnorm)
662
663
# Medical diagnosis scenario
664
n_tests = 20
665
sensitivity = 0.85 # P(test+ | disease+)
666
specificity = 0.90 # P(test- | disease-)
667
prior_prob_disease = 0.10
668
669
# True state: patient has disease
670
true_state = 1
671
672
# Generate test results
673
np.random.seed(123)
674
test_results = []
675
for _ in range(n_tests):
676
if true_state == 1:
677
test_results.append(1 if np.random.rand() < sensitivity else 0)
678
else:
679
test_results.append(0 if np.random.rand() < specificity else 1)
680
681
# Sequential belief updating
682
beliefs = [prior_prob_disease]
683
for test in test_results:
684
# Prior over disease states
685
prior = np.array([1 - beliefs[-1], beliefs[-1]])
686
687
# Likelihood: P(test result | disease state)
688
if test == 1:
689
likelihood = np.array([1 - specificity, sensitivity])
690
else:
691
likelihood = np.array([specificity, 1 - sensitivity])
692
693
# Bayesian update
694
posterior = bayesian_update(prior, likelihood, likelihood, test)
695
beliefs.append(posterior[1])
696
697
beliefs = np.array(beliefs)
698
699
# Create Bayesian decision figure
700
fig4, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 10))
701
702
# Plot 1: Sequential belief updating
703
ax1.plot(range(len(beliefs)), beliefs, 'b-o', linewidth=2, markersize=4)
704
ax1.axhline(0.5, color='red', linestyle='--', linewidth=2, label='Decision threshold')
705
ax1.axhline(prior_prob_disease, color='gray', linestyle=':', alpha=0.7, label='Prior')
706
ax1.fill_between(range(len(beliefs)), 0.5, beliefs, where=(beliefs > 0.5),
707
alpha=0.3, color='green', label='Diagnose disease')
708
ax1.fill_between(range(len(beliefs)), 0.5, beliefs, where=(beliefs <= 0.5),
709
alpha=0.3, color='blue', label='Diagnose healthy')
710
ax1.set_xlabel('Number of Tests', fontsize=11)
711
ax1.set_ylabel('P(Disease | Evidence)', fontsize=11)
712
ax1.set_title('Sequential Bayesian Belief Updating', fontsize=12, fontweight='bold')
713
ax1.legend(fontsize=9)
714
ax1.grid(True, alpha=0.3)
715
ax1.set_ylim(0, 1)
716
717
# Plot 2: Likelihood ratio and evidence strength
718
log_likelihood_ratios = []
719
for test in test_results:
720
if test == 1:
721
lr = sensitivity / (1 - specificity)
722
else:
723
lr = (1 - sensitivity) / specificity
724
log_likelihood_ratios.append(np.log(lr))
725
726
cumulative_log_lr = np.cumsum(log_likelihood_ratios)
727
728
ax2.plot(range(1, len(cumulative_log_lr)+1), cumulative_log_lr, 'b-o', linewidth=2, markersize=4)
729
ax2.axhline(0, color='black', linestyle='-', linewidth=0.5)
730
ax2.axhline(np.log(3), color='green', linestyle='--', alpha=0.7, label='Moderate evidence (3:1)')
731
ax2.axhline(-np.log(3), color='red', linestyle='--', alpha=0.7, label='Moderate evidence (1:3)')
732
ax2.set_xlabel('Test Number', fontsize=11)
733
ax2.set_ylabel('Cumulative Log Likelihood Ratio', fontsize=11)
734
ax2.set_title('Evidence Accumulation (Log Odds)', fontsize=12, fontweight='bold')
735
ax2.legend(fontsize=9)
736
ax2.grid(True, alpha=0.3)
737
738
# Plot 3: Optimal decision boundary with costs
739
costs_fn = 1.0 # False negative (missed disease)
740
costs_fp_range = np.linspace(0.1, 2.0, 50)
741
optimal_thresholds = []
742
743
for cost_fp in costs_fp_range:
744
# Optimal threshold: P(disease) where expected costs are equal
745
# cost_fp * (1 - p) = cost_fn * p
746
threshold = cost_fp / (cost_fp + costs_fn)
747
optimal_thresholds.append(threshold)
748
749
ax3.plot(costs_fp_range, optimal_thresholds, 'b-', linewidth=2.5)
750
ax3.axvline(costs_fn, color='gray', linestyle=':', alpha=0.7, label='Equal costs')
751
ax3.axhline(0.5, color='gray', linestyle=':', alpha=0.7)
752
ax3.set_xlabel('Cost of False Positive', fontsize=11)
753
ax3.set_ylabel('Optimal Decision Threshold', fontsize=11)
754
ax3.set_title(f'Optimal Threshold vs. Cost Ratio (FN cost = {costs_fn})', fontsize=12, fontweight='bold')
755
ax3.legend(fontsize=10)
756
ax3.grid(True, alpha=0.3)
757
758
# Plot 4: ROC curve and decision boundaries
759
specificities = np.linspace(0.01, 0.99, 100)
760
sensitivities_roc = []
761
762
# For different decision thresholds
763
for spec in specificities:
764
# Implied sensitivity at this operating point (simplified model)
765
sens = 1 - (1 - spec) * 0.5
766
sensitivities_roc.append(max(0, min(1, sens)))
767
768
ax4.plot(1 - specificities, sensitivities_roc, 'b-', linewidth=2.5, label='ROC curve')
769
ax4.plot([0, 1], [0, 1], 'k--', linewidth=1, label='Chance')
770
ax4.fill_between([0, 1], [0, 1], [1, 1], alpha=0.2, color='green', label='Better than chance')
771
ax4.scatter([1-specificity], [sensitivity], s=200, c='red', marker='*',
772
edgecolors='black', linewidths=2, label=f'Test performance ({sensitivity},{specificity})', zorder=5)
773
ax4.set_xlabel('False Positive Rate (1 - Specificity)', fontsize=11)
774
ax4.set_ylabel('True Positive Rate (Sensitivity)', fontsize=11)
775
ax4.set_title('ROC Curve: Diagnostic Performance', fontsize=12, fontweight='bold')
776
ax4.legend(fontsize=9)
777
ax4.grid(True, alpha=0.3)
778
779
plt.tight_layout()
780
plt.savefig('decision_making_bayesian.pdf', dpi=150, bbox_inches='tight')
781
plt.close()
782
783
# Calculate key statistics
784
final_belief = beliefs[-1]
785
n_positive_tests = np.sum(test_results)
786
final_log_odds = cumulative_log_lr[-1]
787
\end{pycode}
788
789
\begin{figure}[htbp]
790
\centering
791
\includegraphics[width=\textwidth]{decision_making_bayesian.pdf}
792
\caption{Bayesian decision making for medical diagnosis. (a) Sequential belief updating: starting
793
from a prior of P(disease) = \py{f"{prior_prob_disease}"}, the posterior probability evolves with
794
each test result (sensitivity = \py{f"{sensitivity}"}, specificity = \py{f"{specificity}"}). After
795
\py{n_tests} tests with \py{n_positive_tests} positive results, final belief reaches \py{f"{final_belief:.3f}"}.
796
The decision threshold of 0.5 determines when to diagnose disease. (b) Cumulative log likelihood ratio
797
tracks total evidence strength: values above log(3) represent moderate evidence for disease. Final log
798
odds = \py{f"{final_log_odds:.2f}"}, indicating strong evidence. (c) Optimal decision threshold depends
799
on cost ratio: when false negatives are more costly than false positives, threshold should decrease to
800
favor diagnosing disease. With equal costs, threshold = 0.5. (d) ROC curve shows tradeoff between true
801
positive rate (sensitivity) and false positive rate (1-specificity). Test performance (red star) indicates
802
better-than-chance discrimination, with area under curve representing overall diagnostic accuracy.}
803
\label{fig:bayesian}
804
\end{figure}
805
806
The Bayesian analysis shows that after \py{n_tests} sequential tests, the posterior probability
807
of disease is \py{f"{final_belief:.3f}"}, representing a log odds of \py{f"{final_log_odds:.2f}"}.
808
This demonstrates how weak evidence (single test) accumulates to produce strong diagnostic confidence,
809
formalizing the intuitive notion that multiple corroborating tests increase certainty.
810
811
\section{Reinforcement Learning}
812
813
\subsection{Value-Based Decision Making}
814
815
Reinforcement learning provides a framework for learning to make decisions through trial and error.
816
Agents learn value functions that predict expected future reward and use these to guide action selection.
817
818
\begin{definition}[Q-Learning]
819
The action-value function $Q(s, a)$ represents expected return from taking action $a$ in state $s$.
820
The Q-learning update rule is:
821
\begin{equation}
822
Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \left[ r_t + \gamma \max_{a'} Q(s_{t+1}, a') - Q(s_t, a_t) \right]
823
\end{equation}
824
where $\alpha$ is learning rate and $\gamma$ is discount factor.
825
\end{definition}
826
827
\begin{definition}[Softmax Action Selection]
828
Actions are selected probabilistically according to:
829
\begin{equation}
830
P(a|s) = \frac{\exp(\beta Q(s,a))}{\sum_{a'} \exp(\beta Q(s,a'))}
831
\end{equation}
832
where $\beta$ is inverse temperature controlling exploration-exploitation.
833
\end{definition}
834
835
\subsection{Multi-Armed Bandit Simulation}
836
837
We simulate a classic reinforcement learning task where an agent must learn which of several slot
838
machines (arms) provides the highest reward. The agent faces the exploration-exploitation dilemma:
839
whether to exploit known good options or explore alternatives that might be better.
840
841
\begin{pycode}
842
# Reinforcement Learning: Multi-armed bandit
843
class MultiArmedBandit:
844
def __init__(self, n_arms, true_values):
845
self.n_arms = n_arms
846
self.true_values = true_values
847
848
def pull_arm(self, arm):
849
"""Pull arm and receive reward"""
850
return self.true_values[arm] + np.random.randn() * 0.5
851
852
class QLearningAgent:
853
def __init__(self, n_arms, learning_rate=0.1, temperature=1.0):
854
self.n_arms = n_arms
855
self.Q = np.zeros(n_arms) # Q-values
856
self.alpha = learning_rate
857
self.beta = temperature
858
self.action_counts = np.zeros(n_arms)
859
860
def select_action(self):
861
"""Softmax action selection"""
862
exp_Q = np.exp(self.beta * self.Q)
863
probs = exp_Q / np.sum(exp_Q)
864
return np.random.choice(self.n_arms, p=probs)
865
866
def update(self, action, reward):
867
"""Q-learning update"""
868
self.Q[action] += self.alpha * (reward - self.Q[action])
869
self.action_counts[action] += 1
870
871
# Setup bandit task
872
n_arms = 4
873
true_values = np.array([1.0, 2.5, 1.5, 0.5]) # True mean rewards
874
n_trials = 500
875
876
# Simulate different learning rates
877
learning_rates = [0.01, 0.1, 0.5]
878
temperature = 2.0
879
880
fig5, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 10))
881
882
for lr in learning_rates:
883
bandit = MultiArmedBandit(n_arms, true_values)
884
agent = QLearningAgent(n_arms, learning_rate=lr, temperature=temperature)
885
886
Q_history = [agent.Q.copy()]
887
rewards_history = []
888
889
for trial in range(n_trials):
890
action = agent.select_action()
891
reward = bandit.pull_arm(action)
892
agent.update(action, reward)
893
894
Q_history.append(agent.Q.copy())
895
rewards_history.append(reward)
896
897
Q_history = np.array(Q_history)
898
899
# Plot Q-value learning
900
for arm in range(n_arms):
901
if lr == 0.1: # Only plot for one learning rate to avoid clutter
902
ax1.plot(Q_history[:, arm], label=f'Arm {arm+1} (true: {true_values[arm]:.1f})', alpha=0.8)
903
904
ax1.axhline(0, color='black', linestyle='-', linewidth=0.5)
905
for arm in range(n_arms):
906
ax1.axhline(true_values[arm], color='gray', linestyle=':', alpha=0.5)
907
ax1.set_xlabel('Trial', fontsize=11)
908
ax1.set_ylabel('Q-value', fontsize=11)
909
ax1.set_title(f'Q-Learning Dynamics ($\\alpha={0.1}$, $\\beta={temperature}$)', fontsize=12, fontweight='bold')
910
ax1.legend(fontsize=9)
911
ax1.grid(True, alpha=0.3)
912
913
# Compare learning rates
914
ax2_data = []
915
for lr in learning_rates:
916
bandit = MultiArmedBandit(n_arms, true_values)
917
agent = QLearningAgent(n_arms, learning_rate=lr, temperature=temperature)
918
919
cumulative_reward = 0
920
cumulative_rewards = []
921
922
for trial in range(n_trials):
923
action = agent.select_action()
924
reward = bandit.pull_arm(action)
925
agent.update(action, reward)
926
cumulative_reward += reward
927
cumulative_rewards.append(cumulative_reward)
928
929
ax2.plot(cumulative_rewards, linewidth=2, label=f'$\\alpha={lr}$')
930
931
# Optimal cumulative reward
932
optimal_reward_per_trial = np.max(true_values)
933
optimal_cumulative = np.arange(1, n_trials+1) * optimal_reward_per_trial
934
ax2.plot(optimal_cumulative, 'k--', linewidth=2, label='Optimal')
935
936
ax2.set_xlabel('Trial', fontsize=11)
937
ax2.set_ylabel('Cumulative Reward', fontsize=11)
938
ax2.set_title('Learning Rate Comparison', fontsize=12, fontweight='bold')
939
ax2.legend(fontsize=10)
940
ax2.grid(True, alpha=0.3)
941
942
# Temperature effects on exploration
943
temperatures = [0.5, 2.0, 5.0]
944
ax3_colors = ['blue', 'green', 'red']
945
946
for temp, color in zip(temperatures, ax3_colors):
947
bandit = MultiArmedBandit(n_arms, true_values)
948
agent = QLearningAgent(n_arms, learning_rate=0.1, temperature=temp)
949
950
action_history = []
951
952
for trial in range(n_trials):
953
action = agent.select_action()
954
reward = bandit.pull_arm(action)
955
agent.update(action, reward)
956
action_history.append(action)
957
958
# Plot action selection frequency over time
959
window = 50
960
action_probs = []
961
for i in range(window, n_trials):
962
recent_actions = action_history[i-window:i]
963
probs = [recent_actions.count(a) / window for a in range(n_arms)]
964
action_probs.append(probs)
965
966
action_probs = np.array(action_probs)
967
best_arm = np.argmax(true_values)
968
969
ax3.plot(range(window, n_trials), action_probs[:, best_arm],
970
color=color, linewidth=2, label=f'$\\beta={temp}$')
971
972
ax3.set_xlabel('Trial', fontsize=11)
973
ax3.set_ylabel('P(Select Best Arm)', fontsize=11)
974
ax3.set_title('Temperature Effects: Exploration vs. Exploitation', fontsize=12, fontweight='bold')
975
ax3.legend(fontsize=10)
976
ax3.grid(True, alpha=0.3)
977
ax3.set_ylim(0, 1)
978
979
# Final Q-values and action selection probabilities
980
ax4_agent = QLearningAgent(n_arms, learning_rate=0.1, temperature=2.0)
981
bandit_final = MultiArmedBandit(n_arms, true_values)
982
983
for trial in range(n_trials):
984
action = ax4_agent.select_action()
985
reward = bandit_final.pull_arm(action)
986
ax4_agent.update(action, reward)
987
988
x_pos = np.arange(n_arms)
989
width = 0.35
990
991
bars1 = ax4.bar(x_pos - width/2, true_values, width, label='True values', color='steelblue', edgecolor='black')
992
bars2 = ax4.bar(x_pos + width/2, ax4_agent.Q, width, label='Learned Q-values', color='coral', edgecolor='black')
993
994
# Add action selection probabilities as text
995
exp_Q = np.exp(ax4_agent.beta * ax4_agent.Q)
996
action_probs_final = exp_Q / np.sum(exp_Q)
997
for i, (bar, prob) in enumerate(zip(bars2, action_probs_final)):
998
height = bar.get_height()
999
ax4.text(bar.get_x() + bar.get_width()/2., height + 0.1,
1000
f'P={prob:.2f}', ha='center', va='bottom', fontsize=9)
1001
1002
ax4.set_xlabel('Arm', fontsize=11)
1003
ax4.set_ylabel('Value', fontsize=11)
1004
ax4.set_title(f'Learned Values After {n_trials} Trials', fontsize=12, fontweight='bold')
1005
ax4.set_xticks(x_pos)
1006
ax4.set_xticklabels([f'Arm {i+1}' for i in range(n_arms)])
1007
ax4.legend(fontsize=10)
1008
ax4.grid(True, alpha=0.3, axis='y')
1009
1010
plt.tight_layout()
1011
plt.savefig('decision_making_rl.pdf', dpi=150, bbox_inches='tight')
1012
plt.close()
1013
1014
# Calculate key RL statistics
1015
final_Q = ax4_agent.Q
1016
Q_errors = np.abs(final_Q - true_values)
1017
mean_Q_error = np.mean(Q_errors)
1018
best_arm = np.argmax(true_values)
1019
learned_best = np.argmax(final_Q)
1020
correct_identification = (learned_best == best_arm)
1021
\end{pycode}
1022
1023
\begin{figure}[htbp]
1024
\centering
1025
\includegraphics[width=\textwidth]{decision_making_rl.pdf}
1026
\caption{Reinforcement learning in a four-armed bandit task with true arm values of
1027
[\py{', '.join([f"{v:.1f}" for v in true_values])}]. (a) Q-learning dynamics with $\alpha=0.1$
1028
and $\beta=\py{f"{temperature}"}$: Q-values converge toward true values through reward feedback,
1029
with learning speed depending on action selection frequency. Dotted lines show true values.
1030
(b) Learning rate effects on cumulative reward: faster learning ($\alpha=0.5$) initially
1031
outperforms but may be less stable; slower learning ($\alpha=0.01$) produces smoother but
1032
delayed convergence. Black dashed line shows optimal performance (always selecting best arm).
1033
(c) Temperature parameter controls exploration-exploitation balance: low temperature ($\beta=0.5$)
1034
produces high exploration and slow convergence to optimal arm; high temperature ($\beta=5.0$)
1035
rapidly exploits current best estimate but may get stuck on suboptimal choices. (d) Final learned
1036
Q-values after \py{n_trials} trials (orange) compared to true values (blue), with action selection
1037
probabilities shown above bars. Agent correctly identifies best arm (Arm 2) with Q-value
1038
\py{f"{final_Q[1]:.2f}"} vs. true \py{f"{true_values[1]:.2f}"}, mean absolute error = \py{f"{mean_Q_error:.2f}"}.}
1039
\label{fig:rl}
1040
\end{figure}
1041
1042
The reinforcement learning simulation demonstrates value-based decision making through Q-learning.
1043
After \py{n_trials} trials with $\alpha = 0.1$ and $\beta = \py{f"{temperature}"}$, the agent's
1044
learned Q-values have mean absolute error of \py{f"{mean_Q_error:.3f}"} relative to true values,
1045
and it correctly identifies the best arm (Arm \py{best_arm + 1}) with selection probability
1046
\py{f"{action_probs_final[best_arm]:.2f}"}. This illustrates how experience-based learning
1047
converges to near-optimal policies.
1048
1049
\section{Model Comparison and Integration}
1050
1051
The five computational frameworks presented here address different aspects of decision making:
1052
1053
\begin{itemize}
1054
\item \textbf{Expected Utility Theory} provides the normative benchmark for rational choice under risk
1055
\item \textbf{Prospect Theory} captures systematic deviations including loss aversion and probability distortion
1056
\item \textbf{Drift-Diffusion Models} explain response time distributions and speed-accuracy tradeoffs
1057
\item \textbf{Bayesian Decision Making} prescribes optimal belief updating under uncertainty
1058
\item \textbf{Reinforcement Learning} accounts for value learning through experience
1059
\end{itemize}
1060
1061
\begin{remark}[Integration of Frameworks]
1062
Recent research integrates these approaches: drift-diffusion can be derived from Bayesian sequential
1063
analysis; reinforcement learning models can incorporate prospect theory's asymmetric value function;
1064
neural implementations suggest the brain approximates Bayesian inference through sampling mechanisms.
1065
\end{remark}
1066
1067
\section{Conclusions}
1068
1069
This computational analysis reveals the mechanistic underpinnings of human decision making:
1070
1071
\begin{enumerate}
1072
\item Expected utility with $\alpha = \py{f"{alpha_moderate}"}$ predicts risk premium of
1073
\$\py{f"{abs(rp_moderate):.2f}"} for a 50-50 gamble of $\pm$\$50
1074
\item Prospect theory with $\lambda = \py{f"{lambda_tk}"}$ captures loss aversion, with losses
1075
weighted \py{f"{loss_aversion_ratio:.2f}"} times gains
1076
\item Drift-diffusion with $\mu = \py{f"{drift_key}"}$ and $a = \py{f"{boundary_key}"}$ produces
1077
mean RT = \py{f"{mean_rt_key:.2f}"} s and accuracy = \py{f"{accuracy_key:.1%}"}
1078
\item Bayesian updating with sensitivity = \py{f"{sensitivity}"} and specificity = \py{f"{specificity}"}
1079
achieves posterior confidence of \py{f"{final_belief:.3f}"} after \py{n_tests} tests
1080
\item Q-learning with $\alpha = 0.1$ converges to within \py{f"{mean_Q_error:.3f}"} of true values
1081
after \py{n_trials} trials
1082
\end{enumerate}
1083
1084
These computational models provide quantitative frameworks for understanding both optimal decision
1085
strategies and systematic deviations from rationality observed in human behavior.
1086
1087
\section*{References}
1088
1089
\begin{itemize}
1090
\item von Neumann, J., \& Morgenstern, O. (1944). \textit{Theory of Games and Economic Behavior}. Princeton University Press.
1091
\item Kahneman, D., \& Tversky, A. (1979). Prospect theory: An analysis of decision under risk. \textit{Econometrica}, 47(2), 263-291.
1092
\item Tversky, A., \& Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. \textit{Journal of Risk and Uncertainty}, 5(4), 297-323.
1093
\item Ratcliff, R., \& McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. \textit{Neural Computation}, 20(4), 873-922.
1094
\item Ratcliff, R., Smith, P. L., Brown, S. D., \& McKoon, G. (2016). Diffusion decision model: Current issues and history. \textit{Trends in Cognitive Sciences}, 20(4), 260-281.
1095
\item Bogacz, R., Brown, E., Moehlis, J., Holmes, P., \& Cohen, J. D. (2006). The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. \textit{Psychological Review}, 113(4), 700-765.
1096
\item Edwards, W., Lindman, H., \& Savage, L. J. (1963). Bayesian statistical inference for psychological research. \textit{Psychological Review}, 70(3), 193-242.
1097
\item Sutton, R. S., \& Barto, A. G. (2018). \textit{Reinforcement Learning: An Introduction} (2nd ed.). MIT Press.
1098
\item Daw, N. D., O'Doherty, J. P., Dayan, P., Seymour, B., \& Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. \textit{Nature}, 441(7095), 876-879.
1099
\item Dayan, P., \& Daw, N. D. (2008). Decision theory, reinforcement learning, and the brain. \textit{Cognitive, Affective, \& Behavioral Neuroscience}, 8(4), 429-453.
1100
\item Gold, J. I., \& Shadlen, M. N. (2007). The neural basis of decision making. \textit{Annual Review of Neuroscience}, 30, 535-574.
1101
\item Busemeyer, J. R., \& Diederich, A. (2010). \textit{Cognitive Modeling}. Sage Publications.
1102
\item Prelec, D. (1998). The probability weighting function. \textit{Econometrica}, 66(3), 497-527.
1103
\item Rescorla, R. A., \& Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black \& W. F. Prokasy (Eds.), \textit{Classical Conditioning II: Current Research and Theory} (pp. 64-99). Appleton-Century-Crofts.
1104
\item Glimcher, P. W. (2011). \textit{Foundations of Neuroeconomic Analysis}. Oxford University Press.
1105
\item Lee, M. D., \& Wagenmakers, E. J. (2013). \textit{Bayesian Cognitive Modeling: A Practical Course}. Cambridge University Press.
1106
\item Wald, A. (1947). \textit{Sequential Analysis}. John Wiley \& Sons.
1107
\item Tversky, A., \& Fox, C. R. (1995). Weighing risk and uncertainty. \textit{Psychological Review}, 102(2), 269-283.
1108
\item Forstmann, B. U., Ratcliff, R., \& Wagenmakers, E. J. (2016). Sequential sampling models in cognitive neuroscience: Advantages, applications, and extensions. \textit{Annual Review of Psychology}, 67, 641-666.
1109
\item Niv, Y. (2009). Reinforcement learning in the brain. \textit{Journal of Mathematical Psychology}, 53(3), 139-154.
1110
\end{itemize}
1111
1112
\end{document}
1113
1114