CoCalc -- decision

GitHub Repository: Ok-landscape/computational-pipeline
Path: blob/main/latex-templates/templates/cognitive-science/decision_making.tex
⁵¹ views
unlisted
1
% Decision Making Models Template
2
% Topics: Expected utility, prospect theory, drift-diffusion, Bayesian decisions, reinforcement learning
3
% Style: Computational neuroscience report with behavioral modeling
4

5
\documentclass[a4paper, 11pt]{article}
6
\usepackage[utf8]{inputenc}
7
\usepackage[T1]{fontenc}
8
\usepackage{amsmath, amssymb}
9
\usepackage{graphicx}
10
\usepackage{siunitx}
11
\usepackage{booktabs}
12
\usepackage{subcaption}
13
\usepackage[makestderr]{pythontex}
14

15
% Theorem environments
16
\newtheorem{definition}{Definition}[section]
17
\newtheorem{theorem}{Theorem}[section]
18
\newtheorem{example}{Example}[section]
19
\newtheorem{remark}{Remark}[section]
20

21
\title{Computational Models of Human Decision Making:\\
22
From Rational Choice to Bounded Rationality}
23
\author{Cognitive Neuroscience Laboratory}
24
\date{\today}
25

26
\begin{document}
27
\maketitle
28

29
\begin{abstract}
30
This report presents a comprehensive computational analysis of human decision-making processes,
31
spanning classical normative theories to modern descriptive models. We examine expected utility
32
theory as the rational benchmark, prospect theory's account of systematic deviations from
33
rationality, drift-diffusion models for response time distributions, Bayesian frameworks for
34
decision making under uncertainty, and reinforcement learning models of value-based choice.
35
Computational simulations reveal the parametric signatures that distinguish these frameworks
36
and their ability to capture key phenomena including loss aversion, probability weighting,
37
speed-accuracy tradeoffs, and learning dynamics.
38
\end{abstract}
39

40
\section{Introduction}
41

42
Decision making is a fundamental cognitive process that has been studied across economics,
43
psychology, and neuroscience. Classical economic theory posits that humans are rational agents
44
who maximize expected utility, but decades of empirical research have revealed systematic
45
deviations from this normative ideal. Modern computational approaches seek to characterize
46
both the mechanisms underlying choice behavior and the real-time dynamics of the decision process.
47

48
\begin{definition}[Decision Problem]
49
A decision problem consists of a set of alternatives $\mathcal{A}$, a set of possible outcomes
50
$\mathcal{O}$, and a probability distribution $p(o|a)$ mapping actions to outcomes. The decision
51
maker selects an action $a^* \in \mathcal{A}$ to optimize some objective function.
52
\end{definition}
53

54
\section{Expected Utility Theory}
55

56
\subsection{Theoretical Foundation}
57

58
Expected utility theory (EUT), formalized by von Neumann and Morgenstern, assumes that decision
59
makers assign utilities $u(o)$ to outcomes and choose actions that maximize expected utility.
60

61
\begin{definition}[Expected Utility]
62
For a risky prospect that yields outcome $o_i$ with probability $p_i$, the expected utility is:
63
\begin{equation}
64
EU = \sum_{i=1}^{n} p_i \cdot u(o_i)
65
\end{equation}
66
where $u: \mathcal{O} \rightarrow \mathbb{R}$ is a utility function.
67
\end{definition}
68

69
\begin{theorem}[Risk Aversion]
70
A decision maker is risk-averse if $u$ is concave ($u'' < 0$), risk-neutral if $u$ is linear,
71
and risk-seeking if $u$ is convex ($u'' > 0$).
72
\end{theorem}
73

74
\subsection{Computational Analysis}
75

76
We now examine the implications of different utility functions for choice behavior under risk.
77
The curvature of the utility function determines risk preferences: concave functions exhibit
78
diminishing marginal utility, leading to risk aversion. We simulate decisions over monetary
79
gambles with varying utility function parameters to demonstrate these effects.
80

81
\begin{pycode}
82
import numpy as np
83
import matplotlib.pyplot as plt
84
from scipy.optimize import minimize, fsolve
85
from scipy.stats import norm, gamma
86
from scipy.integrate import odeint
87
import matplotlib.patches as mpatches
88

89
np.random.seed(42)
90

91
# Expected Utility Theory
92
def power_utility(x, alpha=0.5):
93
    """Power utility function with risk aversion parameter alpha"""
94
    if alpha == 1.0:
95
        return x
96
    return np.sign(x) * np.abs(x)**alpha
97

98
def expected_utility(outcomes, probabilities, alpha=0.5):
99
    """Calculate expected utility for a prospect"""
100
    utilities = power_utility(outcomes, alpha)
101
    return np.sum(probabilities * utilities)
102

103
# Simulate choice behavior under different risk attitudes
104
wealth = 100.0
105
stake = 50.0
106
win_prob = 0.5
107

108
# Range of risk aversion parameters
109
alphas = np.linspace(0.3, 1.5, 100)
110
certainty_equivalents = []
111

112
for alpha in alphas:
113
    # Gamble: win or lose $50 with equal probability
114
    outcomes = np.array([wealth + stake, wealth - stake])
115
    probs = np.array([win_prob, 1 - win_prob])
116
    eu_gamble = expected_utility(outcomes, probs, alpha)
117

118
    # Find certainty equivalent: CE such that u(wealth + CE) = EU(gamble)
119
    def objective(ce):
120
        return power_utility(wealth + ce, alpha) - eu_gamble
121

122
    ce = fsolve(objective, 0)[0]
123
    certainty_equivalents.append(ce)
124

125
certainty_equivalents = np.array(certainty_equivalents)
126

127
# Risk premium: expected value - certainty equivalent
128
expected_value = stake * win_prob - stake * (1 - win_prob)  # = 0 for fair gamble
129
risk_premium = expected_value - certainty_equivalents
130

131
# Create first figure
132
fig1, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
133

134
# Plot 1: Utility functions with different risk attitudes
135
x_wealth = np.linspace(0, 200, 500)
136
ax1.plot(x_wealth, power_utility(x_wealth, 0.5), 'b-', linewidth=2.5, label=r'$\alpha=0.5$ (Risk-averse)')
137
ax1.plot(x_wealth, power_utility(x_wealth, 1.0), 'g--', linewidth=2.5, label=r'$\alpha=1.0$ (Risk-neutral)')
138
ax1.plot(x_wealth, power_utility(x_wealth, 1.5), 'r-.', linewidth=2.5, label=r'$\alpha=1.5$ (Risk-seeking)')
139
ax1.axvline(wealth, color='gray', linestyle=':', alpha=0.5)
140
ax1.axvline(wealth + stake, color='orange', linestyle=':', alpha=0.5)
141
ax1.axvline(wealth - stake, color='purple', linestyle=':', alpha=0.5)
142
ax1.set_xlabel('Wealth (\$)', fontsize=11)
143
ax1.set_ylabel('Utility $u(x)$', fontsize=11)
144
ax1.set_title('Expected Utility Theory: Risk Attitudes', fontsize=12, fontweight='bold')
145
ax1.legend(fontsize=10)
146
ax1.grid(True, alpha=0.3)
147

148
# Plot 2: Risk premium as function of risk aversion
149
ax2.plot(alphas, risk_premium, 'b-', linewidth=2.5)
150
ax2.axhline(0, color='black', linestyle='--', alpha=0.5)
151
ax2.axvline(1.0, color='gray', linestyle=':', alpha=0.5, label='Risk-neutral')
152
ax2.fill_between(alphas, risk_premium, 0, where=(alphas < 1.0), alpha=0.3, color='red', label='Risk premium (averse)')
153
ax2.fill_between(alphas, risk_premium, 0, where=(alphas > 1.0), alpha=0.3, color='green', label='Risk premium (seeking)')
154
ax2.set_xlabel(r'Risk Aversion Parameter $\alpha$', fontsize=11)
155
ax2.set_ylabel('Risk Premium (\$)', fontsize=11)
156
ax2.set_title('Risk Premium vs. Risk Attitude', fontsize=12, fontweight='bold')
157
ax2.legend(fontsize=10)
158
ax2.grid(True, alpha=0.3)
159

160
plt.tight_layout()
161
plt.savefig('decision_making_eut.pdf', dpi=150, bbox_inches='tight')
162
plt.close()
163

164
# Calculate key statistics for reporting
165
alpha_moderate = 0.5
166
eu_gamble_moderate = expected_utility(
167
    np.array([wealth + stake, wealth - stake]),
168
    np.array([win_prob, 1 - win_prob]),
169
    alpha_moderate
170
)
171
ce_moderate = fsolve(lambda ce: power_utility(wealth + ce, alpha_moderate) - eu_gamble_moderate, 0)[0]
172
rp_moderate = expected_value - ce_moderate
173
\end{pycode}
174

175
\begin{figure}[htbp]
176
\centering
177
\includegraphics[width=\textwidth]{decision_making_eut.pdf}
178
\caption{Expected utility theory predictions for risky choice. Left panel shows utility functions
179
with varying risk attitudes: concave (risk-averse, $\alpha=0.5$), linear (risk-neutral, $\alpha=1.0$),
180
and convex (risk-seeking, $\alpha=1.5$). Vertical lines indicate current wealth (\$100) and potential
181
outcomes (\$150, \$50) from a 50-50 gamble. Right panel displays the risk premium as a function of
182
the risk aversion parameter $\alpha$, demonstrating that risk-averse agents ($\alpha < 1$) demand
183
compensation to accept fair gambles, while risk-seeking agents ($\alpha > 1$) accept unfavorable
184
gambles. For $\alpha=0.5$, the risk premium is approximately \$\py{f"{abs(rp_moderate):.2f}"},
185
indicating the agent requires the gamble's expected value to exceed this amount.}
186
\label{fig:eut}
187
\end{figure}
188

189
The analysis demonstrates that a moderately risk-averse agent ($\alpha = \py{f"{alpha_moderate}"}$)
190
has a certainty equivalent of \$\py{f"{ce_moderate:.2f}"} for the gamble, yielding a risk premium
191
of \$\py{f"{abs(rp_moderate):.2f}"}. This quantifies the intuition that risk-averse individuals
192
prefer certain outcomes over risky prospects with the same expected value.
193

194
\section{Prospect Theory}
195

196
\subsection{Value Function and Loss Aversion}
197

198
Kahneman and Tversky's prospect theory revolutionized the study of decision making by documenting
199
systematic violations of expected utility theory. The theory proposes that people evaluate outcomes
200
relative to a reference point and exhibit loss aversion: losses loom larger than gains.
201

202
\begin{definition}[Prospect Theory Value Function]
203
The value function is defined as:
204
\begin{equation}
205
v(x) = \begin{cases}
206
x^\alpha & \text{if } x \geq 0 \\
207
-\lambda |x|^\beta & \text{if } x < 0
208
\end{cases}
209
\end{equation}
210
where $\alpha, \beta \in (0,1)$ capture diminishing sensitivity, and $\lambda > 1$ represents
211
loss aversion (typical value: $\lambda \approx 2.25$).
212
\end{definition}
213

214
\subsection{Probability Weighting}
215

216
Prospect theory also incorporates nonlinear probability weighting, whereby small probabilities
217
are overweighted and large probabilities are underweighted.
218

219
\begin{definition}[Probability Weighting Function]
220
The one-parameter Prelec weighting function is:
221
\begin{equation}
222
w(p) = \exp\{-(-\ln p)^\gamma\}
223
\end{equation}
224
where $\gamma < 1$ produces the characteristic inverse-S shape.
225
\end{definition}
226

227
\subsection{Computational Simulation}
228

229
We now implement the full prospect theory model to analyze how loss aversion and probability
230
weighting jointly affect decision making. The model predicts specific phenomena such as the
231
fourfold pattern of risk attitudes and preference reversals that cannot be explained by
232
expected utility theory.
233

234
\begin{pycode}
235
# Prospect Theory Implementation
236
def prospect_value_function(x, alpha=0.88, beta=0.88, lam=2.25):
237
    """Kahneman-Tversky value function with loss aversion"""
238
    if isinstance(x, np.ndarray):
239
        v = np.zeros_like(x)
240
        v[x >= 0] = x[x >= 0]**alpha
241
        v[x < 0] = -lam * np.abs(x[x < 0])**beta
242
        return v
243
    else:
244
        if x >= 0:
245
            return x**alpha
246
        else:
247
            return -lam * np.abs(x)**beta
248

249
def prelec_weighting(p, gamma=0.61):
250
    """Prelec probability weighting function"""
251
    return np.exp(-(-np.log(p))**gamma)
252

253
def prospect_theory_value(outcomes, probabilities, alpha=0.88, beta=0.88, lam=2.25, gamma=0.61):
254
    """Calculate prospect theory value for a gamble"""
255
    values = prospect_value_function(outcomes, alpha, beta, lam)
256
    weights = prelec_weighting(probabilities, gamma)
257
    return np.sum(weights * values)
258

259
# Parameters from Tversky & Kahneman (1992)
260
alpha_tk = 0.88
261
beta_tk = 0.88
262
lambda_tk = 2.25
263
gamma_tk = 0.61
264

265
# Create value function plot
266
x_range = np.linspace(-100, 100, 500)
267
v_values = prospect_value_function(x_range, alpha_tk, beta_tk, lambda_tk)
268

269
fig2, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 11))
270

271
# Plot 1: Value function
272
ax1.plot(x_range, v_values, 'b-', linewidth=2.5)
273
ax1.axhline(0, color='black', linestyle='-', linewidth=0.5)
274
ax1.axvline(0, color='black', linestyle='-', linewidth=0.5)
275
# Add reference lines showing loss aversion
276
x_ref = 50
277
v_gain = prospect_value_function(x_ref, alpha_tk, beta_tk, lambda_tk)
278
v_loss = prospect_value_function(-x_ref, alpha_tk, beta_tk, lambda_tk)
279
ax1.plot([x_ref, x_ref], [0, v_gain], 'g--', alpha=0.5)
280
ax1.plot([-x_ref, -x_ref], [v_loss, 0], 'r--', alpha=0.5)
281
ax1.plot([0, x_ref], [v_gain, v_gain], 'g--', alpha=0.5)
282
ax1.plot([0, -x_ref], [v_loss, v_loss], 'r--', alpha=0.5)
283
ax1.text(x_ref+5, v_gain/2, f'Gain: {v_gain:.1f}', fontsize=9, color='green')
284
ax1.text(-x_ref-25, v_loss/2, f'Loss: {v_loss:.1f}', fontsize=9, color='red')
285
ax1.set_xlabel('Outcome (Relative to Reference Point)', fontsize=11)
286
ax1.set_ylabel('Subjective Value $v(x)$', fontsize=11)
287
ax1.set_title(f'Prospect Theory Value Function ($\\lambda={lambda_tk}$)', fontsize=12, fontweight='bold')
288
ax1.grid(True, alpha=0.3)
289

290
# Plot 2: Probability weighting
291
p_range = np.linspace(0.01, 0.99, 100)
292
w_values = prelec_weighting(p_range, gamma_tk)
293
ax2.plot(p_range, w_values, 'b-', linewidth=2.5, label=f'$\\gamma={gamma_tk}$')
294
ax2.plot([0, 1], [0, 1], 'k--', linewidth=1, label='Identity (no weighting)')
295
ax2.fill_between(p_range, p_range, w_values, where=(w_values > p_range),
296
                  alpha=0.3, color='red', label='Overweighting')
297
ax2.fill_between(p_range, p_range, w_values, where=(w_values < p_range),
298
                  alpha=0.3, color='blue', label='Underweighting')
299
ax2.set_xlabel('Objective Probability $p$', fontsize=11)
300
ax2.set_ylabel('Decision Weight $w(p)$', fontsize=11)
301
ax2.set_title('Prelec Probability Weighting Function', fontsize=12, fontweight='bold')
302
ax2.legend(fontsize=9)
303
ax2.grid(True, alpha=0.3)
304

305
# Plot 3: Fourfold pattern of risk attitudes
306
# Gains domain
307
gain_amounts = np.linspace(10, 100, 20)
308
p_low = 0.01
309
p_high = 0.90
310

311
ce_gain_low = []
312
ce_gain_high = []
313

314
for gain in gain_amounts:
315
    # Low probability gain
316
    pt_val_low = prospect_theory_value(np.array([gain, 0]), np.array([p_low, 1-p_low]),
317
                                        alpha_tk, beta_tk, lambda_tk, gamma_tk)
318
    # Find CE
319
    ce_low = fsolve(lambda x: prospect_value_function(x, alpha_tk, beta_tk, lambda_tk) - pt_val_low, gain*p_low)[0]
320
    ce_gain_low.append(ce_low)
321

322
    # High probability gain
323
    pt_val_high = prospect_theory_value(np.array([gain, 0]), np.array([p_high, 1-p_high]),
324
                                         alpha_tk, beta_tk, lambda_tk, gamma_tk)
325
    ce_high = fsolve(lambda x: prospect_value_function(x, alpha_tk, beta_tk, lambda_tk) - pt_val_high, gain*p_high)[0]
326
    ce_gain_high.append(ce_high)
327

328
expected_gain_low = gain_amounts * p_low
329
expected_gain_high = gain_amounts * p_high
330

331
ax3.plot(expected_gain_low, ce_gain_low, 'b-', linewidth=2.5, label=f'Low prob ({p_low})')
332
ax3.plot(expected_gain_high, ce_gain_high, 'g-', linewidth=2.5, label=f'High prob ({p_high})')
333
ax3.plot([0, 100], [0, 100], 'k--', linewidth=1, label='Risk-neutral')
334
ax3.set_xlabel('Expected Value', fontsize=11)
335
ax3.set_ylabel('Certainty Equivalent', fontsize=11)
336
ax3.set_title('Fourfold Pattern: Gains Domain', fontsize=12, fontweight='bold')
337
ax3.legend(fontsize=10)
338
ax3.grid(True, alpha=0.3)
339

340
# Plot 4: Loss aversion magnitude
341
lambda_values = np.linspace(1.0, 3.5, 50)
342
mixed_gamble_acceptance = []
343

344
for lam in lambda_values:
345
    # Mixed gamble: 50% chance to win $100, 50% chance to lose $100
346
    outcomes = np.array([100, -100])
347
    probs = np.array([0.5, 0.5])
348
    pt_val = prospect_theory_value(outcomes, probs, alpha_tk, beta_tk, lam, gamma_tk)
349
    mixed_gamble_acceptance.append(pt_val)
350

351
mixed_gamble_acceptance = np.array(mixed_gamble_acceptance)
352

353
ax4.plot(lambda_values, mixed_gamble_acceptance, 'b-', linewidth=2.5)
354
ax4.axhline(0, color='red', linestyle='--', linewidth=2, label='Indifference threshold')
355
ax4.axvline(lambda_tk, color='gray', linestyle=':', alpha=0.7, label=f'Typical $\\lambda={lambda_tk}$')
356
ax4.fill_between(lambda_values, mixed_gamble_acceptance, 0,
357
                  where=(mixed_gamble_acceptance > 0), alpha=0.3, color='green', label='Accept')
358
ax4.fill_between(lambda_values, mixed_gamble_acceptance, 0,
359
                  where=(mixed_gamble_acceptance < 0), alpha=0.3, color='red', label='Reject')
360
ax4.set_xlabel('Loss Aversion Parameter $\\lambda$', fontsize=11)
361
ax4.set_ylabel('Prospect Value', fontsize=11)
362
ax4.set_title('Mixed Gamble Acceptance vs. Loss Aversion', fontsize=12, fontweight='bold')
363
ax4.legend(fontsize=10)
364
ax4.grid(True, alpha=0.3)
365

366
plt.tight_layout()
367
plt.savefig('decision_making_prospect.pdf', dpi=150, bbox_inches='tight')
368
plt.close()
369

370
# Calculate key statistics
371
loss_aversion_ratio = abs(v_loss / v_gain)
372
overweighting_01 = prelec_weighting(0.01, gamma_tk) / 0.01
373
underweighting_90 = prelec_weighting(0.90, gamma_tk) / 0.90
374
\end{pycode}
375

376
\begin{figure}[htbp]
377
\centering
378
\includegraphics[width=\textwidth]{decision_making_prospect.pdf}
379
\caption{Prospect theory components and predictions. (a) Value function showing loss aversion:
380
losses loom larger than equivalent gains by a factor of $\lambda=\py{f"{lambda_tk}"}$, resulting
381
in a loss aversion ratio of \py{f"{loss_aversion_ratio:.2f}"}. The function is steeper for losses
382
than gains, demonstrating diminishing sensitivity in both domains. (b) Prelec probability weighting
383
function ($\gamma=\py{f"{gamma_tk}"}$) exhibits inverse-S shape: small probabilities (e.g., 0.01)
384
are overweighted by a factor of \py{f"{overweighting_01:.2f}"}, while large probabilities (e.g., 0.90)
385
are underweighted to \py{f"{underweighting_90:.3f}"} of their objective value. (c) Fourfold pattern
386
in gains domain: risk-seeking for low-probability gains (lottery tickets) and risk-averse for
387
high-probability gains (insurance). (d) Mixed gamble acceptance threshold: agents with typical loss
388
aversion ($\lambda \approx \py{f"{lambda_tk}"}$) reject 50-50 mixed gambles even when expected value
389
is zero, requiring gains to be approximately twice as large as losses to accept.}
390
\label{fig:prospect}
391
\end{figure}
392

393
The prospect theory analysis reveals that with parameters $\alpha = \py{f"{alpha_tk}"}$,
394
$\beta = \py{f"{beta_tk}"}$, $\lambda = \py{f"{lambda_tk}"}$, and $\gamma = \py{f"{gamma_tk}"}$,
395
losses are experienced as \py{f"{loss_aversion_ratio:.2f}"} times more impactful than equivalent
396
gains. This loss aversion explains the rejection of symmetric mixed gambles and the endowment effect.
397

398
\section{Drift-Diffusion Model}
399

400
\subsection{Sequential Sampling Framework}
401

402
The drift-diffusion model (DDM) provides a mechanistic account of two-alternative forced choice
403
decisions, including both choice and response time distributions. Evidence accumulates over time
404
as a noisy process until reaching a decision boundary.
405

406
\begin{definition}[Drift-Diffusion Process]
407
The accumulated evidence $x(t)$ evolves according to:
408
\begin{equation}
409
dx = \mu \, dt + \sigma \, dW
410
\end{equation}
411
where $\mu$ is the drift rate (evidence quality), $\sigma$ is diffusion noise, and $dW$ is a
412
Wiener increment. Decision occurs when $x(t)$ reaches boundary $\pm a$.
413
\end{definition}
414

415
\begin{theorem}[Mean Decision Time]
416
For symmetric boundaries at $\pm a$ with starting point $z=0$, the mean decision time is:
417
\begin{equation}
418
\mathbb{E}[T] = \frac{a}{\mu} \tanh\left(\frac{a\mu}{\sigma^2}\right) + T_{er}
419
\end{equation}
420
where $T_{er}$ is non-decision time (encoding and response execution).
421
\end{theorem}
422

423
\subsection{Simulation of Decision Dynamics}
424

425
We simulate the drift-diffusion process using Euler-Maruyama discretization to generate response
426
time distributions and examine the speed-accuracy tradeoff. The model predicts that increasing
427
boundary separation increases accuracy but slows responses, while drift rate affects both.
428

429
\begin{pycode}
430
# Drift-Diffusion Model Simulation
431
def simulate_ddm_trial(drift_rate, boundary, noise=1.0, dt=0.001, non_decision_time=0.3):
432
    """Simulate a single DDM trial"""
433
    evidence = 0
434
    t = 0
435
    max_time = 10.0  # Prevent infinite loops
436

437
    while abs(evidence) < boundary and t < max_time:
438
        evidence += drift_rate * dt + noise * np.sqrt(dt) * np.random.randn()
439
        t += dt
440

441
    choice = 1 if evidence >= boundary else -1
442
    reaction_time = t + non_decision_time
443
    return choice, reaction_time
444

445
def simulate_ddm_experiment(drift_rate, boundary, n_trials=1000, noise=1.0, non_decision_time=0.3):
446
    """Simulate multiple DDM trials"""
447
    choices = []
448
    rts = []
449

450
    for _ in range(n_trials):
451
        choice, rt = simulate_ddm_trial(drift_rate, boundary, noise, non_decision_time=non_decision_time)
452
        choices.append(choice)
453
        rts.append(rt)
454

455
    return np.array(choices), np.array(rts)
456

457
# Simulate DDM with different parameters
458
drift_rates = [0.1, 0.3, 0.5]
459
boundaries = [0.8, 1.2, 1.6]
460
n_trials = 2000
461

462
fig3 = plt.figure(figsize=(14, 10))
463

464
# Drift rate manipulation
465
ax1 = plt.subplot(2, 3, 1)
466
for drift in drift_rates:
467
    choices, rts = simulate_ddm_experiment(drift, boundary=1.2, n_trials=n_trials)
468
    correct = choices[choices == 1]
469
    correct_rts = rts[choices == 1]
470
    ax1.hist(correct_rts, bins=50, alpha=0.6, density=True, label=f'$\\mu={drift}$')
471

472
ax1.set_xlabel('Response Time (s)', fontsize=11)
473
ax1.set_ylabel('Density', fontsize=11)
474
ax1.set_title('RT Distributions: Drift Rate Effect', fontsize=12, fontweight='bold')
475
ax1.legend(fontsize=10)
476
ax1.set_xlim(0, 5)
477

478
# Boundary manipulation
479
ax2 = plt.subplot(2, 3, 2)
480
for bound in boundaries:
481
    choices, rts = simulate_ddm_experiment(drift_rate=0.3, boundary=bound, n_trials=n_trials)
482
    correct = choices[choices == 1]
483
    correct_rts = rts[choices == 1]
484
    ax2.hist(correct_rts, bins=50, alpha=0.6, density=True, label=f'$a={bound}$')
485

486
ax2.set_xlabel('Response Time (s)', fontsize=11)
487
ax2.set_ylabel('Density', fontsize=11)
488
ax2.set_title('RT Distributions: Boundary Effect', fontsize=12, fontweight='bold')
489
ax2.legend(fontsize=10)
490
ax2.set_xlim(0, 5)
491

492
# Speed-accuracy tradeoff
493
ax3 = plt.subplot(2, 3, 3)
494
boundaries_sac = np.linspace(0.5, 2.0, 10)
495
accuracy_sac = []
496
mean_rt_sac = []
497

498
for bound in boundaries_sac:
499
    choices, rts = simulate_ddm_experiment(drift_rate=0.3, boundary=bound, n_trials=1000)
500
    accuracy_sac.append(np.mean(choices == 1))
501
    mean_rt_sac.append(np.mean(rts))
502

503
ax3.plot(mean_rt_sac, accuracy_sac, 'bo-', linewidth=2, markersize=6)
504
ax3.set_xlabel('Mean Response Time (s)', fontsize=11)
505
ax3.set_ylabel('Accuracy', fontsize=11)
506
ax3.set_title('Speed-Accuracy Tradeoff', fontsize=12, fontweight='bold')
507
ax3.grid(True, alpha=0.3)
508

509
# Sample trajectories
510
ax4 = plt.subplot(2, 3, 4)
511
drift_demo = 0.4
512
bound_demo = 1.5
513
n_trajectories = 20
514

515
for _ in range(n_trajectories):
516
    evidence_path = [0]
517
    t_path = [0]
518
    evidence = 0
519
    t = 0
520
    dt = 0.01
521

522
    while abs(evidence) < bound_demo and t < 5:
523
        evidence += drift_demo * dt + 1.0 * np.sqrt(dt) * np.random.randn()
524
        t += dt
525
        evidence_path.append(evidence)
526
        t_path.append(t)
527

528
    color = 'green' if evidence >= bound_demo else 'red'
529
    alpha = 0.3
530
    ax4.plot(t_path, evidence_path, color=color, alpha=alpha, linewidth=1)
531

532
ax4.axhline(bound_demo, color='blue', linestyle='--', linewidth=2, label='Upper boundary')
533
ax4.axhline(-bound_demo, color='orange', linestyle='--', linewidth=2, label='Lower boundary')
534
ax4.axhline(0, color='black', linestyle='-', linewidth=0.5)
535
ax4.set_xlabel('Time (s)', fontsize=11)
536
ax4.set_ylabel('Evidence $x(t)$', fontsize=11)
537
ax4.set_title(f'Sample Trajectories ($\\mu={drift_demo}$, $a={bound_demo}$)', fontsize=12, fontweight='bold')
538
ax4.legend(fontsize=10)
539
ax4.set_xlim(0, 5)
540
ax4.grid(True, alpha=0.3)
541

542
# Quantile probability plot
543
ax5 = plt.subplot(2, 3, 5)
544
choices_qp, rts_qp = simulate_ddm_experiment(drift_rate=0.4, boundary=1.2, n_trials=5000)
545
correct_rts = rts_qp[choices_qp == 1]
546
error_rts = rts_qp[choices_qp == -1]
547

548
quantiles = [0.1, 0.3, 0.5, 0.7, 0.9]
549
correct_quantiles = np.quantile(correct_rts, quantiles)
550
error_quantiles = np.quantile(error_rts, quantiles) if len(error_rts) > 10 else np.zeros(len(quantiles))
551

552
ax5.plot(quantiles, correct_quantiles, 'go-', linewidth=2, markersize=8, label='Correct')
553
if len(error_rts) > 10:
554
    ax5.plot(quantiles, error_quantiles, 'ro-', linewidth=2, markersize=8, label='Error')
555
ax5.set_xlabel('Quantile', fontsize=11)
556
ax5.set_ylabel('Response Time (s)', fontsize=11)
557
ax5.set_title('Quantile-Probability Plot', fontsize=12, fontweight='bold')
558
ax5.legend(fontsize=10)
559
ax5.grid(True, alpha=0.3)
560

561
# Drift rate vs accuracy/RT
562
ax6 = plt.subplot(2, 3, 6)
563
drift_range = np.linspace(0.05, 0.8, 15)
564
acc_by_drift = []
565
rt_by_drift = []
566

567
for drift in drift_range:
568
    choices_d, rts_d = simulate_ddm_experiment(drift, boundary=1.2, n_trials=500)
569
    acc_by_drift.append(np.mean(choices_d == 1))
570
    rt_by_drift.append(np.mean(rts_d))
571

572
ax6_twin = ax6.twinx()
573
line1 = ax6.plot(drift_range, acc_by_drift, 'b-o', linewidth=2, markersize=5, label='Accuracy')
574
line2 = ax6_twin.plot(drift_range, rt_by_drift, 'r-s', linewidth=2, markersize=5, label='Mean RT')
575

576
ax6.set_xlabel('Drift Rate $\\mu$', fontsize=11)
577
ax6.set_ylabel('Accuracy', fontsize=11, color='blue')
578
ax6_twin.set_ylabel('Mean RT (s)', fontsize=11, color='red')
579
ax6.set_title('Drift Rate Effects', fontsize=12, fontweight='bold')
580
ax6.tick_params(axis='y', labelcolor='blue')
581
ax6_twin.tick_params(axis='y', labelcolor='red')
582
ax6.grid(True, alpha=0.3)
583

584
lines = line1 + line2
585
labels = [l.get_label() for l in lines]
586
ax6.legend(lines, labels, fontsize=10, loc='center right')
587

588
plt.tight_layout()
589
plt.savefig('decision_making_ddm.pdf', dpi=150, bbox_inches='tight')
590
plt.close()
591

592
# Calculate key DDM statistics
593
drift_key = 0.4
594
boundary_key = 1.2
595
choices_key, rts_key = simulate_ddm_experiment(drift_key, boundary_key, n_trials=5000)
596
accuracy_key = np.mean(choices_key == 1)
597
mean_rt_key = np.mean(rts_key)
598
std_rt_key = np.std(rts_key)
599
median_rt_key = np.median(rts_key)
600
\end{pycode}
601

602
\begin{figure}[htbp]
603
\centering
604
\includegraphics[width=\textwidth]{decision_making_ddm.pdf}
605
\caption{Drift-diffusion model simulations of two-alternative forced choice. (a) Response time
606
distributions for different drift rates ($\mu$): higher evidence quality produces faster, more
607
accurate decisions with reduced RT variability. (b) Boundary separation effects: increasing the
608
decision threshold ($a$) from 0.8 to 1.6 produces slower but more conservative responses with
609
broader RT distributions. (c) Speed-accuracy tradeoff: adjusting boundary separation produces
610
the characteristic inverted-U relationship between mean RT and accuracy. (d) Twenty sample
611
evidence accumulation trajectories (green = correct, red = error) showing the stochastic nature
612
of the decision process. (e) Quantile-probability plot showing that correct responses are
613
systematically faster than errors at all quantiles. (f) Drift rate effects on both accuracy
614
and mean RT: with $a=\py{f"{boundary_key}"}$, a drift rate of $\mu=\py{f"{drift_key}"}$ yields
615
accuracy = \py{f"{accuracy_key:.3f}"} and mean RT = \py{f"{mean_rt_key:.3f}"} s (SD = \py{f"{std_rt_key:.3f}"} s).}
616
\label{fig:ddm}
617
\end{figure}
618

619
The drift-diffusion analysis with $\mu = \py{f"{drift_key}"}$ and $a = \py{f"{boundary_key}"}$
620
produces mean RT = \py{f"{mean_rt_key:.2f}"} s (median = \py{f"{median_rt_key:.2f}"} s) with
621
accuracy = \py{f"{accuracy_key:.1%}"}. The model successfully captures empirical RT distributions,
622
including their characteristic positive skew and the speed-accuracy tradeoff.
623

624
\section{Bayesian Decision Making}
625

626
\subsection{Decision Making Under Uncertainty}
627

628
Bayesian decision theory provides a normative framework for optimal decision making when beliefs
629
must be updated from noisy evidence. The decision maker maintains a posterior distribution over
630
states and chooses actions to minimize expected loss.
631

632
\begin{definition}[Bayes' Rule]
633
Given evidence $e$ and hypotheses $h_i$, the posterior probability is:
634
\begin{equation}
635
P(h_i | e) = \frac{P(e | h_i) P(h_i)}{\sum_j P(e | h_j) P(h_j)}
636
\end{equation}
637
\end{definition}
638

639
\begin{theorem}[Optimal Bayesian Decision]
640
The action $a^*$ that minimizes expected loss $L(a, h)$ is:
641
\begin{equation}
642
a^* = \arg\min_a \sum_h L(a, h) P(h | e)
643
\end{equation}
644
\end{theorem}
645

646
\subsection{Sequential Evidence Accumulation}
647

648
We simulate a medical diagnosis scenario where a physician must decide whether a patient has
649
a disease based on sequential test results. The Bayesian framework prescribes how beliefs should
650
be updated and when sufficient evidence has accumulated to make a decision.
651

652
\begin{pycode}
653
# Bayesian Decision Making
654
def bayesian_update(prior, likelihood_pos, likelihood_neg, evidence):
655
    """Update beliefs using Bayes' rule"""
656
    if evidence == 1:  # Positive test
657
        posterior_unnorm = prior * likelihood_pos
658
    else:  # Negative test
659
        posterior_unnorm = prior * likelihood_neg
660

661
    return posterior_unnorm / np.sum(posterior_unnorm)
662

663
# Medical diagnosis scenario
664
n_tests = 20
665
sensitivity = 0.85  # P(test+ | disease+)
666
specificity = 0.90  # P(test- | disease-)
667
prior_prob_disease = 0.10
668

669
# True state: patient has disease
670
true_state = 1
671

672
# Generate test results
673
np.random.seed(123)
674
test_results = []
675
for _ in range(n_tests):
676
    if true_state == 1:
677
        test_results.append(1 if np.random.rand() < sensitivity else 0)
678
    else:
679
        test_results.append(0 if np.random.rand() < specificity else 1)
680

681
# Sequential belief updating
682
beliefs = [prior_prob_disease]
683
for test in test_results:
684
    # Prior over disease states
685
    prior = np.array([1 - beliefs[-1], beliefs[-1]])
686

687
    # Likelihood: P(test result | disease state)
688
    if test == 1:
689
        likelihood = np.array([1 - specificity, sensitivity])
690
    else:
691
        likelihood = np.array([specificity, 1 - sensitivity])
692

693
    # Bayesian update
694
    posterior = bayesian_update(prior, likelihood, likelihood, test)
695
    beliefs.append(posterior[1])
696

697
beliefs = np.array(beliefs)
698

699
# Create Bayesian decision figure
700
fig4, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 10))
701

702
# Plot 1: Sequential belief updating
703
ax1.plot(range(len(beliefs)), beliefs, 'b-o', linewidth=2, markersize=4)
704
ax1.axhline(0.5, color='red', linestyle='--', linewidth=2, label='Decision threshold')
705
ax1.axhline(prior_prob_disease, color='gray', linestyle=':', alpha=0.7, label='Prior')
706
ax1.fill_between(range(len(beliefs)), 0.5, beliefs, where=(beliefs > 0.5),
707
                  alpha=0.3, color='green', label='Diagnose disease')
708
ax1.fill_between(range(len(beliefs)), 0.5, beliefs, where=(beliefs <= 0.5),
709
                  alpha=0.3, color='blue', label='Diagnose healthy')
710
ax1.set_xlabel('Number of Tests', fontsize=11)
711
ax1.set_ylabel('P(Disease | Evidence)', fontsize=11)
712
ax1.set_title('Sequential Bayesian Belief Updating', fontsize=12, fontweight='bold')
713
ax1.legend(fontsize=9)
714
ax1.grid(True, alpha=0.3)
715
ax1.set_ylim(0, 1)
716

717
# Plot 2: Likelihood ratio and evidence strength
718
log_likelihood_ratios = []
719
for test in test_results:
720
    if test == 1:
721
        lr = sensitivity / (1 - specificity)
722
    else:
723
        lr = (1 - sensitivity) / specificity
724
    log_likelihood_ratios.append(np.log(lr))
725

726
cumulative_log_lr = np.cumsum(log_likelihood_ratios)
727

728
ax2.plot(range(1, len(cumulative_log_lr)+1), cumulative_log_lr, 'b-o', linewidth=2, markersize=4)
729
ax2.axhline(0, color='black', linestyle='-', linewidth=0.5)
730
ax2.axhline(np.log(3), color='green', linestyle='--', alpha=0.7, label='Moderate evidence (3:1)')
731
ax2.axhline(-np.log(3), color='red', linestyle='--', alpha=0.7, label='Moderate evidence (1:3)')
732
ax2.set_xlabel('Test Number', fontsize=11)
733
ax2.set_ylabel('Cumulative Log Likelihood Ratio', fontsize=11)
734
ax2.set_title('Evidence Accumulation (Log Odds)', fontsize=12, fontweight='bold')
735
ax2.legend(fontsize=9)
736
ax2.grid(True, alpha=0.3)
737

738
# Plot 3: Optimal decision boundary with costs
739
costs_fn = 1.0  # False negative (missed disease)
740
costs_fp_range = np.linspace(0.1, 2.0, 50)
741
optimal_thresholds = []
742

743
for cost_fp in costs_fp_range:
744
    # Optimal threshold: P(disease) where expected costs are equal
745
    # cost_fp * (1 - p) = cost_fn * p
746
    threshold = cost_fp / (cost_fp + costs_fn)
747
    optimal_thresholds.append(threshold)
748

749
ax3.plot(costs_fp_range, optimal_thresholds, 'b-', linewidth=2.5)
750
ax3.axvline(costs_fn, color='gray', linestyle=':', alpha=0.7, label='Equal costs')
751
ax3.axhline(0.5, color='gray', linestyle=':', alpha=0.7)
752
ax3.set_xlabel('Cost of False Positive', fontsize=11)
753
ax3.set_ylabel('Optimal Decision Threshold', fontsize=11)
754
ax3.set_title(f'Optimal Threshold vs. Cost Ratio (FN cost = {costs_fn})', fontsize=12, fontweight='bold')
755
ax3.legend(fontsize=10)
756
ax3.grid(True, alpha=0.3)
757

758
# Plot 4: ROC curve and decision boundaries
759
specificities = np.linspace(0.01, 0.99, 100)
760
sensitivities_roc = []
761

762
# For different decision thresholds
763
for spec in specificities:
764
    # Implied sensitivity at this operating point (simplified model)
765
    sens = 1 - (1 - spec) * 0.5
766
    sensitivities_roc.append(max(0, min(1, sens)))
767

768
ax4.plot(1 - specificities, sensitivities_roc, 'b-', linewidth=2.5, label='ROC curve')
769
ax4.plot([0, 1], [0, 1], 'k--', linewidth=1, label='Chance')
770
ax4.fill_between([0, 1], [0, 1], [1, 1], alpha=0.2, color='green', label='Better than chance')
771
ax4.scatter([1-specificity], [sensitivity], s=200, c='red', marker='*',
772
            edgecolors='black', linewidths=2, label=f'Test performance ({sensitivity},{specificity})', zorder=5)
773
ax4.set_xlabel('False Positive Rate (1 - Specificity)', fontsize=11)
774
ax4.set_ylabel('True Positive Rate (Sensitivity)', fontsize=11)
775
ax4.set_title('ROC Curve: Diagnostic Performance', fontsize=12, fontweight='bold')
776
ax4.legend(fontsize=9)
777
ax4.grid(True, alpha=0.3)
778

779
plt.tight_layout()
780
plt.savefig('decision_making_bayesian.pdf', dpi=150, bbox_inches='tight')
781
plt.close()
782

783
# Calculate key statistics
784
final_belief = beliefs[-1]
785
n_positive_tests = np.sum(test_results)
786
final_log_odds = cumulative_log_lr[-1]
787
\end{pycode}
788

789
\begin{figure}[htbp]
790
\centering
791
\includegraphics[width=\textwidth]{decision_making_bayesian.pdf}
792
\caption{Bayesian decision making for medical diagnosis. (a) Sequential belief updating: starting
793
from a prior of P(disease) = \py{f"{prior_prob_disease}"}, the posterior probability evolves with
794
each test result (sensitivity = \py{f"{sensitivity}"}, specificity = \py{f"{specificity}"}). After
795
\py{n_tests} tests with \py{n_positive_tests} positive results, final belief reaches \py{f"{final_belief:.3f}"}.
796
The decision threshold of 0.5 determines when to diagnose disease. (b) Cumulative log likelihood ratio
797
tracks total evidence strength: values above log(3) represent moderate evidence for disease. Final log
798
odds = \py{f"{final_log_odds:.2f}"}, indicating strong evidence. (c) Optimal decision threshold depends
799
on cost ratio: when false negatives are more costly than false positives, threshold should decrease to
800
favor diagnosing disease. With equal costs, threshold = 0.5. (d) ROC curve shows tradeoff between true
801
positive rate (sensitivity) and false positive rate (1-specificity). Test performance (red star) indicates
802
better-than-chance discrimination, with area under curve representing overall diagnostic accuracy.}
803
\label{fig:bayesian}
804
\end{figure}
805

806
The Bayesian analysis shows that after \py{n_tests} sequential tests, the posterior probability
807
of disease is \py{f"{final_belief:.3f}"}, representing a log odds of \py{f"{final_log_odds:.2f}"}.
808
This demonstrates how weak evidence (single test) accumulates to produce strong diagnostic confidence,
809
formalizing the intuitive notion that multiple corroborating tests increase certainty.
810

811
\section{Reinforcement Learning}
812

813
\subsection{Value-Based Decision Making}
814

815
Reinforcement learning provides a framework for learning to make decisions through trial and error.
816
Agents learn value functions that predict expected future reward and use these to guide action selection.
817

818
\begin{definition}[Q-Learning]
819
The action-value function $Q(s, a)$ represents expected return from taking action $a$ in state $s$.
820
The Q-learning update rule is:
821
\begin{equation}
822
Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \left[ r_t + \gamma \max_{a'} Q(s_{t+1}, a') - Q(s_t, a_t) \right]
823
\end{equation}
824
where $\alpha$ is learning rate and $\gamma$ is discount factor.
825
\end{definition}
826

827
\begin{definition}[Softmax Action Selection]
828
Actions are selected probabilistically according to:
829
\begin{equation}
830
P(a|s) = \frac{\exp(\beta Q(s,a))}{\sum_{a'} \exp(\beta Q(s,a'))}
831
\end{equation}
832
where $\beta$ is inverse temperature controlling exploration-exploitation.
833
\end{definition}
834

835
\subsection{Multi-Armed Bandit Simulation}
836

837
We simulate a classic reinforcement learning task where an agent must learn which of several slot
838
machines (arms) provides the highest reward. The agent faces the exploration-exploitation dilemma:
839
whether to exploit known good options or explore alternatives that might be better.
840

841
\begin{pycode}
842
# Reinforcement Learning: Multi-armed bandit
843
class MultiArmedBandit:
844
    def __init__(self, n_arms, true_values):
845
        self.n_arms = n_arms
846
        self.true_values = true_values
847

848
    def pull_arm(self, arm):
849
        """Pull arm and receive reward"""
850
        return self.true_values[arm] + np.random.randn() * 0.5
851

852
class QLearningAgent:
853
    def __init__(self, n_arms, learning_rate=0.1, temperature=1.0):
854
        self.n_arms = n_arms
855
        self.Q = np.zeros(n_arms)  # Q-values
856
        self.alpha = learning_rate
857
        self.beta = temperature
858
        self.action_counts = np.zeros(n_arms)
859

860
    def select_action(self):
861
        """Softmax action selection"""
862
        exp_Q = np.exp(self.beta * self.Q)
863
        probs = exp_Q / np.sum(exp_Q)
864
        return np.random.choice(self.n_arms, p=probs)
865

866
    def update(self, action, reward):
867
        """Q-learning update"""
868
        self.Q[action] += self.alpha * (reward - self.Q[action])
869
        self.action_counts[action] += 1
870

871
# Setup bandit task
872
n_arms = 4
873
true_values = np.array([1.0, 2.5, 1.5, 0.5])  # True mean rewards
874
n_trials = 500
875

876
# Simulate different learning rates
877
learning_rates = [0.01, 0.1, 0.5]
878
temperature = 2.0
879

880
fig5, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 10))
881

882
for lr in learning_rates:
883
    bandit = MultiArmedBandit(n_arms, true_values)
884
    agent = QLearningAgent(n_arms, learning_rate=lr, temperature=temperature)
885

886
    Q_history = [agent.Q.copy()]
887
    rewards_history = []
888

889
    for trial in range(n_trials):
890
        action = agent.select_action()
891
        reward = bandit.pull_arm(action)
892
        agent.update(action, reward)
893

894
        Q_history.append(agent.Q.copy())
895
        rewards_history.append(reward)
896

897
    Q_history = np.array(Q_history)
898

899
    # Plot Q-value learning
900
    for arm in range(n_arms):
901
        if lr == 0.1:  # Only plot for one learning rate to avoid clutter
902
            ax1.plot(Q_history[:, arm], label=f'Arm {arm+1} (true: {true_values[arm]:.1f})', alpha=0.8)
903

904
ax1.axhline(0, color='black', linestyle='-', linewidth=0.5)
905
for arm in range(n_arms):
906
    ax1.axhline(true_values[arm], color='gray', linestyle=':', alpha=0.5)
907
ax1.set_xlabel('Trial', fontsize=11)
908
ax1.set_ylabel('Q-value', fontsize=11)
909
ax1.set_title(f'Q-Learning Dynamics ($\\alpha={0.1}$, $\\beta={temperature}$)', fontsize=12, fontweight='bold')
910
ax1.legend(fontsize=9)
911
ax1.grid(True, alpha=0.3)
912

913
# Compare learning rates
914
ax2_data = []
915
for lr in learning_rates:
916
    bandit = MultiArmedBandit(n_arms, true_values)
917
    agent = QLearningAgent(n_arms, learning_rate=lr, temperature=temperature)
918

919
    cumulative_reward = 0
920
    cumulative_rewards = []
921

922
    for trial in range(n_trials):
923
        action = agent.select_action()
924
        reward = bandit.pull_arm(action)
925
        agent.update(action, reward)
926
        cumulative_reward += reward
927
        cumulative_rewards.append(cumulative_reward)
928

929
    ax2.plot(cumulative_rewards, linewidth=2, label=f'$\\alpha={lr}$')
930

931
# Optimal cumulative reward
932
optimal_reward_per_trial = np.max(true_values)
933
optimal_cumulative = np.arange(1, n_trials+1) * optimal_reward_per_trial
934
ax2.plot(optimal_cumulative, 'k--', linewidth=2, label='Optimal')
935

936
ax2.set_xlabel('Trial', fontsize=11)
937
ax2.set_ylabel('Cumulative Reward', fontsize=11)
938
ax2.set_title('Learning Rate Comparison', fontsize=12, fontweight='bold')
939
ax2.legend(fontsize=10)
940
ax2.grid(True, alpha=0.3)
941

942
# Temperature effects on exploration
943
temperatures = [0.5, 2.0, 5.0]
944
ax3_colors = ['blue', 'green', 'red']
945

946
for temp, color in zip(temperatures, ax3_colors):
947
    bandit = MultiArmedBandit(n_arms, true_values)
948
    agent = QLearningAgent(n_arms, learning_rate=0.1, temperature=temp)
949

950
    action_history = []
951

952
    for trial in range(n_trials):
953
        action = agent.select_action()
954
        reward = bandit.pull_arm(action)
955
        agent.update(action, reward)
956
        action_history.append(action)
957

958
    # Plot action selection frequency over time
959
    window = 50
960
    action_probs = []
961
    for i in range(window, n_trials):
962
        recent_actions = action_history[i-window:i]
963
        probs = [recent_actions.count(a) / window for a in range(n_arms)]
964
        action_probs.append(probs)
965

966
    action_probs = np.array(action_probs)
967
    best_arm = np.argmax(true_values)
968

969
    ax3.plot(range(window, n_trials), action_probs[:, best_arm],
970
             color=color, linewidth=2, label=f'$\\beta={temp}$')
971

972
ax3.set_xlabel('Trial', fontsize=11)
973
ax3.set_ylabel('P(Select Best Arm)', fontsize=11)
974
ax3.set_title('Temperature Effects: Exploration vs. Exploitation', fontsize=12, fontweight='bold')
975
ax3.legend(fontsize=10)
976
ax3.grid(True, alpha=0.3)
977
ax3.set_ylim(0, 1)
978

979
# Final Q-values and action selection probabilities
980
ax4_agent = QLearningAgent(n_arms, learning_rate=0.1, temperature=2.0)
981
bandit_final = MultiArmedBandit(n_arms, true_values)
982

983
for trial in range(n_trials):
984
    action = ax4_agent.select_action()
985
    reward = bandit_final.pull_arm(action)
986
    ax4_agent.update(action, reward)
987

988
x_pos = np.arange(n_arms)
989
width = 0.35
990

991
bars1 = ax4.bar(x_pos - width/2, true_values, width, label='True values', color='steelblue', edgecolor='black')
992
bars2 = ax4.bar(x_pos + width/2, ax4_agent.Q, width, label='Learned Q-values', color='coral', edgecolor='black')
993

994
# Add action selection probabilities as text
995
exp_Q = np.exp(ax4_agent.beta * ax4_agent.Q)
996
action_probs_final = exp_Q / np.sum(exp_Q)
997
for i, (bar, prob) in enumerate(zip(bars2, action_probs_final)):
998
    height = bar.get_height()
999
    ax4.text(bar.get_x() + bar.get_width()/2., height + 0.1,
1000
             f'P={prob:.2f}', ha='center', va='bottom', fontsize=9)
1001

1002
ax4.set_xlabel('Arm', fontsize=11)
1003
ax4.set_ylabel('Value', fontsize=11)
1004
ax4.set_title(f'Learned Values After {n_trials} Trials', fontsize=12, fontweight='bold')
1005
ax4.set_xticks(x_pos)
1006
ax4.set_xticklabels([f'Arm {i+1}' for i in range(n_arms)])
1007
ax4.legend(fontsize=10)
1008
ax4.grid(True, alpha=0.3, axis='y')
1009

1010
plt.tight_layout()
1011
plt.savefig('decision_making_rl.pdf', dpi=150, bbox_inches='tight')
1012
plt.close()
1013

1014
# Calculate key RL statistics
1015
final_Q = ax4_agent.Q
1016
Q_errors = np.abs(final_Q - true_values)
1017
mean_Q_error = np.mean(Q_errors)
1018
best_arm = np.argmax(true_values)
1019
learned_best = np.argmax(final_Q)
1020
correct_identification = (learned_best == best_arm)
1021
\end{pycode}
1022

1023
\begin{figure}[htbp]
1024
\centering
1025
\includegraphics[width=\textwidth]{decision_making_rl.pdf}
1026
\caption{Reinforcement learning in a four-armed bandit task with true arm values of
1027
[\py{', '.join([f"{v:.1f}" for v in true_values])}]. (a) Q-learning dynamics with $\alpha=0.1$
1028
and $\beta=\py{f"{temperature}"}$: Q-values converge toward true values through reward feedback,
1029
with learning speed depending on action selection frequency. Dotted lines show true values.
1030
(b) Learning rate effects on cumulative reward: faster learning ($\alpha=0.5$) initially
1031
outperforms but may be less stable; slower learning ($\alpha=0.01$) produces smoother but
1032
delayed convergence. Black dashed line shows optimal performance (always selecting best arm).
1033
(c) Temperature parameter controls exploration-exploitation balance: low temperature ($\beta=0.5$)
1034
produces high exploration and slow convergence to optimal arm; high temperature ($\beta=5.0$)
1035
rapidly exploits current best estimate but may get stuck on suboptimal choices. (d) Final learned
1036
Q-values after \py{n_trials} trials (orange) compared to true values (blue), with action selection
1037
probabilities shown above bars. Agent correctly identifies best arm (Arm 2) with Q-value
1038
\py{f"{final_Q[1]:.2f}"} vs. true \py{f"{true_values[1]:.2f}"}, mean absolute error = \py{f"{mean_Q_error:.2f}"}.}
1039
\label{fig:rl}
1040
\end{figure}
1041

1042
The reinforcement learning simulation demonstrates value-based decision making through Q-learning.
1043
After \py{n_trials} trials with $\alpha = 0.1$ and $\beta = \py{f"{temperature}"}$, the agent's
1044
learned Q-values have mean absolute error of \py{f"{mean_Q_error:.3f}"} relative to true values,
1045
and it correctly identifies the best arm (Arm \py{best_arm + 1}) with selection probability
1046
\py{f"{action_probs_final[best_arm]:.2f}"}. This illustrates how experience-based learning
1047
converges to near-optimal policies.
1048

1049
\section{Model Comparison and Integration}
1050

1051
The five computational frameworks presented here address different aspects of decision making:
1052

1053
\begin{itemize}
1054
\item \textbf{Expected Utility Theory} provides the normative benchmark for rational choice under risk
1055
\item \textbf{Prospect Theory} captures systematic deviations including loss aversion and probability distortion
1056
\item \textbf{Drift-Diffusion Models} explain response time distributions and speed-accuracy tradeoffs
1057
\item \textbf{Bayesian Decision Making} prescribes optimal belief updating under uncertainty
1058
\item \textbf{Reinforcement Learning} accounts for value learning through experience
1059
\end{itemize}
1060

1061
\begin{remark}[Integration of Frameworks]
1062
Recent research integrates these approaches: drift-diffusion can be derived from Bayesian sequential
1063
analysis; reinforcement learning models can incorporate prospect theory's asymmetric value function;
1064
neural implementations suggest the brain approximates Bayesian inference through sampling mechanisms.
1065
\end{remark}
1066

1067
\section{Conclusions}
1068

1069
This computational analysis reveals the mechanistic underpinnings of human decision making:
1070

1071
\begin{enumerate}
1072
\item Expected utility with $\alpha = \py{f"{alpha_moderate}"}$ predicts risk premium of
1073
\$\py{f"{abs(rp_moderate):.2f}"} for a 50-50 gamble of $\pm$\$50
1074
\item Prospect theory with $\lambda = \py{f"{lambda_tk}"}$ captures loss aversion, with losses
1075
weighted \py{f"{loss_aversion_ratio:.2f}"} times gains
1076
\item Drift-diffusion with $\mu = \py{f"{drift_key}"}$ and $a = \py{f"{boundary_key}"}$ produces
1077
mean RT = \py{f"{mean_rt_key:.2f}"} s and accuracy = \py{f"{accuracy_key:.1%}"}
1078
\item Bayesian updating with sensitivity = \py{f"{sensitivity}"} and specificity = \py{f"{specificity}"}
1079
achieves posterior confidence of \py{f"{final_belief:.3f}"} after \py{n_tests} tests
1080
\item Q-learning with $\alpha = 0.1$ converges to within \py{f"{mean_Q_error:.3f}"} of true values
1081
after \py{n_trials} trials
1082
\end{enumerate}
1083

1084
These computational models provide quantitative frameworks for understanding both optimal decision
1085
strategies and systematic deviations from rationality observed in human behavior.
1086

1087
\section*{References}
1088

1089
\begin{itemize}
1090
\item von Neumann, J., \& Morgenstern, O. (1944). \textit{Theory of Games and Economic Behavior}. Princeton University Press.
1091
\item Kahneman, D., \& Tversky, A. (1979). Prospect theory: An analysis of decision under risk. \textit{Econometrica}, 47(2), 263-291.
1092
\item Tversky, A., \& Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. \textit{Journal of Risk and Uncertainty}, 5(4), 297-323.
1093
\item Ratcliff, R., \& McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. \textit{Neural Computation}, 20(4), 873-922.
1094
\item Ratcliff, R., Smith, P. L., Brown, S. D., \& McKoon, G. (2016). Diffusion decision model: Current issues and history. \textit{Trends in Cognitive Sciences}, 20(4), 260-281.
1095
\item Bogacz, R., Brown, E., Moehlis, J., Holmes, P., \& Cohen, J. D. (2006). The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. \textit{Psychological Review}, 113(4), 700-765.
1096
\item Edwards, W., Lindman, H., \& Savage, L. J. (1963). Bayesian statistical inference for psychological research. \textit{Psychological Review}, 70(3), 193-242.
1097
\item Sutton, R. S., \& Barto, A. G. (2018). \textit{Reinforcement Learning: An Introduction} (2nd ed.). MIT Press.
1098
\item Daw, N. D., O'Doherty, J. P., Dayan, P., Seymour, B., \& Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. \textit{Nature}, 441(7095), 876-879.
1099
\item Dayan, P., \& Daw, N. D. (2008). Decision theory, reinforcement learning, and the brain. \textit{Cognitive, Affective, \& Behavioral Neuroscience}, 8(4), 429-453.
1100
\item Gold, J. I., \& Shadlen, M. N. (2007). The neural basis of decision making. \textit{Annual Review of Neuroscience}, 30, 535-574.
1101
\item Busemeyer, J. R., \& Diederich, A. (2010). \textit{Cognitive Modeling}. Sage Publications.
1102
\item Prelec, D. (1998). The probability weighting function. \textit{Econometrica}, 66(3), 497-527.
1103
\item Rescorla, R. A., \& Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black \& W. F. Prokasy (Eds.), \textit{Classical Conditioning II: Current Research and Theory} (pp. 64-99). Appleton-Century-Crofts.
1104
\item Glimcher, P. W. (2011). \textit{Foundations of Neuroeconomic Analysis}. Oxford University Press.
1105
\item Lee, M. D., \& Wagenmakers, E. J. (2013). \textit{Bayesian Cognitive Modeling: A Practical Course}. Cambridge University Press.
1106
\item Wald, A. (1947). \textit{Sequential Analysis}. John Wiley \& Sons.
1107
\item Tversky, A., \& Fox, C. R. (1995). Weighing risk and uncertainty. \textit{Psychological Review}, 102(2), 269-283.
1108
\item Forstmann, B. U., Ratcliff, R., \& Wagenmakers, E. J. (2016). Sequential sampling models in cognitive neuroscience: Advantages, applications, and extensions. \textit{Annual Review of Psychology}, 67, 641-666.
1109
\item Niv, Y. (2009). Reinforcement learning in the brain. \textit{Journal of Mathematical Psychology}, 53(3), 139-154.
1110
\end{itemize}
1111

1112
\end{document}
1113

1114
Product

Resources

Company