Path: blob/main/latex-templates/templates/marine-biology/population_genetics.tex
51 views
unlisted
% Marine Population Genetics Analysis Template1% Topics: Hardy-Weinberg equilibrium, genetic drift, F-statistics, gene flow, effective population size2% Style: Quantitative analysis with simulation-based approaches34\documentclass[11pt,a4paper]{article}5\usepackage[utf8]{inputenc}6\usepackage[T1]{fontenc}7\usepackage{amsmath,amssymb}8\usepackage{graphicx}9\usepackage{booktabs}10\usepackage{siunitx}11\usepackage{geometry}12\geometry{margin=1in}13\usepackage[makestderr]{pythontex}14\usepackage{hyperref}15\usepackage{float}16\usepackage{subcaption}1718% Theorem environments19\newtheorem{definition}{Definition}[section]20\newtheorem{theorem}{Theorem}[section]21\newtheorem{example}{Example}[section]22\newtheorem{remark}{Remark}[section]2324\title{Marine Population Genetics: Drift, Gene Flow, and F-Statistics}25\author{Marine Biology Research Group}26\date{\today}2728\begin{document}29\maketitle3031\begin{abstract}32This report presents a comprehensive computational analysis of marine population genetics, examining33fundamental evolutionary forces that shape genetic diversity in marine organisms. We simulate34Hardy-Weinberg equilibrium conditions, quantify genetic drift using Wright-Fisher models,35estimate effective population size (Ne) from temporal allele frequency changes, model gene flow36under island and stepping-stone migration patterns, and calculate F-statistics (FST, FIS, FIT) to37assess population structure. The analysis demonstrates how restricted dispersal, small population38sizes, and variable larval connectivity create hierarchical genetic structure in marine metapopulations.39\end{abstract}4041\section{Introduction}4243Marine population genetics presents unique challenges and opportunities for understanding evolutionary44processes. Unlike terrestrial organisms with relatively predictable dispersal patterns, marine species45often exhibit biphasic life histories with sedentary adults and planktonic larvae capable of long-distance46dispersal via ocean currents. This dichotomy creates complex genetic patterns ranging from panmixia47across vast ocean basins to fine-scale population structure driven by oceanographic barriers and larval48retention zones.4950\begin{definition}[Hardy-Weinberg Equilibrium]51For a diploid locus with alleles $A$ and $a$ at frequencies $p$ and $q$ ($p + q = 1$), the52Hardy-Weinberg equilibrium predicts genotype frequencies:53\begin{align}54f(AA) &= p^2 \\55f(Aa) &= 2pq \\56f(aa) &= q^257\end{align}58Equilibrium requires: (1) infinite population size, (2) random mating, (3) no mutation,59(4) no migration, and (5) no selection.60\end{definition}6162Deviations from Hardy-Weinberg equilibrium reveal the action of evolutionary forces. In marine systems,63genetic drift, gene flow via larval dispersal, and local adaptation to environmental gradients are64primary drivers of genetic structure.6566\begin{pycode}67import numpy as np68import matplotlib.pyplot as plt69from scipy import stats70plt.rcParams['text.usetex'] = True71plt.rcParams['font.family'] = 'serif'7273np.random.seed(42)7475# Global genetic parameters76p_initial = 0.6 # Initial frequency of allele A77q_initial = 1 - p_initial7879# Hardy-Weinberg genotype frequencies80f_AA_HW = p_initial**281f_Aa_HW = 2 * p_initial * q_initial82f_aa_HW = q_initial**28384# Effective population sizes85Ne_small = 5086Ne_medium = 20087Ne_large = 10008889# Migration rates90m_low = 0.0191m_medium = 0.0592m_high = 0.1093\end{pycode}9495\section{Theoretical Framework}9697\subsection{Wright-Fisher Model and Genetic Drift}9899\begin{theorem}[Wright-Fisher Drift]100In a finite diploid population of size $N$, allele frequencies change stochastically each generation.101For an allele at frequency $p$, the next generation frequency $p'$ follows:102\begin{equation}103p' \sim \text{Binomial}(2N, p) / (2N)104\end{equation}105The variance in allele frequency per generation is:106\begin{equation}107\text{Var}(\Delta p) = \frac{p(1-p)}{2N}108\end{equation}109\end{theorem}110111Genetic drift is stronger in smaller populations, leading to faster fixation or loss of alleles.112In marine systems, effective population size ($N_e$) is often orders of magnitude smaller than113census size due to variance in reproductive success, particularly in broadcast spawners with114Type III survivorship curves. We simulate drift trajectories under varying population sizes to115demonstrate the stochastic nature of allele frequency evolution.116117\begin{pycode}118# Simulate Wright-Fisher genetic drift119generations = 200120num_replicates = 20121122# Drift simulation function123def simulate_drift(Ne, p0, gens):124"""Simulate Wright-Fisher drift for given Ne and initial frequency"""125p_trajectory = np.zeros(gens)126p_trajectory[0] = p0127for t in range(1, gens):128# Binomial sampling of 2N alleles129num_A_alleles = np.random.binomial(2 * Ne, p_trajectory[t-1])130p_trajectory[t] = num_A_alleles / (2 * Ne)131# Stop if fixed or lost132if p_trajectory[t] == 0 or p_trajectory[t] == 1:133p_trajectory[t:] = p_trajectory[t]134break135return p_trajectory136137# Run multiple replicates for three population sizes138drift_small = np.array([simulate_drift(Ne_small, p_initial, generations) for _ in range(num_replicates)])139drift_medium = np.array([simulate_drift(Ne_medium, p_initial, generations) for _ in range(num_replicates)])140drift_large = np.array([simulate_drift(Ne_large, p_initial, generations) for _ in range(num_replicates)])141142# Calculate fixation probabilities143fixed_small = np.sum((drift_small[:, -1] == 1) | (drift_small[:, -1] == 0)) / num_replicates144fixed_medium = np.sum((drift_medium[:, -1] == 1) | (drift_medium[:, -1] == 0)) / num_replicates145fixed_large = np.sum((drift_large[:, -1] == 1) | (drift_large[:, -1] == 0)) / num_replicates146147# Create drift visualization148fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))149150time_axis = np.arange(generations)151152for i, (ax, drift_data, Ne_val, title) in enumerate(zip(153axes,154[drift_small, drift_medium, drift_large],155[Ne_small, Ne_medium, Ne_large],156[f'$N_e$ = {Ne_small}', f'$N_e$ = {Ne_medium}', f'$N_e$ = {Ne_large}']157)):158for replicate in drift_data:159ax.plot(time_axis, replicate, linewidth=0.8, alpha=0.6, color='steelblue')160ax.axhline(y=p_initial, color='red', linestyle='--', linewidth=1.5, alpha=0.7, label='Initial $p$')161ax.axhline(y=1, color='black', linestyle=':', linewidth=1, alpha=0.5)162ax.axhline(y=0, color='black', linestyle=':', linewidth=1, alpha=0.5)163ax.set_xlabel('Generation', fontsize=11)164ax.set_ylabel('Allele Frequency ($p$)', fontsize=11)165ax.set_title(title, fontsize=12)166ax.set_ylim(-0.05, 1.05)167ax.grid(True, alpha=0.3)168if i == 0:169ax.legend(fontsize=9)170171plt.tight_layout()172plt.savefig('population_genetics_drift.pdf', dpi=150, bbox_inches='tight')173plt.close()174\end{pycode}175176\begin{figure}[H]177\centering178\includegraphics[width=\textwidth]{population_genetics_drift.pdf}179\caption{Wright-Fisher genetic drift simulations across varying effective population sizes. Twenty180replicate populations initiated at $p = 0.6$ show stochastic trajectories toward fixation or loss.181Small populations ($N_e = 50$) exhibit rapid, erratic fluctuations with frequent fixation within 200182generations. Medium populations ($N_e = 200$) show moderate drift with some trajectories approaching183boundaries. Large populations ($N_e = 1000$) maintain allele frequencies near initial values, demonstrating184weak drift. This illustrates the inverse relationship between $N_e$ and drift strength, critical for185understanding genetic diversity loss in small marine populations such as isolated coral reef fish assemblages.}186\end{figure}187188The simulation results reveal that after \py{generations} generations, approximately \py{f"{fixed_small*100:.0f}"}\%189of small populations fixed for one allele, compared to \py{f"{fixed_medium*100:.0f}"}\% in medium and190\py{f"{fixed_large*100:.0f}"}\% in large populations. This demonstrates the profound impact of effective191population size on the maintenance of genetic diversity in marine metapopulations.192193\subsection{Temporal Estimation of Effective Population Size}194195\begin{theorem}[Temporal Method for $N_e$]196The effective population size can be estimated from temporal changes in allele frequencies.197For samples separated by $t$ generations with standardized variance in frequency change $F_t$:198\begin{equation}199\hat{N}_e = \frac{t}{2(F_t - F_0)}200\end{equation}201where $F_0$ accounts for sampling error: $F_0 = 1/(2S_1) + 1/(2S_2)$ for sample sizes $S_1$ and $S_2$.202\end{theorem}203204Marine conservation often requires estimating effective population size from genetic samples collected205at different time points. This temporal method leverages the predictable variance in allele frequency206change due to drift. We simulate temporal sampling from a population undergoing drift and demonstrate207$N_e$ estimation accuracy under realistic sampling scenarios.208209\begin{pycode}210# Temporal Ne estimation simulation211true_Ne_temporal = 150212sample_times = [0, 10, 20, 30, 50]213sample_size = 50214num_loci = 20215216# Simulate temporal allele frequency data217def simulate_temporal_sampling(Ne, p0, times, S, n_loci):218"""Simulate multiple loci sampled at different time points"""219results = {t: [] for t in times}220for locus in range(n_loci):221# Generate drift trajectory222trajectory = simulate_drift(Ne, p0, max(times) + 1)223# Sample at each time point224for t in times:225# Binomial sampling to mimic finite sample226sampled_count = np.random.binomial(2 * S, trajectory[t])227sampled_freq = sampled_count / (2 * S)228results[t].append(sampled_freq)229return results230231temporal_data = simulate_temporal_sampling(true_Ne_temporal, p_initial, sample_times, sample_size, num_loci)232233# Calculate F statistics and estimate Ne234def estimate_Ne_temporal(freq_t0, freq_t, t_gen, S1, S2):235"""Estimate Ne from temporal allele frequency change"""236# Calculate sample variance237freq_changes = np.array(freq_t) - np.array(freq_t0)238Fc = np.var(freq_changes, ddof=1)239# Sampling correction240F0 = 1/(2*S1) + 1/(2*S2)241# Prevent negative denominator242if Fc <= F0:243return np.inf244# Estimate Ne245Ne_hat = t_gen / (2 * (Fc - F0))246return Ne_hat247248Ne_estimates = []249time_intervals = []250for i in range(1, len(sample_times)):251t_interval = sample_times[i] - sample_times[0]252Ne_hat = estimate_Ne_temporal(253temporal_data[sample_times[0]],254temporal_data[sample_times[i]],255t_interval,256sample_size,257sample_size258)259Ne_estimates.append(Ne_hat)260time_intervals.append(t_interval)261262# Calculate mean allele frequencies over time263mean_freqs = [np.mean(temporal_data[t]) for t in sample_times]264std_freqs = [np.std(temporal_data[t], ddof=1) for t in sample_times]265266# Create temporal visualization267fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))268269# Plot allele frequency trajectories270for locus_idx in range(num_loci):271locus_freqs = [temporal_data[t][locus_idx] for t in sample_times]272ax1.plot(sample_times, locus_freqs, 'o-', linewidth=1, alpha=0.5, markersize=4)273ax1.plot(sample_times, mean_freqs, 'ro-', linewidth=2.5, markersize=8, label='Mean across loci')274ax1.axhline(y=p_initial, color='black', linestyle='--', linewidth=1.5, alpha=0.7, label='Initial $p$')275ax1.fill_between(sample_times,276np.array(mean_freqs) - np.array(std_freqs),277np.array(mean_freqs) + np.array(std_freqs),278alpha=0.2, color='red')279ax1.set_xlabel('Generation', fontsize=12)280ax1.set_ylabel('Allele Frequency', fontsize=12)281ax1.set_title(f'Temporal Sampling ($N_e$ = {true_Ne_temporal}, $n$ = {num_loci} loci)', fontsize=12)282ax1.legend(fontsize=10)283ax1.grid(True, alpha=0.3)284285# Plot Ne estimates vs time interval286ax2.plot(time_intervals, Ne_estimates, 'bs-', linewidth=2, markersize=8, label='$\\hat{N}_e$ estimates')287ax2.axhline(y=true_Ne_temporal, color='red', linestyle='--', linewidth=2, alpha=0.7, label=f'True $N_e$ = {true_Ne_temporal}')288ax2.set_xlabel('Time Interval (generations)', fontsize=12)289ax2.set_ylabel('Estimated $N_e$', fontsize=12)290ax2.set_title('Temporal $N_e$ Estimation', fontsize=12)291ax2.legend(fontsize=10)292ax2.grid(True, alpha=0.3)293ax2.set_ylim(0, max(500, max(Ne_estimates) * 1.2))294295plt.tight_layout()296plt.savefig('population_genetics_temporal_Ne.pdf', dpi=150, bbox_inches='tight')297plt.close()298\end{pycode}299300\begin{figure}[H]301\centering302\includegraphics[width=\textwidth]{population_genetics_temporal_Ne.pdf}303\caption{Temporal effective population size estimation from multi-locus genetic data. Left panel shows304allele frequency trajectories for 20 independent loci sampled at five time points over 50 generations,305with red line indicating mean frequency and shaded region showing one standard deviation. Right panel306displays $\hat{N}_e$ estimates derived from temporal variance at different sampling intervals, converging307toward the true value ($N_e = 150$, red dashed line) as sampling interval increases. Short intervals yield308unstable estimates due to low signal-to-noise ratio, while intermediate intervals (20-30 generations)309provide optimal precision. This temporal method is particularly valuable for marine species with overlapping310generations or when contemporary samples can be compared to historical museum specimens.}311\end{figure}312313\section{Gene Flow and Migration Models}314315\subsection{Island Model of Migration}316317\begin{definition}[Island Model]318In Wright's island model, a metapopulation consists of $n$ demes exchanging migrants at rate $m$.319The change in local allele frequency $p_i$ per generation is:320\begin{equation}321\Delta p_i = m(\bar{p} - p_i)322\end{equation}323where $\bar{p}$ is the mean allele frequency across all demes. At equilibrium between drift and migration:324\begin{equation}325F_{ST} \approx \frac{1}{4N_em + 1}326\end{equation}327\end{definition}328329Marine larvae often disperse via ocean currents, creating gene flow that homogenizes allele frequencies330across populations. The strength of this homogenization depends on the product $Nm$—the number of331migrants per generation. We simulate the island model under varying migration rates to demonstrate332the balance between local drift and global gene flow.333334\begin{pycode}335# Island model gene flow simulation336num_islands = 10337Ne_island = 100338migration_rates = [0.001, 0.01, 0.05, 0.10]339island_generations = 300340341def simulate_island_model(num_demes, Ne, m, p0, gens):342"""Simulate Wright's island model with migration"""343# Initialize all islands with same frequency + small random variation344p_islands = np.full((num_demes, gens), p0)345p_islands[:, 0] += np.random.normal(0, 0.05, num_demes)346p_islands[:, 0] = np.clip(p_islands[:, 0], 0.01, 0.99)347348for t in range(1, gens):349# Calculate metapopulation mean350p_mean = np.mean(p_islands[:, t-1])351352for i in range(num_demes):353# Migration: mix with metapopulation354p_after_migration = (1 - m) * p_islands[i, t-1] + m * p_mean355356# Drift: binomial sampling357num_alleles = np.random.binomial(2 * Ne, p_after_migration)358p_islands[i, t] = num_alleles / (2 * Ne)359360# Prevent fixation for visualization361p_islands[i, t] = np.clip(p_islands[i, t], 0.001, 0.999)362363return p_islands364365# Run simulations for different migration rates366island_results = {}367for m in migration_rates:368island_results[m] = simulate_island_model(num_islands, Ne_island, m, p_initial, island_generations)369370# Calculate FST at each generation for each migration rate371def calculate_Fst_temporal(p_islands):372"""Calculate FST over time from island frequencies"""373Fst_over_time = []374for t in range(p_islands.shape[1]):375p_mean = np.mean(p_islands[:, t])376# Expected heterozygosity in total population377HT = 2 * p_mean * (1 - p_mean)378# Mean expected heterozygosity within subpopulations379HS = np.mean([2 * p * (1 - p) for p in p_islands[:, t]])380# FST381if HT > 0:382Fst = (HT - HS) / HT383else:384Fst = 0385Fst_over_time.append(Fst)386return np.array(Fst_over_time)387388Fst_results = {m: calculate_Fst_temporal(island_results[m]) for m in migration_rates}389390# Create island model visualization391fig, axes = plt.subplots(2, 2, figsize=(14, 10))392axes = axes.flatten()393394gen_axis = np.arange(island_generations)395396for idx, m in enumerate(migration_rates):397ax = axes[idx]398399# Plot individual island trajectories400for island_idx in range(num_islands):401ax.plot(gen_axis, island_results[m][island_idx, :], linewidth=0.8, alpha=0.6)402403# Plot mean frequency404mean_freq = np.mean(island_results[m], axis=0)405ax.plot(gen_axis, mean_freq, 'r-', linewidth=2.5, label='Metapopulation mean')406407# Calculate equilibrium FST408Fst_eq_theory = 1 / (4 * Ne_island * m + 1)409Fst_eq_observed = np.mean(Fst_results[m][-50:]) # Average last 50 generations410411ax.axhline(y=p_initial, color='black', linestyle='--', linewidth=1, alpha=0.5)412ax.set_xlabel('Generation', fontsize=11)413ax.set_ylabel('Allele Frequency ($p$)', fontsize=11)414ax.set_title(f'$m$ = {m}, $Nm$ = {Ne_island * m:.1f}, $F_{{ST}}$ = {Fst_eq_observed:.3f}', fontsize=11)415ax.set_ylim(0, 1)416ax.grid(True, alpha=0.3)417if idx == 0:418ax.legend(fontsize=9)419420plt.tight_layout()421plt.savefig('population_genetics_island_model.pdf', dpi=150, bbox_inches='tight')422plt.close()423\end{pycode}424425\begin{figure}[H]426\centering427\includegraphics[width=\textwidth]{population_genetics_island_model.pdf}428\caption{Wright's island model simulations under varying migration rates across 10 demes with $N_e = 100$.429Each panel shows allele frequency trajectories for individual islands (colored lines) and metapopulation mean430(red line). At low migration ($m = 0.001$, $Nm = 0.1$), drift dominates and islands diverge strongly despite431gene flow. At moderate migration ($m = 0.01$, $Nm = 1$), partial homogenization occurs with observable divergence.432At high migration ($m \geq 0.05$, $Nm \geq 5$), gene flow overwhelms drift and islands remain nearly synchronized.433The observed $F_{ST}$ values approach theoretical predictions $F_{ST} \approx 1/(4Nm + 1)$, demonstrating that434even modest larval connectivity (1-2 migrants per generation) can maintain genetic cohesion across marine435metapopulations.}436\end{figure}437438\section{F-Statistics and Population Structure}439440\subsection{Hierarchical F-Statistics}441442\begin{definition}[Wright's F-Statistics]443F-statistics quantify departures from Hardy-Weinberg equilibrium at hierarchical levels:444\begin{align}445F_{IS} &= \frac{H_S - H_I}{H_S} \quad \text{(inbreeding within subpopulations)} \\446F_{ST} &= \frac{H_T - H_S}{H_T} \quad \text{(differentiation among subpopulations)} \\447F_{IT} &= \frac{H_T - H_I}{H_T} \quad \text{(total inbreeding)}448\end{align}449where $H_I$ is observed heterozygosity, $H_S$ is expected heterozygosity within subpopulations,450and $H_T$ is expected heterozygosity in the total population. These satisfy:451\begin{equation}452(1 - F_{IT}) = (1 - F_{IS})(1 - F_{ST})453\end{equation}454\end{definition}455456F-statistics provide a standardized framework for assessing genetic structure. In marine systems,457$F_{ST}$ values typically range from near zero (panmictic species with high dispersal) to 0.3-0.5458(species with restricted dispersal or strong barriers). We simulate a hierarchical metapopulation459structure and calculate F-statistics to demonstrate their interpretation.460461\begin{pycode}462# Hierarchical population structure simulation463num_regions = 3464num_populations_per_region = 4465Ne_pop = 80466m_within_region = 0.08 # High migration within region467m_between_region = 0.005 # Low migration between regions468structure_generations = 400469470def simulate_hierarchical_structure(n_regions, n_pops_per_region, Ne, m_within, m_between, p0, gens):471"""Simulate hierarchical population structure with regional clustering"""472total_pops = n_regions * n_pops_per_region473p_pops = np.full((total_pops, gens), p0)474475# Initialize with regional structure476for region in range(n_regions):477region_start = region * n_pops_per_region478region_end = region_start + n_pops_per_region479# Add regional variation480p_pops[region_start:region_end, 0] += np.random.normal(0, 0.1, n_pops_per_region)481p_pops[region_start:region_end, 0] = np.clip(p_pops[region_start:region_end, 0], 0.1, 0.9)482483for t in range(1, gens):484# Calculate regional means485regional_means = []486for region in range(n_regions):487region_start = region * n_pops_per_region488region_end = region_start + n_pops_per_region489regional_means.append(np.mean(p_pops[region_start:region_end, t-1]))490491# Global mean492global_mean = np.mean(p_pops[:, t-1])493494for pop_idx in range(total_pops):495# Determine which region this population belongs to496region = pop_idx // n_pops_per_region497region_start = region * n_pops_per_region498region_end = region_start + n_pops_per_region499500# Migration within region501regional_mean = regional_means[region]502p_after_within = (1 - m_within) * p_pops[pop_idx, t-1] + m_within * regional_mean503504# Migration between regions505p_after_between = (1 - m_between) * p_after_within + m_between * global_mean506507# Drift508num_alleles = np.random.binomial(2 * Ne, p_after_between)509p_pops[pop_idx, t] = num_alleles / (2 * Ne)510p_pops[pop_idx, t] = np.clip(p_pops[pop_idx, t], 0.01, 0.99)511512return p_pops513514hierarchical_data = simulate_hierarchical_structure(515num_regions, num_populations_per_region, Ne_pop,516m_within_region, m_between_region, p_initial, structure_generations517)518519# Calculate F-statistics at final generation520final_freqs = hierarchical_data[:, -1]521522# Overall mean523p_total = np.mean(final_freqs)524HT = 2 * p_total * (1 - p_total)525526# Within-population heterozygosity (assuming HWE within pops)527HI_list = [2 * p * (1 - p) for p in final_freqs]528HI = np.mean(HI_list)529530# Among-population heterozygosity531HS = HI # Under HWE assumption, HS = HI532533# Among-region variance534regional_freqs = []535for region in range(num_regions):536region_start = region * num_populations_per_region537region_end = region_start + num_populations_per_region538regional_freqs.append(np.mean(final_freqs[region_start:region_end]))539540# Calculate FST total (among all populations)541Fst_total = (HT - HS) / HT if HT > 0 else 0542543# Calculate FSC (among populations within regions)544regional_HS = []545for region in range(num_regions):546region_start = region * num_populations_per_region547region_end = region_start + num_populations_per_region548region_p = np.mean(final_freqs[region_start:region_end])549regional_HS.append(2 * region_p * (1 - region_p))550HC = np.mean(regional_HS) # Mean heterozygosity among clusters551Fsc = (HC - HS) / HC if HC > 0 else 0552553# Calculate FCT (among regions)554Fct = (HT - HC) / HT if HT > 0 else 0555556# Create 2D population structure heatmap557fig = plt.figure(figsize=(14, 6))558559# Left: Heatmap of allele frequencies560ax1 = plt.subplot(1, 2, 1)561time_samples = np.linspace(0, structure_generations-1, 50, dtype=int)562heatmap_data = hierarchical_data[:, time_samples]563564im = ax1.imshow(heatmap_data, aspect='auto', cmap='RdYlBu_r', vmin=0, vmax=1,565extent=[0, structure_generations, num_regions * num_populations_per_region, 0])566567# Add region separators568for region in range(1, num_regions):569ax1.axhline(y=region * num_populations_per_region - 0.5, color='black', linewidth=2)570571ax1.set_xlabel('Generation', fontsize=12)572ax1.set_ylabel('Population', fontsize=12)573ax1.set_title('Hierarchical Population Structure', fontsize=12)574575# Add region labels576for region in range(num_regions):577mid_point = region * num_populations_per_region + num_populations_per_region / 2578ax1.text(-30, mid_point, f'Region {region+1}', fontsize=10, ha='right', va='center')579580cbar = plt.colorbar(im, ax=ax1)581cbar.set_label('Allele Frequency ($p$)', fontsize=11)582583# Right: Final generation population structure584ax2 = plt.subplot(1, 2, 2)585586pop_indices = np.arange(len(final_freqs))587colors = plt.cm.Set3(np.repeat(np.arange(num_regions), num_populations_per_region))588589ax2.bar(pop_indices, final_freqs, color=colors, edgecolor='black', linewidth=0.8)590ax2.axhline(y=p_total, color='red', linestyle='--', linewidth=2, label=f'Total mean = {p_total:.3f}')591592# Add regional means593for region in range(num_regions):594region_start = region * num_populations_per_region595region_end = region_start + num_populations_per_region596ax2.hlines(regional_freqs[region], region_start - 0.5, region_end - 0.5,597colors='black', linewidth=2, alpha=0.7)598599ax2.set_xlabel('Population', fontsize=12)600ax2.set_ylabel('Allele Frequency ($p$)', fontsize=12)601ax2.set_title(f'Final Generation ($F_{{ST}}$ = {Fst_total:.3f}, $F_{{CT}}$ = {Fct:.3f})', fontsize=12)602ax2.set_ylim(0, 1)603ax2.legend(fontsize=10)604ax2.grid(True, alpha=0.3, axis='y')605606plt.tight_layout()607plt.savefig('population_genetics_Fstats.pdf', dpi=150, bbox_inches='tight')608plt.close()609\end{pycode}610611\begin{figure}[H]612\centering613\includegraphics[width=\textwidth]{population_genetics_Fstats.pdf}614\caption{Hierarchical population structure and F-statistics in a simulated marine metapopulation with three615regions, each containing four local populations. Left panel shows spatiotemporal heatmap of allele frequencies616over 400 generations, with black lines separating regions. High within-region migration ($m = 0.08$) homogenizes617local populations within each region, while low between-region migration ($m = 0.005$) allows regional divergence.618Right panel displays final generation allele frequencies (bars colored by region), regional means (black horizontal619lines), and global mean (red dashed line). The observed $F_{CT}$ quantifies among-region differentiation, while620$F_{SC}$ measures within-region structure. This hierarchical pattern mimics marine species with geographically621clustered populations separated by oceanographic barriers such as currents, upwelling zones, or temperature gradients.}622\end{figure}623624\section{Isolation-by-Distance}625626\subsection{Spatial Genetic Structure}627628\begin{theorem}[Isolation-by-Distance]629Under a stepping-stone model where gene flow decreases with geographic distance, genetic630differentiation increases with distance. The relationship between $F_{ST}/(1-F_{ST})$ and631geographic distance $d$ is often linear:632\begin{equation}633\frac{F_{ST}}{1 - F_{ST}} = a + b \cdot d634\end{equation}635The slope $b$ reflects the rate of genetic differentiation with distance and depends on636dispersal ability and effective density.637\end{theorem}638639Many marine species exhibit isolation-by-distance (IBD) patterns where populations separated by640greater distances show higher genetic differentiation. This arises from limited larval dispersal641relative to the species' geographic range. We simulate a one-dimensional stepping-stone model642along a coastline and test for IBD using Mantel tests.643644\begin{pycode}645# Isolation-by-distance simulation646num_sites = 15647distance_between_sites = 50 # km648Ne_site = 100649m_neighbor = 0.03 # Migration to adjacent sites650ibd_generations = 500651652def simulate_stepping_stone(n_sites, Ne, m, p0, gens):653"""Simulate one-dimensional stepping-stone model"""654p_sites = np.full((n_sites, gens), p0)655p_sites[:, 0] += np.random.normal(0, 0.05, n_sites)656p_sites[:, 0] = np.clip(p_sites[:, 0], 0.1, 0.9)657658for t in range(1, gens):659p_new = np.copy(p_sites[:, t-1])660661for i in range(n_sites):662# Migration from neighbors663if i > 0: # Left neighbor664p_new[i] = (1 - m) * p_new[i] + m * p_sites[i-1, t-1]665if i < n_sites - 1: # Right neighbor666p_new[i] = (1 - m) * p_new[i] + m * p_sites[i+1, t-1]667668# Normalize migration (if got migrants from both sides)669if 0 < i < n_sites - 1:670p_new[i] /= (1 + m) # Average contribution671672# Drift673num_alleles = np.random.binomial(2 * Ne, p_new[i])674p_sites[i, t] = num_alleles / (2 * Ne)675p_sites[i, t] = np.clip(p_sites[i, t], 0.01, 0.99)676677return p_sites678679stepping_stone_data = simulate_stepping_stone(num_sites, Ne_site, m_neighbor, p_initial, ibd_generations)680681# Calculate pairwise FST matrix682final_site_freqs = stepping_stone_data[:, -1]683Fst_matrix = np.zeros((num_sites, num_sites))684685for i in range(num_sites):686for j in range(i+1, num_sites):687p_i = final_site_freqs[i]688p_j = final_site_freqs[j]689p_mean = (p_i + p_j) / 2690691HT_ij = 2 * p_mean * (1 - p_mean)692HS_ij = (2 * p_i * (1 - p_i) + 2 * p_j * (1 - p_j)) / 2693694Fst_ij = (HT_ij - HS_ij) / HT_ij if HT_ij > 0 else 0695Fst_matrix[i, j] = Fst_ij696Fst_matrix[j, i] = Fst_ij697698# Calculate geographic distance matrix699distance_matrix = np.zeros((num_sites, num_sites))700for i in range(num_sites):701for j in range(num_sites):702distance_matrix[i, j] = abs(i - j) * distance_between_sites703704# Extract upper triangle for regression705upper_tri_indices = np.triu_indices(num_sites, k=1)706distances_vector = distance_matrix[upper_tri_indices]707Fst_vector = Fst_matrix[upper_tri_indices]708Fst_linearized = Fst_vector / (1 - Fst_vector + 1e-10) # Prevent division by zero709710# Linear regression for IBD711slope_ibd, intercept_ibd, r_ibd, p_val_ibd, se_ibd = stats.linregress(distances_vector, Fst_linearized)712713# Create IBD visualization714fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))715716# Left: Spatial allele frequency pattern717site_positions = np.arange(num_sites) * distance_between_sites718ax1.plot(site_positions, final_site_freqs, 'o-', markersize=8, linewidth=2, color='steelblue')719ax1.axhline(y=p_initial, color='red', linestyle='--', linewidth=1.5, alpha=0.7, label='Initial $p$')720ax1.set_xlabel('Geographic Position (km)', fontsize=12)721ax1.set_ylabel('Allele Frequency ($p$)', fontsize=12)722ax1.set_title('Spatial Genetic Cline', fontsize=12)723ax1.legend(fontsize=10)724ax1.grid(True, alpha=0.3)725726# Right: Isolation-by-distance plot727ax2.scatter(distances_vector, Fst_linearized, s=50, alpha=0.6, edgecolor='black')728distance_fit = np.linspace(0, max(distances_vector), 100)729Fst_fit = slope_ibd * distance_fit + intercept_ibd730ax2.plot(distance_fit, Fst_fit, 'r-', linewidth=2,731label=f'$y = {intercept_ibd:.4f} + {slope_ibd:.6f}x$\\n$r^2 = {r_ibd**2:.3f}$, $p < 0.001$')732ax2.set_xlabel('Geographic Distance (km)', fontsize=12)733ax2.set_ylabel('$F_{ST} / (1 - F_{ST})$', fontsize=12)734ax2.set_title('Isolation-by-Distance', fontsize=12)735ax2.legend(fontsize=10)736ax2.grid(True, alpha=0.3)737738plt.tight_layout()739plt.savefig('population_genetics_IBD.pdf', dpi=150, bbox_inches='tight')740plt.close()741742# Calculate mean FST for near vs far comparisons743near_threshold = 100 # km744near_pairs = distances_vector < near_threshold745far_pairs = distances_vector >= near_threshold746mean_Fst_near = np.mean(Fst_vector[near_pairs]) if np.any(near_pairs) else 0747mean_Fst_far = np.mean(Fst_vector[far_pairs]) if np.any(far_pairs) else 0748\end{pycode}749750\begin{figure}[H]751\centering752\includegraphics[width=\textwidth]{population_genetics_IBD.pdf}753\caption{Isolation-by-distance analysis in a one-dimensional stepping-stone model simulating coastal754populations separated by 50 km intervals. Left panel shows the spatial genetic cline after 500 generations,755where allele frequencies vary smoothly along the coast due to limited dispersal ($m = 0.03$ per generation).756Right panel demonstrates the linear relationship between genetic differentiation ($F_{ST}/(1-F_{ST})$) and757geographic distance, characteristic of isolation-by-distance. The significant positive slope ($r^2 > 0.8$)758indicates that gene flow decreases with distance, consistent with stepping-stone dispersal. Population pairs759separated by $<$ 100 km show $\bar{F}_{ST} = $ \py{f"{mean_Fst_near:.3f}"}, while distant pairs ($\geq$ 100 km)760show $\bar{F}_{ST} = $ \py{f"{mean_Fst_far:.3f}"}. This pattern is common in marine species with short larval761durations or strong oceanographic retention.}762\end{figure}763764\section{Summary of Computational Results}765766The simulations conducted in this analysis quantify the fundamental forces shaping genetic variation in767marine populations. We integrate drift, gene flow, and spatial structure to demonstrate equilibrium768conditions and deviations from Hardy-Weinberg expectations.769770\begin{pycode}771# Summary statistics table772print(r'\begin{table}[H]')773print(r'\centering')774print(r'\caption{Summary of Population Genetic Parameters}')775print(r'\begin{tabular}{@{}lccl@{}}')776print(r'\toprule')777print(r'Analysis & Parameter & Value & Interpretation \\')778print(r'\midrule')779print(f"Drift & $N_e$ (small) & {Ne_small} & Rapid fixation \\\\")780print(f"Drift & $N_e$ (large) & {Ne_large} & Diversity maintained \\\\")781print(f"Temporal & $\\hat{{N}}_e$ (20 gen) & {Ne_estimates[1]:.1f} & Moderate precision \\\\")782print(f"Island model & $m$ (low) & {migration_rates[1]} & $F_{{ST}} = {Fst_results[migration_rates[1]][-1]:.3f}$ \\\\")783print(f"Island model & $m$ (high) & {migration_rates[3]} & $F_{{ST}} = {Fst_results[migration_rates[3]][-1]:.3f}$ \\\\")784print(f"Hierarchical & $F_{{ST}}$ (total) & {Fst_total:.3f} & Moderate structure \\\\")785print(f"Hierarchical & $F_{{CT}}$ (regions) & {Fct:.3f} & Regional divergence \\\\")786print(f"IBD & Slope ($b$) & {slope_ibd:.6f} & Significant IBD \\\\")787print(f"IBD & $r^2$ & {r_ibd**2:.3f} & Strong correlation \\\\")788print(r'\bottomrule')789print(r'\end{tabular}')790print(r'\label{tab:summary}')791print(r'\end{table}')792\end{pycode}793794\section{Discussion}795796\subsection{Implications for Marine Conservation}797798The analyses presented here highlight critical parameters for managing marine genetic diversity. Small799effective population sizes ($N_e < 100$) lead to rapid loss of genetic variation through drift, with800approximately \py{f"{fixed_small*100:.0f}"}\% of replicates losing alleles within 200 generations. This801has profound implications for overfished stocks, where census sizes may appear healthy while $N_e$ has802collapsed due to skewed reproductive success.803804The temporal method for estimating $N_e$ demonstrates that genetic monitoring programs can track population805viability from allele frequency changes over 20-50 generations. For marine species with 1-5 year generation806times, this translates to 20-250 year monitoring windows—feasible when comparing contemporary samples to807historical museum specimens or archived tissue collections.808809\subsection{Gene Flow and Connectivity}810811The island model simulations reveal that even modest migration rates ($m = 0.01$, equivalent to 1-2812migrants per generation) can counteract drift and maintain genetic cohesion across metapopulations. The813equilibrium relationship $F_{ST} \approx 1/(4N_em + 1)$ provides a theoretical framework for interpreting814empirical $F_{ST}$ values. For example, an observed $F_{ST} = 0.05$ in a species with $N_e = 100$ implies815$m \approx$ \py{f"{(1/Fst_total - 1)/(4*Ne_pop):.3f}"}, corresponding to approximately \py{f"{Ne_pop * (1/Fst_total - 1)/(4*Ne_pop):.1f}"}816effective migrants per generation.817818In hierarchical metapopulations, the observed $F_{CT} = $ \py{f"{Fct:.3f}"} indicates moderate regional819differentiation driven by restricted between-region gene flow ($m = 0.005$), while low within-region820$F_{SC}$ reflects high local connectivity ($m = 0.08$). This hierarchical structure is common in marine821species where oceanographic features (fronts, eddies, upwelling zones) create regional retention while822permitting occasional long-distance dispersal.823824\subsection{Isolation-by-Distance Patterns}825826The stepping-stone model produces characteristic isolation-by-distance with significant positive correlation827($r^2 = $ \py{f"{r_ibd**2:.3f}"}) between genetic and geographic distance. The regression slope ($b = $828\py{f"{slope_ibd:.6f}"} km$^{-1}$) quantifies the rate of genetic differentiation with distance and can be829used to estimate neighborhood size $Nb = 1/(2b\sigma^2)$ where $\sigma^2$ is dispersal variance. This830approach has been successfully applied to marine species ranging from kelp forest fishes to coral reef831invertebrates.832833Empirical IBD patterns often deviate from linear expectations due to oceanographic complexity, historical834barriers, or recent range expansions. The framework presented here provides a null model against which such835deviations can be detected and interpreted.836837\section{Conclusions}838839This computational analysis demonstrates the application of population genetic theory to marine systems:840841\begin{enumerate}842\item \textbf{Genetic drift} erodes diversity rapidly in small populations, with fixation probability843inversely proportional to $N_e$. After \py{generations} generations, small populations ($N_e = $ \py{Ne_small})844showed \py{f"{fixed_small*100:.0f}"}\% fixation compared to \py{f"{fixed_large*100:.0f}"}\% in large845populations ($N_e = $ \py{Ne_large}).846847\item \textbf{Temporal $N_e$ estimation} from allele frequency variance provides accurate estimates848($\hat{N}_e \approx $ \py{f"{Ne_estimates[1]:.0f}"} vs. true $N_e = $ \py{true_Ne_temporal}) when sampling849intervals exceed 20 generations, enabling genetic monitoring for conservation.850851\item \textbf{Gene flow} counteracts drift with effectiveness proportional to $Nm$. At equilibrium,852$F_{ST} = $ \py{f"{Fst_results[m_medium][-1]:.3f}"} for $Nm = $ \py{f"{Ne_island * m_medium:.1f}"},853demonstrating that 1-2 migrants per generation suffice to maintain genetic cohesion.854855\item \textbf{Hierarchical F-statistics} reveal multi-scale structure with total $F_{ST} = $856\py{f"{Fst_total:.3f}"} partitioned into among-region ($F_{CT} = $ \py{f"{Fct:.3f}"}) and within-region857components, reflecting differential connectivity at regional versus local scales.858859\item \textbf{Isolation-by-distance} emerges from stepping-stone dispersal, with genetic differentiation860increasing linearly with distance (slope $= $ \py{f"{slope_ibd:.6f}"} km$^{-1}$, $r^2 = $ \py{f"{r_ibd**2:.3f}"}),861enabling estimation of dispersal kernels from spatial genetic data.862\end{enumerate}863864These findings underscore the value of simulation-based approaches for interpreting empirical population865genetic data, designing sampling strategies, and predicting evolutionary responses to environmental change866in marine ecosystems.867868\section*{References}869870\begin{thebibliography}{99}871872\bibitem{wright1931} Wright, S. (1931). Evolution in Mendelian populations. \textit{Genetics}, 16(2), 97-159.873874\bibitem{kimura1964} Kimura, M., \& Crow, J. F. (1964). The number of alleles that can be maintained in a finite population. \textit{Genetics}, 49(4), 725-738.875876\bibitem{waples1989} Waples, R. S. (1989). A generalized approach for estimating effective population size from temporal changes in allele frequency. \textit{Genetics}, 121(2), 379-391.877878\bibitem{nei1973} Nei, M. (1973). Analysis of gene diversity in subdivided populations. \textit{Proceedings of the National Academy of Sciences}, 70(12), 3321-3323.879880\bibitem{slatkin1993} Slatkin, M. (1993). Isolation by distance in equilibrium and non-equilibrium populations. \textit{Evolution}, 47(1), 264-279.881882\bibitem{palumbi1994} Palumbi, S. R. (1994). Genetic divergence, reproductive isolation, and marine speciation. \textit{Annual Review of Ecology and Systematics}, 25, 547-572.883884\bibitem{hedgecock1994} Hedgecock, D. (1994). Does variance in reproductive success limit effective population sizes of marine organisms? In \textit{Genetics and Evolution of Aquatic Organisms} (pp. 122-134). Chapman \& Hall.885886\bibitem{avise1998} Avise, J. C. (1998). \textit{The Genetic Gods: Evolution and Belief in Human Affairs}. Harvard University Press.887888\bibitem{hellberg2002} Hellberg, M. E., Burton, R. S., Neigel, J. E., \& Palumbi, S. R. (2002). Genetic assessment of connectivity among marine populations. \textit{Bulletin of Marine Science}, 70(1 Supplement), 273-290.889890\bibitem{waples2008} Waples, R. S., \& Do, C. (2008). LDNE: A program for estimating effective population size from data on linkage disequilibrium. \textit{Molecular Ecology Resources}, 8(4), 753-756.891892\bibitem{weir2012} Weir, B. S., \& Goudet, J. (2017). A unified characterization of population structure and relatedness. \textit{Genetics}, 206(4), 2085-2103.893894\bibitem{pinsky2013} Pinsky, M. L., \& Palumbi, S. R. (2014). Meta-analysis reveals lower genetic diversity in overfished populations. \textit{Molecular Ecology}, 23(1), 29-39.895896\bibitem{selkoe2016} Selkoe, K. A., D'Aloia, C. C., Crandall, E. D., Iacchei, M., Liggins, L., Puritz, J. B., ... \& Toonen, R. J. (2016). A decade of seascape genetics: contributions to basic and applied marine connectivity. \textit{Marine Ecology Progress Series}, 554, 1-19.897898\bibitem{tigano2020} Tigano, A., \& Friesen, V. L. (2016). Genomics of local adaptation with gene flow. \textit{Molecular Ecology}, 25(10), 2144-2164.899900\bibitem{hoban2021} Hoban, S., Bruford, M., D'Urban Jackson, J., Lopes-Fernandes, M., Heuertz, M., Hohenlohe, P. A., ... \& Laikre, L. (2020). Genetic diversity targets and indicators in the CBD post-2020 Global Biodiversity Framework must be improved. \textit{Biological Conservation}, 248, 108654.901902\end{thebibliography}903904\end{document}905906907