Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

IPython notebook Gamblers Ruin.ipynb

196 views
Kernel: Unknown Kernel

Last time: Classify the states into:

  • Recurrent states - You visit infinitely many times (almost surely). In other words, you always return.

  • Transient states - You visit finitely many times (alomst surely) In other words, eventually you do not return

Group recurrent states into recurrence classes (alternatively communication classes) - maximal sets of recurrent states that communicate with each other

A Markov chain has at least one recurrence classes (maybe more).

An irreducible Markov chain has exactly one recurrence class and no transient states.

It's often convenient to relabel states so that states in the same recurrence class are numbered consecutively and transient state have the highest numbers. This permutes rows and columns of PP and gives the nice block forms on p.20 (a block for each recurrence class) and p.27 (ONE block for ALL recurrence classes). That second block form is P=[R0ST] P = \left[ \begin{array}{c|c} R & 0 \\ \hline S & T \end{array} \right]

note: Lawler uses QQ for the bottom right. This conflicts with a different use of QQ in section 1.2. So, I'll use TT instead.

The main idea is that, in the long term, the transient state don't matter and each recurrence class operates as its own, independent Markov chain.

So, we prove theorems about irreducible Markov chains and then apply them to the recurrence classes of more complicated chains.

A recurrence class may be

  • aperiodic - In the long term, it converges to its own stationary distribution π⃗\vec{\pi}

  • periodic with period p≥2p\geq 2 - In the long term, it oscillates among pp distributions.

Different Recurrences classes may do differnet things in the long-term.

Fact to prove: Lemma 1: If PP is a stochastic matrix, then λ=1\lambda=1 is always an eigenvalue and 1⃗=[11⋮1]\vec{1}= \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1\end{bmatrix} is an associated right eigenvector.

pf - done

Lemma 2: If PP is a stochastic matrix, then all eigenvalues have modulus less than or equal to one.

pf - done

Lemma 3: If PP is an irreducible stochastic matrix, then 1⃗\vec{1} is the only right eigenvector for λ=1\lambda=1 (up to scalar multiple). Since right and left kernels have the same dimension for square matrices, this means that there is only one only left eigenvector for λ=1\lambda=1 (up to scalar multiple). pf - in class quickly today

Periodic behavior Let JiJ_i be the set of all possible first return times to ii ={r:pr(i,i)>0}= \{r : p_r(i,i)>0\}

The period of a state ii is the greatest common divisior of JiJ_i.

Lemma 4: All states in the same recurrence class have the same period.

Facts (parts of the Perron-Frobenius Theorem or applications of linear algebra)

  • If periodic with period p≥2p \geq 2, then there are eigenvalue of modulus 1 that are not 11. They are complex numbers zz such that zp=1z^p = 1 (roots of unity). In the long term, the chain oscillates through associated left eigenvectors.

  • If aperiodic, there is a long term stationary distribution π⃗\vec{\pi}; entry ii is the reciprocal of the expected first return time to state ii.

First return time to ii is denoted TiT_i. It is a random variable. If lots of us all start at ii, we'll return at different times. The average/mean/expected first return time is E(Ti)=∑r=1∞rP(Ti=r)E(T_i) = \sum_{r=1}^{\infty} r P(T_i=r)

Recall: P=[R0ST] P = \left[ \begin{array}{c|c} R & 0 \\ \hline S & T \end{array} \right] Say you start at transient state ii. You may jump to other transient states for a while. But you will eventually reach a recurrent state and never leave its recurrence class (we say you are absorbed).

Theorem

  • The probability that you are absorbed at recurrent state jj is the (i,j)(i,j) entry of MSMS

  • On average, the number of jumps you will make before absorption equals the sum of row ii in MM.

where

MM tells us where and when.

from __future__ import division, print_function from numpy import * set_printoptions(suppress=True,precision=3) def normalize(P): """ Accepts a single row vector or a matrix. Normalizes so rows sum to 1. """ Q = array(P) #Convert to numpy array (if necessary) Q = Q.T #Transpose sums = array([sum(Q,0)]) #Compute columns sums. sums[abs(sums)<0.001] = 1 #Set column sum to 1 for columns of all zeros. Q = Q / sums #Divide each column by its sum. Q = Q.T #Tranpose again to undo original transpose. return Q def left_eigs(P,printout=False): """ Computes all eigenvalues and the associated LEFT eigenvectors. """ esys = eig(P.T) #eig is a numpy command that return all eigenvalues and right eigenvectors. #We use P.T (the transpose) so that we get left eigenvectors evals = esys[0] evecs = normalize(esys[1].T) perm = argsort(abs(evals)) #sorts evals smallest to largest modulus perm = perm[::-1] #sort largest to smallest evals = evals[perm] #applies the sorting to evals evecs = evecs[perm] #applies the sorting to evecs if printout == True: print("\nAll eigenvalues the associated LEFT eigenvectors.") for eval,evec in zip(evals,evecs): print("mod=%-6.3f eval=%-16s evec=%s" % (abs(eval),array([eval]),evec)) #We use array([eval]) #rather than just eval so the numpy print options apply. Else, it prints ugly. limit_evals = copy(evals) limit_evals[abs(limit_evals)<0.999]=0 #eigenvalues of modulus less than 1 go to zero over time evals = matrix(diag(evals)) #Puts the evals on the diagonal. Matrix denoted D in Lawler p.15 limit_evals = matrix(diag(limit_evals)) #Just the ones that matter in long term evecs = matrix(evecs) #Matrix denote Q^-1 in Lawler p.15 return evecs, evals, limit_evals def add_headers(A,row_headers=[],column_headers=[]): """ Adds row and column headers to an array for display purposes. """ if len(row_headers)>0 and len(column_headers)>0: row_headers = insert(row_headers,0,[-1]) if len(column_headers)>0: A = insert(A,0,column_headers,axis=0) if len(row_headers)>0: A = A.T A = insert(A,0,row_headers,axis=0) A = A.T return array(A) def permuate_matrix(A,perm): """ Permutes rows and columns to accomplish a relabeling of the states """ A = A[perm] A = A.T A = A[perm] A = A.T return A def simulate_gamblers_ruin_simple(prob_win,prob_draw,prob_lose,break_house,start_bankroll,numPlays=1000,numPlayers=10): """ A simple simulation algorithm only for the gambler's ruin problem. A general (but slower) algorithm is presented above. """ bankroll = zeros([numPlays+1,numPlayers],dtype='int') bankroll[0] = ones(numPlayers)*start_bankroll for play in arange(numPlays): for player in arange(numPlayers): if bankroll[play,player] == 0 or bankroll[play,player] == break_house: bankroll[play+1,player] = bankroll[play,player] else: r = random.random(1) if 0 <= r < prob_win: bankroll[play+1,player] = bankroll[play,player] + 1 elif prob_win <= r < prob_win+prob_draw: bankroll[play+1,player] = bankroll[play,player] elif prob_win+prob_draw <= r < 1: bankroll[play+1,player] = bankroll[play,player] - 1 numStates = break_house+1 dist = compute_dist(bankroll,numStates) return bankroll,dist def simulate(P,start_state=0,numSteps=100,numTrials=10): """ A general algorithm to simulate finite state, discrete time Markov chains. """ numStates = shape(P)[0] states = arange(numStates) history = zeros([numSteps+1,numTrials],dtype='int') history[0] = ones(numTrials)*start_state for step in arange(numSteps): for trial in arange(numTrials): current_state = history[step,trial] r = random.random(1) cutoff = 0. for next_state in states: cutoff += P[current_state,next_state] if r < cutoff: history[step+1,trial] = next_state break dist = compute_dist(history,numStates) return history,dist def compute_dist(history,numState): """ Computes step-by-step distributions from simulation data. """ dist = [bincount(step,minlength=numStates) for step in history] dist = normalize(array(dist)) return dist def plot_dist(dist,print_steps): """ Plots histograms of evolution of distribution. """ numStates = shape(P)[0] states = arange(numStates) for step in print_steps: print("After %s steps" % step) plt.bar(states,dist[step],align='center') plt.ylim([0,1]) plt.xticks(states) plt.show() def absorption(history,recurrent_states): """ Computes when simulation runs were absorbed into a recurrent class. """ alive = history for state in recurrent_states: alive[alive==state] = 0 alive[alive>0]=1 propAlive = sum(alive[-1,:]) / len(alive[-1,:]) jumps = sum(alive,axis=0) print("Proportion not yet absorbed = %.2f" % propAlive) print("Mean number of jumps = %.2f" % mean(jumps)) return jumps,propAlive
def long_term_from_eigs(P,initial_dist=False): """ Computes long term behavior using eigenvalues and eigenvectors, as shown in Lawler 1.5 """ numStates = shape(P)[0] states = arange(numStates) if initial_dist == False: initial_dist = zeros(numStates) initial_dist[0] = 1 elif isinstance(initial_dist,int): i = initial_dist initial_dist = zeros(numStates) initial_dist[i] = 1 evecs, evals, limit_evals = left_eigs(P,printout=False) #Observe that these two are the same #print(P) #print(inv(evecs) * evals * evecs) angles = abs(angle(limit_evals)) angles = angles[angles>0.001] if len(angles)>0: #Then there eigenvalues of modulus 1 other than 1. So the chain is periodic xi = min(angles) period = 2*pi/xi period = int(round(period,0)) print("\nPeriodic chain with period %s" % period) #We'll print a period of the limiting P in limit_P. limit_P = [eye(numStates)]*(period) #Initialize list. Eye(n) makes the nxn identity print("Initial distribution \n %s \n" % (initial_dist)) print("\nWe'll print one period of the long-term P^r and the non-stationary steady-state oscillations.") for r in arange(period): limit_P[r] = inv(evecs) * limit_evals**(r+1) * evecs print("\nlimit_P_%s \n %s \n" % (r,limit_P[r])) print("limit_pi_%s \n %s \n" % (r,initial_dist*limit_P[r])) else: period = 1 print("\nAperiodic chain") limit_P = inv(evecs) * limit_evals * evecs print("\nlimit_P \n %s" % limit_P) print("\nlimit_pi \n %s" % (initial_dist*limit_P)) #We determine the recurrent classes and the transient states new_states = copy(states) col_sums = (sum(limit_P,axis=0)).A1 #.A1 turns the "matrix" into a flat array. Else next line errs transient_states = new_states[col_sums<0.001] transient_states.tolist() #We'll use a plain Python list rather than numpy arrays. It prints nicer and recurrence classes will be a list of lists with (possibly) different lengths. new_states[transient_states] = -1 recurrent_classes = [] #list of lists - one list for each recurrence class. May have different lengths. recurrent_states = [] #flat list of all recurrent states for x in new_states: if x == -1: #state already classified continue new_class = [y for y in new_states if y!=-1 and limit_P[x,y]>0] recurrent_classes.append(new_class) #append maintains list of list structure recurrent_states.extend(new_class) #extend maintains flat list structure new_states[new_class] = -1 print("\nRecurrent classes are %s" % recurrent_classes) print("Transient states are %s" % transient_states) perm = recurrent_states[:] #makes copy of flat list of recurrent states. It is already ordered as desired perm.extend(transient_states) #adds transient states; it's now the perumtation that relabel states as on p.20 P_perm = permuate_matrix(P,perm) limit_P_perm = permuate_matrix(limit_P,perm) #print("\nPermuted P \n %s" % add_headers(P_perm,perm,perm)) #print("\nPermuted limit_P \n %s" % add_headers(limit_P_perm,perm,perm)) numRec = len(recurrent_states) numTrans = len(transient_states) if numTrans>0:#If there are transient states, we compute probs ending up in various recurrent classes and expected times S = P_perm[numRec:,:numRec] T = P_perm[numRec:,numRec:] #Lawler calls this Q in section 1.5, but that conflicts with notation in section 1.2. So I'll use T. I = eye(*shape(T)) M = inv(I-T) #print("\nT = lower right %s by %s transient part \n(Lawler calls this Q in section 1.5, but that conflicts with notation in section 1.2. So I'll use T.)\n %s" % (numTrans,numTrans,add_headers(T,transient_states,transient_states))) #print("\nM = inv(I-T) \n %s" % M) #print("\nProbability of absorption at state j for chain starting at state i \n %s" % (add_headers(M*S,transient_states,recurrent_states))) print("\nExpected steps before absorption for chain starting at state i \n %s" % (add_headers(sum(M,axis=1),transient_states)))
prob_win = 0.3 #prob win if you play prob_draw = 0.1 #prob do not play prob_lose = 1 - prob_win - prob_draw numStates = 7 start_state = 3 numSteps = 300 numTrials = 300 boundary_type = 2 #0 = absorbing: Never leave boundary once reached #1 = semi-reflecting: Leave boundary only if jump toward interior (1 outcome) #2 = fully-reflecting: Leave boundary if jump either direction (2 outcomes) states = arange(numStates) #Recurrence and transience are not automatically detected until the eigen-analysis is performed. #It is the users job to correctly specify them for use in simulations. recurrent_states = states if boundary_type == 0: recurrent_states = [0,numStates-1] #recurrent_states = [a differnet list??] P = matrix(identity(numStates)) for i in arange(1,numStates-1): P[i,i-1] = prob_lose P[i,i] = prob_draw P[i,i+1] = prob_win if boundary_type == 0: print("Absorbing Boundary") elif boundary_type == 1: P[0,0] = prob_lose + prob_draw P[0,1] = prob_win P[-1,-2] = prob_lose r=.1 P[-1,-1] = prob_win + r #+r WHAT IS r SUPPOSED TO BE?????????????????????? print("Semi-Reflecting Boundary") elif boundary_type == 2: P[0,0] = prob_draw P[0,1] = prob_win + prob_lose P[-1,-2] = prob_win + prob_lose P[-1,-1] = prob_draw print("Fully-Reflecting Boundary") print("Gambler's Ruin Transition Matrix:\n") print(P,"\n\n") """ n=5 P=matrix([[1/2,1/2,0,0,0], [1/6,5/6,0,0,0], [0,0,3/4,1/4,0], [0,0,1/8,2/3,5/24], [0,0,0,1/6,5/6] ]) """ P=matrix(P) numStates = shape(P)[0] states = arange(numStates) if boundary_type == 0: print("\n\nSimulating with start state %s" % start_state) print("Using simple algorithm just gambler's ruin only.") break_house = numStates-1 history, dist = simulate_gamblers_ruin_simple(prob_win,prob_draw,prob_lose,break_house,start_state,numPlays=numSteps,numPlayers=numTrials) print("Final distribution\n%s" % dist[-1]) jumps,propAlive = absorption(history,recurrent_states) print("\n\nSimulating with start state %s" % start_state) print("Using general algorithm for any finite state, disrete time Markov chain.") history, dist = simulate(P,start_state,numSteps,numTrials) #plot_dist(dist,arange(0,numSteps+1,50)) print("Final distribution\n%s" % dist[-1]) jumps,propAlive = absorption(history,recurrent_states) print("\n\nNow we'll do the analysis using eigenvalues presented in Lawler 1.5") long_term_from_eigs(P,start_state)
Fully-Reflecting Boundary Gambler's Ruin Transition Matrix: [[ 0.1 0.9 0. 0. 0. 0. 0. ] [ 0.6 0.1 0.3 0. 0. 0. 0. ] [ 0. 0.6 0.1 0.3 0. 0. 0. ] [ 0. 0. 0.6 0.1 0.3 0. 0. ] [ 0. 0. 0. 0.6 0.1 0.3 0. ] [ 0. 0. 0. 0. 0.6 0.1 0.3] [ 0. 0. 0. 0. 0. 0.9 0.1]] Simulating with start state 3 Using general algorithm for any finite state, disrete time Markov chain. Final distribution [ 0.277 0.407 0.133 0.117 0.037 0.023 0.007] Proportion not yet absorbed = 0.00 Mean number of jumps = 0.00 Now we'll do the analysis using eigenvalues presented in Lawler 1.5 Aperiodic chain limit_P [[ 0.254 0.381 0.19 0.095 0.048 0.024 0.008] [ 0.254 0.381 0.19 0.095 0.048 0.024 0.008] [ 0.254 0.381 0.19 0.095 0.048 0.024 0.008] [ 0.254 0.381 0.19 0.095 0.048 0.024 0.008] [ 0.254 0.381 0.19 0.095 0.048 0.024 0.008] [ 0.254 0.381 0.19 0.095 0.048 0.024 0.008] [ 0.254 0.381 0.19 0.095 0.048 0.024 0.008]] limit_pi [[ 0.254 0.381 0.19 0.095 0.048 0.024 0.008]] Recurrent classes are [[0, 1, 2, 3, 4, 5, 6]] Transient states are []
N=numStates-1 j=start_state print(j/N) print(j*(N-j))
0.5 9
Absorbing Boundary Gambler's Ruin Transition Matrix: [[ 1. 0. 0. 0. 0. 0. 0. ] [ 0.6 0.1 0.3 0. 0. 0. 0. ] [ 0. 0.6 0.1 0.3 0. 0. 0. ] [ 0. 0. 0.6 0.1 0.3 0. 0. ] [ 0. 0. 0. 0.6 0.1 0.3 0. ] [ 0. 0. 0. 0. 0.6 0.1 0.3] [ 0. 0. 0. 0. 0. 0. 1. ]] Semi-Reflecting Boundary Gambler's Ruin Transition Matrix: [[ 0.7 0.3 0. 0. 0. 0. 0. ] #[ 0.6 0.1 0.3 0. 0. 0. 0. ] #[ 0. 0.6 0.1 0.3 0. 0. 0. ] #[ 0. 0. 0.6 0.1 0.3 0. 0. ] #[ 0. 0. 0. 0.6 0.1 0.3 0. ] #[ 0. 0. 0. 0. 0.6 0.1 0.3] [ 0. 0. 0. 0. 0. 0.6 0.4]] Fully-Reflecting Boundary Gambler's Ruin Transition Matrix: [[ 0.1 0.9 0. 0. 0. 0. 0. ] #[ 0.6 0.1 0.3 0. 0. 0. 0. ] #[ 0. 0.6 0.1 0.3 0. 0. 0. ] #[ 0. 0. 0.6 0.1 0.3 0. 0. ] #[ 0. 0. 0. 0.6 0.1 0.3 0. ] #[ 0. 0. 0. 0. 0.6 0.1 0.3] [ 0. 0. 0. 0. 0. 0.9 0.1]]
File "<ipython-input-6-eec925a2ae8b>", line 1 Absorbing Boundary ^ SyntaxError: invalid syntax