CoCalc -- Gamblers Ruin.ipynb

IPython notebook Gamblers Ruin.ipynb

¹⁹⁶ views

Kernel: Unknown Kernel

Last time: Classify the states into:

Recurrent states - You visit infinitely many times (almost surely). In other words, you always return.
Transient states - You visit finitely many times (alomst surely) In other words, eventually you do not return

Group recurrent states into recurrence classes (alternatively communication classes) - maximal sets of recurrent states that communicate with each other

A Markov chain has at least one recurrence classes (maybe more).

An irreducible Markov chain has exactly one recurrence class and no transient states.

It's often convenient to relabel states so that states in the same recurrence class are numbered consecutively and transient state have the highest numbers. This permutes rows and columns of $P$ and gives the nice block forms on p.20 (a block for each recurrence class) and p.27 (ONE block for ALL recurrence classes). That second block form is $P = \left[ \begin{array}{c|c} R & 0 \\ \hline S & T \end{array} \right]$

note: Lawler uses $Q$ for the bottom right. This conflicts with a different use of $Q$ in section 1.2. So, I'll use $T$ instead.

The main idea is that, in the long term, the transient state don't matter and each recurrence class operates as its own, independent Markov chain.

So, we prove theorems about irreducible Markov chains and then apply them to the recurrence classes of more complicated chains.

A recurrence class may be

aperiodic - In the long term, it converges to its own stationary distribution $\vec{\pi}$
periodic with period $p\geq 2$ - In the long term, it oscillates among $p$ distributions.

Different Recurrences classes may do differnet things in the long-term.

Fact to prove: Lemma 1: If $P$ is a stochastic matrix, then $\lambda=1$ is always an eigenvalue and $\vec{1}= \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1\end{bmatrix}$ is an associated right eigenvector.

pf - done

Lemma 2: If $P$ is a stochastic matrix, then all eigenvalues have modulus less than or equal to one.

pf - done

Lemma 3: If $P$ is an irreducible stochastic matrix, then $\vec{1}$ is the only right eigenvector for $\lambda=1$ (up to scalar multiple). Since right and left kernels have the same dimension for square matrices, this means that there is only one only left eigenvector for $\lambda=1$ (up to scalar multiple). pf - in class quickly today

Periodic behavior Let $J_i$ be the set of all possible first return times to $i$ $= \{r : p_r(i,i)>0\}$

The period of a state $i$ is the greatest common divisior of $J_i$ .

Lemma 4: All states in the same recurrence class have the same period.

Facts (parts of the Perron-Frobenius Theorem or applications of linear algebra)

If periodic with period $p \geq 2$ , then there are eigenvalue of modulus 1 that are not $1$ . They are complex numbers $z$ such that $z^p = 1$ (roots of unity). In the long term, the chain oscillates through associated left eigenvectors.
If aperiodic, there is a long term stationary distribution $\vec{\pi}$ ; entry $i$ is the reciprocal of the expected first return time to state $i$ .

First return time to $i$ is denoted $T_i$ . It is a random variable. If lots of us all start at $i$ , we'll return at different times. The average/mean/expected first return time is $E(T_i) = \sum_{r=1}^{\infty} r P(T_i=r)$

Recall: $P = \left[ \begin{array}{c|c} R & 0 \\ \hline S & T \end{array} \right]$ Say you start at transient state $i$ . You may jump to other transient states for a while. But you will eventually reach a recurrent state and never leave its recurrence class (we say you are absorbed).

Theorem

The probability that you are absorbed at recurrent state $j$ is the $(i,j)$ entry of $MS$
On average, the number of jumps you will make before absorption equals the sum of row $i$ in $M$ .

where

$M$ tells us where and when.

In [9]:

from __future__ import division, print_function
from numpy import *
set_printoptions(suppress=True,precision=3)

def normalize(P):
    """
    Accepts a single row vector or a matrix.  Normalizes so rows sum to 1.
    """
    Q = array(P)               #Convert to numpy array (if necessary)
    Q = Q.T                    #Transpose
    sums = array([sum(Q,0)])   #Compute columns sums.
    sums[abs(sums)<0.001] = 1  #Set column sum to 1 for columns of all zeros.
    Q = Q / sums               #Divide each column by its sum.
    Q = Q.T                    #Tranpose again to undo original transpose.
    return Q

def left_eigs(P,printout=False):
    """
    Computes all eigenvalues and the associated LEFT eigenvectors.
    """
    esys = eig(P.T) #eig is a numpy command that return all eigenvalues and right eigenvectors.  
    #We use P.T (the transpose) so that we get left eigenvectors
    evals = esys[0]
    evecs = normalize(esys[1].T)
    perm = argsort(abs(evals)) #sorts evals smallest to largest modulus
    perm = perm[::-1] #sort largest to smallest
    evals = evals[perm] #applies the sorting to evals    
    evecs = evecs[perm] #applies the sorting to evecs
    if printout == True:
        print("\nAll eigenvalues the associated LEFT eigenvectors.")
        for eval,evec in zip(evals,evecs):
            print("mod=%-6.3f eval=%-16s evec=%s" % (abs(eval),array([eval]),evec)) #We use array([eval])
            #rather than just eval so the numpy print options apply.  Else, it prints ugly.    
    limit_evals = copy(evals)
    limit_evals[abs(limit_evals)<0.999]=0 #eigenvalues of modulus less than 1 go to zero over time
    evals = matrix(diag(evals)) #Puts the evals on the diagonal.  Matrix denoted D in Lawler p.15
    limit_evals = matrix(diag(limit_evals)) #Just the ones that matter in long term
    evecs = matrix(evecs) #Matrix denote Q^-1 in Lawler p.15
    return evecs, evals, limit_evals


def add_headers(A,row_headers=[],column_headers=[]):
    """
    Adds row and column headers to an array for display purposes.
    """
    if len(row_headers)>0 and len(column_headers)>0:
        row_headers = insert(row_headers,0,[-1])
    if len(column_headers)>0:
        A = insert(A,0,column_headers,axis=0)
    if len(row_headers)>0:
        A = A.T    
        A = insert(A,0,row_headers,axis=0)
        A = A.T
    return array(A)

def permuate_matrix(A,perm):
    """
    Permutes rows and columns to accomplish a relabeling of the states
    """
    A = A[perm]
    A = A.T
    A = A[perm]
    A = A.T
    return A


def simulate_gamblers_ruin_simple(prob_win,prob_draw,prob_lose,break_house,start_bankroll,numPlays=1000,numPlayers=10):
    """
    A simple simulation algorithm only for the gambler's ruin problem.  A general (but slower) algorithm is presented above.
    """
    bankroll = zeros([numPlays+1,numPlayers],dtype='int')
    bankroll[0] = ones(numPlayers)*start_bankroll
    
    for play in arange(numPlays):
        for player in arange(numPlayers):
            if bankroll[play,player] == 0 or bankroll[play,player] == break_house:
                bankroll[play+1,player] = bankroll[play,player]
            else:
                r = random.random(1)
                if 0 <= r < prob_win:
                    bankroll[play+1,player] = bankroll[play,player] + 1
                elif prob_win <= r < prob_win+prob_draw:
                    bankroll[play+1,player] = bankroll[play,player]
                elif prob_win+prob_draw <= r < 1:
                    bankroll[play+1,player] = bankroll[play,player] - 1
    numStates = break_house+1
    dist = compute_dist(bankroll,numStates)
    return bankroll,dist


def simulate(P,start_state=0,numSteps=100,numTrials=10):
    """
    A general algorithm to simulate finite state, discrete time Markov chains.
    """
    numStates = shape(P)[0]
    states = arange(numStates)

    history = zeros([numSteps+1,numTrials],dtype='int')
    history[0] = ones(numTrials)*start_state
    for step in arange(numSteps):
        for trial in arange(numTrials):
            current_state = history[step,trial]
            r = random.random(1)
            cutoff = 0.
            for next_state in states:
                cutoff += P[current_state,next_state]                
                if r < cutoff:
                    history[step+1,trial] = next_state
                    break
    dist = compute_dist(history,numStates)
    return history,dist


def compute_dist(history,numState):
    """
    Computes step-by-step distributions from simulation data.
    """
    dist = [bincount(step,minlength=numStates) for step in history]
    dist = normalize(array(dist))
    return dist
    

def plot_dist(dist,print_steps):
    """
    Plots histograms of evolution of distribution.
    """
    numStates = shape(P)[0]
    states = arange(numStates)

    for step in print_steps:
        print("After %s steps" % step)
        plt.bar(states,dist[step],align='center')        
        plt.ylim([0,1])
        plt.xticks(states)
        plt.show()


def absorption(history,recurrent_states):
    """
    Computes when simulation runs were absorbed into a recurrent class.
    """
    alive = history    
    for state in recurrent_states:
        alive[alive==state] = 0
    
    alive[alive>0]=1
    propAlive = sum(alive[-1,:]) / len(alive[-1,:])
    jumps = sum(alive,axis=0)
    print("Proportion not yet absorbed = %.2f" % propAlive)
    print("Mean number of jumps = %.2f" % mean(jumps))
    return jumps,propAlive

In [3]:

def long_term_from_eigs(P,initial_dist=False):
    """
    Computes long term behavior using eigenvalues and eigenvectors, as shown in Lawler 1.5
    """
    numStates = shape(P)[0]
    states = arange(numStates)

    if initial_dist == False:
        initial_dist = zeros(numStates)
        initial_dist[0] = 1
    elif isinstance(initial_dist,int):
        i = initial_dist
        initial_dist = zeros(numStates)
        initial_dist[i] = 1

    evecs, evals, limit_evals = left_eigs(P,printout=False)
    #Observe that these two are the same
    #print(P)
    #print(inv(evecs) * evals * evecs)

    angles = abs(angle(limit_evals))
    angles = angles[angles>0.001]
    if len(angles)>0: #Then there eigenvalues of modulus 1 other than 1.  So the chain is periodic
        xi = min(angles)
        period = 2*pi/xi
        period = int(round(period,0))    
        print("\nPeriodic chain with period %s" % period)

        #We'll print a period of the limiting P in limit_P.
        limit_P = [eye(numStates)]*(period) #Initialize list.  Eye(n) makes the nxn identity

        print("Initial distribution \n %s \n" % (initial_dist))
        print("\nWe'll print one period of the long-term P^r and the non-stationary steady-state oscillations.")
        for r in arange(period):
            limit_P[r] = inv(evecs) * limit_evals**(r+1) * evecs
            print("\nlimit_P_%s \n %s \n" % (r,limit_P[r]))            
            print("limit_pi_%s \n %s \n" % (r,initial_dist*limit_P[r]))
    else:
        period = 1         
        print("\nAperiodic chain")
        limit_P = inv(evecs) * limit_evals * evecs
        print("\nlimit_P \n %s" % limit_P)
        print("\nlimit_pi \n %s" % (initial_dist*limit_P))

        #We determine the recurrent classes and the transient states        
        new_states = copy(states)
        col_sums = (sum(limit_P,axis=0)).A1 #.A1 turns the "matrix" into a flat array.  Else next line errs
        transient_states = new_states[col_sums<0.001]
        transient_states.tolist() #We'll use a plain Python list rather than numpy arrays. It prints nicer and recurrence classes will be a list of lists with (possibly) different lengths.
        new_states[transient_states] = -1

        recurrent_classes = [] #list of lists - one list for each recurrence class.  May have different lengths.
        recurrent_states = [] #flat list of all recurrent states
        for x in new_states:
            if x == -1: #state already classified
                continue
            new_class = [y for y in new_states if y!=-1 and limit_P[x,y]>0]
            recurrent_classes.append(new_class) #append maintains list of list structure
            recurrent_states.extend(new_class) #extend maintains flat list structure
            new_states[new_class] = -1
        print("\nRecurrent classes are %s" % recurrent_classes)
        print("Transient states are %s" % transient_states)    

        perm = recurrent_states[:] #makes copy of flat list of recurrent states.  It is already ordered as desired
        perm.extend(transient_states) #adds transient states; it's now the perumtation that relabel states as on p.20 
        P_perm = permuate_matrix(P,perm)
        limit_P_perm = permuate_matrix(limit_P,perm)
        #print("\nPermuted P \n %s" % add_headers(P_perm,perm,perm))
        #print("\nPermuted limit_P \n %s" % add_headers(limit_P_perm,perm,perm))    
        numRec = len(recurrent_states)
        numTrans = len(transient_states)
        if numTrans>0:#If there are transient states, we compute probs ending up in various recurrent classes and expected times
            S = P_perm[numRec:,:numRec]
            T = P_perm[numRec:,numRec:] #Lawler calls this Q in section 1.5, but that conflicts with notation in section 1.2.  So I'll use T.
            I = eye(*shape(T))
            M = inv(I-T)
            #print("\nT = lower right %s by %s transient part \n(Lawler calls this Q in section 1.5, but that conflicts with notation in section 1.2.  So I'll use T.)\n %s" % (numTrans,numTrans,add_headers(T,transient_states,transient_states)))
            #print("\nM = inv(I-T) \n %s" % M)        
            #print("\nProbability of absorption at state j for chain starting at state i \n %s" % (add_headers(M*S,transient_states,recurrent_states)))            
            print("\nExpected steps before absorption for chain starting at state i \n %s" % (add_headers(sum(M,axis=1),transient_states)))

In [10]:

prob_win = 0.3 #prob win if you play
prob_draw = 0.1 #prob do not play
prob_lose = 1 - prob_win - prob_draw

numStates = 7 
start_state = 3
numSteps = 300
numTrials = 300

boundary_type = 2

#0 = absorbing: Never leave boundary once reached
#1 = semi-reflecting: Leave boundary only if jump toward interior (1 outcome)
#2 = fully-reflecting: Leave boundary if jump either direction (2 outcomes)

states = arange(numStates)
#Recurrence and transience are not automatically detected until the eigen-analysis is performed.  
#It is the users job to correctly specify them for use in simulations.
recurrent_states = states
if boundary_type == 0:
    recurrent_states = [0,numStates-1]
#recurrent_states = [a differnet list??]



P = matrix(identity(numStates))
for i in arange(1,numStates-1):
    P[i,i-1] = prob_lose
    P[i,i] = prob_draw
    P[i,i+1] = prob_win
    

    
    
    
if boundary_type == 0:
    print("Absorbing Boundary")
elif boundary_type == 1:
    P[0,0] = prob_lose + prob_draw
    P[0,1] = prob_win
    P[-1,-2] = prob_lose
    r=.1
    P[-1,-1] = prob_win + r #+r WHAT IS r SUPPOSED TO BE??????????????????????
    print("Semi-Reflecting Boundary")
elif boundary_type == 2:
    P[0,0] = prob_draw
    P[0,1] = prob_win + prob_lose
    P[-1,-2] = prob_win + prob_lose
    P[-1,-1] = prob_draw
    print("Fully-Reflecting Boundary")
    
    
print("Gambler's Ruin Transition Matrix:\n")
print(P,"\n\n")



    
    
"""
n=5
P=matrix([[1/2,1/2,0,0,0],
          [1/6,5/6,0,0,0],
          [0,0,3/4,1/4,0],
          [0,0,1/8,2/3,5/24],
          [0,0,0,1/6,5/6]
          ])
"""
P=matrix(P)

numStates = shape(P)[0]
states = arange(numStates)


if boundary_type == 0:
    print("\n\nSimulating with start state %s" % start_state)
    print("Using simple algorithm just gambler's ruin only.")
    break_house = numStates-1
    history, dist = simulate_gamblers_ruin_simple(prob_win,prob_draw,prob_lose,break_house,start_state,numPlays=numSteps,numPlayers=numTrials)
    print("Final distribution\n%s" % dist[-1])
    jumps,propAlive = absorption(history,recurrent_states)

print("\n\nSimulating with start state %s" % start_state)
print("Using general algorithm for any finite state, disrete time Markov chain.")
history, dist = simulate(P,start_state,numSteps,numTrials)
#plot_dist(dist,arange(0,numSteps+1,50))
print("Final distribution\n%s" % dist[-1])
jumps,propAlive = absorption(history,recurrent_states)

print("\n\nNow we'll do the analysis using eigenvalues presented in Lawler 1.5")
long_term_from_eigs(P,start_state)

Out[10]:

Fully-Reflecting Boundary
Gambler's Ruin Transition Matrix:

[[ 0.1  0.9  0.   0.   0.   0.   0. ]
 [ 0.6  0.1  0.3  0.   0.   0.   0. ]
 [ 0.   0.6  0.1  0.3  0.   0.   0. ]
 [ 0.   0.   0.6  0.1  0.3  0.   0. ]
 [ 0.   0.   0.   0.6  0.1  0.3  0. ]
 [ 0.   0.   0.   0.   0.6  0.1  0.3]
 [ 0.   0.   0.   0.   0.   0.9  0.1]] 




Simulating with start state 3
Using general algorithm for any finite state, disrete time Markov chain.
Final distribution
[ 0.277  0.407  0.133  0.117  0.037  0.023  0.007]
Proportion not yet absorbed = 0.00
Mean number of jumps = 0.00


Now we'll do the analysis using eigenvalues presented in Lawler 1.5

Aperiodic chain

limit_P 
 [[ 0.254  0.381  0.19   0.095  0.048  0.024  0.008]
 [ 0.254  0.381  0.19   0.095  0.048  0.024  0.008]
 [ 0.254  0.381  0.19   0.095  0.048  0.024  0.008]
 [ 0.254  0.381  0.19   0.095  0.048  0.024  0.008]
 [ 0.254  0.381  0.19   0.095  0.048  0.024  0.008]
 [ 0.254  0.381  0.19   0.095  0.048  0.024  0.008]
 [ 0.254  0.381  0.19   0.095  0.048  0.024  0.008]]

limit_pi 
 [[ 0.254  0.381  0.19   0.095  0.048  0.024  0.008]]

Recurrent classes are [[0, 1, 2, 3, 4, 5, 6]]
Transient states are []

In [5]:

N=numStates-1
j=start_state
print(j/N)
print(j*(N-j))

Out[5]:

0.5
9

In [6]:

Absorbing Boundary
Gambler's Ruin Transition Matrix:

[[ 1.   0.   0.   0.   0.   0.   0. ]
 [ 0.6  0.1  0.3  0.   0.   0.   0. ]
 [ 0.   0.6  0.1  0.3  0.   0.   0. ]
 [ 0.   0.   0.6  0.1  0.3  0.   0. ]
 [ 0.   0.   0.   0.6  0.1  0.3  0. ]
 [ 0.   0.   0.   0.   0.6  0.1  0.3]
 [ 0.   0.   0.   0.   0.   0.   1. ]] 

Semi-Reflecting Boundary
Gambler's Ruin Transition Matrix:

[[ 0.7  0.3  0.   0.   0.   0.   0. ]
#[ 0.6  0.1  0.3  0.   0.   0.   0. ]
#[ 0.   0.6  0.1  0.3  0.   0.   0. ]
#[ 0.   0.   0.6  0.1  0.3  0.   0. ]
#[ 0.   0.   0.   0.6  0.1  0.3  0. ]
#[ 0.   0.   0.   0.   0.6  0.1  0.3]
 [ 0.   0.   0.   0.   0.   0.6  0.4]] 

Fully-Reflecting Boundary
Gambler's Ruin Transition Matrix:

[[ 0.1  0.9  0.   0.   0.   0.   0. ]
#[ 0.6  0.1  0.3  0.   0.   0.   0. ]
#[ 0.   0.6  0.1  0.3  0.   0.   0. ]
#[ 0.   0.   0.6  0.1  0.3  0.   0. ]
#[ 0.   0.   0.   0.6  0.1  0.3  0. ]
#[ 0.   0.   0.   0.   0.6  0.1  0.3]
 [ 0.   0.   0.   0.   0.   0.9  0.1]]

Out[6]:

  File "<ipython-input-6-eec925a2ae8b>", line 1
    Absorbing Boundary
                     ^
SyntaxError: invalid syntax

In [ ]:

Product

Resources

Company