GitHub Repository: DataScienceUWL/DS775
Path: blob/main/Lessons/Lesson 07 - Global Optimization 2/Self_Assess_Soln_07.ipynb
⁸⁷¹ views

Kernel: Python 3 (system-wide)

In [3]:

# imports
import GAUtilities as ga
import numpy as np
import pandas as pd
from simanneal import Annealer
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("darkgrid")
from scipy.optimize import minimize
import random
import warnings
warnings.filterwarnings('ignore')
from time import time

Self-Assessment: Using Maximization with GA

To maximize, we need to negate the fitness function, because our genetic algorithm only minimizes. Negating the fitness function is simple. You simply add a negative sign before the return variable. Let's set up the same population used in the lesson and get the maximized fitness.

In [4]:

np.random.seed(10101010) #for everything, everything, everything... (Courtesy of the Violent Femmes)
pop_size = 6 # should be even due to the way we'll implement crossover
ind_size = 3 #this is the number of genes in each individual

#bounds are used for both real and integer problems. 
#For integer problems, the upper bound should be 1 over what you actually want
bounds = [1,7]

#each type of problem might use different types of populations. This one is a simple matrix of integers.
pop = np.random.randint(low=bounds[0], high=bounds[1], size =(pop_size,ind_size))

#this is our objective function for this particular problem. Each problem requires a different objective function.
def obj_sumDice(x):
    x = np.array(x) # force a numpy arrray here so that the math below works
    return -np.sum(x)

#compute the fitness by passing in the function and population
fitness = ga.computeFitness(obj_sumDice, pop)

fitness

Out[4]:

array([ -9., -13.,  -9., -15., -13.,  -7.])

Because our fitness function is so simple, and because we know that we're always using numpy arrays, we could also just pass numpy's sum function directly to our helper function like this, to get the minimization fitnesses.

In [5]:

#calling np.sum directly without our wrapper function
fitness = ga.computeFitness(np.sum, pop)
fitness

Out[5]:

array([ 9., 13.,  9., 15., 13.,  7.])

But, if you try to put a negative sign in front of np.sum, you'll get an error. Numpy, though, has it's own negation function. We could call it like this to turn np.sum into a maximization function.

In [6]:

#negating np.sum for a maximization problem.
fitness = np.negative(ga.computeFitness(np.sum, pop))
fitness

Out[6]:

array([ -9., -13.,  -9., -15., -13.,  -7.])

Self-Assessment: Exploring Tournament Selection

What happens for smaller tournament sizes? You should notice that there is more diversity in the selected population and more high value fitness values get selected. There are fewer repeats in the selected population.

For larger tournament sizes? There is less diversity in the selected population and mostly low value fitness values get selected. There are more repeats in the selected population.

For tournament size 1? This yields the most diverse population with fewest repeats.

For tournament size the same as the population size? The selected population contains only the individual with the lowest fitness value. This means crossover will have no effect since all the individuals are the same. Only the mutation operator will have an effect.

How does tournament size affect the exploration versus exploitation tradeoff?. Small tournament sizes encourage more exploration and less exploitation while larger tournament sizes have the opposite effect.

Self-Assessment: Crossover probability

What happens if cx_prob = 0? No mating occurs so there is no sharing of information between individuals. This would result in a population of parallel random local searches.
What happens if cx_prob=1? Every pair of individuals mates, this means that there is no chance that a very good solution survives more than one generation unless it happens to mate with a copy of itself.

Self-Assessment: Mutation Parameters:

What is the effect of mut_prob = 1? Every individual is mutated.
What is the effect of mut_prob = 0? No individuals are mutated so the genetic algorithm uses only mating to improve the population.
What is the effect of increasing ind_prob? Larger values mean more changes in the individual.
What would happen if you made sigma really large? The mutations could result in very large steps which could make the search behave erratically. Mutated individuals might have very little in common with their parents. Large exploration and small exploitation.
What would happen if you made sigma really small? The steps would be very small so the search remains very local. Small exploration and large exploitation.

Self-Assessment: Genetic Algorithm for the Value Balancing Problem

In [7]:

# problem data and fitness function - DO NOT CHANGE
np.random.seed(123)
tot_num_items = 1000 # should be divisible by 4
num_items = int(tot_num_items / 4)
num_groups = 4
values = np.random.randint(10,100,size=num_items)
values = np.hstack([values,values,values,values])
np.random.seed()

def group_fitness(groups,values,num_groups):
    # groups must be a numpy array with ind_size entries (only one individual)
    sums = np.array([ sum( values[ groups == i] ) for i in range(num_groups) ])
    return max(sums)-min(sums)

In [9]:

## Solution Code for Genetic 

# genetic algorithm parameters
pop_size = 1000 # should be even due to the way we'll implement crossover
ind_size = tot_num_items # determines number of input variables for each individual
tourn_size = 5 # tournament size for selection
cx_prob = 0.8 # probability a pair of parents crossover to produce two children
mut_prob = 0.1 # probability an individual mutates
ind_prob = 0.01 # probability each variable in an individual mutates
num_iter = 100 # number of genetic algorithm mutations
update_iter = 10 # how often to display output
num_elite = 10 # number of elite individuals to save each generation
fitness_fun = group_fitness # change this to the fitness function for your problem

#np.random.seed(121)  # set seed here for reproducibility (you can change this number to explore)

time_start = time()  # record start time to compute time elapsed at the end

#initialize population and fitness
pop = np.random.randint(low=0, high=num_groups, size = (pop_size,ind_size))

#note how we're passing named parameters into the computeFitness function
fitness = ga.computeFitness(fitness_fun, pop, values = values, num_groups = num_groups)

# initialize stats and output
stats, best_fitness, best_x = ga.initStats(fitness, pop, num_iter)

#This is where the guts of the algorithm start
for iter in range(num_iter):
    #sort the population
    #pop = ga.sortPop(pop, fitness)
    
    # tournament selection
    selected_pop = ga.tournamentSelection(pop, tourn_size).astype(int)
    
    # one-point crossover (mating)
    cx_pop = ga.onePointCrossover(selected_pop, cx_prob).astype(int)

    # uniform int mutation
    mut_pop = ga.uniformIntMutation(cx_pop, mut_prob, ind_prob, [0, num_groups]).astype(int) 

    # elitism
    pop = ga.addElitism(pop, mut_pop, num_elite)
    
    # evaluate fitness on new population
    fitness = ga.computeFitness(fitness_fun, pop, values = values, num_groups = num_groups )

    # collect stats and output to screen
    stats, best_fitness, best_x = ga.updateStats(stats, fitness,best_x, pop, iter, update_iter)

time_elapsed = time() - time_start
    
#####################
# Everything in the algorithm is done, and now we're just outputting the final result
#####################
print(f"The minimized maximum difference between value totals is {best_fitness:.0f}")
print(f"The total time elapsed is: {time_elapsed:0.2f}")

print(f"The total number of function evaluations is: {(num_iter+1)*pop_size:.0f}")

Out[9]:

Iteration | Best this iter |    Best ever
        1 |      2.460e+02 |    2.460e+02
       10 |      1.270e+02 |    5.700e+01
       20 |      2.110e+02 |    5.700e+01
       30 |      2.460e+02 |    5.700e+01
       40 |      7.700e+01 |    5.700e+01
       50 |      7.800e+01 |    5.700e+01
       60 |      1.770e+02 |    5.700e+01
       70 |      1.440e+02 |    5.700e+01
       80 |      1.340e+02 |    5.700e+01
       90 |      8.700e+01 |    4.000e+01
      100 |      1.500e+02 |    4.000e+01
The minimized maximum difference between value totals is 40
The total time elapsed is: 34.09
The total number of function evaluations is: 101000

In [0]:

Self-Assessment: Using Maximization with GA

Self-Assessment: Exploring Tournament Selection

Self-Assessment: Crossover probability

Self-Assessment: Mutation Parameters:

Self-Assessment: Genetic Algorithm for the Value Balancing Problem

Product

Resources

Company