Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
DataScienceUWL
GitHub Repository: DataScienceUWL/DS775
Path: blob/main/Lessons/Lesson 07 - Global Optimization 2/Self_Assess_Soln_07.ipynb
871 views
Kernel: Python 3 (system-wide)
# imports import GAUtilities as ga import numpy as np import pandas as pd from simanneal import Annealer import matplotlib.pyplot as plt import seaborn as sns sns.set_style("darkgrid") from scipy.optimize import minimize import random import warnings warnings.filterwarnings('ignore') from time import time

Self-Assessment: Using Maximization with GA

To maximize, we need to negate the fitness function, because our genetic algorithm only minimizes. Negating the fitness function is simple. You simply add a negative sign before the return variable. Let's set up the same population used in the lesson and get the maximized fitness.

np.random.seed(10101010) #for everything, everything, everything... (Courtesy of the Violent Femmes) pop_size = 6 # should be even due to the way we'll implement crossover ind_size = 3 #this is the number of genes in each individual #bounds are used for both real and integer problems. #For integer problems, the upper bound should be 1 over what you actually want bounds = [1,7] #each type of problem might use different types of populations. This one is a simple matrix of integers. pop = np.random.randint(low=bounds[0], high=bounds[1], size =(pop_size,ind_size)) #this is our objective function for this particular problem. Each problem requires a different objective function. def obj_sumDice(x): x = np.array(x) # force a numpy arrray here so that the math below works return -np.sum(x) #compute the fitness by passing in the function and population fitness = ga.computeFitness(obj_sumDice, pop) fitness
array([ -9., -13., -9., -15., -13., -7.])

Because our fitness function is so simple, and because we know that we're always using numpy arrays, we could also just pass numpy's sum function directly to our helper function like this, to get the minimization fitnesses.

#calling np.sum directly without our wrapper function fitness = ga.computeFitness(np.sum, pop) fitness
array([ 9., 13., 9., 15., 13., 7.])

But, if you try to put a negative sign in front of np.sum, you'll get an error. Numpy, though, has it's own negation function. We could call it like this to turn np.sum into a maximization function.

#negating np.sum for a maximization problem. fitness = np.negative(ga.computeFitness(np.sum, pop)) fitness
array([ -9., -13., -9., -15., -13., -7.])

Self-Assessment: Exploring Tournament Selection

What happens for smaller tournament sizes? You should notice that there is more diversity in the selected population and more high value fitness values get selected. There are fewer repeats in the selected population.

For larger tournament sizes? There is less diversity in the selected population and mostly low value fitness values get selected. There are more repeats in the selected population.

For tournament size 1? This yields the most diverse population with fewest repeats.

For tournament size the same as the population size? The selected population contains only the individual with the lowest fitness value. This means crossover will have no effect since all the individuals are the same. Only the mutation operator will have an effect.

How does tournament size affect the exploration versus exploitation tradeoff?. Small tournament sizes encourage more exploration and less exploitation while larger tournament sizes have the opposite effect.

Self-Assessment: Crossover probability

  • What happens if cx_prob = 0? No mating occurs so there is no sharing of information between individuals. This would result in a population of parallel random local searches.

  • What happens if cx_prob=1? Every pair of individuals mates, this means that there is no chance that a very good solution survives more than one generation unless it happens to mate with a copy of itself.

Self-Assessment: Mutation Parameters:

  • What is the effect of mut_prob = 1? Every individual is mutated.

  • What is the effect of mut_prob = 0? No individuals are mutated so the genetic algorithm uses only mating to improve the population.

  • What is the effect of increasing ind_prob? Larger values mean more changes in the individual.

  • What would happen if you made sigma really large? The mutations could result in very large steps which could make the search behave erratically. Mutated individuals might have very little in common with their parents. Large exploration and small exploitation.

  • What would happen if you made sigma really small? The steps would be very small so the search remains very local. Small exploration and large exploitation.

Self-Assessment: Genetic Algorithm for the Value Balancing Problem

# problem data and fitness function - DO NOT CHANGE np.random.seed(123) tot_num_items = 1000 # should be divisible by 4 num_items = int(tot_num_items / 4) num_groups = 4 values = np.random.randint(10,100,size=num_items) values = np.hstack([values,values,values,values]) np.random.seed() def group_fitness(groups,values,num_groups): # groups must be a numpy array with ind_size entries (only one individual) sums = np.array([ sum( values[ groups == i] ) for i in range(num_groups) ]) return max(sums)-min(sums)
## Solution Code for Genetic # genetic algorithm parameters pop_size = 1000 # should be even due to the way we'll implement crossover ind_size = tot_num_items # determines number of input variables for each individual tourn_size = 5 # tournament size for selection cx_prob = 0.8 # probability a pair of parents crossover to produce two children mut_prob = 0.1 # probability an individual mutates ind_prob = 0.01 # probability each variable in an individual mutates num_iter = 100 # number of genetic algorithm mutations update_iter = 10 # how often to display output num_elite = 10 # number of elite individuals to save each generation fitness_fun = group_fitness # change this to the fitness function for your problem #np.random.seed(121) # set seed here for reproducibility (you can change this number to explore) time_start = time() # record start time to compute time elapsed at the end #initialize population and fitness pop = np.random.randint(low=0, high=num_groups, size = (pop_size,ind_size)) #note how we're passing named parameters into the computeFitness function fitness = ga.computeFitness(fitness_fun, pop, values = values, num_groups = num_groups) # initialize stats and output stats, best_fitness, best_x = ga.initStats(fitness, pop, num_iter) #This is where the guts of the algorithm start for iter in range(num_iter): #sort the population #pop = ga.sortPop(pop, fitness) # tournament selection selected_pop = ga.tournamentSelection(pop, tourn_size).astype(int) # one-point crossover (mating) cx_pop = ga.onePointCrossover(selected_pop, cx_prob).astype(int) # uniform int mutation mut_pop = ga.uniformIntMutation(cx_pop, mut_prob, ind_prob, [0, num_groups]).astype(int) # elitism pop = ga.addElitism(pop, mut_pop, num_elite) # evaluate fitness on new population fitness = ga.computeFitness(fitness_fun, pop, values = values, num_groups = num_groups ) # collect stats and output to screen stats, best_fitness, best_x = ga.updateStats(stats, fitness,best_x, pop, iter, update_iter) time_elapsed = time() - time_start ##################### # Everything in the algorithm is done, and now we're just outputting the final result ##################### print(f"The minimized maximum difference between value totals is {best_fitness:.0f}") print(f"The total time elapsed is: {time_elapsed:0.2f}") print(f"The total number of function evaluations is: {(num_iter+1)*pop_size:.0f}")
Iteration | Best this iter | Best ever 1 | 2.460e+02 | 2.460e+02 10 | 1.270e+02 | 5.700e+01 20 | 2.110e+02 | 5.700e+01 30 | 2.460e+02 | 5.700e+01 40 | 7.700e+01 | 5.700e+01 50 | 7.800e+01 | 5.700e+01 60 | 1.770e+02 | 5.700e+01 70 | 1.440e+02 | 5.700e+01 80 | 1.340e+02 | 5.700e+01 90 | 8.700e+01 | 4.000e+01 100 | 1.500e+02 | 4.000e+01 The minimized maximum difference between value totals is 40 The total time elapsed is: 34.09 The total number of function evaluations is: 101000