Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
DataScienceUWL
GitHub Repository: DataScienceUWL/DS775
Path: blob/main/Lessons/Lesson 07 - Global Optimization 2/extras/Lesson_07_Load_Balancing_With_Constraints.ipynb
871 views
Kernel: Python 3 (system-wide)
#imports import numpy as np

Load Balancing With Contraints Example

Note: Make sure you've already read Part 1, in which we do this problem without constraints. We'll only be explaining the new bits here.

In this example we're going to show how you could use various approaches to solve a constrained load balancing problem.

For this problem, we're talking execution times on computer processors, with total execution time on certain processors limited to a certain amount. You might see this happen when one processor needs to "reserve" processing cycles for some other job not in our load balancing list.

We can describe it as:

given a list of nn execution times, divide them to be executed on kk processors so that the total execution time on each processor is as close to the same as possible, while yy constrained processors are under xx execution time limit.

There are two kinds of constraints we can implement with our metaheuristic algorithms:

  • hard constraints

  • soft constraints

We'll start with hard constraints.

Hard Constraint

A hard constraint is a constraint which rejects any solution that doesn't meet our specifications. We've seen these before in Pyomo when we build our lists of constraints or our bounds.

We can use hard constraints with some of our metaheuristics methods. We'll use hard constraints with greedy local search and simulated annealing.

There are two ways to implement hard constraints - directly in the loop of our hand-coded solutions, or in the move function. We'll look at both approaches with our greedy local search.

Hard Constraint - Objective Function

For a hard constraint, our objective function remains identical. (If you need a refresher on what the objective function is doing, please see the Lesson_05_Load_Balancing notebook.)

# original objective function = total squared deviation of times from balanced times def balance_metric(assign,times,k): target = sum(times)/k return sum( (sum(times[assign==j])-target)**2 for j in range(k) )

Hard Constraint - Directly In the loop Move Function

When we implement our hard constraint directly in our loop for greedy local search, the move function remains the same as it was without constraints.

# define a move function which changes one processor assignment randomly def reassign_one(assign,k): n = len(assign) # choose a job and a new processor assignment which_job = np.random.randint(0,n,1)[0] which_proc = np.random.randint(0,k,1)[0] new_assign = assign.copy() new_assign[which_job] = which_proc return new_assign

Hard Constraint - Enforced in the Move Function

But, we can put the hard constraint directly in our move function, instead. We'll implement it by first completing the move, then checking to see if the new assignments meet our constraints. If they do not, we'll return the original assignments. If they do, we'll return the new assignments.

To do this, we'll need to pass in two additional parameters:

  • conproc - the list of constrained processors

  • conmax - the list of max total processing time allowed on each constrained processor

Let's look at the function first.

# define a move function which changes one processor assignment randomly def reassign_one_constrained(assign,k,conproc, conmax): #get the reassignment new_assign = reassign_one(assign, k) ################### # NEW - Evaluate if the new assignments meet our constraint over_max = True in [sum(times[new_assign==c]) > conmax[c] for c in conproc] # Only return a new assignment if it meets our constraints #uncomment this line to see total time on processor #print('Total time on each processor (inside function):', [ sum(times[new_assign==j]) for j in range(k)]) if over_max == False: #print('Not over max') #uncomment this line to see if it passed return new_assign else: #print('Over max') #uncomment this line to see if it failed return assign

Let's see what the constrained move function looks like with a simple problem. We'll create some sample data, and run the reassign_one_constrained() function, constraining processor 0 to a total processing time of 10. You can uncomment the print statements in the function to see what's happening inside the function, or you can just compare what goes in with what comes out below. Note that if we start with more than 10 on constrained processor (0), we'll still end up with more than 10 on it if our reassignment doesn't "fix" it.

k = 3 times = np.array([2,4,6,2,4,6,2,4,6]) assign=np.array([0,0,0,1,1,1,2,2,2]) # total time on each processor ... should be the same print('Total time on each processor (going in):', [ sum(times[assign==j]) for j in range(k)]) #reassign one, with processor 0 constrained to 10 new_assign = reassign_one_constrained(assign,k, [0], [10]) print('Processor 0 Constrained to 10:', new_assign) print('Total time on each processor (coming out):', [ sum(times[new_assign==j]) for j in range(k)])
Total time on each processor (going in): [12, 12, 12] Processor 0 Constrained to 10: [0 0 0 1 1 1 2 2 2] Total time on each processor (coming out): [12, 12, 12]

Greedy Local Search - Hard Constraint In the Loop

Let's see what it looks like to use the original function in a greedy local search, and enforce the hard constraint right in the loop.

We'll also track whether the algorithm ever finds a solution that meets the constraints. It's possible with a hard constraint that we never find a solution that works.

# local search function def load_balance_local_loop_constraint(times, k, max_no_improve,conproc,conmax): n = len(times) # starts from a random assignment to k processors current_x = np.random.randint(low=0,high=k,size=n) current_f = balance_metric(current_x, times, k) best_x = current_x best_f = current_f ########################## # New - track convergence converged = False ########################## # stop search if no better x is found within max_no_improve iterations num_moves_no_improve = 0 iterations = 0 while (num_moves_no_improve < max_no_improve): num_moves_no_improve += 1 iterations += 1 # just for tracking #this is the same as before - using our original reassign_one function new_x = reassign_one(current_x,k) ########################################################## # NEW - Evaluate if the new assignments meet our constraint over_max = True in [sum(times[new_x==c]) > conmax[c] for c in conproc] ################################## new_f = balance_metric(new_x, times, k) #################################### #NEW - check both that the new value is less than the old and that we are within our constraints if new_f < current_f and over_max == False: ################################# #NEW - track if we ever accept a solution converged = True ################################# num_moves_no_improve = 0 current_x = new_x current_f = new_f if current_f < best_f: best_x = current_x best_f = current_f return best_x, best_f, iterations, converged

Greedy Local Search - Hard Constraint In the Move Function

Next, let's implement the hard constraint directly in our move function (reassign_one_constrained).

Again, to use that, we need our two additional parameters, so we'll update our load_balance_local function to take in 2 additional parameters:

  • conproc - a list of the processors to constrain

  • conmax - a list of the max times on each processor.

We'll again track whether the algorithm ever finds a solution that meets the constraints.

# local search function def load_balance_local_function_constraint(times, k, max_no_improve,conproc,conmax): n = len(times) # starts from a random assignment to k processors current_x = np.random.randint(low=0,high=k,size=n) current_f = balance_metric(current_x, times, k) best_x = current_x best_f = current_f ########################## # New - track convergence converged = False ########################## # stop search if no better x is found within max_no_improve iterations num_moves_no_improve = 0 iterations = 0 while (num_moves_no_improve < max_no_improve): num_moves_no_improve += 1 iterations += 1 # just for tracking ################################## # NEW - pass the extra parameters to reassign_one new_x = reassign_one_constrained(current_x,k,conproc,conmax) ################################## new_f = balance_metric(new_x, times, k) if new_f < current_f: ################################# #NEW - track if we ever accept a solution converged = True ################################# num_moves_no_improve = 0 current_x = new_x current_f = new_f if current_f < best_f: best_x = current_x best_f = current_f return best_x, best_f, iterations, converged

Let's run each of these with a small number of processors and a small number of job execution times. First let's generate some random data and see what the time on each processor would be if it loads were completely balanced.

# generate random job times np.random.seed(666) #comment this out to play with new numbers #we'll start with 20 execution times n = 30 #we'll start with 2 processors k = 3 min_time = 20 max_time = 200 times = np.random.randint(low=min_time, high = max_time, size = n) assign = np.random.randint(low=0,high=k,size=n) # total time on each processor print('Total time on each processor, if completely balanced:', sum(times)/k)
Total time on each processor, if completely balanced: 1220.6666666666667

Running local search with constraints

Let's start with setting processor 0 to be constrained to a max processing time of 1100. Run this code several times. How often do you get convergence?

##################### # NEW: adding our 2 additional parameters to the function and one additional return variable ##################### print('Using the Constraint Directly in the Loop') best_assign, best_f, num_iter, converged = load_balance_local_loop_constraint(times,k,5000,[0],[1100]) print('The algorithm found a solution that met the criteria:', converged) print('The best assignment is', best_assign) print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) print('The deviation from balance is', best_f) print('It took', num_iter, 'iterations.') print('------------------------------------------') print('Using the Constraint in the Function') best_assign, best_f, num_iter, converged = load_balance_local_function_constraint(times,k,5000,[0],[1100]) print('The algorithm found a solution that met the criteria:', converged) print('The best assignment is', best_assign) print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) print('The deviation from balance is', best_f) print('It took', num_iter, 'iterations.')
Using the Constraint Directly in the Loop The algorithm found a solution that met the criteria: False The best assignment is [2 0 0 2 0 0 1 1 1 1 1 1 0 0 0 2 1 2 1 0 0 1 2 0 0 2 1 1 2 2] Total time on each processor: [1353, 1373, 936] The deviation from balance is 121752.66666666666 It took 5000 iterations. ------------------------------------------ Using the Constraint in the Function The algorithm found a solution that met the criteria: True The best assignment is [2 1 2 0 0 1 1 2 1 1 2 0 2 2 0 1 2 0 0 0 1 1 1 0 0 2 2 1 1 0] Total time on each processor: [1077, 1306, 1279] The deviation from balance is 31324.666666666668 It took 5022 iterations.

What's happening?

This isn't working very well, is it? Why not? Well, let's say we start with 1400 on our constrained processor and our largest processing time is 200. 1400-200 = 1200. 1200 is still larger than our constrained solution allows. In other words - there's no single move that can find a feasible solution.

How do we fix it?

There are a couple of approaches we could take. We've already seen one approach in prior lesson - using multi-start. We could wrap our load_balance_local_function_constraint or our load_balance_local_loop_constraint in a multi-start function and cross our fingers and hope that at least one of our starts randomly starts with a feasible solution. With some algorithms, that's our only choice to avoid getting stuck in a local optimum.

But for our problem, we can use another approach. We can start from a random, but feasible solution. We can do this in 2 ways - we can put all of our processes on a single processor and let the algorithm "spread them out" or we can randomly distribute our processes across the unconstrained processors.

We'll actually write code that handles both, because with only 3 processors, if we constrain 2, we only have a single processor to start with. But, for larger problems, we'll randomly distribute across the unconstrained processors.

We'll need a new version of our load_balance_local function.

# local search function def load_balance_local_feasible(times, k, max_no_improve,conproc,conmax): n = len(times) ############################################### #NEW - get a random sample of size n using choices from the unconstrained processors #this will assign all to one processor if only one is unconstrained unconstrained = [x for x in list(range(k)) if x not in conproc] current_x = np.random.choice(unconstrained,size=n,replace=True) ############################################### current_f = balance_metric(current_x, times, k) best_x = current_x best_f = current_f ########################## # New - track convergence converged = False ########################## # stop search if no better x is found within max_no_improve iterations num_moves_no_improve = 0 iterations = 0 while (num_moves_no_improve < max_no_improve): num_moves_no_improve += 1 iterations += 1 # just for tracking ################################## # NEW - pass the extra parameters to reassign_one new_x = reassign_one_constrained(current_x,k,conproc,conmax) ################################## new_f = balance_metric(new_x, times, k) if new_f < current_f: ################################# #NEW - track if we ever accept a solution converged = True ################################# num_moves_no_improve = 0 current_x = new_x current_f = new_f if current_f < best_f: best_x = current_x best_f = current_f return best_x, best_f, iterations, converged

Trying again

Now let's try our code and see if we get feasible solutions. Run this multiple times.

best_assign, best_f, num_iter, converged = load_balance_local_feasible(times,k,5000,[0],[1100]) print('The algorithm found a solution that met the criteria:', converged) print('The best assignment is', best_assign) print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) print('The deviation from balance is', best_f) print('It took', num_iter, 'iterations.')
The algorithm found a solution that met the criteria: True The best assignment is [0 2 1 1 2 0 2 1 0 0 0 1 1 1 1 0 2 2 0 1 0 2 2 2 0 1 1 2 1 0] Total time on each processor: [1093, 1297, 1272] The deviation from balance is 24760.666666666664 It took 5292 iterations.

So much better! We consisently find a feasible solution.

What if we wanted to constrain 2 of our processors? Easy! We just add to our conproc and conmax lists. This time, let's constrain processor 0 to a max time of 1200 and processor 1 to a max time of 1100. Again, run this code multiple times and see how often the algorithm converges.

best_assign, best_f, num_iter, converged = load_balance_local_feasible(times,k,5000,[0,1],[1200,1100]) #adding our 2 additional parameters here print('The algorithm found a solution that met the criteria:', converged) print('The best assignment is', best_assign) print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) print('The deviation from balance is', best_f) print('It took', num_iter, 'iterations.')
The algorithm found a solution that met the criteria: True The best assignment is [2 2 1 2 0 1 2 1 2 1 2 1 1 2 0 1 0 0 0 0 2 2 1 0 2 2 1 0 1 0] Total time on each processor: [1178, 1077, 1407] The deviation from balance is 57180.666666666664 It took 5093 iterations.

Simulated Annealing - By Hand - Hard Constraints

We can take the same hard constraint approach with our hand-coded simulated annealing problem. Once again, we'll add 2 parameters to our custom_simanneal function:

  • conproc - a list of the processors to constrain

  • conmax - a list of the max times on each processor.

We'll start with a feasible solution (no processes on the constrained processor).

And once again we'll pass back a convergence variable to let us know if we ever found a solution that matched our constraints.

Personally, I think it's neater and cleaner to use the reassign_one_constrained function. So we'll use that one for our moves.

We'll use the same set of jobs from the previous example so you can compare.

def custom_simanneal(times, k, max_no_improve, temp, alpha, conproc, conmax): #get the length of our jobs n = len(times) # starts from a random assignment to the unconstrained processors unconstrained = [x for x in list(range(k)) if x not in conproc] current_x = np.random.choice(unconstrained,size=n,replace=True) current_f = balance_metric(current_x, times, k) best_x = current_x best_f = current_f #this is just for tracking iterations = 1 trajectory = [[iterations,current_f]] trajectory_best = [[iterations,best_f]] ########################## # New - track convergence converged = False ########################## # stop search if no better x is found within max_no_improve iterations num_moves_no_improve = 0 while (num_moves_no_improve < max_no_improve): num_moves_no_improve += 1 iterations += 1 # just for tracking ################################### #NEW - add the 2 extra parameters new_x = reassign_one_constrained(current_x,k, conproc, conmax) ################################### new_f = balance_metric(new_x, times, k) #determine the change in score delta = new_f - current_f #determine the probability of accepting this solution prob = np.exp(min(delta, 0) / temp) #determine if we'll accept this solution accept = new_f < current_f or np.random.uniform() < prob if accept: current_x = new_x current_f = new_f if current_f < best_f: ################# #New - track if we ever got a better solution than the first converged = True ################# best_x = current_x best_f = current_f num_moves_no_improve = 0 temp *= alpha iterations += 1 trajectory.append([iterations,current_f]) trajectory_best.append([iterations,best_f]) return best_x, best_f, iterations, trajectory, trajectory_best,converged ####NEW: Return extra variable ####### # New - add the 2 extra parameters best_x, best_f, iterations, trajectory, trajectory_best, converged = custom_simanneal(times, k, 1000, 500, .99, [0],[1100]) print('The algorithm found a solution that met the criteria:', converged) print('The best assignment is', best_f) print('Total time on each processor:', [ sum(times[best_x==j]) for j in range(k)]) print('The deviation from balance is', best_f)
The algorithm found a solution that met the criteria: True The best assignment is 22858.666666666668 Total time on each processor: [1098, 1294, 1270] The deviation from balance is 22858.666666666668

The simanneal Package - Hard Constraints - No External Functions

We can also use a hard constraint with the simanneal package. Let's see what it looks like if don't use external functions. To do this, you add your code within the package's move and energy functions. To use a hard constraint in simanneal, you'd enforce the constraint in the move function, just like we did with our hand-coding.

We again need our two extra variables:

  • conproc - a list of the processors to constrain

  • conmax - a list of the max times on each processor.

But this time we'll pass them into the initialization function of the simanneal package.

To start with a feasible solution, we'll need to generate it outside the package and pass in the initial assignment.

Let's see what that looks like.

#this line just imports the package from simanneal import Annealer #this is the line where we decide what we're calling this problem class loadProblemHC(Annealer): # Here's where we pass extra data if we need it. We need to pass our times (jobs) variable and the number of servers (k) ############################## #NEW - add 2 extra parameters def __init__(self, state, times, k, conproc, conmax): ############################### #this line makes the times accessible within the other two functions self.times = times self.k = k ########################### # New Set up 2 new variables self.conproc = conproc self.conmax = conmax ########################### #this is how we initialize - note we're calling super with the same name as above (loadProblem) super(loadProblemHC, self).__init__(state) # important! def move(self): """This corresponds to our previous reassign one function""" # pick one of the jobs and assign it to one of k processors ############################# #NEW - We have to COPY the state assign = self.state.copy() n = len(assign) k = self.k # choose a job and a new processor assignment which_job = np.random.randint(0,n,1)[0] which_proc = np.random.randint(0,k,1)[0] assign[which_job] = which_proc ################################################# # NEW - hard constraint enforcement over_max = True in [sum(self.times[assign==c]) > self.conmax[c] for c in self.conproc] # Only update the state if it meets our requirements if over_max == False: #we only update the state if we've met our constraints self.state = assign def energy(self): """This corresponds to our balance_metric function""" times = self.times assign = self.state k = self.k target = sum(times)/k return sum( (sum(times[assign==j])-target)**2 for j in range(k) ) ################################ #NEW - generate an initial feasible assignment ################################ unconstrained = [x for x in list(range(k)) if x not in [0]] assign = np.random.choice(unconstrained,size=n,replace=True) #initialize the class ld = loadProblemHC(assign, times, k, [0], [1100]) ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results # since our state is a numpy array, we need deepcopy ld.copy_strategy = "deepcopy" #this is what kicks it off best_assign, best_score = ld.anneal() print('The best set is: ', best_assign) print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) print('The best score is:', best_score)
Temperature Energy Accept Improve Elapsed Remaining 730.00000 22620.67 55.30% 0.00% 0:00:02 0:00:001 Temperature Energy Accept Improve Elapsed Remaining 730.00000 27402.67 56.20% 0.06% 0:00:14 0:00:003
The best set is: [0 2 2 1 2 2 0 2 1 1 1 2 1 0 0 1 1 1 2 1 0 1 2 2 0 0 2 1 0 0] Total time on each processor: [1100, 1281, 1281] The best score is: 21840.666666666668

The simanneal Package - Hard Constraints - External Functions

The simanneal package will also work with external functions. Remember, we've already written nice, self-contained functions that correspond to simanneal's functions.

  • move = reassign_one_constrained

  • energy = balance_metric

We can just plop those external functions in to simanneal's functions and run our code.

#this line just imports the package from simanneal import Annealer #this is the line where we decide what we're calling this problem class loadProblemHCWithFunctions(Annealer): # Here's where we pass extra data if we need it. We need to pass our times (jobs) variable and the number of servers (k) ############################## #NEW - add 2 extra parameters def __init__(self, state, times, k, conproc, conmax): ############################### #this line makes the times accessible within the other two functions self.times = times self.k = k ########################### # New Set up 2 new variables self.conproc = conproc self.conmax = conmax ########################### #this is how we initialize - note we're calling super with the same name as above (loadProblem) super(loadProblemHCWithFunctions, self).__init__(state) # important! def move(self): """Can directly call our reassign one constrained function""" #since we're copying in our reassign_one_constrained function, we can pass the self.state directly self.state = reassign_one_constrained(self.state,self.k, self.conproc, self.conmax) def energy(self): """Can directly call our balance_metric function""" return balance_metric(self.state, self.times, self.k) ################################ #AGAIN - generate an initial feasible assignment ################################ unconstrained = [x for x in list(range(k)) if x not in [0]] assign = np.random.choice(unconstrained,size=n,replace=True) #initialize the class ld2 = loadProblemHCWithFunctions(assign, times, k, [0], [1100]) ld2.set_schedule(ld2.auto(minutes=.2)) #set approximate time to find results # since our state is a numpy array, we need deepcopy ld2.copy_strategy = "deepcopy" #this is what kicks it off best_assign, best_score = ld2.anneal() print('The best set is: ', best_assign) print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) print('The best score is:', best_score)
Temperature Energy Accept Improve Elapsed Remaining 490.00000 27220.67 55.95% 0.00% 0:00:02 0:00:001 Temperature Energy Accept Improve Elapsed Remaining 490.00000 22082.67 55.09% 0.14% 0:00:13 0:00:001
The best set is: [0 0 1 2 1 1 1 2 1 0 0 0 1 2 0 1 2 0 1 1 2 2 1 1 0 2 2 0 2 2] Total time on each processor: [1100, 1281, 1281] The best score is: 21840.666666666668

Note: Simanneal evaluates the problem space before running. If your constraint is set too low, simanneal will print the first pink line, and then just hang. If you're playing with this and it gets stuck, you'll need to restart your kernel and loosen your constraints.

Soft-Constraints

As we've seen, sometimes with hard constraints you can fail to get a feasible solution, and we can't even get closer to a feasible solution because we're rejecting any option that doesn't meet the constraint. Soft constraints fix that problem. We won't always get a solution that meets the constraint, but the algorithm will at least have a chance to get closer to an optimal solution. The Big M approach is a type of soft constraint.

In the metaheuristic algorithms we're exploring, soft constraints are implemented in the objective function. Instead of rejecting a solution outright, a penalty is incorporated. For a minimization problem, a positive number is added when the constraint isn't met. For a maximization problem, a negative number is added.

In our code, we're adding a multiplier to the penalty. The larger the multipler, the "harder" the soft constraint. We'll add the penalty as a parameter so we can adjust the penalty on the fly.

Let's look at what this would look like with a hand-solved problem.

We'll keep our original objective function (balanced_metric), but we'll add a new wrapper function (balanced_metric_constrained). This one will take in 3 additional parameters:

  • a list of constrained processors

  • a list of the max times on each processor

  • a penalty multiplier (integer)

# constrained objective function = total squared deviation of times from balanced times, providing a penalty for constraints def balance_metric_constrained(assign,times,k,conproc,conmax, penalty_multiplier): #get the unconstrained balance metric dev_uncon = balance_metric(assign,times,k) #sum the constrained processors and apply our penalty dev_penalty = penalty_multiplier * sum( max(sum(times[assign==c])-conmax[c],0)**2 for c in conproc ) return (dev_uncon + dev_penalty)

Testing the Soft Constraint

We'll test our two functions with some hand-coded assignments. We'll use 9 jobs on 3 processors. First we'll look at them as an unconstrained, perfectly balanced problem.

#testing perfectly balanced unconstrained k = 3 times = np.array([2,4,6,2,4,6,2,4,6]) assign=np.array([0,0,0,1,1,1,2,2,2]) # total time on each processor ... should be the same print('Total time on each processor:', [ sum(times[assign==j]) for j in range(k)]) #print the original balance metric print('Unconstrained Balance Metric:', balance_metric(assign,times,k))
Total time on each processor: [12, 12, 12] Unconstrained Balance Metric: 0.0

Now we'll add some constraints. Note that neither our times nor assignments are changing. But, we're essentially changing the target for some of our processors. We're going to set processor 0 to a max limit of 10 and a penalty of 5.

# total time on each processor has not changed print('Total time on each processor (has not changed):', [ sum(times[assign==j]) for j in range(k)]) #Constrain processor 1 to 10 print('Constrained Balance Metric:', balance_metric_constrained(assign,times,k,[0],[10], 5))
Total time on each processor (has not changed): [12, 12, 12] Constrained Balance Metric: 20.0

What happens if we increase the penalty to 10?

# total time on each processor has not changed print('Total time on each processor (has not changed):', [ sum(times[assign==j]) for j in range(k)]) #Constrain processor 1 to 10, with a penalty of 10 print('Constrained Balance Metric:', balance_metric_constrained(assign,times,k,[0],[10], 10))
Total time on each processor (has not changed): [12, 12, 12] Constrained Balance Metric: 40.0

With the constraint in place, what was a completely balanced solution no longer looks so great. What would happen if we switch our assignments around some?

#new assignments assign=np.array([1,0,0,1,1,1,2,2,2]) print('Total time on each processor (has changed):', [ sum(times[assign==j]) for j in range(k)]) #check the unconstrained balance metric print('Balance Metric without constraints', balance_metric(assign,times,k)) #check the constrained balance metric print('Constrained Balance Metric:', balance_metric_constrained(assign,times,k,[0],[10],5))
Total time on each processor (has changed): [10, 14, 12] Balance Metric without constraints 8.0 Constrained Balance Metric: 8.0

Here, we've met our constraint, so our unconstrained and constrained balance metrics match.

The simanneal Package - Soft Constraint

With simanneal, instead of adding our hard constraint to the move() function, we'd add our soft constraint to the energy() function. We'll need to pass in the penalty multiplier at initialization.

#this line just imports the package from simanneal import Annealer #this is the line where we decide what we're calling this problem class loadProblemSC(Annealer): # Here's where we pass extra data if we need it. We need to pass our times (jobs) variable and the number of servers (k) ############################## #NEW - add 2 extra parameters def __init__(self, state, times, k, conproc, conmax, penalty_multiplier): ############################### #this line makes the times accessible within the other two functions self.times = times self.k = k ########################### # New Set up 3 new variables self.conproc = conproc self.conmax = conmax self.penalty_multiplier = penalty_multiplier ########################### #this is how we initialize - note we're calling super with the same name as above (loadProblem) super(loadProblemSC, self).__init__(state) # important! def move(self): """This corresponds to our previous reassign one function""" # pick one of the jobs and assign it to one of k processors ################################## #NEW - back to just changing the state directly assign = self.state n = len(assign) k = self.k # choose a job and a new processor assignment which_job = np.random.randint(0,n,1)[0] which_proc = np.random.randint(0,k,1)[0] assign[which_job] = which_proc def energy(self): """This corresponds to our balance_metric function""" times = self.times assign = self.state k = self.k conproc = self.conproc conmax = self.conmax ############################################ #NEW - determing the energy and assign a penalty target = sum(times)/k #sum the total processor deviation dev_uncon = sum( (sum(times[assign==j])-target)**2 for j in range(k) ) #sum the constrained processor deviation and apply penalty dev_penalty = self.penalty_multiplier * sum( max(sum(times[assign==c])-conmax[c],0)**2 for c in conproc ) return (dev_uncon + dev_penalty)/100

Let's generate some more test data and run our soft constraint version of simanneal.

# generate random job times np.random.seed(666) #comment this out to play with new numbers #we'll start with 20 execution times n = 30 #we'll start with 2 processors k = 3 min_time = 20 max_time = 200 times = np.random.randint(low=min_time, high = max_time, size = n) assign = np.random.randint(low=0,high=k,size=n) # total time on each processor print('Total time on each processor, if completely balanced:', sum(times)/k)
Total time on each processor, if completely balanced: 1220.6666666666667

With completely balanced loads, we'd have 1220-ish on each processor. We'll set processor 0's constraint to 1100. Run this code several times. Change up the penalty multiplier. How often do you meet the constraint? How balanced does the workload seem compared to our hard constraint version?

#initialize the class ld = loadProblemSC(assign, times, k, [0], [1100], 5) #penalty = 5 ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results # since our state is a numpy array, we need deepcopy ld.copy_strategy = "deepcopy" #this is what kicks it off best_assign, best_score = ld.anneal() #print('The best set is: ', best_assign) print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) print('The best score is:', best_score)
Temperature Energy Accept Improve Elapsed Remaining 3.00000 181.21 33.80% 0.00% 0:00:02 0:00:00 Temperature Energy Accept Improve Elapsed Remaining 3.00000 168.09 33.42% 0.00% 0:00:13 0:00:00
Total time on each processor: [1128, 1267, 1267] The best score is: 168.00666666666663

Genetic Algorithm with DEAP - Soft Constraints

Again with DEAP we'll do a soft constraint in our energy function. This requires a few small, but important changes. First, the things that stay the same.

  • Our create_individual() function

  • Our custom_ga() function

Neither of these change at all and we can just copy/paste the code from the load balance without constraints example.

# No changes to this function def create_individual(k,n): current_x = np.random.randint(low=0,high=k,size=n) return current_x.tolist() #this converts our np array back to a list # no changes here, call this to execute the genetic algorithm def customGA(in_toolbox,in_tools,in_stats,pop_size, cx_prob, mut_prob, max_gen, max_no_improve): pop = in_toolbox.population(n=pop_size) logbook = in_tools.Logbook() hof = in_tools.HallOfFame(1) # Evaluate the entire population fitnesses = list(map(in_toolbox.evaluate, pop)) for ind, fit in zip(pop, fitnesses): ind.fitness.values = fit hof.update(pop) best_val = hof[0].fitness.values num_no_improve = 0 generation = 0 while num_no_improve < max_no_improve and generation < max_gen: # Select the next generation individuals selected = in_toolbox.select(pop, len(pop)) # Clone the selected individuals offspring = list(map(in_toolbox.clone, selected)) # Apply crossover and mutation on the offspring for child1, child2 in zip(offspring[::2], offspring[1::2]): if random.random() < cx_prob: in_toolbox.mate(child1, child2) del child1.fitness.values del child2.fitness.values for mutant in offspring: if random.random() < mut_prob: in_toolbox.mutate(mutant) del mutant.fitness.values # Evaluate the individuals with an invalid fitness invalid_ind = [ind for ind in offspring if not ind.fitness.valid] fitnesses = map(in_toolbox.evaluate, invalid_ind) num_evals = 0 for ind, fit in zip(invalid_ind, fitnesses): num_evals += 1 ind.fitness.values = fit # The population is entirely replaced by the offspring pop[:] = offspring # track the best value and reset counter if there is a change hof.update(pop) curr_best_val = hof[0].fitness.values[0] num_no_improve += 1 if curr_best_val != best_val: best_val = curr_best_val num_no_improve = 0 # record stats record = in_stats.compile(pop) logbook.record(gen=generation, evals=num_evals, **record) # increment generation generation += 1 best_x = list(hof[0]) return best_val, best_x, logbook

Our balance_metric_tuple function does need to be updated. We need to take in the 3 additional parameters (do you have these down yet?):

  • conproc - a list of constrained processors

  • conmax - a list of the max times on each processor

  • penalty_multiplier - integer to control how "hard" the penalty is

# objective function = total squared deviation of times from balanced times def balance_metric_tuple(assign,times,k,conproc, conmax, penalty): #make the list a numpy array assign_np = np.array(assign) ## call the balance_metric function metric = balance_metric_constrained(assign_np, times, k, conproc, conmax, penalty) return (metric, ) #note that we're returning a tuple #let's test this function balance_metric_tuple(assign,times,k, [0], [10], 5)
(3058021.6666666665,)

The only other thing we'll need to change is how we set up our evaluate function. Most of this code is identical to what you've seen before. Note the one changed line

import random from deap import base from deap import creator from deap import tools from functools import partial creator.create("FitnessLoad", base.Fitness, weights=(-1.0,)) creator.create("Individual", list, fitness=creator.FitnessLoad) toolbox = base.Toolbox() toolbox.register("assignments",create_individual,k,n) toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.assignments) toolbox.register("population", tools.initRepeat, list, toolbox.individual) ############################### #NEW - this line needs additional parameters ############################### toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=[0], conmax=[10], penalty=5) toolbox.register("select", tools.selTournament, tournsize=3) toolbox.register("mate", tools.cxTwoPoint) toolbox.register("mutate", tools.mutUniformInt, low = 0, up = k-1, indpb=0.1) stats = tools.Statistics(key=lambda ind: ind.fitness.values) stats.register("avg", np.mean) stats.register("std", np.std) stats.register("min", np.min) stats.register("max", np.max) # define search parameters pop_size = 200 crossover_prob = 0.3 mutation_prob = 0.5 max_gen = 2000 max_no_improve = 200 # get solution best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob, max_gen, max_no_improve) print('Genetic Algorithm Best Result', best_balance) print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)])
Genetic Algorithm Best Result 1691209.6666666665 Total time on each processor: [289, 1687, 1686]

Increasing Problem Size

We used a very small problem to demonstrate each of the methods above. Now it's time to create a much larger problem and see how our algorithms perform.

We'll also set conproc and conmax to constrain 2 of our 10 processes.

#################################### # Setting up a new bigger problem #################################### n = 1000 k = 10 #we're going to set some min/max times here for the jobs min_time = 20 max_time = 200 #randomly generate some jobs times = np.random.randint(low=min_time, high = max_time, size = n, dtype='int64') # total time on each processor print('Total time on each processor, if completely balanced:', sum(times)/k) conproc = [0,1] conmax=[8000,8000]
Total time on each processor, if completely balanced: 11282.9

Let's create a pandas dataframe for storing our results. We'll update this after each algorithm and at the end we can sort and compare to see which algorithm performed the best.

#set up a dataframe for storing results import pandas as pd r_df = pd.DataFrame({'Score': [None,None,None,None,None,None,None,None,None,None,None,None,None], 'Constrained Score': [None,None,None,None,None,None,None,None,None,None,None,None,None], 'Time on Processor': [None,None,None,None,None,None,None,None,None,None,None,None,None], 'Over Constraints': [None,None,None,None,None,None,None,None,None,None,None,None,None]}, index=['Baseline', 'HC-Local Search', 'HC-Custom Annealing', 'HC-Simanneal','SC-Simanneal-5', 'SC-Simanneal-50', 'SC-Simanneal-500', 'SC-GA-5', 'SC-GA-50', 'SC-GA-500', 'SC-GA-5-FStart', 'SC-GA-50-FStart', 'SC-GA-500-FStart']) pd.set_option('display.float_format', '{:.2f}'.format) pd.set_option('max_colwidth', 200) #create a function for updating the grid def updateResults(df,score,c_score,time,approach,assign,conproc,conmax): over_max = True in [sum(times[assign==c]) > conmax[c] for c in conproc] df.loc[approach, 'Score'] = score df.loc[approach, 'Constrained Score'] = c_score df.loc[approach, 'Time on Processor'] = time df.loc[approach, 'Over Constraints'] = over_max return(df) r_df

Baseline

Let's see what our baseline deviation from balanced loads is with a size this large. (Note, there's randomness here and some algorithms set their own baseline. But this should give us a general idea.)

We'll also create a new metric to get an apples to apples comparision of the scores, using the max constraints as a target and evenly distributed processes on unconstrained processors.

# constrained objective function = total squared deviation of times from balanced times, accounting for constraints def get_c_metric(assign,times,k,conproc,conmax): #get the total time total = sum(times) #get the total that should be balanced balance_target = ((total-sum(conmax))/(k-len(conproc))) #get the unconstrained processors uncon = np.delete(np.array(range(k)), np.array(conproc)) #sum the unconstrained processor deviation dev_uncon = sum( (sum(times[assign==j])-balance_target)**2 for j in uncon ) #sum the absolute difference on constrained processors dev_cons = sum( (sum(times[assign==j])-conmax[j])**2 for j in conproc ) return dev_uncon + dev_cons #get the baseline assign = np.random.randint(low=0,high=k,size=n) time_on_proc = [ sum(times[assign==j]) for j in range(k)] baseline = balance_metric(assign,times,k) print('Baseline with random assignments:', baseline) c_metric = get_c_metric(assign,times,k,conproc,conmax) print('Baseline constrained metric', c_metric) r_df = updateResults(r_df,baseline,c_metric,time_on_proc,'Baseline', assign, conproc, conmax)
Baseline with random assignments: 5143156.899999999 Baseline constrained metric 39721121.875

Greedy local search (hard constraint)

The only parameter we can fiddle with in our greedy local search is how many iterations we're willing to go with no improvement. Try changing the 5000 number to see if it gets better results

  • max_no_improve = 5000

#### Greedy Local Search ##### ##################### #Parameters max_no_improve = 5000 ##################### best_assign, best_f, num_iter, converge = load_balance_local_feasible(times,k,max_no_improve,conproc,conmax) print('Greedy Local Search best result:', best_f) print('The algorithm found a solution that met the criteria:', converged) print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) c_metric = get_c_metric(best_assign,times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_f,c_metric,[ sum(times[best_assign==j]) for j in range(k)],'HC-Local Search',best_assign,conproc,conmax)
Greedy Local Search best result: 27050792.900000002 The algorithm found a solution that met the criteria: True Total time on each processor: [8000, 7987, 12109, 12113, 12093, 12107, 12100, 12100, 12110, 12110] Constrained metric 517.625

Custom Simulated Annealing - Hard Constraint

For our custom simulated annealing, we can tweak the following parameters:

  • max_no_improve = 1000

  • temp = 500

  • alpha = .99

Try tweaking these parameters to see if you can get a better result

#### Custom Simulated Annealing #### ##################### #Parameters max_no_improve = 1000 temp = 500 alpha = .99 ##################### best_x, best_f, iterations, trajectory, trajectory_best, converge = custom_simanneal(times, k, max_no_improve, temp, alpha, conproc, conmax) print('Custom Simulated Annealing best result:', best_f) print('The algorithm found a solution that met the criteria:', converged) print('Total time on each processor:', [ sum(times[best_x==j]) for j in range(k)]) c_metric = get_c_metric(best_x,times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_f,c_metric,[ sum(times[best_x==j]) for j in range(k)],'HC-Custom Annealing',best_x,conproc,conmax)
Custom Simulated Annealing best result: 39844248.9 The algorithm found a solution that met the criteria: True Total time on each processor: [7866, 7909, 12927, 13944, 11136, 11193, 12832, 12770, 10051, 12201] Constrained metric 11054036.625

Simanneal Package - Hard Constraint

The only parameter you can tweak in the simanneal package is how long you're willing to wait. Try changing that to see if you can get a better result.

  • wait_time = .2

#### Simanneal Package #### ##################### #Parameters wait_time = .2 ##################### unconstrained = [x for x in list(range(k)) if x not in conproc] assign = np.random.choice(unconstrained,size=n,replace=True) ld = loadProblemHC(assign, times, k, conproc, conmax) ld.set_schedule(ld.auto(minutes=wait_time)) ld.copy_strategy = "deepcopy" best_assign, best_score = ld.anneal() print('Simanneal Package best result', best_score) print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) c_metric = get_c_metric(best_assign,times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_score,c_metric,[ sum(times[best_assign==j]) for j in range(k)],'HC-Simanneal',best_assign,conproc,conmax)
Temperature Energy Accept Improve Elapsed Remaining 39.00000 27100358.90 28.90% 0.00% 0:00:28 -1:59:4960 Temperature Energy Accept Improve Elapsed Remaining 39.00000 27001600.90 31.71% 0.00% 0:00:12 0:00:000
Simanneal Package best result 27001554.900000006 Total time on each processor: [7998, 7995, 12102, 12102, 12111, 12094, 12114, 12112, 12092, 12109] Constrained metric 523.125

Simanneal Package - Soft Constraint - Penalty Multiplier 5

The only parameter you can tweak in the simanneal package is how long you're willing to wait. Try changing that to see if you can get a better result.

  • wait_time = .2

#return to random assignments assign = np.random.randint(low=0,high=k,size=n) #initialize the class ld = loadProblemSC(assign, times, k, conproc, conmax, 5) #penalty 5 ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results # since our state is a numpy array, we need deepcopy ld.copy_strategy = "deepcopy" #this is what kicks it off best_assign, best_score = ld.anneal() print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) print('The best score is:', best_score) c_metric = get_c_metric(best_assign,times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_score,c_metric,[ sum(times[best_assign==j]) for j in range(k)],'SC-Simanneal-5',best_assign,conproc,conmax)
Temperature Energy Accept Improve Elapsed Remaining 1.30000 215556.02 9.20% 0.00% 0:00:33 -1:59:45 Temperature Energy Accept Improve Elapsed Remaining 1.30000 215568.74 9.41% 0.00% 0:00:12 0:00:00
Total time on each processor: [8656, 8649, 11937, 11954, 11935, 11948, 11937, 11944, 11933, 11936] The best score is: 215556.01899999997 Constrained metric 1064797.125

Simanneal Package - Soft Constraint - Penalty Multiplier 50

The only parameter you can tweak in the simanneal package is how long you're willing to wait. Try changing that to see if you can get a better result.

  • wait_time = .2

#return to random assignments assign = np.random.randint(low=0,high=k,size=n) #initialize the class ld = loadProblemSC(assign, times, k, conproc, conmax, 50) #penalty 50 ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results # since our state is a numpy array, we need deepcopy ld.copy_strategy = "deepcopy" #this is what kicks it off best_assign, best_score = ld.anneal() print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) print('The best score is:', best_score) c_metric = get_c_metric(best_assign,times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_score,c_metric,[ sum(times[best_assign==j]) for j in range(k)],'SC-Simanneal-50',best_assign,conproc,conmax)
Temperature Energy Accept Improve Elapsed Remaining 0.73000 262891.15 10.90% 0.00% 0:00:38 -1:59:405 Temperature Energy Accept Improve Elapsed Remaining 0.73000 262920.43 10.29% 0.29% 0:00:12 0:00:002
Total time on each processor: [8083, 8086, 12084, 12091, 12086, 12076, 12077, 12080, 12071, 12095] The best score is: 262891.149 Constrained metric 18309.125

Simanneal Package - Soft Constraint - Penalty Multiplier 500

The only parameter you can tweak in the simanneal package is how long you're willing to wait. Try changing that to see if you can get a better result.

  • wait_time = .2

#return to random assignments assign = np.random.randint(low=0,high=k,size=n) #initialize the class ld = loadProblemSC(assign, times, k, conproc, conmax, 500) #penalty 500 ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results # since our state is a numpy array, we need deepcopy ld.copy_strategy = "deepcopy" #this is what kicks it off best_assign, best_score = ld.anneal() print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)]) print('The best score is:', best_score) c_metric = get_c_metric(best_assign,times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_score,c_metric,[ sum(times[best_assign==j]) for j in range(k)],'SC-Simanneal-500',best_assign,conproc,conmax)
Temperature Energy Accept Improve Elapsed Remaining 0.33000 269110.31 9.70% 0.00% 0:00:45 -1:59:339 Temperature Energy Accept Improve Elapsed Remaining 0.33000 269258.09 10.00% 0.00% 0:00:13 0:00:000
Total time on each processor: [8000, 8007, 12112, 12094, 12100, 12114, 12104, 12102, 12099, 12097] The best score is: 269110.30900000007 Constrained metric 400.625

DEAP Genetic Algorithm - Penalty 5

DEAP has a lot of parameters to tweak. Try tweaking some of the following to see if you can get a better result.

  • pop_size = 200

  • crossover_prob = 0.3

  • mutation_prob = 0.5

  • max_gen = 2000

  • max_no_improve = 200

(Note: we need to repeat a lot of code when we're changing the problem space with DEAP. DEAP hard-codes the k and n in our functions when we set it up, so we need to essentially start from scratch. We've included all the necessary code without comments in the cell below.)

Warning: This code will be slow to run.

#### DEAP Genetic Algorithm #### #################### #Parameters pop_size = 200 crossover_prob = 0.5 mutation_prob = 0.5 max_gen = 3000 max_no_improve = 500 ##################### ################################### # Leave everything below here alone ################################### # how we create our individuals def create_individual(k,n): current_x = np.random.randint(low=0,high=k,size=n) return current_x.tolist() #this converts our np array back to a list # objective function = total squared deviation of times from balanced times def balance_metric_tuple(assign,times,k,conproc, conmax, penalty): #make the list a numpy array assign_np = np.array(assign) ## call the balance_metric function metric = balance_metric_constrained(assign_np, times, k, conproc, conmax, penalty) return (metric, ) #note that we're returning a tuple # delete references to classes to avoid warning del creator.FitnessLoad del creator.Individual creator.create("FitnessLoad", base.Fitness, weights=(-1.0,)) creator.create("Individual", list, fitness=creator.FitnessLoad) toolbox = base.Toolbox() toolbox.register("assignments",create_individual,k,n) toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.assignments) toolbox.register("population", tools.initRepeat, list, toolbox.individual) toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=5) toolbox.register("select", tools.selTournament, tournsize=3) toolbox.register("mate", tools.cxTwoPoint) toolbox.register("mutate", tools.mutUniformInt, low = 0, up = k-1, indpb=0.1) stats = tools.Statistics(key=lambda ind: ind.fitness.values) stats.register("avg", np.mean) stats.register("std", np.std) stats.register("min", np.min) stats.register("max", np.max) # get solution best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob, max_gen, max_no_improve) print('Genetic Algorithm Best Result', best_balance) print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)]) c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-5',np.array(best_assign),conproc,conmax)
Genetic Algorithm Best Result 21557311.9 Total time on each processor: [8644, 8661, 11928, 11930, 11960, 11927, 11951, 11926, 11952, 11950] Constrained metric 1065907.125

DEAP Genetic Algorithm - Penalty 50

To update just the penalty, we can unregister and register the evaluate function.

Warning: This code will be slow to run.

toolbox.unregister("evaluate") toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=50) # get solution best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob, max_gen, max_no_improve) print('Genetic Algorithm Best Result', best_balance) print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)]) c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-50',np.array(best_assign),conproc,conmax)
Genetic Algorithm Best Result 26297626.900000006 Total time on each processor: [8090, 8089, 12083, 12103, 12064, 12056, 12077, 12072, 12101, 12094] Constrained metric 22093.625

DEAP Genetic Algorithm - Penalty 500

To update just the penalty, we can unregister and register the evaluate function.

Warning: This code will be slow to run.

toolbox.unregister("evaluate") toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=500) # get solution best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob, max_gen, max_no_improve) print('Genetic Algorithm Best Result', best_balance) print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)]) c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-500',np.array(best_assign),conproc,conmax)
Genetic Algorithm Best Result 27743348.9 Total time on each processor: [8041, 8017, 12058, 12371, 12264, 12115, 12172, 11906, 11726, 12159] Constrained metric 290788.375

DEAP Genetic Algorithm - Feasible Start - Penalty 5

We found that the GA algorithm wasn't performing as well as we'd have thought it might. So we decided to try giving it a feasible start, even though we're using a soft constraint. Algorithms can only perform as well as we program them to perform. It's okay to try multiple options!

To create a feasible start, we needed a new version of our create individual function, and we need to register our individual class and assignments.

# how we create our individuals def create_individual(k,n,conproc): unconstrained = [x for x in list(range(k)) if x not in conproc] current_x = np.random.choice(unconstrained,size=n,replace=True) return current_x.tolist() #this converts our np array back to a list del creator.Individual creator.create("Individual", list, fitness=creator.FitnessLoad) toolbox.unregister("assignments") toolbox.register("assignments",create_individual,k,n,conproc) toolbox.unregister("evaluate") toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=5) # get solution best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob, max_gen, max_no_improve) print('Genetic Algorithm Best Result', best_balance) print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)]) c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-5-FStart',np.array(best_assign),conproc,conmax)
Genetic Algorithm Best Result 21560589.9 Total time on each processor: [8652, 8671, 11930, 11941, 11943, 11976, 11970, 11903, 11923, 11920] Constrained metric 1098475.625

DEAP Genetic Algorithm - Feasible Start - Penalty 50

Once again, to just update the penalty, we only need to unregister and register in the evaluate function.

toolbox.unregister("evaluate") toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=50) # get solution best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob, max_gen, max_no_improve) print('Genetic Algorithm Best Result', best_balance) print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)]) c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-50-FStart',np.array(best_assign),conproc,conmax)
Genetic Algorithm Best Result 26439400.900000006 Total time on each processor: [8089, 8098, 12182, 12282, 12032, 11989, 12080, 12222, 11883, 11972] Constrained metric 154325.625

DEAP Genetic Algorithm - Feasible Start - Penalty 500

One last time...

toolbox.unregister("evaluate") toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=500) # get solution best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob, max_gen, max_no_improve) print('Genetic Algorithm Best Result', best_balance) print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)]) c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax) print('Constrained metric', c_metric) r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-500-FStart',np.array(best_assign),conproc,conmax)
Genetic Algorithm Best Result 27398408.900000006 Total time on each processor: [8019, 7995, 12013, 11713, 12084, 11836, 12283, 12388, 12230, 12268] Constrained metric 389229.375

Final Results

The results below are sorted by the constrained score - which identifies how close to both meeting the constraints and having the processing evenly distributed among the constrained processors. Based on these results, which of the algorithms performed "best" for you?

r_df.sort_values(by=['Constrained Score'])