GitHub Repository: DataScienceUWL/DS775
Path: blob/main/Lessons/Lesson 07 - Global Optimization 2/extras/Lesson_07_Load_Balancing_With_Constraints.ipynb
⁸⁷¹ views

Kernel: Python 3 (system-wide)

In [1]:

#imports
import numpy as np

Load Balancing With Contraints Example

Note: Make sure you've already read Part 1, in which we do this problem without constraints. We'll only be explaining the new bits here.

In this example we're going to show how you could use various approaches to solve a constrained load balancing problem.

For this problem, we're talking execution times on computer processors, with total execution time on certain processors limited to a certain amount. You might see this happen when one processor needs to "reserve" processing cycles for some other job not in our load balancing list.

We can describe it as:

given a list of $n$ execution times, divide them to be executed on $k$ processors so that the total execution time on each processor is as close to the same as possible, while $y$ constrained processors are under $x$ execution time limit.

There are two kinds of constraints we can implement with our metaheuristic algorithms:

hard constraints
soft constraints

We'll start with hard constraints.

Hard Constraint

A hard constraint is a constraint which rejects any solution that doesn't meet our specifications. We've seen these before in Pyomo when we build our lists of constraints or our bounds.

We can use hard constraints with some of our metaheuristics methods. We'll use hard constraints with greedy local search and simulated annealing.

There are two ways to implement hard constraints - directly in the loop of our hand-coded solutions, or in the move function. We'll look at both approaches with our greedy local search.

Hard Constraint - Objective Function

For a hard constraint, our objective function remains identical. (If you need a refresher on what the objective function is doing, please see the Lesson_05_Load_Balancing notebook.)

In [2]:

# original objective function = total squared deviation of times from balanced times
def balance_metric(assign,times,k):
    target = sum(times)/k
    return sum( (sum(times[assign==j])-target)**2 for j in range(k) )

Hard Constraint - Directly In the loop Move Function

When we implement our hard constraint directly in our loop for greedy local search, the move function remains the same as it was without constraints.

In [3]:

# define a move function which changes one processor assignment randomly
def reassign_one(assign,k):
    n = len(assign)
    # choose a job and a new processor assignment
    which_job = np.random.randint(0,n,1)[0]
    which_proc = np.random.randint(0,k,1)[0]
    new_assign = assign.copy()
    new_assign[which_job] = which_proc
    return new_assign

Hard Constraint - Enforced in the Move Function

But, we can put the hard constraint directly in our move function, instead. We'll implement it by first completing the move, then checking to see if the new assignments meet our constraints. If they do not, we'll return the original assignments. If they do, we'll return the new assignments.

To do this, we'll need to pass in two additional parameters:

conproc - the list of constrained processors
conmax - the list of max total processing time allowed on each constrained processor

Let's look at the function first.

In [4]:

# define a move function which changes one processor assignment randomly
def reassign_one_constrained(assign,k,conproc, conmax):
    #get the reassignment
    new_assign = reassign_one(assign, k)
    ###################
    # NEW - Evaluate if the new assignments meet our constraint
    over_max = True in [sum(times[new_assign==c]) > conmax[c] for c in conproc]
    # Only return a new assignment if it meets our constraints
    #uncomment this line to see total time on processor
    #print('Total time on each processor (inside function):', [ sum(times[new_assign==j]) for j in range(k)])
    if over_max == False:
        #print('Not over max') #uncomment this line to see if it passed
        return new_assign
    else:
        #print('Over max') #uncomment this line to see if it failed
        return assign

Let's see what the constrained move function looks like with a simple problem. We'll create some sample data, and run the reassign_one_constrained() function, constraining processor 0 to a total processing time of 10. You can uncomment the print statements in the function to see what's happening inside the function, or you can just compare what goes in with what comes out below. Note that if we start with more than 10 on constrained processor (0), we'll still end up with more than 10 on it if our reassignment doesn't "fix" it.

In [5]:

k = 3
times = np.array([2,4,6,2,4,6,2,4,6])
assign=np.array([0,0,0,1,1,1,2,2,2])

# total time on each processor ... should be the same
print('Total time on each processor (going in):', [ sum(times[assign==j]) for j in range(k)])
#reassign one, with processor 0 constrained to 10
new_assign = reassign_one_constrained(assign,k, [0], [10])
print('Processor 0 Constrained to 10:', new_assign)
print('Total time on each processor (coming out):', [ sum(times[new_assign==j]) for j in range(k)])

Out[5]:

Total time on each processor (going in): [12, 12, 12]
Processor 0 Constrained to 10: [0 0 0 1 1 1 2 2 2]
Total time on each processor (coming out): [12, 12, 12]

Greedy Local Search - Hard Constraint In the Loop

Let's see what it looks like to use the original function in a greedy local search, and enforce the hard constraint right in the loop.

We'll also track whether the algorithm ever finds a solution that meets the constraints. It's possible with a hard constraint that we never find a solution that works.

In [6]:

# local search function
def load_balance_local_loop_constraint(times, k, max_no_improve,conproc,conmax):
    n = len(times)
    # starts from a random assignment to k processors
    current_x = np.random.randint(low=0,high=k,size=n)
    current_f = balance_metric(current_x, times, k)
    best_x = current_x
    best_f = current_f
    ##########################
    # New - track convergence
    converged = False
    ##########################
    # stop search if no better x is found within max_no_improve iterations
    num_moves_no_improve = 0
    iterations = 0
    while (num_moves_no_improve < max_no_improve):
        num_moves_no_improve += 1
        iterations += 1  # just for tracking
        #this is the same as before - using our original reassign_one function
        new_x = reassign_one(current_x,k)
        ##########################################################
        # NEW - Evaluate if the new assignments meet our constraint
        over_max = True in [sum(times[new_x==c]) > conmax[c] for c in conproc] 
        ##################################
        new_f = balance_metric(new_x, times, k)
        ####################################
        #NEW - check both that the new value is less than the old and that we are within our constraints
        if new_f < current_f and over_max == False:
            #################################
            #NEW - track if we ever accept a solution
            converged = True
            #################################      
            num_moves_no_improve = 0
            current_x = new_x
            current_f = new_f
            if current_f < best_f:  
                best_x = current_x  
                best_f = current_f
    return best_x, best_f, iterations, converged

Greedy Local Search - Hard Constraint In the Move Function

Next, let's implement the hard constraint directly in our move function (reassign_one_constrained).

Again, to use that, we need our two additional parameters, so we'll update our load_balance_local function to take in 2 additional parameters:

conproc - a list of the processors to constrain
conmax - a list of the max times on each processor.

We'll again track whether the algorithm ever finds a solution that meets the constraints.

In [7]:

# local search function
def load_balance_local_function_constraint(times, k, max_no_improve,conproc,conmax):
    n = len(times)
    # starts from a random assignment to k processors
    current_x = np.random.randint(low=0,high=k,size=n)
    current_f = balance_metric(current_x, times, k)
    best_x = current_x
    best_f = current_f
    ##########################
    # New - track convergence
    converged = False
    ##########################
    # stop search if no better x is found within max_no_improve iterations
    num_moves_no_improve = 0
    iterations = 0
    while (num_moves_no_improve < max_no_improve):
        num_moves_no_improve += 1
        iterations += 1  # just for tracking
        ##################################
        # NEW - pass the extra parameters to reassign_one
        new_x = reassign_one_constrained(current_x,k,conproc,conmax)
        ##################################
        new_f = balance_metric(new_x, times, k)
        if new_f < current_f:
            #################################
            #NEW - track if we ever accept a solution
            converged = True
            #################################      
            num_moves_no_improve = 0
            current_x = new_x
            current_f = new_f
            if current_f < best_f:  
                best_x = current_x  
                best_f = current_f
    return best_x, best_f, iterations, converged

Let's run each of these with a small number of processors and a small number of job execution times. First let's generate some random data and see what the time on each processor would be if it loads were completely balanced.

In [8]:

# generate random job times
np.random.seed(666) #comment this out to play with new numbers
#we'll start with 20 execution times
n = 30
#we'll start with 2 processors
k = 3
min_time = 20
max_time = 200
times = np.random.randint(low=min_time, high = max_time, size = n)
assign = np.random.randint(low=0,high=k,size=n)
# total time on each processor
print('Total time on each processor, if completely balanced:', sum(times)/k)

Out[8]:

Total time on each processor, if completely balanced: 1220.6666666666667

Running local search with constraints

Let's start with setting processor 0 to be constrained to a max processing time of 1100. Run this code several times. How often do you get convergence?

In [9]:

#####################
# NEW: adding our 2 additional parameters to the function and one additional return variable
#####################
print('Using the Constraint Directly in the Loop')
best_assign, best_f, num_iter, converged = load_balance_local_loop_constraint(times,k,5000,[0],[1100]) 
print('The algorithm found a solution that met the criteria:', converged)
print('The best assignment is', best_assign)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The deviation from balance is', best_f)
print('It took', num_iter, 'iterations.')
print('------------------------------------------')

print('Using the Constraint in the Function')
best_assign, best_f, num_iter, converged = load_balance_local_function_constraint(times,k,5000,[0],[1100]) 
print('The algorithm found a solution that met the criteria:', converged)
print('The best assignment is', best_assign)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The deviation from balance is', best_f)
print('It took', num_iter, 'iterations.')

Out[9]:

Using the Constraint Directly in the Loop
The algorithm found a solution that met the criteria: False
The best assignment is [2 0 0 2 0 0 1 1 1 1 1 1 0 0 0 2 1 2 1 0 0 1 2 0 0 2 1 1 2 2]
Total time on each processor: [1353, 1373, 936]
The deviation from balance is 121752.66666666666
It took 5000 iterations.
------------------------------------------
Using the Constraint in the Function
The algorithm found a solution that met the criteria: True
The best assignment is [2 1 2 0 0 1 1 2 1 1 2 0 2 2 0 1 2 0 0 0 1 1 1 0 0 2 2 1 1 0]
Total time on each processor: [1077, 1306, 1279]
The deviation from balance is 31324.666666666668
It took 5022 iterations.

What's happening?

This isn't working very well, is it? Why not? Well, let's say we start with 1400 on our constrained processor and our largest processing time is 200. 1400-200 = 1200. 1200 is still larger than our constrained solution allows. In other words - there's no single move that can find a feasible solution.

How do we fix it?

There are a couple of approaches we could take. We've already seen one approach in prior lesson - using multi-start. We could wrap our load_balance_local_function_constraint or our load_balance_local_loop_constraint in a multi-start function and cross our fingers and hope that at least one of our starts randomly starts with a feasible solution. With some algorithms, that's our only choice to avoid getting stuck in a local optimum.

But for our problem, we can use another approach. We can start from a random, but feasible solution. We can do this in 2 ways - we can put all of our processes on a single processor and let the algorithm "spread them out" or we can randomly distribute our processes across the unconstrained processors.

We'll actually write code that handles both, because with only 3 processors, if we constrain 2, we only have a single processor to start with. But, for larger problems, we'll randomly distribute across the unconstrained processors.

We'll need a new version of our load_balance_local function.

In [10]:

# local search function
def load_balance_local_feasible(times, k, max_no_improve,conproc,conmax):
    n = len(times)
    ###############################################
    #NEW - get a random sample of size n using choices from the unconstrained processors
    #this will assign all to one processor if only one is unconstrained
    unconstrained = [x for x in list(range(k)) if x not in conproc]
    current_x = np.random.choice(unconstrained,size=n,replace=True)
    ###############################################
    current_f = balance_metric(current_x, times, k)
    best_x = current_x
    best_f = current_f
    ##########################
    # New - track convergence
    converged = False
    ##########################
    # stop search if no better x is found within max_no_improve iterations
    num_moves_no_improve = 0
    iterations = 0
    while (num_moves_no_improve < max_no_improve):
        num_moves_no_improve += 1
        iterations += 1  # just for tracking
        ##################################
        # NEW - pass the extra parameters to reassign_one
        new_x = reassign_one_constrained(current_x,k,conproc,conmax)
        ##################################
        new_f = balance_metric(new_x, times, k)
        if new_f < current_f:
            #################################
            #NEW - track if we ever accept a solution
            converged = True
            #################################      
            num_moves_no_improve = 0
            current_x = new_x
            current_f = new_f
            if current_f < best_f:  
                best_x = current_x  
                best_f = current_f
    return best_x, best_f, iterations, converged

Trying again

Now let's try our code and see if we get feasible solutions. Run this multiple times.

In [11]:

best_assign, best_f, num_iter, converged = load_balance_local_feasible(times,k,5000,[0],[1100]) 
print('The algorithm found a solution that met the criteria:', converged)
print('The best assignment is', best_assign)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The deviation from balance is', best_f)
print('It took', num_iter, 'iterations.')

Out[11]:

The algorithm found a solution that met the criteria: True
The best assignment is [0 2 1 1 2 0 2 1 0 0 0 1 1 1 1 0 2 2 0 1 0 2 2 2 0 1 1 2 1 0]
Total time on each processor: [1093, 1297, 1272]
The deviation from balance is 24760.666666666664
It took 5292 iterations.

So much better! We consisently find a feasible solution.

What if we wanted to constrain 2 of our processors? Easy! We just add to our conproc and conmax lists. This time, let's constrain processor 0 to a max time of 1200 and processor 1 to a max time of 1100. Again, run this code multiple times and see how often the algorithm converges.

In [12]:

best_assign, best_f, num_iter, converged = load_balance_local_feasible(times,k,5000,[0,1],[1200,1100]) #adding our 2 additional parameters here
print('The algorithm found a solution that met the criteria:', converged)
print('The best assignment is', best_assign)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The deviation from balance is', best_f)
print('It took', num_iter, 'iterations.')

Out[12]:

The algorithm found a solution that met the criteria: True
The best assignment is [2 2 1 2 0 1 2 1 2 1 2 1 1 2 0 1 0 0 0 0 2 2 1 0 2 2 1 0 1 0]
Total time on each processor: [1178, 1077, 1407]
The deviation from balance is 57180.666666666664
It took 5093 iterations.

Simulated Annealing - By Hand - Hard Constraints

We can take the same hard constraint approach with our hand-coded simulated annealing problem. Once again, we'll add 2 parameters to our custom_simanneal function:

conproc - a list of the processors to constrain
conmax - a list of the max times on each processor.

We'll start with a feasible solution (no processes on the constrained processor).

And once again we'll pass back a convergence variable to let us know if we ever found a solution that matched our constraints.

Personally, I think it's neater and cleaner to use the reassign_one_constrained function. So we'll use that one for our moves.

We'll use the same set of jobs from the previous example so you can compare.

In [13]:


def custom_simanneal(times, k, max_no_improve, temp, alpha, conproc, conmax):
    #get the length of our jobs
    n = len(times)
    # starts from a random assignment to the unconstrained processors
    unconstrained = [x for x in list(range(k)) if x not in conproc]
    current_x = np.random.choice(unconstrained,size=n,replace=True)
    current_f = balance_metric(current_x, times, k)
    best_x = current_x
    best_f = current_f
    
    #this is just for tracking
    iterations = 1
    trajectory = [[iterations,current_f]]
    trajectory_best = [[iterations,best_f]]
    ##########################
    # New - track convergence
    converged = False
    ##########################

    # stop search if no better x is found within max_no_improve iterations
    num_moves_no_improve = 0
    while (num_moves_no_improve < max_no_improve):
        num_moves_no_improve += 1
        iterations += 1  # just for tracking
        ###################################
        #NEW - add the 2 extra parameters
        new_x = reassign_one_constrained(current_x,k, conproc, conmax)
        ###################################
        new_f = balance_metric(new_x, times, k)
      
        #determine the change in score
        delta = new_f - current_f
        #determine the probability of accepting this solution
        prob = np.exp(min(delta, 0) / temp)
        
        #determine if we'll accept this solution
        accept = new_f < current_f or np.random.uniform() < prob          
        if accept:   
            current_x = new_x
            current_f = new_f
            if current_f < best_f:  
                #################
                #New - track if we ever got a better solution than the first
                converged = True
                #################
                best_x = current_x  
                best_f = current_f
                num_moves_no_improve = 0
        temp *= alpha
        iterations += 1
        trajectory.append([iterations,current_f])
        trajectory_best.append([iterations,best_f])        
    return best_x, best_f, iterations, trajectory, trajectory_best,converged ####NEW: Return extra variable
    

#######
# New - add the 2 extra parameters
best_x, best_f, iterations, trajectory, trajectory_best, converged = custom_simanneal(times, k, 1000, 500, .99, [0],[1100])

print('The algorithm found a solution that met the criteria:', converged)
print('The best assignment is', best_f)
print('Total time on each processor:', [ sum(times[best_x==j]) for j in range(k)])
print('The deviation from balance is', best_f)

Out[13]:

The algorithm found a solution that met the criteria: True
The best assignment is 22858.666666666668
Total time on each processor: [1098, 1294, 1270]
The deviation from balance is 22858.666666666668

The simanneal Package - Hard Constraints - No External Functions

We can also use a hard constraint with the simanneal package. Let's see what it looks like if don't use external functions. To do this, you add your code within the package's move and energy functions. To use a hard constraint in simanneal, you'd enforce the constraint in the move function, just like we did with our hand-coding.

We again need our two extra variables:

conproc - a list of the processors to constrain
conmax - a list of the max times on each processor.

But this time we'll pass them into the initialization function of the simanneal package.

To start with a feasible solution, we'll need to generate it outside the package and pass in the initial assignment.

Let's see what that looks like.

In [14]:

#this line just imports the package
from simanneal import Annealer

#this is the line where we decide what we're calling this problem
class loadProblemHC(Annealer):

    # Here's where we pass extra data if we need it. We need to pass our times (jobs) variable and the number of servers (k)
    ##############################
    #NEW - add 2 extra parameters
    def __init__(self, state, times, k, conproc, conmax):
        ###############################
        #this line makes the times accessible within the other two functions
        self.times = times
        self.k = k
        ###########################
        # New Set up 2 new variables
        self.conproc = conproc
        self.conmax = conmax
        ###########################
        #this is how we initialize - note we're calling super with the same name as above (loadProblem)
        super(loadProblemHC, self).__init__(state)  # important!

    def move(self):
        """This corresponds to our previous reassign one function"""
        # pick one of the jobs and assign it to one of k processors
        
        #############################
        #NEW - We have to COPY the state
        assign = self.state.copy()
        n = len(assign)
        k = self.k
        # choose a job and a new processor assignment
        which_job = np.random.randint(0,n,1)[0]
        which_proc = np.random.randint(0,k,1)[0]
        assign[which_job] = which_proc
        
        #################################################
        # NEW - hard constraint enforcement
        over_max = True in [sum(self.times[assign==c]) > self.conmax[c] for c in self.conproc]
        # Only update the state if it meets our requirements
        if over_max == False:
            #we only update the state if we've met our constraints
            self.state = assign
        
    
    def energy(self):
        """This corresponds to our balance_metric function"""
        times = self.times
        assign = self.state
        k = self.k
        target = sum(times)/k
        return sum( (sum(times[assign==j])-target)**2 for j in range(k) )


    
    
################################
#NEW - generate an initial feasible assignment
################################
unconstrained = [x for x in list(range(k)) if x not in [0]]
assign = np.random.choice(unconstrained,size=n,replace=True)
#initialize the class
ld = loadProblemHC(assign, times, k, [0], [1100])
ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results

# since our state is a numpy array, we need deepcopy
ld.copy_strategy = "deepcopy" 
#this is what kicks it off
best_assign, best_score = ld.anneal()



print('The best set is: ', best_assign)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The best score is:', best_score)

Out[14]:

 Temperature        Energy    Accept   Improve     Elapsed   Remaining
   730.00000      22620.67    55.30%     0.00%     0:00:02     0:00:001 Temperature        Energy    Accept   Improve     Elapsed   Remaining
   730.00000      27402.67    56.20%     0.06%     0:00:14     0:00:003

The best set is:  [0 2 2 1 2 2 0 2 1 1 1 2 1 0 0 1 1 1 2 1 0 1 2 2 0 0 2 1 0 0]
Total time on each processor: [1100, 1281, 1281]
The best score is: 21840.666666666668

The simanneal Package - Hard Constraints - External Functions

The simanneal package will also work with external functions. Remember, we've already written nice, self-contained functions that correspond to simanneal's functions.

move = reassign_one_constrained
energy = balance_metric

We can just plop those external functions in to simanneal's functions and run our code.

In [15]:

#this line just imports the package
from simanneal import Annealer

#this is the line where we decide what we're calling this problem
class loadProblemHCWithFunctions(Annealer):

    # Here's where we pass extra data if we need it. We need to pass our times (jobs) variable and the number of servers (k)
    ##############################
    #NEW - add 2 extra parameters
    def __init__(self, state, times, k, conproc, conmax):
        ###############################
        #this line makes the times accessible within the other two functions
        self.times = times
        self.k = k
        ###########################
        # New Set up 2 new variables
        self.conproc = conproc
        self.conmax = conmax
        ###########################
        #this is how we initialize - note we're calling super with the same name as above (loadProblem)
        super(loadProblemHCWithFunctions, self).__init__(state)  # important!

    def move(self):
        """Can directly call our reassign one constrained function"""
        #since we're copying in our reassign_one_constrained function, we can pass the self.state directly
        self.state = reassign_one_constrained(self.state,self.k, self.conproc, self.conmax)
        
    
    def energy(self):
        """Can directly call our balance_metric function"""        
        return balance_metric(self.state, self.times, self.k)


    
    
################################
#AGAIN - generate an initial feasible assignment
################################
unconstrained = [x for x in list(range(k)) if x not in [0]]
assign = np.random.choice(unconstrained,size=n,replace=True)
#initialize the class
ld2 = loadProblemHCWithFunctions(assign, times, k, [0], [1100])
ld2.set_schedule(ld2.auto(minutes=.2)) #set approximate time to find results

# since our state is a numpy array, we need deepcopy
ld2.copy_strategy = "deepcopy" 
#this is what kicks it off
best_assign, best_score = ld2.anneal()



print('The best set is: ', best_assign)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The best score is:', best_score)

Out[15]:

 Temperature        Energy    Accept   Improve     Elapsed   Remaining
   490.00000      27220.67    55.95%     0.00%     0:00:02     0:00:001 Temperature        Energy    Accept   Improve     Elapsed   Remaining
   490.00000      22082.67    55.09%     0.14%     0:00:13     0:00:001

The best set is:  [0 0 1 2 1 1 1 2 1 0 0 0 1 2 0 1 2 0 1 1 2 2 1 1 0 2 2 0 2 2]
Total time on each processor: [1100, 1281, 1281]
The best score is: 21840.666666666668

Note: Simanneal evaluates the problem space before running. If your constraint is set too low, simanneal will print the first pink line, and then just hang. If you're playing with this and it gets stuck, you'll need to restart your kernel and loosen your constraints.

Soft-Constraints

As we've seen, sometimes with hard constraints you can fail to get a feasible solution, and we can't even get closer to a feasible solution because we're rejecting any option that doesn't meet the constraint. Soft constraints fix that problem. We won't always get a solution that meets the constraint, but the algorithm will at least have a chance to get closer to an optimal solution. The Big M approach is a type of soft constraint.

In the metaheuristic algorithms we're exploring, soft constraints are implemented in the objective function. Instead of rejecting a solution outright, a penalty is incorporated. For a minimization problem, a positive number is added when the constraint isn't met. For a maximization problem, a negative number is added.

In our code, we're adding a multiplier to the penalty. The larger the multipler, the "harder" the soft constraint. We'll add the penalty as a parameter so we can adjust the penalty on the fly.

Let's look at what this would look like with a hand-solved problem.

We'll keep our original objective function (balanced_metric), but we'll add a new wrapper function (balanced_metric_constrained). This one will take in 3 additional parameters:

a list of constrained processors
a list of the max times on each processor
a penalty multiplier (integer)

In [16]:


# constrained objective function = total squared deviation of times from balanced times, providing a penalty for constraints
def balance_metric_constrained(assign,times,k,conproc,conmax, penalty_multiplier):
    #get the unconstrained balance metric
    dev_uncon = balance_metric(assign,times,k)
    #sum the constrained processors and apply our penalty
    dev_penalty = penalty_multiplier * sum( max(sum(times[assign==c])-conmax[c],0)**2 for c in conproc )
    return (dev_uncon + dev_penalty)

Testing the Soft Constraint

We'll test our two functions with some hand-coded assignments. We'll use 9 jobs on 3 processors. First we'll look at them as an unconstrained, perfectly balanced problem.

In [17]:

#testing perfectly balanced unconstrained
k = 3
times = np.array([2,4,6,2,4,6,2,4,6])
assign=np.array([0,0,0,1,1,1,2,2,2])

# total time on each processor ... should be the same
print('Total time on each processor:', [ sum(times[assign==j]) for j in range(k)])
#print the original balance metric
print('Unconstrained Balance Metric:', balance_metric(assign,times,k))

Out[17]:

Total time on each processor: [12, 12, 12]
Unconstrained Balance Metric: 0.0

Now we'll add some constraints. Note that neither our times nor assignments are changing. But, we're essentially changing the target for some of our processors. We're going to set processor 0 to a max limit of 10 and a penalty of 5.

In [18]:

# total time on each processor has not changed
print('Total time on each processor (has not changed):', [ sum(times[assign==j]) for j in range(k)])
#Constrain processor 1 to 10
print('Constrained Balance Metric:', balance_metric_constrained(assign,times,k,[0],[10], 5))

Out[18]:

Total time on each processor (has not changed): [12, 12, 12]
Constrained Balance Metric: 20.0

What happens if we increase the penalty to 10?

In [19]:

# total time on each processor has not changed
print('Total time on each processor (has not changed):', [ sum(times[assign==j]) for j in range(k)])
#Constrain processor 1 to 10, with a penalty of 10
print('Constrained Balance Metric:', balance_metric_constrained(assign,times,k,[0],[10], 10))

Out[19]:

Total time on each processor (has not changed): [12, 12, 12]
Constrained Balance Metric: 40.0

With the constraint in place, what was a completely balanced solution no longer looks so great. What would happen if we switch our assignments around some?

In [20]:

#new assignments
assign=np.array([1,0,0,1,1,1,2,2,2])
print('Total time on each processor (has changed):', [ sum(times[assign==j]) for j in range(k)])

#check the unconstrained balance metric
print('Balance Metric without constraints', balance_metric(assign,times,k))
#check the constrained balance metric
print('Constrained Balance Metric:', balance_metric_constrained(assign,times,k,[0],[10],5))

Out[20]:

Total time on each processor (has changed): [10, 14, 12]
Balance Metric without constraints 8.0
Constrained Balance Metric: 8.0

Here, we've met our constraint, so our unconstrained and constrained balance metrics match.

The simanneal Package - Soft Constraint

With simanneal, instead of adding our hard constraint to the move() function, we'd add our soft constraint to the energy() function. We'll need to pass in the penalty multiplier at initialization.

In [21]:

#this line just imports the package
from simanneal import Annealer

#this is the line where we decide what we're calling this problem
class loadProblemSC(Annealer):

    # Here's where we pass extra data if we need it. We need to pass our times (jobs) variable and the number of servers (k)
    ##############################
    #NEW - add 2 extra parameters
    def __init__(self, state, times, k, conproc, conmax, penalty_multiplier):
        ###############################
        #this line makes the times accessible within the other two functions
        self.times = times
        self.k = k
        ###########################
        # New Set up 3 new variables
        self.conproc = conproc
        self.conmax = conmax
        self.penalty_multiplier = penalty_multiplier
        ###########################
        #this is how we initialize - note we're calling super with the same name as above (loadProblem)
        super(loadProblemSC, self).__init__(state)  # important!

    def move(self):
        """This corresponds to our previous reassign one function"""
        # pick one of the jobs and assign it to one of k processors
        ##################################
        #NEW - back to just changing the state directly
        assign = self.state
        n = len(assign)
        k = self.k
        # choose a job and a new processor assignment
        which_job = np.random.randint(0,n,1)[0]
        which_proc = np.random.randint(0,k,1)[0]
        assign[which_job] = which_proc

        
    
    def energy(self):
        """This corresponds to our balance_metric function"""
        times = self.times
        assign = self.state
        k = self.k
        conproc = self.conproc
        conmax = self.conmax
        ############################################
        #NEW - determing the energy and assign a penalty
        target = sum(times)/k
        #sum the total processor deviation
        dev_uncon = sum( (sum(times[assign==j])-target)**2 for j in range(k) )
        #sum the constrained processor deviation and apply penalty
        dev_penalty = self.penalty_multiplier * sum( max(sum(times[assign==c])-conmax[c],0)**2 for c in conproc )

        return (dev_uncon + dev_penalty)/100

Let's generate some more test data and run our soft constraint version of simanneal.

In [22]:

# generate random job times
np.random.seed(666) #comment this out to play with new numbers
#we'll start with 20 execution times
n = 30
#we'll start with 2 processors
k = 3
min_time = 20
max_time = 200
times = np.random.randint(low=min_time, high = max_time, size = n)
assign = np.random.randint(low=0,high=k,size=n)
# total time on each processor
print('Total time on each processor, if completely balanced:', sum(times)/k)

Out[22]:

Total time on each processor, if completely balanced: 1220.6666666666667

With completely balanced loads, we'd have 1220-ish on each processor. We'll set processor 0's constraint to 1100. Run this code several times. Change up the penalty multiplier. How often do you meet the constraint? How balanced does the workload seem compared to our hard constraint version?

In [23]:

#initialize the class
ld = loadProblemSC(assign, times, k, [0], [1100], 5) #penalty = 5
ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results

# since our state is a numpy array, we need deepcopy
ld.copy_strategy = "deepcopy" 
#this is what kicks it off
best_assign, best_score = ld.anneal()



#print('The best set is: ', best_assign)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The best score is:', best_score)

Out[23]:

 Temperature        Energy    Accept   Improve     Elapsed   Remaining
     3.00000        181.21    33.80%     0.00%     0:00:02     0:00:00 Temperature        Energy    Accept   Improve     Elapsed   Remaining
     3.00000        168.09    33.42%     0.00%     0:00:13     0:00:00

Total time on each processor: [1128, 1267, 1267]
The best score is: 168.00666666666663

Genetic Algorithm with DEAP - Soft Constraints

Again with DEAP we'll do a soft constraint in our energy function. This requires a few small, but important changes. First, the things that stay the same.

Our create_individual() function
Our custom_ga() function

Neither of these change at all and we can just copy/paste the code from the load balance without constraints example.

In [24]:

# No changes to this function
def create_individual(k,n):
    current_x = np.random.randint(low=0,high=k,size=n)
    return current_x.tolist() #this converts our np array back to a list


# no changes here, call this to execute the genetic algorithm
def customGA(in_toolbox,in_tools,in_stats,pop_size, cx_prob, mut_prob, max_gen, max_no_improve):

    pop = in_toolbox.population(n=pop_size)
    logbook = in_tools.Logbook()
    hof = in_tools.HallOfFame(1)

    # Evaluate the entire population
    fitnesses = list(map(in_toolbox.evaluate, pop))
    for ind, fit in zip(pop, fitnesses):
        ind.fitness.values = fit

    hof.update(pop)
    best_val = hof[0].fitness.values
    num_no_improve = 0
    generation = 0

    while num_no_improve < max_no_improve and generation < max_gen:

        # Select the next generation individuals
        selected = in_toolbox.select(pop, len(pop))
        # Clone the selected individuals
        offspring = list(map(in_toolbox.clone, selected))

        # Apply crossover and mutation on the offspring
        for child1, child2 in zip(offspring[::2], offspring[1::2]):
            if random.random() < cx_prob:
                in_toolbox.mate(child1, child2)
                del child1.fitness.values
                del child2.fitness.values

        for mutant in offspring:
            if random.random() < mut_prob:
                in_toolbox.mutate(mutant)
                del mutant.fitness.values

        # Evaluate the individuals with an invalid fitness
        invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
        fitnesses = map(in_toolbox.evaluate, invalid_ind)
        num_evals = 0
        for ind, fit in zip(invalid_ind, fitnesses):
            num_evals += 1
            ind.fitness.values = fit

        # The population is entirely replaced by the offspring
        pop[:] = offspring
        
        # track the best value and reset counter if there is a change
        hof.update(pop)
        curr_best_val = hof[0].fitness.values[0]
        num_no_improve += 1
        if curr_best_val != best_val:
            best_val = curr_best_val
            num_no_improve = 0

        # record stats
        record = in_stats.compile(pop)
        logbook.record(gen=generation, evals=num_evals, **record)

        # increment generation
        generation += 1

    best_x = list(hof[0])

    return best_val, best_x, logbook

Our balance_metric_tuple function does need to be updated. We need to take in the 3 additional parameters (do you have these down yet?):

conproc - a list of constrained processors
conmax - a list of the max times on each processor
penalty_multiplier - integer to control how "hard" the penalty is

In [25]:

# objective function = total squared deviation of times from balanced times
def balance_metric_tuple(assign,times,k,conproc, conmax, penalty):
    #make the list a numpy array
    assign_np = np.array(assign)
    ## call the balance_metric function
    metric = balance_metric_constrained(assign_np, times, k, conproc, conmax, penalty)
    return (metric, ) #note that we're returning a tuple

#let's test this function
balance_metric_tuple(assign,times,k, [0], [10], 5)

Out[25]:

(3058021.6666666665,)

The only other thing we'll need to change is how we set up our evaluate function. Most of this code is identical to what you've seen before. Note the one changed line

In [26]:

import random
from deap import base
from deap import creator
from deap import tools
from functools import partial

creator.create("FitnessLoad", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessLoad)
toolbox = base.Toolbox()
toolbox.register("assignments",create_individual,k,n)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.assignments)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
###############################
#NEW - this line needs additional parameters
###############################
toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=[0], conmax=[10], penalty=5)
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("mate", tools.cxTwoPoint) 
toolbox.register("mutate", tools.mutUniformInt, low = 0, up = k-1, indpb=0.1)
stats = tools.Statistics(key=lambda ind: ind.fitness.values)
stats.register("avg", np.mean)
stats.register("std", np.std)
stats.register("min", np.min)
stats.register("max", np.max)

# define search parameters
pop_size = 200
crossover_prob = 0.3
mutation_prob = 0.5
max_gen = 2000
max_no_improve = 200

# get solution
best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob,
                                     max_gen, max_no_improve)

print('Genetic Algorithm Best Result', best_balance)
print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)])

Out[26]:

Genetic Algorithm Best Result 1691209.6666666665
Total time on each processor: [289, 1687, 1686]

Increasing Problem Size

We used a very small problem to demonstrate each of the methods above. Now it's time to create a much larger problem and see how our algorithms perform.

We'll also set conproc and conmax to constrain 2 of our 10 processes.

In [27]:

####################################
# Setting up a new bigger problem
####################################
n = 1000
k = 10
#we're going to set some min/max times here for the jobs
min_time = 20
max_time = 200
#randomly generate some jobs
times = np.random.randint(low=min_time, high = max_time, size = n, dtype='int64')

# total time on each processor
print('Total time on each processor, if completely balanced:', sum(times)/k)

conproc = [0,1]
conmax=[8000,8000]

Out[27]:

Total time on each processor, if completely balanced: 11282.9

Let's create a pandas dataframe for storing our results. We'll update this after each algorithm and at the end we can sort and compare to see which algorithm performed the best.

In [28]:

#set up a dataframe for storing results 
import pandas as pd
r_df = pd.DataFrame({'Score': [None,None,None,None,None,None,None,None,None,None,None,None,None],
                     'Constrained Score': [None,None,None,None,None,None,None,None,None,None,None,None,None],
                      'Time on Processor': [None,None,None,None,None,None,None,None,None,None,None,None,None],
                    'Over Constraints': [None,None,None,None,None,None,None,None,None,None,None,None,None]}, 
                     index=['Baseline', 'HC-Local Search', 'HC-Custom Annealing',
                            'HC-Simanneal','SC-Simanneal-5', 'SC-Simanneal-50', 'SC-Simanneal-500',
                           'SC-GA-5', 'SC-GA-50', 'SC-GA-500', 'SC-GA-5-FStart', 'SC-GA-50-FStart', 'SC-GA-500-FStart'])


pd.set_option('display.float_format', '{:.2f}'.format)
pd.set_option('max_colwidth', 200)
#create a function for updating the grid
def updateResults(df,score,c_score,time,approach,assign,conproc,conmax):
    over_max = True in [sum(times[assign==c]) > conmax[c] for c in conproc]
    df.loc[approach, 'Score'] = score
    df.loc[approach, 'Constrained Score'] = c_score
    df.loc[approach, 'Time on Processor'] = time
    df.loc[approach, 'Over Constraints'] = over_max
    return(df)
r_df

Out[28]:

Baseline

Let's see what our baseline deviation from balanced loads is with a size this large. (Note, there's randomness here and some algorithms set their own baseline. But this should give us a general idea.)

We'll also create a new metric to get an apples to apples comparision of the scores, using the max constraints as a target and evenly distributed processes on unconstrained processors.

In [29]:

# constrained objective function = total squared deviation of times from balanced times, accounting for constraints
def get_c_metric(assign,times,k,conproc,conmax):
    #get the total time
    total = sum(times)
    #get the total that should be balanced
    balance_target = ((total-sum(conmax))/(k-len(conproc)))
    #get the unconstrained processors
    uncon = np.delete(np.array(range(k)), np.array(conproc))
    #sum the unconstrained processor deviation
    dev_uncon = sum( (sum(times[assign==j])-balance_target)**2 for j in uncon )
    #sum the absolute difference on constrained processors
    dev_cons = sum( (sum(times[assign==j])-conmax[j])**2 for j in conproc )
    return dev_uncon + dev_cons


#get the baseline
assign = np.random.randint(low=0,high=k,size=n)
time_on_proc = [ sum(times[assign==j]) for j in range(k)]
baseline = balance_metric(assign,times,k)
print('Baseline with random assignments:', baseline)
c_metric = get_c_metric(assign,times,k,conproc,conmax)
print('Baseline constrained metric', c_metric)

r_df = updateResults(r_df,baseline,c_metric,time_on_proc,'Baseline', assign, conproc, conmax)

Out[29]:

Baseline with random assignments: 5143156.899999999
Baseline constrained metric 39721121.875

Greedy local search (hard constraint)

The only parameter we can fiddle with in our greedy local search is how many iterations we're willing to go with no improvement. Try changing the 5000 number to see if it gets better results

max_no_improve = 5000

In [30]:

#### Greedy Local Search #####
#####################
#Parameters
max_no_improve = 5000
#####################
best_assign, best_f, num_iter, converge = load_balance_local_feasible(times,k,max_no_improve,conproc,conmax)
print('Greedy Local Search best result:', best_f)
print('The algorithm found a solution that met the criteria:', converged)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
c_metric = get_c_metric(best_assign,times,k,conproc,conmax)
print('Constrained metric', c_metric)

r_df = updateResults(r_df,best_f,c_metric,[ sum(times[best_assign==j]) for j in range(k)],'HC-Local Search',best_assign,conproc,conmax)

Out[30]:

Greedy Local Search best result: 27050792.900000002
The algorithm found a solution that met the criteria: True
Total time on each processor: [8000, 7987, 12109, 12113, 12093, 12107, 12100, 12100, 12110, 12110]
Constrained metric 517.625

Custom Simulated Annealing - Hard Constraint

For our custom simulated annealing, we can tweak the following parameters:

max_no_improve = 1000
temp = 500
alpha = .99

Try tweaking these parameters to see if you can get a better result

In [31]:

#### Custom Simulated Annealing ####
#####################
#Parameters
max_no_improve = 1000
temp = 500 
alpha = .99
#####################


best_x, best_f, iterations, trajectory, trajectory_best, converge = custom_simanneal(times, k, max_no_improve, temp, alpha, conproc, conmax)
print('Custom Simulated Annealing best result:', best_f)
print('The algorithm found a solution that met the criteria:', converged)
print('Total time on each processor:', [ sum(times[best_x==j]) for j in range(k)])
c_metric = get_c_metric(best_x,times,k,conproc,conmax)
print('Constrained metric', c_metric)
r_df = updateResults(r_df,best_f,c_metric,[ sum(times[best_x==j]) for j in range(k)],'HC-Custom Annealing',best_x,conproc,conmax)

Out[31]:

Custom Simulated Annealing best result: 39844248.9
The algorithm found a solution that met the criteria: True
Total time on each processor: [7866, 7909, 12927, 13944, 11136, 11193, 12832, 12770, 10051, 12201]
Constrained metric 11054036.625

Simanneal Package - Hard Constraint

The only parameter you can tweak in the simanneal package is how long you're willing to wait. Try changing that to see if you can get a better result.

wait_time = .2

In [32]:

#### Simanneal Package ####
#####################
#Parameters
wait_time = .2
#####################

unconstrained = [x for x in list(range(k)) if x not in conproc]
assign = np.random.choice(unconstrained,size=n,replace=True)
ld = loadProblemHC(assign, times, k, conproc, conmax)
ld.set_schedule(ld.auto(minutes=wait_time)) 
ld.copy_strategy = "deepcopy" 
best_assign, best_score = ld.anneal()
print('Simanneal Package best result', best_score)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
c_metric = get_c_metric(best_assign,times,k,conproc,conmax)
print('Constrained metric', c_metric)
r_df = updateResults(r_df,best_score,c_metric,[ sum(times[best_assign==j]) for j in range(k)],'HC-Simanneal',best_assign,conproc,conmax)

Out[32]:

 Temperature        Energy    Accept   Improve     Elapsed   Remaining
    39.00000   27100358.90    28.90%     0.00%     0:00:28    -1:59:4960 Temperature        Energy    Accept   Improve     Elapsed   Remaining
    39.00000   27001600.90    31.71%     0.00%     0:00:12     0:00:000 

Simanneal Package best result 27001554.900000006
Total time on each processor: [7998, 7995, 12102, 12102, 12111, 12094, 12114, 12112, 12092, 12109]
Constrained metric 523.125

Simanneal Package - Soft Constraint - Penalty Multiplier 5

The only parameter you can tweak in the simanneal package is how long you're willing to wait. Try changing that to see if you can get a better result.

wait_time = .2

In [33]:

#return to random assignments
assign = np.random.randint(low=0,high=k,size=n)
#initialize the class
ld = loadProblemSC(assign, times, k, conproc, conmax, 5) #penalty 5
ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results

# since our state is a numpy array, we need deepcopy
ld.copy_strategy = "deepcopy" 
#this is what kicks it off
best_assign, best_score = ld.anneal()

print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The best score is:', best_score) 
c_metric = get_c_metric(best_assign,times,k,conproc,conmax)
print('Constrained metric', c_metric)
r_df = updateResults(r_df,best_score,c_metric,[ sum(times[best_assign==j]) for j in range(k)],'SC-Simanneal-5',best_assign,conproc,conmax)

Out[33]:

 Temperature        Energy    Accept   Improve     Elapsed   Remaining
     1.30000     215556.02     9.20%     0.00%     0:00:33    -1:59:45 Temperature        Energy    Accept   Improve     Elapsed   Remaining
     1.30000     215568.74     9.41%     0.00%     0:00:12     0:00:00

Total time on each processor: [8656, 8649, 11937, 11954, 11935, 11948, 11937, 11944, 11933, 11936]
The best score is: 215556.01899999997
Constrained metric 1064797.125

Simanneal Package - Soft Constraint - Penalty Multiplier 50

The only parameter you can tweak in the simanneal package is how long you're willing to wait. Try changing that to see if you can get a better result.

wait_time = .2

In [34]:

#return to random assignments
assign = np.random.randint(low=0,high=k,size=n)
#initialize the class
ld = loadProblemSC(assign, times, k, conproc, conmax, 50) #penalty 50
ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results

# since our state is a numpy array, we need deepcopy
ld.copy_strategy = "deepcopy" 
#this is what kicks it off
best_assign, best_score = ld.anneal()

print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The best score is:', best_score) 
c_metric = get_c_metric(best_assign,times,k,conproc,conmax)
print('Constrained metric', c_metric)
r_df = updateResults(r_df,best_score,c_metric,[ sum(times[best_assign==j]) for j in range(k)],'SC-Simanneal-50',best_assign,conproc,conmax)

Out[34]:

 Temperature        Energy    Accept   Improve     Elapsed   Remaining
     0.73000     262891.15    10.90%     0.00%     0:00:38    -1:59:405 Temperature        Energy    Accept   Improve     Elapsed   Remaining
     0.73000     262920.43    10.29%     0.29%     0:00:12     0:00:002

Total time on each processor: [8083, 8086, 12084, 12091, 12086, 12076, 12077, 12080, 12071, 12095]
The best score is: 262891.149
Constrained metric 18309.125

Simanneal Package - Soft Constraint - Penalty Multiplier 500

The only parameter you can tweak in the simanneal package is how long you're willing to wait. Try changing that to see if you can get a better result.

wait_time = .2

In [35]:

#return to random assignments
assign = np.random.randint(low=0,high=k,size=n)
#initialize the class
ld = loadProblemSC(assign, times, k, conproc, conmax, 500) #penalty 500
ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results

# since our state is a numpy array, we need deepcopy
ld.copy_strategy = "deepcopy" 
#this is what kicks it off
best_assign, best_score = ld.anneal()

print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The best score is:', best_score) 
c_metric = get_c_metric(best_assign,times,k,conproc,conmax)
print('Constrained metric', c_metric)
r_df = updateResults(r_df,best_score,c_metric,[ sum(times[best_assign==j]) for j in range(k)],'SC-Simanneal-500',best_assign,conproc,conmax)

Out[35]:

 Temperature        Energy    Accept   Improve     Elapsed   Remaining
     0.33000     269110.31     9.70%     0.00%     0:00:45    -1:59:339 Temperature        Energy    Accept   Improve     Elapsed   Remaining
     0.33000     269258.09    10.00%     0.00%     0:00:13     0:00:000

Total time on each processor: [8000, 8007, 12112, 12094, 12100, 12114, 12104, 12102, 12099, 12097]
The best score is: 269110.30900000007
Constrained metric 400.625

DEAP Genetic Algorithm - Penalty 5

DEAP has a lot of parameters to tweak. Try tweaking some of the following to see if you can get a better result.

pop_size = 200
crossover_prob = 0.3
mutation_prob = 0.5
max_gen = 2000
max_no_improve = 200

(Note: we need to repeat a lot of code when we're changing the problem space with DEAP. DEAP hard-codes the k and n in our functions when we set it up, so we need to essentially start from scratch. We've included all the necessary code without comments in the cell below.)

Warning: This code will be slow to run.

In [36]:

#### DEAP Genetic Algorithm ####
####################
#Parameters
pop_size = 200
crossover_prob = 0.5
mutation_prob = 0.5
max_gen = 3000
max_no_improve = 500
#####################


###################################
# Leave everything below here alone
###################################

# how we create our individuals
def create_individual(k,n):
    current_x = np.random.randint(low=0,high=k,size=n)
    return current_x.tolist() #this converts our np array back to a list

# objective function = total squared deviation of times from balanced times
def balance_metric_tuple(assign,times,k,conproc, conmax, penalty):
    #make the list a numpy array
    assign_np = np.array(assign)
    ## call the balance_metric function
    metric = balance_metric_constrained(assign_np, times, k, conproc, conmax, penalty)
    return (metric, ) #note that we're returning a tuple
# delete references to classes to avoid warning
del creator.FitnessLoad
del creator.Individual
creator.create("FitnessLoad", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessLoad)
toolbox = base.Toolbox()
toolbox.register("assignments",create_individual,k,n)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.assignments)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=5)
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("mate", tools.cxTwoPoint) 
toolbox.register("mutate", tools.mutUniformInt, low = 0, up = k-1, indpb=0.1)
stats = tools.Statistics(key=lambda ind: ind.fitness.values)
stats.register("avg", np.mean)
stats.register("std", np.std)
stats.register("min", np.min)
stats.register("max", np.max)

# get solution
best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob,
                                     max_gen, max_no_improve)

print('Genetic Algorithm Best Result', best_balance)

print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)])
c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax)
print('Constrained metric', c_metric)
r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-5',np.array(best_assign),conproc,conmax)

Out[36]:

Genetic Algorithm Best Result 21557311.9
Total time on each processor: [8644, 8661, 11928, 11930, 11960, 11927, 11951, 11926, 11952, 11950]
Constrained metric 1065907.125

DEAP Genetic Algorithm - Penalty 50

To update just the penalty, we can unregister and register the evaluate function.

Warning: This code will be slow to run.

In [37]:

toolbox.unregister("evaluate")
toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=50)

# get solution
best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob,
                                     max_gen, max_no_improve)

print('Genetic Algorithm Best Result', best_balance)
print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)])
c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax)
print('Constrained metric', c_metric)
r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-50',np.array(best_assign),conproc,conmax)

Out[37]:

Genetic Algorithm Best Result 26297626.900000006
Total time on each processor: [8090, 8089, 12083, 12103, 12064, 12056, 12077, 12072, 12101, 12094]
Constrained metric 22093.625

DEAP Genetic Algorithm - Penalty 500

To update just the penalty, we can unregister and register the evaluate function.

Warning: This code will be slow to run.

In [38]:

toolbox.unregister("evaluate")
toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=500)

# get solution
best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob,
                                     max_gen, max_no_improve)

print('Genetic Algorithm Best Result', best_balance)
print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)])
c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax)
print('Constrained metric', c_metric)
r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-500',np.array(best_assign),conproc,conmax)

Out[38]:

Genetic Algorithm Best Result 27743348.9
Total time on each processor: [8041, 8017, 12058, 12371, 12264, 12115, 12172, 11906, 11726, 12159]
Constrained metric 290788.375

DEAP Genetic Algorithm - Feasible Start - Penalty 5

We found that the GA algorithm wasn't performing as well as we'd have thought it might. So we decided to try giving it a feasible start, even though we're using a soft constraint. Algorithms can only perform as well as we program them to perform. It's okay to try multiple options!

To create a feasible start, we needed a new version of our create individual function, and we need to register our individual class and assignments.

In [39]:

# how we create our individuals
def create_individual(k,n,conproc):
    unconstrained = [x for x in list(range(k)) if x not in conproc]
    current_x = np.random.choice(unconstrained,size=n,replace=True)
    return current_x.tolist() #this converts our np array back to a list

del creator.Individual
creator.create("Individual", list, fitness=creator.FitnessLoad)
toolbox.unregister("assignments")
toolbox.register("assignments",create_individual,k,n,conproc)

toolbox.unregister("evaluate")
toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=5)

# get solution
best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob,
                                     max_gen, max_no_improve)

print('Genetic Algorithm Best Result', best_balance)
print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)])
c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax)
print('Constrained metric', c_metric)
r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-5-FStart',np.array(best_assign),conproc,conmax)

Out[39]:

Genetic Algorithm Best Result 21560589.9
Total time on each processor: [8652, 8671, 11930, 11941, 11943, 11976, 11970, 11903, 11923, 11920]
Constrained metric 1098475.625

DEAP Genetic Algorithm - Feasible Start - Penalty 50

Once again, to just update the penalty, we only need to unregister and register in the evaluate function.

In [40]:

toolbox.unregister("evaluate")
toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=50)

# get solution
best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob,
                                     max_gen, max_no_improve)

print('Genetic Algorithm Best Result', best_balance)
print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)])
c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax)
print('Constrained metric', c_metric)
r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-50-FStart',np.array(best_assign),conproc,conmax)

Out[40]:

Genetic Algorithm Best Result 26439400.900000006
Total time on each processor: [8089, 8098, 12182, 12282, 12032, 11989, 12080, 12222, 11883, 11972]
Constrained metric 154325.625

DEAP Genetic Algorithm - Feasible Start - Penalty 500

One last time...

In [41]:

toolbox.unregister("evaluate")
toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax,penalty=500)

# get solution
best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob,
                                     max_gen, max_no_improve)

print('Genetic Algorithm Best Result', best_balance)
print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)])
c_metric = get_c_metric(np.array(best_assign),times,k,conproc,conmax)
print('Constrained metric', c_metric)
r_df = updateResults(r_df,best_balance,c_metric,[ sum(times[np.array(best_assign)==j]) for j in range(k)],'SC-GA-500-FStart',np.array(best_assign),conproc,conmax)

Out[41]:

Genetic Algorithm Best Result 27398408.900000006
Total time on each processor: [8019, 7995, 12013, 11713, 12084, 11836, 12283, 12388, 12230, 12268]
Constrained metric 389229.375

Final Results

The results below are sorted by the constrained score - which identifies how close to both meeting the constraints and having the processing evenly distributed among the constrained processors. Based on these results, which of the algorithms performed "best" for you?

In [42]:

r_df.sort_values(by=['Constrained Score'])

Out[42]:

Load Balancing With Contraints Example

Hard Constraint

Hard Constraint - Objective Function

Hard Constraint - Directly In the loop Move Function

Hard Constraint - Enforced in the Move Function

Greedy Local Search - Hard Constraint In the Loop

Greedy Local Search - Hard Constraint In the Move Function

Running local search with constraints

What's happening?

How do we fix it?

Trying again

Simulated Annealing - By Hand - Hard Constraints

The simanneal Package - Hard Constraints - No External Functions

The simanneal Package - Hard Constraints - External Functions

Soft-Constraints

Testing the Soft Constraint

The simanneal Package - Soft Constraint

Genetic Algorithm with DEAP - Soft Constraints

Increasing Problem Size

Baseline

Greedy local search (hard constraint)

Custom Simulated Annealing - Hard Constraint

Simanneal Package - Hard Constraint

Simanneal Package - Soft Constraint - Penalty Multiplier 5

Simanneal Package - Soft Constraint - Penalty Multiplier 50

Simanneal Package - Soft Constraint - Penalty Multiplier 500

DEAP Genetic Algorithm - Penalty 5

DEAP Genetic Algorithm - Penalty 50

DEAP Genetic Algorithm - Penalty 500

DEAP Genetic Algorithm - Feasible Start - Penalty 5

DEAP Genetic Algorithm - Feasible Start - Penalty 50

DEAP Genetic Algorithm - Feasible Start - Penalty 500

Final Results

Product

Resources

Company