Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ethen8181
GitHub Repository: ethen8181/machine-learning
Path: blob/master/operation_research/ortools.ipynb
1470 views
Kernel: Python 3 (ipykernel)
# code for loading the format for the notebook import os # path : store the current path to convert back to it later path = os.getcwd() os.chdir(os.path.join('..', 'notebook_format')) from formats import load_style load_style(plot_style=False)
os.chdir(path) # 1. magic to print version # 2. magic so that the notebook will reload external python modules %load_ext watermark %load_ext autoreload %autoreload 2 from collections import namedtuple from ortools.linear_solver import pywraplp %watermark -a 'Ethen' -d -t -v -p ortools
Author: Ethen Python implementation: CPython Python version : 3.7.11 IPython version : 7.27.0 ortools: 9.0.9048

Operation Research Quick Intro Via Ortools

The way to think about operation research, or optimization problem is we want to maximizie/minimize some objective, while being subjected to certain constraints.

For example, say we are deciding whether to buy ice cream or boba tea for dessert. Each type of food has an associated value, and cost, while we have a certain budget that we don't wish to exceed.

ParseError: KaTeX parse error: Expected 'EOF', got '_' at position 60: …lue}_{\text{ice_̲cream}} \cdot \…

Say we are able to replace the value, cost, and budget part with actual numbers (in practice, assigning actual numbers to each of these coefficients is often times core pieces of the work).

ParseError: KaTeX parse error: Expected 'EOF', got '_' at position 54: …\cdot \text{ice_̲cream} + 2 \cdo…

Given this toy problem, we can eye ball the solution, and see that we should use our limited budget to buy a boba tea for dessert. Operation research, a.k.a optimization techniques helps us algorithmically find solutions for these types of problems at a much larger scale.

The following section, uses ortools library to solve this problem programmatically.

budget = 1 DessertInfo = namedtuple('DessertInfo', ['name', 'value', 'cost']) dessert_infos = [ DessertInfo('ice_cream', 3, 2), DessertInfo('boba', 2, 1), ] num_desserts = len(dessert_infos) dessert_infos
[DessertInfo(name='ice_cream', value=3, cost=2), DessertInfo(name='boba', value=2, cost=1)]
# creates solver solver = pywraplp.Solver.CreateSolver('GLOP') # creates variables variables = [solver.NumVar(0, solver.infinity(), dessert_info.name) for dessert_info in dessert_infos] # define constraints constraint_coefficients = [dessert_info.cost for dessert_info in dessert_infos] constraint = [constraint_coefficients[i] * variables[i] for i in range(num_desserts)] solver.Add(solver.Sum(constraint) <= budget) # define objective objective_coefficients = [dessert_info.value for dessert_info in dessert_infos] objective = constraint = [objective_coefficients[i] * variables[i] for i in range(num_desserts)] solver.Maximize(solver.Sum(objective)) # solve status = solver.Solve() # extract optimal/feasible value if status == pywraplp.Solver.OPTIMAL or status == pywraplp.Solver.FEASIBLE: optimal_value = solver.Objective().Value() print(f'Optimal Value: = {optimal_value}') for i in range(num_desserts): print(variables[i].name(), variables[i].solution_value())
Optimal Value: = 2.0 ice_cream 0.0 boba 1.0

A couple of important things to note:

  • We are solving a Linear Programming problem, where we are computing the best solution to a given problem modeled as a series of linear relationships.

  • In this article, we won't be diving into the algorithms/solvers that are the workhorse behind the scenes that's finding the solution for us, and focus on how to frame the optimization problem.

  • We didn't explicitly specify this in our optimization formula, but notice the definition of NumVar specifies that our variables can take on numeric solutions. Often times, our problem might require some of the variables to be integers, these are called Mixed Integer Programming. e.g. In our example, we probably can't buy 1.5 portion of boba. In these cases, we can specify our variables to be IntVar.

  • There're other open sourced frameworks other than ortools out there, feel free to pick and choose based on preferences or speed. The exact API might be different, but the main idea revolves around defining the objective, defining the variables, adding the constraints, solving it and extracting the optimal/feasible solution.

Assignment Problem

Continuing with our discussions around Mixed Integer Programming, a closely related problem is the assignment problem, where our variables involves boolean decisions of 0 and 1 values.

We'll use the examples from this blog post, Blog: Towards optimal personalization: synthesisizing machine learning and operations research.

Say we are working in the marketing team, and we have different types of churn prevention channel, each having different prices, while different users/customers' retention rate is different for each channel. Our constraint is not spending above our monthly marketing budget, and the goal is to maxmize the total number of retained customers.

maximizeu,cRu,cAu,csubject tou,cPu,cAu,cBcAu,c=1,uUau,c{0,1}\begin{align} \text{maximize} & \sum_{u, c} R_{u, c} A_{u, c} \nonumber \\ \text{subject to} & \sum_{u, c} P_{u, c} A_{u, c} \leq B \\ & \sum_{c} A_{u, c} = 1, \forall u \in U \\ & a_{u, c} \in \{0, 1\} \end{align}

Where:

  • UU: is the set of all users.

  • CC: is the set of all channels.

  • Ru,cR_{u, c}: is the rentention probability if we were to notify the user, uu, using the channel cc.

  • Au,cA_{u, c}: is the assignment boolean decision variable, i.e. it takes on the value of 1 if we decided to reach out to user uu with channel cc, 0 otherwise.

  • Pu,cP_{u, c}: is the price/cost if we were to notify the user, uu, using the channel cc.

  • We have a constraint saying each customer can only receive the retention message via one channel, to prevent bombarding them.

  • As well as a constraint saying our cost shouldn't exceed our monthly budget BB.

Let's say we have 4 channels: email (0.25), push notification (0.3), text message (0.85), and phone call (5.0). Number in parenthesis indicates the cost/price. As for the retention probability, we will be using some randomly generated numbers, but imagine in real world scenarios where this can come from aggregated historical information, or even generated by some machine learning models.

budget = 1000 price = [25, 30, 85, 250] # rentention probability for each customer and channel pair retention_prob = [ [0.02, 0.27, 0.17, 0.87], [0.14, 0.21, 0.28, 0.014], [0.13, 0.003, 0.016, 0.64], [0.14, 0.04, 0.14, 0.26], [0.04, 0.24, 0.11, 0.31], ] num_users = len(retention_prob) num_channels = len(retention_prob[0])
# creates the solver for the mixed integer programming solver = pywraplp.Solver.CreateSolver('SCIP') # variable: assignment problem, creating a dictionary of binary variables variables = {} for i in range(num_users): for j in range(num_channels): variables[i, j] = solver.IntVar(0, 1, f'prob{i}_{j}')
# constraint: each user is assigned to at most 1 channel. for i in range(num_users): solver.Add(solver.Sum([variables[i, j] for j in range(num_channels)]) <= 1) # constraint: total cost should not exceed budget constraints = [] for j in range(num_channels): for i in range(num_users): constraint = price[j] * variables[i, j] constraints.append(constraint) solver.Add(solver.Sum(constraints) <= budget)
<ortools.linear_solver.pywraplp.Constraint; proxy of <Swig Object of type 'operations_research::MPConstraint *' at 0x7fa6e04b9f60> >
# objective objective_terms = [] for i in range(num_users): for j in range(num_channels): objective_terms.append(retention_prob[i][j] * variables[i, j]) solver.Maximize(solver.Sum(objective_terms))
# invokes the solver status = solver.Solve()
if status == pywraplp.Solver.OPTIMAL or status == pywraplp.Solver.FEASIBLE: optimal_value = solver.Objective().Value() print(f'Optimal Value: = {optimal_value}') for i in range(num_users): for j in range(num_channels): # check indicator variable's value, with tolerance for floating point arithmetic if variables[i, j].solution_value() > 0.5: print(f'User {i} assigned to Channel {j}, Cost = {price[j]}')
Optimal Value: = 2.29 User 0 assigned to Channel 3, Cost = 250 User 1 assigned to Channel 2, Cost = 85 User 2 assigned to Channel 3, Cost = 250 User 3 assigned to Channel 3, Cost = 250 User 4 assigned to Channel 1, Cost = 30

In this article, we took a sneak peak into some problems that can benefit from leveraging optimization. The problems that we deal with in real world settings can be a lot more complicated than the examples seen here, but hopefully, this gives you the idea that whenever we see a problem that involves maximizing some objectives given some constraint, we have a tool at hand that we can turn to.

Reference