GitHub Repository: ethen8181/machine-learning
Path: blob/master/operation_research/ortools.ipynb
¹⁴⁷⁰ views

Kernel: Python 3 (ipykernel)

In [1]:

# code for loading the format for the notebook
import os

# path : store the current path to convert back to it later
path = os.getcwd()
os.chdir(os.path.join('..', 'notebook_format'))

from formats import load_style
load_style(plot_style=False)

Out[1]:

In [2]:

os.chdir(path)

# 1. magic to print version
# 2. magic so that the notebook will reload external python modules
%load_ext watermark
%load_ext autoreload
%autoreload 2

from collections import namedtuple
from ortools.linear_solver import pywraplp

%watermark -a 'Ethen' -d -t -v -p ortools

Out[2]:

Author: Ethen

Python implementation: CPython
Python version       : 3.7.11
IPython version      : 7.27.0

ortools: 9.0.9048

Operation Research Quick Intro Via Ortools

The way to think about operation research, or optimization problem is we want to maximizie/minimize some objective, while being subjected to certain constraints.

For example, say we are deciding whether to buy ice cream or boba tea for dessert. Each type of food has an associated value, and cost, while we have a certain budget that we don't wish to exceed.

ParseError: KaTeX parse error: Expected 'EOF', got '_' at position 60: …lue}_{\text{ice_̲cream}} \cdot \…

Say we are able to replace the value, cost, and budget part with actual numbers (in practice, assigning actual numbers to each of these coefficients is often times core pieces of the work).

ParseError: KaTeX parse error: Expected 'EOF', got '_' at position 54: …\cdot \text{ice_̲cream} + 2 \cdo…

Given this toy problem, we can eye ball the solution, and see that we should use our limited budget to buy a boba tea for dessert. Operation research, a.k.a optimization techniques helps us algorithmically find solutions for these types of problems at a much larger scale.

The following section, uses ortools library to solve this problem programmatically.

In [3]:

budget = 1
DessertInfo = namedtuple('DessertInfo', ['name', 'value', 'cost'])
dessert_infos = [
    DessertInfo('ice_cream', 3, 2),
    DessertInfo('boba', 2, 1),
]
num_desserts = len(dessert_infos)
dessert_infos

Out[3]:

[DessertInfo(name='ice_cream', value=3, cost=2),
 DessertInfo(name='boba', value=2, cost=1)]

In [4]:

# creates solver
solver = pywraplp.Solver.CreateSolver('GLOP')

# creates variables
variables = [solver.NumVar(0, solver.infinity(), dessert_info.name) for dessert_info in dessert_infos]

# define constraints
constraint_coefficients = [dessert_info.cost for dessert_info in dessert_infos]
constraint = [constraint_coefficients[i] * variables[i] for i in range(num_desserts)]
solver.Add(solver.Sum(constraint) <= budget)

# define objective
objective_coefficients = [dessert_info.value for dessert_info in dessert_infos]
objective = constraint = [objective_coefficients[i] * variables[i] for i in range(num_desserts)]
solver.Maximize(solver.Sum(objective))

# solve
status = solver.Solve()

# extract optimal/feasible value
if status == pywraplp.Solver.OPTIMAL or status == pywraplp.Solver.FEASIBLE:
    optimal_value = solver.Objective().Value()
    print(f'Optimal Value: = {optimal_value}')
    for i in range(num_desserts):
        print(variables[i].name(), variables[i].solution_value())

Out[4]:

Optimal Value: = 2.0
ice_cream 0.0
boba 1.0

A couple of important things to note:

We are solving a Linear Programming problem, where we are computing the best solution to a given problem modeled as a series of linear relationships.
In this article, we won't be diving into the algorithms/solvers that are the workhorse behind the scenes that's finding the solution for us, and focus on how to frame the optimization problem.
We didn't explicitly specify this in our optimization formula, but notice the definition of NumVar specifies that our variables can take on numeric solutions. Often times, our problem might require some of the variables to be integers, these are called Mixed Integer Programming. e.g. In our example, we probably can't buy 1.5 portion of boba. In these cases, we can specify our variables to be IntVar.
There're other open sourced frameworks other than ortools out there, feel free to pick and choose based on preferences or speed. The exact API might be different, but the main idea revolves around defining the objective, defining the variables, adding the constraints, solving it and extracting the optimal/feasible solution.

Assignment Problem

Continuing with our discussions around Mixed Integer Programming, a closely related problem is the assignment problem, where our variables involves boolean decisions of 0 and 1 values.

We'll use the examples from this blog post, Blog: Towards optimal personalization: synthesisizing machine learning and operations research.

Say we are working in the marketing team, and we have different types of churn prevention channel, each having different prices, while different users/customers' retention rate is different for each channel. Our constraint is not spending above our monthly marketing budget, and the goal is to maxmize the total number of retained customers.

\begin{align} \text{maximize} & \sum_{u, c} R_{u, c} A_{u, c} \nonumber \\ \text{subject to} & \sum_{u, c} P_{u, c} A_{u, c} \leq B \\ & \sum_{c} A_{u, c} = 1, \forall u \in U \\ & a_{u, c} \in \{0, 1\} \end{align}

Where:

$U$ : is the set of all users.
$C$ : is the set of all channels.
$R_{u, c}$ : is the rentention probability if we were to notify the user, $u$ , using the channel $c$ .
$A_{u, c}$ : is the assignment boolean decision variable, i.e. it takes on the value of 1 if we decided to reach out to user $u$ with channel $c$ , 0 otherwise.
$P_{u, c}$ : is the price/cost if we were to notify the user, $u$ , using the channel $c$ .
We have a constraint saying each customer can only receive the retention message via one channel, to prevent bombarding them.
As well as a constraint saying our cost shouldn't exceed our monthly budget $B$ .

Let's say we have 4 channels: email (0.25), push notification (0.3), text message (0.85), and phone call (5.0). Number in parenthesis indicates the cost/price. As for the retention probability, we will be using some randomly generated numbers, but imagine in real world scenarios where this can come from aggregated historical information, or even generated by some machine learning models.

In [5]:

budget = 1000
price = [25, 30, 85, 250]

# rentention probability for each customer and channel pair
retention_prob = [
    [0.02, 0.27, 0.17, 0.87],
    [0.14, 0.21, 0.28, 0.014],
    [0.13, 0.003, 0.016, 0.64],
    [0.14, 0.04, 0.14, 0.26],
    [0.04, 0.24, 0.11, 0.31],
]
num_users = len(retention_prob)
num_channels = len(retention_prob[0])

In [6]:

# creates the solver for the mixed integer programming
solver = pywraplp.Solver.CreateSolver('SCIP')

# variable: assignment problem, creating a dictionary of binary variables
variables = {}
for i in range(num_users):
    for j in range(num_channels):
        variables[i, j] = solver.IntVar(0, 1, f'prob{i}_{j}')

In [7]:

# constraint: each user is assigned to at most 1 channel.
for i in range(num_users):
    solver.Add(solver.Sum([variables[i, j] for j in range(num_channels)]) <= 1)

# constraint: total cost should not exceed budget
constraints = []
for j in range(num_channels):
    for i in range(num_users):
        constraint = price[j] * variables[i, j]
        constraints.append(constraint)

solver.Add(solver.Sum(constraints) <= budget)

Out[7]:

<ortools.linear_solver.pywraplp.Constraint; proxy of <Swig Object of type 'operations_research::MPConstraint *' at 0x7fa6e04b9f60> >

In [8]:

# objective
objective_terms = []
for i in range(num_users):
    for j in range(num_channels):
        objective_terms.append(retention_prob[i][j] * variables[i, j])

solver.Maximize(solver.Sum(objective_terms))

In [9]:

# invokes the solver
status = solver.Solve()

In [10]:

if status == pywraplp.Solver.OPTIMAL or status == pywraplp.Solver.FEASIBLE:
    optimal_value = solver.Objective().Value()
    print(f'Optimal Value: = {optimal_value}')
    for i in range(num_users):
        for j in range(num_channels):
            # check indicator variable's value, with tolerance for floating point arithmetic
            if variables[i, j].solution_value() > 0.5:
                print(f'User {i} assigned to Channel {j}, Cost = {price[j]}')

Out[10]:

Optimal Value: = 2.29
User 0 assigned to Channel 3, Cost = 250
User 1 assigned to Channel 2, Cost = 85
User 2 assigned to Channel 3, Cost = 250
User 3 assigned to Channel 3, Cost = 250
User 4 assigned to Channel 1, Cost = 30

In this article, we took a sneak peak into some problems that can benefit from leveraging optimization. The problems that we deal with in real world settings can be a lot more complicated than the examples seen here, but hopefully, this gives you the idea that whenever we see a problem that involves maximizing some objectives given some constraint, we have a tool at hand that we can turn to.

Table of Contents

Operation Research Quick Intro Via Ortools

Assignment Problem

Reference

Product

Resources

Company