Preventing Customer Churn by Optimizing Incentive Programs¶

Table of contents¶

Background
Setup
Data Exploration
Model Training
Model Hosting
Assess and optimize
Summary

Background¶

_This notebook has been adapted from Amazon SageMaker Examples - Customer Churn. That notebook had been adapted from the AWS blog post and AWS notebook._

This notebook also builds on the AWS blog Gain customer insights using Amazon Aurora machine learning, which focused on integrating churn information into the customer service response process, offering selected customers an incentive. For ease of experimentation, this notebook has been built to run stand-alone, but the methods developed here are intended for integration into the environment of the prior blog.

In this version of the notebook, in addition to building the predictive model we explore the key question: How do we create an optimal incentive program that we (as the provider) think is most likely to reduce churn, with the minimum cost to us?

Losing customers is costly for any business. Identifying unhappy customers early on gives the business a chance to offer them incentives to stay. This notebook describes using machine learning (ML) for the automated identification of unhappy customers, also known as customer churn prediction. ML models rarely give perfect predictions though, so this notebook is also about how to incorporate the relative costs of prediction mistakes when determining the financial outcome of using ML.

We use an example of churn that is familiar to all of us – leaving a mobile phone operator. Seems like I can always find fault with my provider du jour! And if my provider knows that I’m thinking of leaving, it can offer timely incentives – I can always use a phone upgrade or perhaps have a new feature activated – and I might just stick around. Incentives are often much more cost effective than losing and reacquiring a customer.

The first sections of the notebook are identical to the source XGBoost notebook. Substantial differences begin at the heading, "Assessing business impact..."

Setup¶

This notebook was created and tested on an ml.m4.xlarge notebook instance.

Import the libraries:

import os
import sys
import boto3
import re
import sagemaker

# To get the container for training
from sagemaker.amazon.amazon_estimator import get_image_uri
# To run predictions against the model 
from sagemaker.predictor import csv_serializer
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner

# Data manipulations:
import pandas as pd
import numpy as np

# Data Plotting
import matplotlib.pyplot as plt
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# To mix code and markdown
from IPython.display import Markdown
# from ColorBrewer
plot_color = "#4daf4a"

%matplotlib inline

accountid = boto3.client('sts').get_caller_identity()['Account']
role = sagemaker.get_execution_role()
sess = sagemaker.Session()

# provide a prefix to be attached to the output files in the bucket
prefix = 'sagemaker/xgboost-churn-ecooptimize'

# Please give the name of an existing bucket to use
bucket = 'vmegler-projects'

Data Exploration¶

Mobile operators have historical records on which customers ultimately ended up churning and which continued using the service. We can use this historical information to construct an ML model of one mobile operator’s churn using a process called training. After training the model, we can pass the profile information of an arbitrary customer (the same profile information that we used to train the model) to the model, and have the model predict whether this customer is going to churn. Of course, we expect the model to make mistakes–after all, predicting the future is tricky business! But I’ll also show how to deal with prediction errors.

The dataset we use is publicly available and was mentioned in the book Discovering Knowledge in Data by Daniel T. Larose. It is attributed by the author to the University of California Irvine Repository of Machine Learning Datasets.

if not os.path.exists("DKD2e_data_sets.zip"):
    !wget http://dataminingconsultant.com/DKD2e_data_sets.zip
    !unzip -o DKD2e_data_sets.zip
else:
    print("File has been already downloaded")

File has been already downloaded

By modern standards, it’s a relatively small dataset, with only 3,333 records, where each record uses 21 attributes to describe the profile of a customer of an unknown US mobile operator. The attributes are:

State: the US state in which the customer resides, indicated by a two-letter abbreviation; for example, OH or NJ
Account Length: the number of days that this account has been active
Area Code: the three-digit area code of the corresponding customer’s phone number
Phone: the remaining seven-digit phone number
Int’l Plan: whether the customer has an international calling plan: yes/no
VMail Plan: whether the customer has a voice mail feature: yes/no
VMail Message: presumably the average number of voice mail messages per month
Day Mins: the total number of calling minutes used during the day
Day Calls: the total number of calls placed during the day
Day Charge: the billed cost of daytime calls
Eve Mins, Eve Calls, Eve Charge: the billed cost for calls placed during the evening
Night Mins, Night Calls, Night Charge: the billed cost for calls placed during nighttime
Intl Mins, Intl Calls, Intl Charge: the billed cost for international calls
CustServ Calls: the number of calls placed to Customer Service
Churn?: whether the customer left the service: true/false

The last attribute, Churn?, is known as the target attribute–the attribute that we want the ML model to predict. Because the target attribute is binary, our model will be performing binary prediction, also known as binary classification.

Preview the first few rows:

# read the customer churn data to pandas DataFrame
pd.set_option('display.max_columns', 25)
churn = pd.read_csv('./Data sets/churn.txt')
# review the top rows
churn.head()

Exploring the Data¶

Let's begin exploring the data.

_This section is identical to the original notebook, Amazon SageMaker Examples - Customer Churn. While data exploration is an important topic, it's not the focus of this walk through. Therefore it's been removed in the interests of brevity, but the actions taken based on the analysis have been kept (i.e., columns kept/removed). Please refer to the original notebook for this section._

# Now, we'll save a copy of the original dataset, for use later
churn_save = churn.copy()
# Then, we'll add a column for the total customer spend
churn_save['Total Customer Spend'] = churn_save.apply(lambda x: x['Day Charge'] + x['Night Charge'] + x['Eve Charge'] 
                                                      + x['Intl Charge'], axis=1)
churn_save['Area Code'] = churn['Area Code'].astype(object)

churn = churn.drop('Phone', axis=1)
churn['Area Code'] = churn['Area Code'].astype(object)

Let's remove one feature from each of the highly correlated pairs: Day Charge from the pair with Day Mins, Night Charge from the pair with Night Mins, Intl Charge from the pair with Intl Mins:

churn = churn.drop(['Day Charge', 'Eve Charge', 'Night Charge', 'Intl Charge'], axis=1)

Now that we've cleaned up our dataset, let's determine which algorithm to use. As mentioned above, there appear to be some variables where both high and low (but not intermediate) values are predictive of churn. In order to accommodate this in an algorithm like linear regression, we'd need to generate polynomial (or bucketed) terms. Instead, let's attempt to model this problem using gradient boosted trees. Amazon SageMaker provides an XGBoost container that we can use to train in a managed, distributed setting, and then host as a real-time prediction endpoint. XGBoost uses gradient boosted trees which naturally account for non-linear relationships between features and the target variable, as well as accommodating complex interactions between features.

Amazon SageMaker XGBoost can train on data in either a CSV or LibSVM format. For this example, we'll stick with CSV. It should (documentation here):

Have the predictor variable in the first column
Not have a header row

Since the model will not have the feature names later, when we explore the results, we will need to assign them from the original data (excluding the target variable)

But first, let's convert our categorical features into numeric features as the algorithm manages only numeric features. Then, we place the outcome as the first column.

model_data = pd.get_dummies(churn)
model_data = pd.concat([model_data['Churn?_True.'], model_data.drop(['Churn?_False.', 'Churn?_True.'], axis=1)], axis=1)

And now let's split the data into training, validation, and test sets. This will help prevent us from overfitting the model, and allow us to test the models accuracy on data it hasn't already seen.

Note that different splits of the data may create slightly different results. In addition, on different runs against the same data, XGBoost may choose different combinations of features and trees that give similar model performance.

train_data, validation_data, test_data = np.split(model_data.sample(frac = 1, random_state = 1729), 
                                                  [int(0.7 * len(model_data)), int(0.9 * len(model_data))])
train_data.to_csv('train.csv', header = False, index = False)
validation_data.to_csv('validation.csv', header = False, index = False)
test_data.to_csv('test.csv', header = False, index = False)

test_data_columns=test_data.columns
test_data_columns
test_data.shape

Index(['Churn?_True.', 'Account Length', 'VMail Message', 'Day Mins',
       'Day Calls', 'Eve Mins', 'Eve Calls', 'Night Mins', 'Night Calls',
       'Intl Mins', 'Intl Calls', 'CustServ Calls', 'State_AK', 'State_AL',
       'State_AR', 'State_AZ', 'State_CA', 'State_CO', 'State_CT', 'State_DC',
       'State_DE', 'State_FL', 'State_GA', 'State_HI', 'State_IA', 'State_ID',
       'State_IL', 'State_IN', 'State_KS', 'State_KY', 'State_LA', 'State_MA',
       'State_MD', 'State_ME', 'State_MI', 'State_MN', 'State_MO', 'State_MS',
       'State_MT', 'State_NC', 'State_ND', 'State_NE', 'State_NH', 'State_NJ',
       'State_NM', 'State_NV', 'State_NY', 'State_OH', 'State_OK', 'State_OR',
       'State_PA', 'State_RI', 'State_SC', 'State_SD', 'State_TN', 'State_TX',
       'State_UT', 'State_VA', 'State_VT', 'State_WA', 'State_WI', 'State_WV',
       'State_WY', 'Area Code_408', 'Area Code_415', 'Area Code_510',
       'Int'l Plan_no', 'Int'l Plan_yes', 'VMail Plan_no', 'VMail Plan_yes'],
      dtype='object')

(334, 70)

train_data.head()

train_data.columns

Index(['Churn?_True.', 'Account Length', 'VMail Message', 'Day Mins',
       'Day Calls', 'Eve Mins', 'Eve Calls', 'Night Mins', 'Night Calls',
       'Intl Mins', 'Intl Calls', 'CustServ Calls', 'State_AK', 'State_AL',
       'State_AR', 'State_AZ', 'State_CA', 'State_CO', 'State_CT', 'State_DC',
       'State_DE', 'State_FL', 'State_GA', 'State_HI', 'State_IA', 'State_ID',
       'State_IL', 'State_IN', 'State_KS', 'State_KY', 'State_LA', 'State_MA',
       'State_MD', 'State_ME', 'State_MI', 'State_MN', 'State_MO', 'State_MS',
       'State_MT', 'State_NC', 'State_ND', 'State_NE', 'State_NH', 'State_NJ',
       'State_NM', 'State_NV', 'State_NY', 'State_OH', 'State_OK', 'State_OR',
       'State_PA', 'State_RI', 'State_SC', 'State_SD', 'State_TN', 'State_TX',
       'State_UT', 'State_VA', 'State_VT', 'State_WA', 'State_WI', 'State_WV',
       'State_WY', 'Area Code_408', 'Area Code_415', 'Area Code_510',
       'Int'l Plan_no', 'Int'l Plan_yes', 'VMail Plan_no', 'VMail Plan_yes'],
      dtype='object')

Now we'll upload these files to S3.

boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')).upload_file('validation.csv')

Train¶

Moving onto training, first we'll need to specify the locations of the XGBoost algorithm containers.

_This section is identical to the original notebook, Amazon SageMaker Examples - Customer Churn._

from sagemaker.amazon.amazon_estimator import get_image_uri
container = get_image_uri(boto3.Session().region_name, 'xgboost', repo_version = '1.0-1')
print(container)

'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.

246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3

Then, because we're training with the CSV file format, we'll create s3_inputs that our training function can use as a pointer to the files in S3.

s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='csv')

's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.

Now, we can specify a few parameters like what type of training instances we'd like to use and how many, as well as our XGBoost hyperparameters. More detail on XGBoost's hyperparmeters can be found on their GitHub page.

sess = sagemaker.Session()

xgb = sagemaker.estimator.Estimator(container,
                                    role, 
                                    train_instance_count=1, 
                                    train_instance_type='ml.m4.xlarge',
                                    output_path='s3://{}/{}/output'.format(bucket, prefix),
                                    sagemaker_session=sess)
xgb.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        silent=0,
                        objective='binary:logistic',
                        num_round=100)

xgb.fit({'train': s3_input_train, 'validation': s3_input_validation})

Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.

2020-08-28 00:46:51 Starting - Starting the training job...
2020-08-28 00:46:53 Starting - Launching requested ML instances......
2020-08-28 00:48:04 Starting - Preparing the instances for training......
2020-08-28 00:49:11 Downloading - Downloading input data
2020-08-28 00:49:11 Training - Downloading the training image...
2020-08-28 00:49:48 Training - Training image download completed. Training in progress..INFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training
INFO:sagemaker-containers:Failed to parse hyperparameter objective value binary:logistic to Json.
Returning the value itself
INFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)
INFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode
INFO:root:Determined delimiter of CSV input is ','
INFO:root:Determined delimiter of CSV input is ','
INFO:root:Determined delimiter of CSV input is ','
[00:49:51] 2333x69 matrix with 160977 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,
INFO:root:Determined delimiter of CSV input is ','
[00:49:51] 666x69 matrix with 45954 entries loaded from /opt/ml/input/data/validation?format=csv&label_column=0&delimiter=,
INFO:root:Single node training.
INFO:root:Train matrix has 2333 rows
INFO:root:Validation matrix has 666 rows
[00:49:51] WARNING: /workspace/src/learner.cc:328: 
Parameters: { num_round, silent } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[0]#011train-error:0.07715#011validation-error:0.09910
[1]#011train-error:0.05058#011validation-error:0.08108
[2]#011train-error:0.04886#011validation-error:0.07508
[3]#011train-error:0.04672#011validation-error:0.07207
[4]#011train-error:0.04801#011validation-error:0.07357
[5]#011train-error:0.04672#011validation-error:0.07057
[6]#011train-error:0.04544#011validation-error:0.07357
[7]#011train-error:0.04372#011validation-error:0.06907
[8]#011train-error:0.04501#011validation-error:0.06907
[9]#011train-error:0.04244#011validation-error:0.06607
[10]#011train-error:0.04029#011validation-error:0.06456
[11]#011train-error:0.03901#011validation-error:0.06757
[12]#011train-error:0.03858#011validation-error:0.06456
[13]#011train-error:0.03772#011validation-error:0.06456
[14]#011train-error:0.03772#011validation-error:0.06607
[15]#011train-error:0.03943#011validation-error:0.06757
[16]#011train-error:0.03858#011validation-error:0.06306
[17]#011train-error:0.03772#011validation-error:0.06456
[18]#011train-error:0.03943#011validation-error:0.06306
[19]#011train-error:0.03986#011validation-error:0.06006
[20]#011train-error:0.03943#011validation-error:0.06306
[21]#011train-error:0.03858#011validation-error:0.06306
[22]#011train-error:0.03815#011validation-error:0.06607
[23]#011train-error:0.03686#011validation-error:0.06757
[24]#011train-error:0.03601#011validation-error:0.06757
[25]#011train-error:0.03429#011validation-error:0.06757
[26]#011train-error:0.03386#011validation-error:0.06907
[27]#011train-error:0.03386#011validation-error:0.06907
[28]#011train-error:0.03386#011validation-error:0.06757
[29]#011train-error:0.03386#011validation-error:0.07207
[30]#011train-error:0.03386#011validation-error:0.07057
[31]#011train-error:0.03429#011validation-error:0.07207
[32]#011train-error:0.03472#011validation-error:0.07057
[33]#011train-error:0.03472#011validation-error:0.06907
[34]#011train-error:0.03300#011validation-error:0.06907
[35]#011train-error:0.03386#011validation-error:0.06907
[36]#011train-error:0.03386#011validation-error:0.06907
[37]#011train-error:0.03300#011validation-error:0.06757
[38]#011train-error:0.03343#011validation-error:0.06757
[39]#011train-error:0.03300#011validation-error:0.06757
[40]#011train-error:0.03172#011validation-error:0.06907
[41]#011train-error:0.03172#011validation-error:0.06907
[42]#011train-error:0.03086#011validation-error:0.06907
[43]#011train-error:0.03129#011validation-error:0.06907
[44]#011train-error:0.03086#011validation-error:0.06757
[45]#011train-error:0.03129#011validation-error:0.06757
[46]#011train-error:0.03086#011validation-error:0.06607
[47]#011train-error:0.03086#011validation-error:0.06607
[48]#011train-error:0.03043#011validation-error:0.06607
[49]#011train-error:0.03000#011validation-error:0.06757
[50]#011train-error:0.03000#011validation-error:0.06757
[51]#011train-error:0.03000#011validation-error:0.06607
[52]#011train-error:0.03000#011validation-error:0.06607
[53]#011train-error:0.03000#011validation-error:0.06607
[54]#011train-error:0.03000#011validation-error:0.06607
[55]#011train-error:0.03043#011validation-error:0.06757
[56]#011train-error:0.03043#011validation-error:0.06757
[57]#011train-error:0.03043#011validation-error:0.06757
[58]#011train-error:0.03043#011validation-error:0.06757
[59]#011train-error:0.03043#011validation-error:0.06757
[60]#011train-error:0.03043#011validation-error:0.06757
[61]#011train-error:0.03043#011validation-error:0.06757
[62]#011train-error:0.03043#011validation-error:0.06757
[63]#011train-error:0.03043#011validation-error:0.06757
[64]#011train-error:0.03043#011validation-error:0.06757
[65]#011train-error:0.03043#011validation-error:0.06757
[66]#011train-error:0.03043#011validation-error:0.06757
[67]#011train-error:0.03043#011validation-error:0.06757
[68]#011train-error:0.02915#011validation-error:0.06757
[69]#011train-error:0.02915#011validation-error:0.06607
[70]#011train-error:0.02915#011validation-error:0.06757
[71]#011train-error:0.02872#011validation-error:0.06607
[72]#011train-error:0.02915#011validation-error:0.06607
[73]#011train-error:0.02915#011validation-error:0.06607
[74]#011train-error:0.02915#011validation-error:0.06607
[75]#011train-error:0.02915#011validation-error:0.06607
[76]#011train-error:0.02915#011validation-error:0.06607
[77]#011train-error:0.02915#011validation-error:0.06607
[78]#011train-error:0.02958#011validation-error:0.06607
[79]#011train-error:0.03000#011validation-error:0.06607
[80]#011train-error:0.02958#011validation-error:0.06607
[81]#011train-error:0.02829#011validation-error:0.06607
[82]#011train-error:0.02829#011validation-error:0.06607
[83]#011train-error:0.02829#011validation-error:0.06607
[84]#011train-error:0.02786#011validation-error:0.06607
[85]#011train-error:0.02786#011validation-error:0.06607
[86]#011train-error:0.02786#011validation-error:0.06607
[87]#011train-error:0.02786#011validation-error:0.06607
[88]#011train-error:0.02829#011validation-error:0.06607
[89]#011train-error:0.02829#011validation-error:0.06607
[90]#011train-error:0.02786#011validation-error:0.06607
[91]#011train-error:0.02786#011validation-error:0.06607
[92]#011train-error:0.02829#011validation-error:0.06607
[93]#011train-error:0.02786#011validation-error:0.06607
[94]#011train-error:0.02786#011validation-error:0.06607
[95]#011train-error:0.02786#011validation-error:0.06607
[96]#011train-error:0.02829#011validation-error:0.06607
[97]#011train-error:0.02829#011validation-error:0.06607
[98]#011train-error:0.02829#011validation-error:0.06607
[99]#011train-error:0.02829#011validation-error:0.06607

2020-08-28 00:50:01 Uploading - Uploading generated training model
2020-08-28 00:50:01 Completed - Training job completed
Training seconds: 67
Billable seconds: 67

Host¶

Now that we've trained the algorithm, let's create a model and deploy it to a hosted endpoint.

xgb_predictor = xgb.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge', endpoint_name='prevent-churn-oda')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.

-------------!

Evaluate¶

This section has minor changes from the original notebook, to set up for the next section. We save the input file, and then add the predictions as a column, so that all customer data is available to us in addition to the prediction.

Now that we have a hosted endpoint running, we can make real-time predictions from our model very easily, simply by making an http POST request. But first, we'll need to setup serializers and deserializers for passing our test_data NumPy arrays to the model behind the endpoint.

xgb_predictor.content_type = 'text/csv'
xgb_predictor.serializer = csv_serializer
xgb_predictor.deserializer = None

For simpler traceability to our original dataset given our focus on optimization rather than on improving the ML model, we'll take a section of our saved dataset and use that to make predictions.

N = 1500 

churn_sample = churn_save.sample(N)
# Convert this subset to dummies for use in inference; but remove columns that may cause dimension explosion
# Note that dummies can be problematic, as all categorical variables must be represented as with the full dataset.
churn_sample_dummies = pd.get_dummies(churn_sample.drop(['Phone', 'Total Customer Spend'], axis=1))
churn_sample_dummies.shape

(1500, 75)

Now, we'll use a simple function to:

Loop over our test dataset
Split it into mini-batches of rows
Convert those mini-batchs to CSV string payloads
Retrieve mini-batch predictions by invoking the XGBoost endpoint
Collect predictions and convert from the CSV output our model provides into a NumPy array

# This version of 'predict' allows us to pass a dataset with more columns, and a list of the columns to be used in the prediction
def predict_cost(data, columns, rows=500):
    test_data = data[columns]
    test_data_nolab = test_data.values[:, 1:]
    split_array = np.array_split(test_data_nolab, int(test_data_nolab.shape[0] / float(rows) + 1))
    predictions = ''
    for array in split_array:
        predictions = ','.join([predictions, xgb_predictor.predict(array).decode('utf-8')])

    return np.fromstring(predictions[1:], sep=',')

predictions = predict_cost(churn_sample_dummies, test_data.columns)
predictions

array([0.08631791, 0.14205045, 0.03817179, ..., 0.04258348, 0.15909368,
       0.89470887])

churn_sample['Churn Probability'] = predictions
churn_sample.head(5)

Assess and optimize¶

This section contains the new material, and is the focus of this blog post.

We can assess the model performance by looking at the prediction scores, as shown in the original post, Amazon SageMaker Examples - Customer Churn.

While it’s usual to treat this as a binary classification problem (‘1’ or ‘0’), in fact, the real world is less binary: people become “likely to churn” for some time before they actually churn. Loss of “brand loyalty” occurs some time before someone actually buys from a competitor. There's frequently a slow rise in dissatisfaction over time before someone is finally driven to act. Providing the right incentive at the right time can reset a customer's satisfaction.

So how do calculate the minimum incentive that will give the desired result? Rather than providing a single program to all customers, can we save money and gain a better outcome by using variable incentives, customized to a customer's churn probability and value? And if so, how?

We can do so by building on components we've already developed so far.

Assigning costs to our predictions¶

What are the costs for our problem of mobile operator churn? The costs, of course, depend on the specific actions that the business takes. Let's make some assumptions here.

First, we'll assign the true negatives the cost of \$0. Our model essentially correctly identified a happy customer in this case, and we won’t offer them an incentive. An alternative is to assign the true negatives the actual value of the customer's spend, as this is the customer's contribution to our overall revenue.

False negatives are the most problematic, because they incorrectly predict that a churning customer will stay. We lose the customer and will have to pay all the costs of acquiring a replacement customer, including foregone revenue, advertising costs, administrative costs, point of sale costs, and likely a phone hardware subsidy. A quick search on the Internet reveals that such costs typically run in the hundreds of dollars so, for the right now, let's assume $500. This is the cost we'll use for each false negative. Our marketing department should be able to give us a value to use here for the overhead, and we have the actual customer spend for each customer in our dataset.

Finally, we'll give an incentive to customers that our model identifies as churning. At this point let's assume a one-time retention incentive in the amount of \$50. This is the cost we'll apply to both true positive and false positive outcomes. In the case of false positives (the customer is happy, but the model mistakenly predicted churn), we will “waste” the concession. We probably could have spent those dollars more effectively, but it's possible we increased the loyalty of an already loyal customer, so that’s not so bad. We'll be revising this initial approach below.

Let's look at the continuous values of our churn predictions.

plt.hist(predictions)
plt.xlabel('Churn prediction score')
plt.ylabel('Number of customers')
plt.title('Prediction Scores')
plt.show()

(array([1189.,   76.,   41.,   13.,   16.,   15.,   12.,   19.,   57.,
          62.]),
 array([0.00339087, 0.10263557, 0.20188026, 0.30112495, 0.40036965,
        0.49961434, 0.59885903, 0.69810373, 0.79734842, 0.89659311,
        0.99583781]),
 <a list of 10 Patch objects>)

Text(0.5, 0, 'Churn prediction score')

Text(0, 0.5, 'Number of customers')

Text(0.5, 1.0, 'Prediction Scores')

Mapping the customer churn threshold¶

In previous versions of this notebook, we've shown the effect of false negatives that are substantially more costly than false positives. Instead of optimizing for error based on the number of customers, we've used a cost function that looks like this:

txt
cost_of_replacing_customer * FN(C) + customer_value * TN(C) + incentive_offered * FP(C) + incentive_offered * TP(C)

FN(C) means that the false negative percentage is a function of the cutoff, C, and similar for TN, FP, and TP. We'd like to find the cutoff, C, where the result of the expression is smallest.

Right now we'll start by using the same values for all customers, to give us a starting point for discussion with the business. With the estimates we'll use for right now, this equation becomes:

txt
$500 * FN(C) + $0 * TN(C) + $50 * FP(C) + $50 * TP(C)

A straightforward way to understand the impact of these numbers is to simply run a simulation over a large number of possible cutoffs. We test 100 possible values in the for loop below.

import matplotlib.ticker as ticker
cutoffs = np.arange(0.01, 1, 0.01)
costs = []
num_below_cutoff = []
fn = 500
tn = 0
fp = 50
tp = 50
for c in cutoffs:
    crsstb = pd.crosstab(index=churn_sample_dummies['Churn?_True.'],       
                                           columns=np.where(predictions > c, 1, 0))
    if crsstb.shape == (2,1):
        print(crsstb.columns)
        if crsstb.columns[0] == 0:   # Then we're missing the '1' column
            crsstb[1] = 0
        else:
            crsstb[0] = 0
    costs.append(np.sum(np.sum(np.array([[tn, tp], [fn, fp]]) * crsstb )))
    num_below_cutoff.append(np.count_nonzero(np.where(predictions <= c, 1, 0)))
    
costs = np.array(costs)
fig, ax = plt.subplots(1, 1)
plt.plot(cutoffs, costs)
fmt = '${x:,.0f}'
tick = ticker.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick) 
ax.tick_params(axis='y', labelcolor='b')
plt.xlabel('Threshold')
ax.set_ylabel('Cost',color='b')

ax2 = ax.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('Number of customers below cutoff')  
ax2.plot(cutoffs, num_below_cutoff, color='k')
ax2.tick_params(axis='y', labelcolor='k')

plt.title('Cost versus Threshold')
plt.show()
dex = np.argmin(costs)
incentives_paid_to = len(churn_sample_dummies) - num_below_cutoff[dex]
print('Cost is minimized near a cutoff of:', cutoffs[dex], 'for a cost of: $', np.min(costs), 'for these', len(predictions), 'customers.')
print('Incentive is paid to', incentives_paid_to,'customers, for a total outlay of $', incentives_paid_to * tp)
print('Total customer spend of these customers is $',
          churn_sample[churn_sample['Churn Probability'] > cutoffs[dex]]['Total Customer Spend'].sum())

[<matplotlib.lines.Line2D at 0x7f7f04234f60>]

Text(0.5, 0, 'Threshold')

Text(0, 0.5, 'Cost')

Text(0, 0.5, 'Number of customers below cutoff')

[<matplotlib.lines.Line2D at 0x7f7efc640550>]

Text(0.5, 1.0, 'Cost versus Threshold')

Cost is minimized near a cutoff of: 0.22 for a cost of: $ 26700 for these 1500 customers.
Incentive is paid to 224 customers, for a total outlay of $ 11200
Total customer spend of these customers is $ 15123.499999999998

The above chart shows how picking a threshold too low results in costs skyrocketing as all customers are given a retention incentive. Meanwhile, setting the threshold too high (e.g., 0.7 or above) results in too many lost customers, which ultimately grows to be nearly as costly. In between, there is a large "grey" area, where perhaps some more nuanced incentives would create better outcomes.

The overall cost can be minimized at \$25800 by setting the cutoff to 0.21, which is substantially better than the \\$100k+ we would expect to lose by not taking any action.

We can also calculate the dollar outlay of the program, and compare to the total spend of the customers. Here we can see that paying the incentive to all predicted churn customers will cost \$12300, and that these customers spend \\$16324. (Your numbers may vary, depending on the specific customers randomly chosen for the sample.)

What happens if we instead have a smaller budget for our campaign? We'll choose a budget of 1% of total customer monthly spend.

# C: Incentive Budget equal to a fixed %age of the Total revenue
C = 0.01*np.sum(churn_sample['Total Customer Spend'].values) 

print('Total budget is:', '${:,.2f}'.format(C))
incentive = C / N
print('Per customer incentive is', '${:,.2f}'.format(incentive))

Total budget is: $893.57
Per customer incentive is $0.60

import matplotlib.ticker as ticker
cutoffs = np.arange(0.01, 1, 0.01)
costs = []
num_below_cutoff = []
fn = 500
tn = 0
fp = incentive
tp = incentive
for c in cutoffs:
    crsstb = pd.crosstab(index=churn_sample_dummies['Churn?_True.'],       
                                           columns=np.where(predictions > c, 1, 0))
    if crsstb.shape == (2,1):
        print(crsstb.columns)
        if crsstb.columns[0] == 0:   # Then we're missing the '1' column
            crsstb[1] = 0
        else:
            crsstb[0] = 0
    costs.append(np.sum(np.sum(np.array([[tn, tp], [fn, fp]]) * crsstb )))
    num_below_cutoff.append(np.count_nonzero(np.where(predictions <= c, 1, 0)))
    
costs = np.array(costs)
fig, ax = plt.subplots(1, 1)
plt.plot(cutoffs, costs)
fmt = '${x:,.0f}'
tick = ticker.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick) 
ax.tick_params(axis='y', labelcolor='b')
plt.xlabel('Threshold')
ax.set_ylabel('Cost',color='b')

ax2 = ax.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('Number of customers below cutoff')  
ax2.plot(cutoffs, num_below_cutoff, color='k')
ax2.tick_params(axis='y', labelcolor='k')

plt.title('Cost versus Threshold')
plt.show()

dex = np.argmin(costs)
incentives_paid_to = len(churn_sample_dummies) - num_below_cutoff[dex]
print('Cost is minimized near a cutoff of:', cutoffs[dex], 'for a cost of: $', np.min(costs), 'for these', len(predictions), 'customers.')
print('Incentive is paid to', incentives_paid_to,'customers, for a total outlay of $', incentives_paid_to * tp)
print('Total customer spend of these customers is $',
          churn_sample[churn_sample['Churn Probability'] > cutoffs[dex]]['Total Customer Spend'].sum())

[<matplotlib.lines.Line2D at 0x7f7efef1b9b0>]

Text(0.5, 0, 'Threshold')

Text(0, 0.5, 'Cost')

Text(0, 0.5, 'Number of customers below cutoff')

[<matplotlib.lines.Line2D at 0x7f7efc582eb8>]

Text(0.5, 1.0, 'Cost versus Threshold')

Cost is minimized near a cutoff of: 0.01 for a cost of: $ 1811.9551834 for these 1500 customers.
Incentive is paid to 1363 customers, for a total outlay of $ 811.9551834
Total customer spend of these customers is $ 81586.3

We can see that the cost to us changes. But it's pretty clear that an incentive of ~\$0.60 is unlikely to change many people's minds.

For better outcomes, we could even offer a range of incentives to customers that meet different criteria. For example, it's worth more to the business to prevent a high spend customer from churning than a low spend customer. We could also target the "grey area" of customers that have less loyalty and could be swayed by another company's advertising. Let's explore that now.

Preventing customer churn using mathematical optimization of incentive programs¶

Now let's use a more sophisticated approach to developing our customer retention program. We'd like to tailor our incentives to target the customers most likely to reconsider a "churn" decision.

Intuitively, we know that we do not need to offer an incentive to customers with a low churn probability. Also, above some threshold, we've already lost the customer's heart and mind, even if they haven't actually left yet. So the best target for our incentive is between those two thresholds - these are the customers we can convince to stay.

Let's formulate this as a mathematical optimization problem.

The problem under investigation is inherently stochastic in that each customer might churn or not, and might accept the incentive (offer) or not. Stochastic programming [1, 2] is an approach for modeling optimization problems that involve uncertainty. Whereas deterministic optimization problems are formulated with known parameters, real world problems almost invariably include parameters which are unknown at the time a decision should be made. An example would be the construction of an investment portfolio to maximize return. An efficient portfolio would be defined as the portfolio that maximizes the expected return for a given amount of risk (e.g. standard deviation), or the portfolio that minimizes the risk subject to a given expected return [3].

References: [1] S. Uryasev, P. M. Pardalos, Stochastic Optimization: Algorithm and Applications, Kluwer Academic: Norwell, MA, USA, 2001. [2] John R. Birge and François V. Louveaux. Introduction to Stochastic Programming. Springer Verlag, New York, 1997. [3] Francis, J. C. and Kim, D. (2013). Modern portfolio theory: Foundations, analysis, and new developments (Vol. 795). John Wiley & Sons.

\begin{align} N & : \text{number of customers} \\ i & \in \{1,\ldots,N\} \\ P_i & : \text{profit generated by customer i} \\ \alpha_i & : \text{probability customer i will churn} \\ c_i & : \text{discount or incentive to be offered to customer i} \\ C & = \sum^{N}_{i=1}c_i, \text{ total retention campaign budget} \\ \gamma_i & \in (0,1), \text{convincing factor for customer i} \\ \beta_i & = 1-e^{-\gamma_i c_i}, \text{ probability customer i will accept the discount $c_i$} \\ f(c_i) & = \sum^{N}_{i=1} P_i(1-\alpha_i) + \sum^{N}_{i=1} \beta_i(\alpha_i P_i - c_i), \text{expected total profit} \end{align}

Our goal is to optimally allocate the discount 𝑐𝑖 across the 𝑁 customers in order to maximize the expected total profit. Mathematically this is equivalent to the following optimization problem: \begin{aligned} & \underset{c_i}{\text{maximize}} & & f(c_i) \\ & \text{subject to} & & \sum^{N}_{i=1}c_i \leq C \\ &&& c_i \geq 0. \end{aligned}

For our situation:

We know the number of customers, N.
We can use their spend from their customer record as the (upper bound) estimate of the profit they generate, P.
We can use the churn score from our ML model as an estimate of the probability of churn, alpha.
The incentive, c, is what we'd like to calculate.
We'll use 1% of our total revenue as our campaign budget, C.
The probability that the customer will be swayed, beta, depends on how convincing the incentive is to the customer - which we've represented as $\gamma$.

That leaves $\gamma$, the convincing factor to be defined, below.

We set up our inputs: P, profit; alpha, our churn probabilities, from our model above; and C, our campaign budget.

# P: vector of the total customer spend
P = churn_sample['Total Customer Spend'].values
# alpha: vector of churn probabilities
alpha = churn_sample['Churn Probability'].values

print('Total budget is:', '${:,.2f}'.format(C))

Total budget is: $893.57

Now we can add a variable (gamma) that allows us to specify how likely we think each customer is to accept the offer and not churn - that is, how convincing they find the incentive.

While this is a matter of business judgment, we can use the graph above to inform that judgment. In this case, the business believes that if the churn probability is below 0.55, they are unlikely to churn, even without an incentive; on the other hand, if the customer's churn probability is above 0.95, the customer has little loyalty and is unlikely to be convinced. The real target for the incentives are the customers with churn probability between 0.55 and 0.95.

We could include that business insight into the optimization by setting the value for the convincing factor $\gamma$ as follows:

$\gamma_i$ = 100. This is equivalent to giving less importance as deciding factor to the discount $c_i$ for customers whose churn probability $\alpha_i$ is below 0.55 (they are loyal and less likeley to churn) and/or greater than 0.95 (they will most likely leave despite the retention campaign)
$\gamma_i$ = 1. This is equivalent to saying that the probability customer i will accept the discount $c_i$ is equal to $\beta = 1-e^{-c_i}$) for customer whose $\alpha_i$ $\in$ [0.55, 0.95]

Once we start to offer these incentives, we can log whether or not each customer accepts the offer and remains a customer. With that information, we can learn this function from experience, and use that learned function to develop the next set of incentives.

gamma = np.ones(N)
len(np.where(alpha > 0.95)[0])

29

indices_gamma_eq_zero = np.union1d(np.where(alpha > 0.95)[0], np.where(alpha < 0.55)[0])
gamma[indices_gamma_eq_zero] = 100
gamma

array([100., 100., 100., ..., 100., 100.,   1.])

Solving the optimization problem¶

There's a variety of open source solvers available that can solve this optimization problem for us. Examples include SciPy scipy.optimize.minimize, or faster open source solvers like GEKKO (https://gekko.readthedocs.io/en/latest/), which is what we use here. For large-scale problems, we would recommend using commercial optimization solvers like CPLEX or GUROBI.

!pip install gekko

Collecting gekko
  Downloading gekko-0.2.8-py3-none-any.whl (10.8 MB)
     |████████████████████████████████| 10.8 MB 3.8 MB/s eta 0:00:01
Requirement already satisfied: flask in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from gekko) (1.1.1)
Requirement already satisfied: numpy>=1.8 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from gekko) (1.18.1)
Collecting flask-cors
  Downloading Flask_Cors-3.0.8-py2.py3-none-any.whl (14 kB)
Requirement already satisfied: Jinja2>=2.10.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from flask->gekko) (2.11.1)
Requirement already satisfied: Werkzeug>=0.15 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from flask->gekko) (1.0.1)
Requirement already satisfied: itsdangerous>=0.24 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from flask->gekko) (1.1.0)
Requirement already satisfied: click>=5.1 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from flask->gekko) (7.1.1)
Requirement already satisfied: Six in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from flask-cors->gekko) (1.14.0)
Requirement already satisfied: MarkupSafe>=0.23 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (from Jinja2>=2.10.1->flask->gekko) (1.1.1)
Installing collected packages: flask-cors, gekko
Successfully installed flask-cors-3.0.8 gekko-0.2.8
WARNING: You are using pip version 20.0.2; however, version 20.2.2 is available.
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.

Note! Due to the stochastic nature of the algorithm, it may occasionally not converge on a solution. In these cases it's often solved by running the algorithm again; or, as a last resort, slightly modifying the value of C has been found to help the algorithm find a solution.

from gekko import GEKKO
m = GEKKO(remote=False)
m.options.SOLVER = 3 #IPOPT Solver
m.options.IMODE = 3

#C=1000
# variable array dimension
# create array
x = m.Array(m.Var,N)
for i in range(N):
    x[i].value = C / N
    x[i].lower = 0
    x[i].upper = 10000000
    
# create parameter
budget = m.Param(value = C)
ival_eq = [m.Intermediate(x[i]) for i in range(N)]
#ival_eq_2 = [m.Intermediate(x[i]) for i in range(int(N/2),N)]
 
m.Equation(sum(ival_eq)==budget)

beta =  [1 - m.exp(-gamma[i] *  x[i]) for i in range(N)]
ival = [m.Intermediate(beta[i] * (alpha[i] * P[i] - x[i])) for i in range(N)]
#ival_2 = [m.Intermediate(beta[i] * (alpha[i] * P[i] - x[i])) for i in range(int(N/2),N)]
m.Obj(-sum(ival))

# minimize objective
m.solve()
print(x)

<gekko.gekko.EquationObj at 0x7f7f042b7d30>

 ----------------------------------------------------------------
 APMonitor, Version 0.9.2
 APMonitor Optimization Suite
 ----------------------------------------------------------------
 
 
 --------- APM Model Size ------------
 Each time step contains
   Objects      :            0
   Constants    :            0
   Variables    :         1501
   Intermediates:         3000
   Connections  :            0
   Equations    :         3002
   Residuals    :            2
 
 Number of state variables:           1500
 Number of total equations: -            1
 Number of slack variables: -            0
 ---------------------------------------
 Degrees of freedom       :           1499
 
 solver            3  not supported
 using default solver: APOPT
 ----------------------------------------------
 Steady State Optimization with APOPT Solver
 ----------------------------------------------
 
 Iter    Objective  Convergence
    0 -4.81147E+03  4.77485E-12
    1 -7.47394E+03  1.13687E-13
    2 -1.16981E+04  7.10543E-15
    3 -1.16455E+04  4.26326E-14
    4 -8.83350E+03  1.49214E-13
    5 -9.28159E+03  7.10543E-15
    6 -8.89492E+03  0.00000E+00
    7 -1.19155E+04  1.80097E-07
    8 -6.87825E+02  8.03724E-08
    9 -1.13235E+04  0.00000E+00
 
 Iter    Objective  Convergence
   10 -1.18032E+04  2.13163E-14
   11 -1.19574E+04  3.19744E-14
   12 -1.18852E+04  2.48690E-14
   13 -1.20330E+04  1.77636E-14
   14 -1.21382E+04  1.42109E-14
   15 -1.23457E+04  6.03961E-14
   16 -1.14126E+04  3.19744E-14
   17 -1.16003E+04  1.77636E-14
   18 -1.10583E+04  9.59233E-14
   19 -6.17011E+03  1.77636E-14
 
 Iter    Objective  Convergence
   20 -1.23492E+04  7.10543E-14
   21 -1.21520E+04  1.27898E-13
   22 -1.21718E+04  1.13687E-13
   23 -1.19909E+04  4.26326E-14
   24 -1.23984E+04  1.13687E-13
   25 -1.24075E+04  1.42109E-14
   26 -1.24247E+04  1.42109E-13
   27 -1.24673E+04  0.00000E+00
   28 -1.23035E+04  2.70006E-13
   29 -1.18135E+04  3.69482E-13
 
 Iter    Objective  Convergence
   30 -3.34952E+03  3.41061E-13
   31 -1.23530E+04  2.70006E-13
   32 -1.19112E+04  4.26326E-14
   33 -1.24662E+04  2.13163E-13
   34 -1.24026E+04  2.84217E-14
   35 -1.24288E+04  4.26326E-14
   36 -1.24267E+04  7.10543E-14
   37 -1.24409E+04  7.10543E-14
   38 -1.24659E+04  0.00000E+00
   39 -1.11331E+04  1.56319E-13
 
 Iter    Objective  Convergence
   40 -1.24485E+04  1.13687E-13
   41 -1.24376E+04  2.84217E-14
   42 -1.23611E+04  2.27374E-13
   43 -1.24351E+04  1.42109E-13
   44 -1.24223E+04  5.68434E-14
   45 -1.24552E+04  1.42109E-13
   46 -1.23619E+04  1.70530E-13
   47 -1.24694E+04  5.68434E-14
   48 -1.24395E+04  1.42109E-14
   49 -1.24519E+04  1.42109E-14
 
 Iter    Objective  Convergence
   50 -1.24534E+04  1.13687E-13
   51 -1.24672E+04  5.68434E-14
   52 -1.24694E+04  4.26326E-14
   53 -1.24561E+04  4.26326E-14
   54 -1.24702E+04  5.68434E-14
   55 -1.12212E+04  1.42109E-14
   56 -1.24692E+04  2.13163E-13
   57 -1.24694E+04  2.70006E-13
   58 -1.24710E+04  5.11591E-13
   59 -1.24666E+04  1.27898E-13
 
 Iter    Objective  Convergence
   60 -1.24680E+04  4.68958E-13
   61 -1.24695E+04  2.55795E-13
   62 -1.23095E+04  3.41061E-13
   63 -1.24703E+04  4.68958E-13
   64 -1.24532E+04  1.13687E-13
   65 -1.24680E+04  1.84741E-13
   66 -1.24694E+04  1.13687E-13
   67 -1.24175E+04  1.56319E-13
   68 -1.23470E+04  5.68434E-14
   69 -1.24697E+04  2.84217E-14
 
 Iter    Objective  Convergence
   70 -1.24710E+04  1.70530E-13
   71 -1.24610E+04  1.98952E-13
   72 -1.24457E+04  2.84217E-14
   73 -1.24710E+04  2.55795E-13
   74 -1.24505E+04  3.55271E-13
   75 -1.24685E+04  2.84217E-14
   76 -1.24705E+04  1.42109E-14
   77 -1.24711E+04  7.10543E-14
   78 -1.24711E+04  1.13687E-13
   79 -1.24711E+04  1.13687E-13
 
 Iter    Objective  Convergence
   80 -1.24710E+04  7.10543E-14
   81 -1.24707E+04  1.13687E-13
   82 -1.24694E+04  1.42109E-14
   83 -1.24005E+04  1.27898E-13
   84 -1.24689E+04  1.13687E-13
   85 -1.24666E+04  3.83693E-13
   86 -1.24709E+04  2.27374E-13
   87 -1.24711E+04  7.10543E-14
   88 -1.24711E+04  1.84741E-13
   89 -1.24711E+04  3.83693E-13
 
 Iter    Objective  Convergence
   90 -1.24711E+04  2.27374E-13
   91 -1.24711E+04  5.68434E-14
   92 -1.24711E+04  4.26326E-14
   93 -1.24711E+04  1.42109E-13
   94 -1.24711E+04  1.42109E-13
   95 -1.24707E+04  2.84217E-14
   96 -1.24711E+04  1.42109E-13
   97 -1.24710E+04  9.94760E-14
   98 -1.24710E+04  2.84217E-14
   99 -1.24710E+04  1.13687E-13
 
 Iter    Objective  Convergence
  100 -1.24676E+04  4.26326E-14
  101 -1.24555E+04  1.13687E-13
  102 -1.24710E+04  4.26326E-14
  103 -1.24711E+04  8.52651E-14
  104 -1.24711E+04  1.42109E-13
  105 -1.24711E+04  4.26326E-14
  106 -1.24711E+04  0.00000E+00
  107 -1.24711E+04  5.68434E-14
  108 -1.24711E+04  8.52651E-14
  109 -1.24711E+04  1.42109E-13
 
 Iter    Objective  Convergence
  110 -1.24710E+04  1.42109E-13
  111 -1.24711E+04  1.70530E-13
  112 -1.24711E+04  9.94760E-14
  113 -1.24691E+04  2.84217E-14
  114 -1.24711E+04  0.00000E+00
  115 -1.24711E+04  8.52651E-14
  116 -1.24088E+04  2.84217E-14
  117 -8.34477E+03  2.84217E-14
  118 -1.17423E+04  9.94760E-14
  119 -1.19087E+04  2.84217E-14
 
 Iter    Objective  Convergence
  120 -1.07721E+04  1.42109E-14
  121 -1.19612E+04  1.56319E-13
  122 -1.11347E+04  2.27374E-13
  123 -1.17466E+04  8.88178E-15
  124 -1.21940E+04  1.33227E-15
  125 -1.23306E+04  4.44089E-15
  126 -1.17885E+04  1.33227E-15
  127 -1.24709E+04  2.27374E-13
  128 -1.06476E+04  9.94760E-14
  129 -1.24250E+04  1.13687E-13
 
 Iter    Objective  Convergence
  130 -1.24540E+04  2.84217E-14
  131 -1.24372E+04  1.56319E-13
  132 -1.24229E+04  4.26326E-14
  133 -1.24606E+04  1.42109E-14
  134 -1.24469E+04  7.10543E-14
  135 -1.24627E+04  1.13687E-13
  136 -1.16485E+04  1.42109E-14
  137 -1.24710E+04  1.42109E-14
  138 -9.30343E+03  5.68434E-14
  139 -1.18981E+04  1.27898E-13
 
 Iter    Objective  Convergence
  140 -1.12718E+04  2.70006E-13
  141 -1.22600E+04  1.42109E-13
  142 -1.12298E+04  7.10543E-14
  143 -1.24611E+04  1.27898E-13
  144 -1.24225E+04  1.98952E-13
  145 -1.23729E+04  7.10543E-14
  146 -1.04682E+04  1.13687E-13
  147 -1.20924E+04  5.68434E-14
  148 -1.11509E+04  5.68434E-14
  149 -1.16738E+04  1.42109E-14
 
 Iter    Objective  Convergence
  150 -1.20624E+04  1.42109E-14
  151 -1.24414E+04  1.42109E-14
  152 -1.24527E+04  0.00000E+00
  153 -1.18729E+04  1.42109E-14
  154 -1.16353E+04  1.42109E-14
  155 -1.23774E+04  1.42109E-14
  156 -1.22544E+04  1.70530E-13
  157 -1.24705E+04  9.94760E-14
  158 -8.66774E+03  1.42109E-14
  159 -1.22839E+04  1.27898E-13
 
 Iter    Objective  Convergence
  160 -1.12396E+04  7.10543E-14
  161 -1.24705E+04  1.98952E-13
  162 -1.24317E+04  1.27898E-13
  163 -1.23191E+04  7.10543E-14
  164 -9.67644E+03  5.68434E-14
  165 -1.21751E+04  2.84217E-14
  166 -1.21813E+04  7.10543E-14
  167 -1.22751E+04  2.84217E-14
  168 -1.05263E+04  5.68434E-14
  169 -1.24244E+04  7.10543E-14
 
 Iter    Objective  Convergence
  170 -1.21266E+04  9.94760E-14
  171 -1.21183E+04  1.27898E-13
  172 -1.19652E+04  7.10543E-14
  173 -1.23524E+04  1.13687E-13
  174 -1.23618E+04  1.27898E-13
  175 -1.23471E+04  2.84217E-14
  176 -1.22438E+04  5.68434E-14
  177 -1.24699E+04  1.56319E-13
  178 -1.22454E+04  2.13163E-13
  179 -1.24693E+04  1.98952E-13
 
 Iter    Objective  Convergence
  180 -1.23172E+04  1.42109E-14
  181 -1.24011E+04  2.13163E-13
  182 -1.24711E+04  2.41585E-13
  183 -1.24711E+04  2.41585E-13
 Successful solution
 
 ---------------------------------------------------
 Solver         :  IPOPT (v3.12)
 Solution time  :    128.328199999989      sec
 Objective      :   -12471.0658015244     
 Successful solution
 ---------------------------------------------------
 

[[0.087191431872] [0.089943445181] [0.077092360184] ... [0.077413960003]
 [0.091585091986] [5.9918941496]]

 # Gekko returns an array of arrays so transforming to array
x = np.array([a[0] for a in x])

We verify that the budget constraint C is met.

print('Total spend is', '${:,.2f}'.format(np.sum(x)), 'compared to our budget of', '${:,.2f}'.format(C))
print('Total customer spend is', '${:,.2f}'.format(churn_sample['Total Customer Spend'].sum()), 'for', len(churn_sample), 'customers.' )

Total spend is $893.57 compared to our budget of $893.57
Total customer spend is $89,356.77 for 1500 customers.

Now we evaluate the expected total profit for the following scenarios:

Optimal discount allocation, as calculated by our optimization algorithm
Uniform discount allocation - every customer is offered the same incentive
No discount

def expected_total_profit(x, gamma, alpha, P):
    # beta: vector of probabilities customer will accept the offer
    beta = 1  - np.exp(-gamma * (x))
    
    return np.sum(P * (1 - alpha)) + np.sum(beta * (alpha * P - x))

expected_total_profit_no_campaign = expected_total_profit(0, gamma, alpha, P)
expected_total_profit_optimal = expected_total_profit(x, gamma, alpha, P)
expected_total_profit_uniform_campaign = expected_total_profit((C/N)*np.ones(N), gamma, alpha, P)

plt.figure(figsize=(10, 6))
data = [expected_total_profit_optimal, expected_total_profit_uniform_campaign, expected_total_profit_no_campaign]
labels = ['Optimised Campaign', 'Naive Uniform Spend', 'No Campaign']
plt.xticks(range(len(data)), labels)
plt.xlabel('Budget Allocation Approach')
plt.ylabel('Expected Total Profit')
# plt.title('Benefit of Optimisation with N=%i Customers' %N)
plt.bar(range(len(data)), data)
ax = plt.gca()
fmt = '${x:,.0f}'
formatter = ticker.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(formatter)

plt.savefig("Profits from optimisation", transparent=True)
plt.show()

print("Expected total profit compared to no campaign:  %.0f%%" %(100*(expected_total_profit_optimal-expected_total_profit_no_campaign)/expected_total_profit_no_campaign))
print("Expected total profit compared to uniform discount allocation:  %.0f%%" %(100*(expected_total_profit_optimal-expected_total_profit_uniform_campaign)/expected_total_profit_uniform_campaign))

<Figure size 720x432 with 0 Axes>

([<matplotlib.axis.XTick at 0x7f7f04448390>,
  <matplotlib.axis.XTick at 0x7f7f04367be0>,
  <matplotlib.axis.XTick at 0x7f7f04367908>],
 <a list of 3 Text xticklabel objects>)

Text(0.5, 0, 'Budget Allocation Approach')

Text(0, 0.5, 'Expected Total Profit')

<BarContainer object of 3 artists>

Expected total profit compared to no campaign:  16%
Expected total profit compared to uniform discount allocation:  4%

Lastly, we add the discount to our customer data.

churn_sample['Convincing Factor'] = gamma
churn_sample['Optimal Discount'] = x

churn_sample['Optimal Discount'].hist(bins=20)
plt.axvline(x=C/N, linewidth=3, color='r')
plt.xlabel('Discount in $')
plt.ylabel('# of Subscribers Offered Discount')

<matplotlib.axes._subplots.AxesSubplot at 0x7f7f0490f828>

<matplotlib.lines.Line2D at 0x7f7efeef6860>

Text(0.5, 0, 'Discount in $')

Text(0, 0.5, '# of Subscribers Offered Discount')

We can see that this graph mirrors the histogram of churn probabilies, above: a large number of people are unlikely to churn, and they are offered a very small discount. A smaller number of people are likely to churn, and they are offered a larger discount.

The vertical line shows the discount offered by a naive, uniform allocation of the budget across all customers.

Now, for each customer we can see their total spend, and the optimal incentive to offer that customer. We can see that the discount varies by churn probability, and we're assured that the incentive campaign will fit within our budget.

churn_sample[['State', 'Area Code', 'Phone', 'Churn?', 'Total Customer Spend', 'Churn Probability', 'Optimal Discount', 
              'Convincing Factor']].sort_values(by='Optimal Discount', ascending=False).head(15)

churn_sample[['State', 'Area Code', 'Phone', 'Churn?', 'Total Customer Spend', 'Churn Probability', 'Optimal Discount', 
              'Convincing Factor']].sample(15)

Depending on the size of the total budget we allocate, we may occasionally find that we’re offering all customers a discount. This discount allocation problem reminds us of the water-filling algorithm in wireless communications [4,5], where the problem is of maximizing the mutual information between the input and the output of a channel composed of several subchannels (such as a frequency-selective channel, a time-varying channel, or a set of parallel subchannels arising from the use of multiple antennas at both sides of the link) with a global power constraint at the transmitter. More power is allocated to the channels with higher gains to maximize the sum of data rates or the capacity of all the channels. The solution to this class of the problems can be interpreted by a vivid description as pouring limited volume of water into a tank, the bottom of which has the stair levels determined by the inverse of the sub-channel gains.

Unfortunately our problem does not have an intuitive explanation as for the water-filling problem. This is due to the fact that, because of the nature of the objective function, the system of equations and inequalities corresponding to the KKT conditions [6] does not admit a closed form solution.

The optimal incentives calculated here are the result of an optimization routine designed to maximize an economic figure, which is the expected total profit. While this approach provides a principled way for Marketing teams to make systematic, quantitative and analytics-driven decisions, it is also important to recall that the objective function to be optimized is a proxy measure to the actual total profit. It goes without saying that we cannot compute the actual profit based on future decisions (e.g. this would paradoxically imply maximizing the actual return based on future values of the stocks). But we can explore new ideas using techniques such as the potential outcomes work [7], which could be leveraged to design strategies for back-testing of our solution.

References: [4] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. [5] D. P. Palomar and J. R. Fonollosa, “Practical algorithms for a family of water-filling solutions,” IEEE Trans. Signal Process., vol. 53, no. 2, pp. 686–695, Feb. 2005. [6] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004. [7] Imbens, G. W. and D. B. Rubin (2015): Causal Inference for Statistics, Social, and Biomedical Sciences, Cambridge University Press.

Conclusion¶

We’ve now taken another step towards preventing customer churn. We’ve built on the prior blog, where we integrated our customer data with our ML model to predict churn. We can now experiment with variations on this optimization equation, and see the effect of different campaign budgets or even different theories of how they should be modeled.

To gather more data on effective incentives and customer behavior, we could also test several campaigns against different subsets of our customers. We can collect their responses – do they churn after being offered this incentive, or not? – and use that data in a future ML model to further refine the incentives offered. We can use this data to learn what kinds of incentives convince customers with different characteristics to stay, and then use that new function within this optimization.

Now, we’re empowering Marketing with the tools to make data-driven decisions that they can quickly turn into action. This approach can drive fast iterations on incentive programs, moving at the speed with which our customers make decisions. Over to you, Marketing!

Clean-up¶

If you're ready to be done with this notebook, please run the cell below. This will remove the hosted endpoint you created and avoid any charges from a stray instance being left on.

sagemaker.Session().delete_endpoint(xgb_predictor.endpoint)

	State	Area Code	Phone	Churn?	Total Customer Spend	Churn Probability	Optimal Discount	Convincing Factor
832	SC	408	335-1874	True.	83.14	0.917121	6.478214	1.0
2708	MI	408	389-4608	True.	80.44	0.944693	6.474639	1.0
901	CT	510	370-5527	True.	85.46	0.885804	6.470634	1.0
331	MT	415	387-5453	True.	78.93	0.947161	6.457338	1.0
859	AL	408	374-9203	True.	79.37	0.940146	6.455318	1.0
1893	IN	510	330-9354	True.	78.80	0.917642	6.421701	1.0
2942	GA	415	328-5188	True.	80.52	0.896742	6.420118	1.0
978	VT	415	394-7447	True.	83.52	0.863931	6.419509	1.0
2210	UT	415	367-3220	True.	78.22	0.920567	6.417463	1.0
1933	PA	408	377-5043	True.	79.03	0.907595	6.413227	1.0
3265	ID	415	408-1913	True.	79.19	0.905222	6.412584	1.0
1078	ME	408	333-7631	True.	82.80	0.864165	6.410425	1.0
1736	NV	415	334-5029	True.	76.64	0.920564	6.395602	1.0
1472	MD	415	400-7002	True.	79.54	0.872084	6.377540	1.0
946	NJ	408	332-5949	True.	78.77	0.879278	6.375991	1.0

	State	Area Code	Phone	Churn?	Total Customer Spend	Churn Probability	Optimal Discount	Convincing Factor
2975	WV	415	382-3512	False.	50.73	0.014496	0.064396	100.0
2702	VT	510	333-9664	False.	54.50	0.022015	0.069551	100.0
1258	RI	510	422-8268	False.	68.17	0.051344	0.080563	100.0
70	NJ	408	359-1231	False.	72.08	0.036865	0.077753	100.0
285	SD	408	412-1194	False.	55.56	0.017662	0.067447	100.0
487	IN	415	363-3911	False.	73.41	0.061725	0.083185	100.0
2149	IA	415	341-6743	False.	74.54	0.061287	0.083268	100.0
1497	MT	510	393-3274	False.	55.45	0.025156	0.071119	100.0
2444	TX	415	408-9572	False.	76.81	0.066599	0.084449	100.0
2347	IL	415	340-6908	True.	60.66	0.837967	6.042959	1.0
2263	UT	510	370-9563	False.	71.76	0.013441	0.067272	100.0
547	VT	510	378-3508	True.	49.13	0.859363	5.840068	1.0
2249	FL	510	343-3340	False.	41.92	0.042234	0.073586	100.0
301	FL	415	416-1676	True.	78.74	0.959138	0.111469	100.0
270	PA	415	410-3390	False.	69.43	0.030898	0.075559	100.0

	State	Account Length	Area Code	Phone	Int'l Plan	VMail Plan	VMail Message	Day Mins	Day Calls	Day Charge	Eve Mins	Eve Calls	Eve Charge	Night Mins	Night Calls	Night Charge	Intl Mins	Intl Calls	Intl Charge	CustServ Calls	Churn?
0	KS	128	415	382-4657	no	yes	25	265.1	110	45.07	197.4	99	16.78	244.7	91	11.01	10.0	3	2.70	1	False.
1	OH	107	415	371-7191	no	yes	26	161.6	123	27.47	195.5	103	16.62	254.4	103	11.45	13.7	3	3.70	1	False.
2	NJ	137	415	358-1921	no	no	0	243.4	114	41.38	121.2	110	10.30	162.6	104	7.32	12.2	5	3.29	0	False.
3	OH	84	408	375-9999	yes	no	0	299.4	71	50.90	61.9	88	5.26	196.9	89	8.86	6.6	7	1.78	2	False.
4	OK	75	415	330-6626	yes	no	0	166.7	113	28.34	148.3	122	12.61	186.9	121	8.41	10.1	3	2.73	3	False.

	Churn?_True.	Account Length	VMail Message	Day Mins	Day Calls	Eve Mins	Eve Calls	Night Mins	Night Calls	Intl Mins	Intl Calls	CustServ Calls	...	State_WY	Area Code_415	Area Code_510	Int'l Plan_no	VMail Plan_no	VMail Plan_yes
1095	0	106	0	274.4	120	198.6	82	160.8	62	6.0	3	1	...	0	0	1	1	1	0
608	0	28	0	187.8	94	248.6	86	208.8	124	10.6	5	0	...	1	1	0	1	1	0
2908	1	148	0	279.3	104	201.6	87	280.8	99	7.9	2	2	...	0	1	0	1	1	0
943	0	132	0	191.9	107	206.9	127	272.0	88	12.6	2	1	...	0	0	1	1	1	0
693	0	92	29	155.4	110	188.5	104	254.9	118	8.0	4	3	...	0	0	1	1	0	1

	State	Account Length	Area Code	Phone	Int'l Plan	VMail Plan	VMail Message	Day Mins	Day Calls	Day Charge	Eve Mins	Eve Calls	Eve Charge	Night Mins	Night Calls	Night Charge	Intl Mins	Intl Calls	Intl Charge	CustServ Calls	Churn?	Total Customer Spend	Churn Probability
913	GA	50	408	377-1218	no	yes	24	214.3	129	36.43	289.8	55	24.63	312.5	130	14.06	10.6	4	2.86	1	False.	77.98	0.086318
97	AZ	99	415	327-3954	no	no	0	198.2	87	33.69	207.3	76	17.62	190.9	113	8.59	8.7	3	2.35	4	False.	62.25	0.142050
2935	DC	136	510	353-2763	no	no	0	204.5	63	34.77	208.8	95	17.75	224.0	119	10.08	9.8	2	2.65	0	False.	65.25	0.038172
3203	PA	142	510	340-6221	no	yes	40	230.7	101	39.22	256.8	88	21.83	263.9	92	11.88	6.4	3	1.73	1	False.	74.66	0.118553
1658	UT	111	510	347-4982	no	no	0	191.3	80	32.52	138.5	94	11.77	246.0	107	11.07	6.4	3	1.73	2	False.	57.09	0.018856