Data & Concept Drift Monitoring
This article describes how ModelOp Center enables ongoing Data Drift and Concept Drift Monitoring.
Table of Contents
Introduction
Monitoring data - input and output (concept) - for drift is necessary to track whether assumptions made during model development are still valid in a production setting. For instance, a data scientist may assume that the values of a particular feature are normally distributed or the choice of encoding of a certain categorical variable may have been made with a certain multinomial distribution in mind. Tests should be run routinely against batches of live data and compared against the distribution of the training/reference data to ensure that these assumptions are still valid; if the tests fail, appropriate alerts should be raised for the data scientist or ModelOps engineer to investigate.
ModelOp Center provides a number of Drift monitors out-of-the-box (OOTB) but also allows you to write your own drift monitor. The subsequent sections describe how to add a drift monitor - assuming an OOTB monitor - and the detailed makeup of a drift monitor for multiple types of models.
Adding Drift Monitors
As background on the terminology and concepts used below, please read the Monitoring Concepts section of the Model overview documentation.
To add drift monitoring to your business model, you will add an existing “Monitor” to a snapshot (deployable model) of the business model under consideration. Below are the steps to accomplish this. For tutorial purposes, these instructions use all out-of-the-box and publicly available content provided by ModelOp, focusing on the German Credit Model and its related assets.
Associate a Monitor to a Snapshot of a Business Model
In MOC, navigate to the business model to be monitored. In our example here, that’s the German Credit Model.
Navigate to the specific snapshot of the business model. If no snapshots exist, create one.
On the Monitoring tab, click on
+ Add
, then click onMonitor
Search for (or select) the
Data Drift Monitor: Comprehensive Analysis
from the list of OOTB monitors.Select a snapshot of the monitor. By default, a snapshot is created for each OOTB monitor
On the
Input Assets
page, you’ll notice that two assets are required: A baseline data asset and a sample data asset. This is because a drift monitor compares a slice of production data (sample) to a reference data set (baseline). For our example, selectdf_baseline_scored.json
as theBaseline Data Asset
anddf_sample_scored.json
as theSample Data Asset
. Since these files are already assets of the business model, we can find them underSelect Existing
On the
Threshold
page, click onADD A THRESHOLD
, then select the.dmn
filedata_drift_DMN.dmn
. Since the file is already an asset of the business model, we can find it underSelect Existing
. If the business model does not have a.dmn
asset, the user may upload on from a local directory during the monitor association process. More on thresholds and decision tables in the next section.The last step in adding a monitor is adding an optional schedule. To do so, click on
ADD A SCHEDULE
. TheSchedule Name
field is free-form. TheSignal Name
field is a dropdown. Choose a signal that corresponds to your ticketing system (Jira, ServiceNow). Lastly, set the frequency of the monitoring job. This can be done either by the wizard or by entering a cron expression. Note: schedules are optional; a monitor may be run on-demand from the business model’s snapshot page, or by a curl command.On the
Review
page clickSAVE
To run a monitor on demand, click on COPY CURL TO RUN JOB EXTERNALLY
. The CURL command can then be run from the application of your choosing.
Define thresholds for your model
As mentioned in the Monitoring Concepts article, ModelOp Center uses decision tables to define the thresholds within which the model should operate for the given monitor.
The first step is to define these thresholds. For this tutorial, we will leverage the example
data_drift_DMN.dmn
decision table. Specifically, this decision table ensures that thecredit_amount_ks_pvalue
andinstallment_rate_js_distance
metrics of the German Credit Model are within specification.credit_amount_ks_pvalue
is the p-value returned by the Kolmogorov-Smirnov 2-sample test, for the featurecredit_amount
. If the p-value is sufficiently large (say, for example over 0.05), you can assume that the two samples are similar. If the p-value is small, you can assume that these samples are different and generate an alert.The
credit_amount_ks_pvalue
andinstallment_rate_js_distance
values can be accessed directly from the Monitoring Test Result by design. More metrics are produced OOTB by the drift monitor. We will discuss this in more detail later.In our example, the
.dmn
file is already an asset of the business model and versioned/managed along with the source code in the same Github repo. This is considered best practice, as the decision tables are closely tied to the specific business model under consideration. However, it is not a requirement that the.dmn
files are available as model assets ahead of time.
Run a Monitor On-demand (UI)
To run a monitor on-demand from the UI, navigate to the business model’s snapshot page and click the play button next to the monitor of interest. A monitoring job will be initiated, and you will be redirected to the corresponding job page once the job is created.
Schedule a Monitor DIY (CURL)
Monitors can be scheduled to run using your preferred enterprise scheduling capability (Control-M, Airflow, Autosys, etc.) While the details will depend on the specific scheduling software, at the highest level, the user simply needs to create a REST call to the ModelOp Center API. Here are the steps:
Obtain the Business Model snapshot’s UUID. This can be found, for instance, in the URL of the snapshot page, as shown in this example:
Similarly, obtain the Monitoring Model snapshot’s UUID.
Within the scheduler, configure the REST call to ModelOp Center’s automation engine to trigger the monitor for your model:
Obtain a valid
auth
tokenMake a call (POST) to the ModelOp Center API to initiate the monitor. The endpoint is
<MOC_INSTANCE_URL>/mlc-service/rest/signal
The body should contain references to the Model Life Cycle (MLC) being triggered, as well as the business model and monitor snapshots, as shown below:
{ "name": "com.modelop.mlc.definitions.Signals_Run_Associated_Model_Jira", "variables": { "DEPLOYABLE_MODEL_ID" : { "value": <UUID_of_business_model_snapshot_as_a_string> }, "ASSOCIATED_MODEL_ID": { "value": <UUID_of_monitoring_model_snapshot_as_a_string> } } }
This process is made easier by copying the CURL command provided at the last step of the monitoring wizard
The copied command will look something like this:
curl 'http://localhost:8090/mlc-service/rest/signalResponsive' -H 'Accept: application/json, text/plain, /' -H 'Content-Type: application/json' -X POST -H 'Authorization: Bearer <token>' --data-raw '{"name":"com.modelop.mlc.definitions.Signals_Run_Associated_Model_Jira","variables":{"DEPLOYABLE_MODEL_ID":{"value":"23282688-62a6-47ae-8603-16f380efca57"},"ASSOCIATED_MODEL_ID":{"value":"1dc64c1e-3634-4e2e-b37d-71d04a9ee5ef"}}}'
Monitoring Execution
Once the scheduler triggers the signal, the corresponding MLC (listening to that signal) will be initiated. The sequence of events include:
Preparing the monitoring job with all artifacts necessary to run the job
Creating the monitoring job
Parsing the results into viewable test results
Comparing the results against the thresholds in the decision table
Taking action, which could include creating a notification and/or opening up an incident in JIRA/ServiceNow/etc.
These steps can be summarized in the following Model Life Cycle (MLC)
Monitoring Results and Notifications
Sample Standard Output of Data Drift Monitors
Monitoring Test Results are listed under the Test Results
table:
Upon clicking on the “View” icon, you’ll have two options for looking at test results: graphical (under “Test Results”), and raw (under “Raw Results”).
Visual elements
Summary Metrics: these are a subset of all metrics computed by the monitor, returned as
key:value
pairs for ease-of-reference. Below is a portion of the table:Data Drift Metrics
Summary Metrics
Kolmogorov-Smirnov p-values
Jensen-Shannon distances
Kullback-Leibler divergences
Epps-Singleton p-values
Raw Results
The “Raw Results” tab shows a clickable (expandable and collapsable) JSON representation of the test results.
To get a JSON file of the test results,
Navigate to the “Jobs” tab of the snapshot and click on “Details” next to the monitoring job of interest
Click on “Download File” under “Outputs”
Note that the top key:value
pairs are what gets shown in the “Summary Metrics” table.
Sample Monitoring Notification
Notifications arising from monitoring jobs can be found under the corresponding model test result.
If a ticketing system is configured in ModelOp Center, such as Jira, a ticket will be written when an ERROR occurs (as in above), and a link to the ticket will be available next to the notification. In the example above, a metric fell out of a preset threshold, and thus the monitoring job failed.
Drift Monitors Details
Choosing a drift monitor for a business model depends in practice on the particular model in consideration. For example, a binary classification model can be best monitored for concept drift by running a Summary
test (basic statistics), instead of a 2-sample test, since there are only two possible outcomes, and thus a very small range for the random variable. In addition, feature types (numerical vs categorical - also referred to in MOC terminology as dataClass
) play an important role in choosing the right monitor. Some monitors, such as Kullback-Liebler
(KL) accommodate both numerical and categorical data, whereas others (usually 2-sample tests such as Kolmogorov-Smirnov
or Epps-Singleton
) work only on numerical features.
This being said, model-type are feature dataClass
are the only abstractions to consider when choosing a drift monitor. Out-of-the-box monitoring takes care of the rest.
Out-of-the-Box Monitors
The following is the list of OOTB monitors that are currently implemented, as well as their source code from the SciPy library:
Epps-Singleton 2-Sample Test
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.epps_singleton_2samp.html
Test to see if two samples have the same underlying distribution. Returns a p-value. Samples do not have to be continuous.
If the output of the Epps-Singleton test on two distributions is a p-value that is less than a certain threshold (i.e. 0.05), then we can reject the null hypothesis that the two samples come from a similar underlying distribution. When applied to a feature (or a target variable) of a dataset, we can determine if there is drift between a baseline and a sample dataset in that feature (or target variable).
Remarks:
Null values in the samples will cause the Epps-Singleton test to fail. As such, null values are dropped when calculating the Epps-Singleton test.
The Epps-Singleton test will fail when there are less than five values in each sample. In such cases, the Epps-Singleton test will return a
null
metric
Kolmogorov-Smirnov 2-Sample Test
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html#scipy.stats.ks_2samp
Test to see the goodness-of-fit of the underlying distributions of two samples. Returns a p-value. Only works on continuous distributions of data.
If the output of the Kolmogorov-Smirnov test on two distributions is a p-value that is less than a certain threshold (i.e. 0.05), then we can reject the null hypothesis that the two samples have an identical underlying distribution. When applied to a feature (or a target variable) of a dataset, we can determine if there is drift between a baseline and a sample dataset in that feature (or target variable).
Jensen-Shannon Distance
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jensenshannon.html
Computes the Jensen-Shannon distance between two distributions, which is the square root of the Jensen-Shannon divergence metric.
The output of the Jensen-Shannon distance calculation is not a p-value, like the Epps-Singleton or the Kolmogorov-Smirnov tests, but a distance. As such, there is not a one-case-fits-all or a universally accepted value that shows that the two distributions are significantly different. However, it is useful to keep track of how the distances of two distributions might change over time.
Remarks:
Null values in the samples will cause the Jensen-Shannon distance to fail. As such, null values are dropped when calculating the Jensen-Shannon distance.
Because the Jensen-Shannon distance attempts to fit a Gaussian KDE on the samples, an error occurs when there is little to no variance in the samples (i.e. all constant values). In such cases, the Jensen-Shannon distance will return a
null
metric.
Kullback-Leibler Divergence
https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.kl_div.html
Computes the Kullback-Leibler divergence metric (also called relative entropy) between two distributions. Computes by bucketing the samples, computing the element-wise Kullback-Leibler divergence metric, then sums each bucket for the final divergence metric over the samples. Because the Kullback-Leibler divergence is asymmetric, the order in which the samples are input into the calculation might output slightly differing results.
The output of the Kullback-Leibler divergence calculation is not a p-value (like the Epps-Singleton and Kolmogorov-Smirnov tests), nor is it a distance (like the Jensen-Shannon distance), but rather a metric to inform how divergent two distributions might be. Like the Jensen-Shannon distance, there is no one-case-fits-all or universally accepted value to determine if two distributions are significantly different, but the Kullback-Leibler divergence provides one more option in detecting possible drift.
Remarks:
It is possible that the Kullback-Leibler Divergence will return a value of
Inf
(when the support of one sample is not contained within the support of the other sample, or when one sample distribution has a much “wider tail” than the other). In such cases, the order of the samples will be reversed and the Kullback-Leibler Divergence will be recalculated (with an appropriatelogger.warning
raised). However, in the case that even the reversed order of samples returnsInf
, the Kullback-Leibler Divergence will return anull
metric.
Model Assumptions
Business Models considered for drift monitoring have a couple of requirements:
An extended schema asset for the input data.
Input data contains at least one
numerical
column and/or onecategorical
column. The exact requirement depends on the specific monitor being used.
Model Execution
During execution, drift monitors execute the following:
The
init
function extracts the extended input schema from job JSON.monitoring parameters are set based on the schema extracted previously.
numerical_columns
andcategorical_columns
are determined accordingly. In the case of concept drift monitoring,target_column
(score column) andlabel_type
(numerical
vs.categorical
) are determined at this step.The
metrics
function runs the appropriate drift monitoring test: Epps-Singleton, Jensen-Shannon, Kullback-Leibler, Kolmogorov-Smirnov, or Pandas.describe(). When the drift monitor (data drift or concept drift) is a comprehensive monitor, all the tests above are performed.Test results are appended to the list of
data_drift
orconcept_drift
tests to be returned by the model, and key-value pairs are added to the top-level of the output dictionary.
For a deeper look at OOTB monitors, see the Python Monitoring package documentation (subscribed Customers only): https://modelopdocs.atlassian.net/wiki/spaces/py132