...
Table of Contents |
---|
Introduction
Monitoring incoming data a model for its statistical drift performance is necessary to track whether assumptions made during model development are still valid in a production setting. For instance, a data scientist may assume that the values of a particular feature are normally distributed or the choice of encoding of a certain categorical variable may have been made with a certain multinomial distribution in mind. Tests the model is producing good output (inferences/scores) as compared to actual ground truth. These statistical metrics provide excellent insight into the predictive power of the model, including helping to identify degradation in the model’s ability to predict correctly. These statistical monitors should be run routinely against batches of live labeled data and compared against the distribution of the training data to ensure that these assumptions are still valid, and if the tests fail, then original metrics produced during training to ensure that the model is performing within specification. If the production statistical metrics deviate beyond a set threshold, then the appropriate alerts are raised for the data scientist or ModelOps engineer to investigate.
ModelOp Center provides a number of Drift statistical monitors out of the box, but also allows you to write your own drift monitorcustom metrics to monitor the statistical performance of the model. The subsequent sections describe how to add a drift statistical monitor (assuming an out-of-the-box monitor) and the detailed makeup of a drift statistical monitor for multiple types of models.
Adding
...
Statistical Monitors
As background on the terminology and concepts used in the below, please read the Monitoring Concepts section of the Model overview documentation.
...
As mentioned in the Monitoring Concepts article, ModelOp Center uses decision tables to define the thresholds within which the model should operate for the given monitor.
The first step is to define these thresholds. For this tutorial, we will leverage the example
Performance-test.dmn
decision table. This assumes that the out-of-the-box metrics function in the Consumer Credit Default example model is used, which outputs AUC, ROC, F1, amongst others. Specifically, this decision table ensures that the F1 and AUC from the Consumer Linear Demo model are within specification.Save the files locally to your machine.
...
Schedule. Monitors can be scheduled to run using your preferred enterprise scheduling capability (Control-M, Airflow, Autosys, etc.)
While the details will depend on the specific scheduling software, at the highest level, the user simply needs to create a REST call to the ModelOp Center API. Here are the steps:
Obtain the Model snapshot’s unique ID, which can be obtained from the Model snapshot screen. Simply copy the ID from the URL bar:
Example:
Within the scheduler, configure the REST call to ModelOp Center’s automation engine to trigger the monitor for your model:
Obtain a valid auth token
Make a call to the ModelOp Center API to initiate the monitor
Example:
Code Block { "name": "com.modelop.mlc.definitions.Signals_MODEL_BACK_TEST", "variables": { "MODEL_ID":
{ "value": "FILL-IN-SNAPSHOT-GUID" } } }
For more details on triggering monitors, visit the article Triggering Metrics Tests.
Monitoring Execution: once the scheduler triggers the monitoring job, the relevant model life cycle will initiated the specific monitor, which likely includes:
Preparing the monitoring job with all artifacts necessary to run the job
Creating the monitoring job
Parsing the results into viewable test results
Comparing the results against the thresholds in the decision table
Taking action, which could include creating a notification and/or opening up an incident in JIRA/ServiceNow/etc.
...
All monitor job results are persisted and can be viewed directly by clicking the specific “result” in the “Model Tests” section of the model snapshot page:
Statistical Monitor Details
...