This article describes how ModelOp Center enables ongoing Drift Monitoring.
Table of Contents
Introduction
Monitoring incoming data for statistical drift is necessary to track whether assumptions made during model development are still valid in a production setting. For instance, a data scientist may assume that the values of a particular feature are normally distributed or the choice of encoding of a certain categorical variable may have been made with a certain multinomial distribution in mind. Tests should be run routinely against batches of live data and compared against the distribution of the training data to ensure that these assumptions are still valid, and if the tests fail, then appropriate alerts are raised for the data scientist or ModelOps engineer to investigate.
ModelOp Center provides a number of Drift monitors out of the box, but also allows you to write your own drift monitor. The subsequent sections describe how to add a drift monitor (assuming an out-of-the-box monitor) and the detailed makeup of a drift monitor for multiple types of models.
Adding Drift Monitors
To add a drift monitor to your model, you will add an existing “associated” model to your model. Below are the steps to accomplish this. For tutorial purposes, these instructions use all OOTB and OTS content provided by ModelOp.
Define KPIs and thresholds for model
Edit the provided
Data-drift.dmn
file to reflect your desired tolerance for data driftRepeat for the provided
Concept-drift.dmn
filePerformance-test.dmn
Save the files locally to your machine.
Associate Monitor models to snapshot
Navigate to the specific model snapshot
Using the Associated Models widget, create a data drift association
Use the provided data and the DMN you made in step 2.
Use the provided data and the DMN you made in step 2.
Before leaving the Model snapshot screen, copy the ID from the URL bar, you’ll need this for later
To test, run a monitoring job manually
Make a REST call to MOC’s automation engine to trigger a data drift detection job on your model
Obtain a valid auth token
Make a call to the MLC API to initiate the monitor:
Drift Monitor Details
As the same data set may serve several models, you can write one drift detection model to associate to several models. This association is made during the Model Lifecycle process. The drift model can compare the training data of the associated models to a given batch of data. The following is a simple example:
import pandas as pd import numpy as np from scipy.stats import ks_2samp from scipy.stats import binom_test # modelop.init def begin(): """ A function to read training data and save it, along with it numerical features, globally. """ global train, numerical_features train = pd.read_csv('training_data.csv') numerical_features = train.select_dtypes(['int64', 'float64']).columns # modelop.metrics def metrics(data): """ A function to compute KS p-values on input (sample) data as compared to training (baseline data) """ ks_tests = [ks_2samp(train.loc[:, feat], data.loc[:, feat]) \ for feat in numerical_features] pvalues = [x[1] for x in ks_tests] ks_pvalues = dict(zip(numerical_features, pvalues)) yield dict(pvalues=ks_pvalues)
This drift model executes a two-sample Kolmogorov-Smirnov test between numerical features of the training data and the incoming batch and reports the p-values. If the p-values are sufficiently large (over 0.01 or 0.05), you can assume that the two samples are similar. If the p-values are small, you can assume that these samples are different and generate an alert.
If the training data is too large to fit in memory, you can save summary statistics about the training data and save those as, e.g., a pickle file and read those statistics in during the init function of the drift model. The metrics function can contain other statistical tests to compare those statistics to the statistics of the incoming batch.
Spark Drift Model Details
A similar drift detection method may be used for PySpark models with HDFS assets by parsing the HDFS asset URLs from the parameters of the metrics function. The following is a simple example:
from pyspark.sql import SparkSession from pyspark.sql.functions import col from pyspark.sql.functions import isnull, when, count from pyspark.sql.functions import udf from pyspark.sql.types import ArrayType, FloatType from pyspark.ml.feature import StringIndexer from pyspark.ml.feature import VectorAssembler from pyspark.ml.classification import RandomForestClassificationModel from pyspark.ml.evaluation import MulticlassClassificationEvaluator # modelop.init def begin(): print("Begin function...") global SPARK SPARK = SparkSession.builder.appName("DriftTest").getOrCreate() global MODEL MODEL = RandomForestClassificationModel.load("/hadoop/demo/titanic-spark/titanic") # modelop.metrics def metrics(external_inputs, external_outputs, external_model_assets): # Grab single input asset and single output asset file paths input_asset_path = external_inputs[0]["fileUrl"] output_asset_path = external_outputs[0]["fileUrl"] input_df = SPARK.read.format("csv").option("header", "true").load(input_asset_path) predictions = predict(input_df) # Select (prediction, true label) and compute test error evaluator = MulticlassClassificationEvaluator( labelCol="Survived", predictionCol="prediction", metricName="accuracy" ) accuracy = evaluator.evaluate(predictions) output_df = SPARK.createDataFrame([{"accuracy": accuracy}]) print("Metrics output:") output_df.show() output_df.coalesce(1).write.mode("overwrite").option("header", "true").format( "json" ).save(output_asset_path) SPARK.stop() def predict(input_df): dataset = input_df.select( col("Survived").cast("float"), col("Pclass").cast("float"), col("Sex"), col("Age").cast("float"), col("Fare").cast("float"), col("Embarked"), ) dataset = dataset.replace("?", None).dropna(how="any") dataset = ( StringIndexer(inputCol="Sex", outputCol="Gender", handleInvalid="keep") .fit(dataset) .transform(dataset) ) dataset = ( StringIndexer(inputCol="Embarked", outputCol="Boarded", handleInvalid="keep") .fit(dataset) .transform(dataset) ) dataset = dataset.drop("Sex") dataset = dataset.drop("Embarked") required_features = ["Pclass", "Age", "Fare", "Gender", "Boarded"] assembler = VectorAssembler(inputCols=required_features, outputCol="features") transformed_data = assembler.transform(dataset) predictions = MODEL.transform(transformed_data) return predictions
This model uses a Spark MulticlassClassificationEvaluator
to determine the accuracy of the predictions generated by the titanic model.
Next Article: Statistical Monitoring >