Build a Custom Test or Monitor
This article provides an overview of ModelOp Center’s Model Monitoring approach, including the use of various metrics to enable comprehensive monitoring throughout the life cycle of a model.
Table of Contents
Writing a Custom Monitor
The Metrics Function allows you to define custom metrics that you would like to monitor for your model. This metrics function would be included in the source code that is registered as a model in ModelOp Center and then added as an associated model for monitoring. You can use the Metrics Job to manually execute this script against data, or use an MLC Process to trigger automatic execution. See Model Batch Jobs and Tests for more information.
You can specify a Metrics Function either with a # modelop.metrics
smart tag comment before the function definition or you can select it within the Command Center after the model source code is registered. The Metrics Function executes against a batch of records and yields test results as a JSON object of the form {“metric_1”: <value_1>, …, “metric_n”: <value_n>}
. These values are used to populate the Test Results visuals within the UI (as seen at the bottom of this page).
Python Custom Monitor Example
Here is an example of how to code a Metrics Function. It is calculating the ROC Curve, AUC, F2, and the Confusion Matrix.
# modelop.metrics
def metrics(x):
lasso_model = lasso_model_artifacts['lasso_model']
dictionary = lasso_model_artifacts['dictionary']
threshold = lasso_model_artifacts['threshold']
tfidf_model = lasso_model_artifacts['tfidf_model']
actuals = x.flagged
cleaned = preprocess(x.content)
corpus = cleaned.apply(dictionary.doc2bow)
corpus_sparse = gensim.matutils.corpus2csc(corpus).transpose()
corpus_sparse_padded = pad_sparse_matrix(sp_mat = corpus_sparse,
length=corpus_sparse.shape[0],
width = len(dictionary))
tfidf_vectors = tfidf_model.transform(corpus_sparse_padded)
probabilities = lasso_model.predict_proba(tfidf_vectors)[:,1]
predictions = pd.Series(probabilities > threshold, index=x.index).astype(int)
confusion_matrix = sklearn.metrics.confusion_matrix(actuals, predictions)
fpr,tpr,thres = sklearn.metrics.roc_curve(actuals, predictions)
auc_val = sklearn.metrics.auc(fpr, tpr)
f2_score = sklearn.metrics.fbeta_score(actuals, predictions, beta=2)
roc_curve = [{'fpr': x[0], 'tpr':x[1]} for x in list(zip(fpr, tpr))]
labels = ['Compliant', 'Non-Compliant']
cm = matrix_to_dicts(confusion_matrix, labels)
test_results = dict(
roc_curve=roc_curve,
auc=auc_val,
f2_score=f2_score,
confusion_matrix=cm
)
yield test_results
Here is an example of expected output from this function:
{
"roc_curve":
[
{"fpr": 0.0, "tpr": 0.0},
{"fpr": 0.026, "tpr": 0.667},
{"fpr": 1.0, "tpr": 1.0}
],
"auc": 0.821,
"f2_score": 0.625,
"confusion_matrix":
[
{"Compliant": 76, "Non-Compliant": 2},
{"Compliant": 1, "Non-Compliant": 2}
]
}
R Custom Monitor Example
# import librarieslibrary(tidymodels)library(readr)
# modelop.init
begin <- function() {
# run any steps for when the monitor is loaded on the ModelOp runtime
}
# modelop.metricsmetrics <- function(data) {
df <- data.frame(data)
get_metrics <- metric_set(f_meas, accuracy, sensitivity, specificity, precision)
output <- get_metrics(data = df, truth = as.factor(label_value), estimate = as.factor(score))
mtr <- list(PerformanceMetrics=output) emit(mtr)
}
Custom Monitor Output - Charts, Graphs, Tables
To have the results of your custom monitor displayed as a Bar Graph, Line Chart, or Table, the output of your metrics function should leverage the following format:
Define the required assets for your monitor
You can specify a file named “required_assets.json” in your custom monitor repository and ModelOp Center will use it to enforce the acquisition of such required assets and passed to the execution of the monitor.
A sample required_assets.json looks like below:
From the above example please note the following usage rules that define the expected (at least initially) fields to be populated in each of those base asset objects, none of them is required except 'usage':
usage (required)- to specify if the usage is meant to be as INPUT_ASSET or as ADDITIONAL_ASSET. This will tell the UI and the MLC if this asset is meant to be passed to the job as data or as additionalAssets.
assetRole (optional) - If we want to match only a specific asset role, you can define in here.
name (optional) - A regex expression that allows asset matching by name if we needed two assets of the same role, but different names.
description field within metaData (optional) - Can also be provided to be used in the Monitor Wizard screen as a hint to the user of what the asset represents to the monitor model.
filename (optional) - If the monitor model requires the file to have a specific name, the filename field can be used to specify the final file name. Using the example from the JSON above this means that you can allow choosing any file with extension '.pkl' (based on the regex at the “name” field), but always place into the job as “explainer_shap.pkl” (the value provided here in filename).
Ability to add non-data assets
From the section above “Define the required assets for your monitor”, we can define assets to be provided as input for the metrics function. This happens essentially because we are defining the usage as “INPUT_ASSET”, which means it will be fed into the metric function as an argument. But if we specify the ‘usage’ as “ADDITIONAL_ASSET” the file will simply be materialized into the runtime at execution time and will be available for the model to load it at will. In the example above, the monitor model can be sure to load a file “explainer_shap.pkl” that the user would have provided by uploading a file with the extension “.pkl” but would be renamed to this chosen filename, thus allowing it to be found by the model with certainty. As in the example above, this is especially useful for files that don’t represent data.
Next Article: Monitor Output Structure >