Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

This article provides an overview of ModelOp Center’s Model Monitoring approach, including the use of various metrics to enable comprehensive monitoring throughout the life cycle of a model.

Table of Contents

Introduction

ModelOp Center provides comprehensive operational, quality, risk, and process monitoring throughout the entire life cycle of a model. ModelOp Center uses the concept of an “associated model” that allows the user to “associate” specific monitors for the model and run these monitors routinely--either on a scheduled or triggered basis. Monitors are associated models that can be tied to one or more business models, or “base models”.

ModelOp Center ships with a number of monitors out of the box, which the user can select and use without modification. Additionally, the user may decide to write his/her own custom monitoring function which can be registered as an associated model and set to run for the user’s model. ModelOp also provides a monitoring SDK in the form of a Python package to assist in writing custom monitoring functions or supplementing the out of the box monitors. This gives the enterprise the flexibility to select the best metrics to monitor their unique requirements from a business, technical, and risk perspective. Furthermore, these monitors are integrated into model life cycles, allowing the user to not only observe issues via the monitor, but to automatically compare the monitor outcomes against model-specific thresholds and take remediation action if there are deviations.

The subsequent sections provide an overview of monitor selection as well as how to test monitors within ModelOp Center. Subsequent articles go into detail on enabling statistical monitoring, drift monitoring, and ethical fairness monitoring.

Monitoring Concepts

As background, ModelOp Center treats all “monitors” as models themselves, which allows for reuse and robust governance and auditability around these critical monitors that are ensuring that an enterprise’s decisioning assets are performing optimally and within governance thresholds.

Additionally, ModelOp Center uses decision tables to determine if a model is running within the desired thresholds. Decision tables are an industry standard approach to allow for defining various rules by which a decision should be made. ModelOp specifically chose to incorporate decision tables for monitoring as our experience has shown that there are a number of factors that weigh into whether a model is actually having an issue, often combining technical, statistical, business, and other metadata to ascertain if the model is operating out of bounds. ModelOp Center provides data scientists and ModelOps engineers the flexibility to incorporate these varying requirements to provide more precise monitoring and alerting when a model begins operating out of specification.

Choosing Evaluation Metrics

To test the efficacy of a model, a metric should be chosen during model development and used to benchmark the model. The chosen metric should reflect the underlying business problem. For instance, in a binary classification problem with very unbalanced class frequencies, accuracy is a poor choice of metric. A “model” which always predicts that the more common class will occur will be very accurate, but will not do a good job of predicting the less frequent class.

Take compliance in internal communications as an example. Very few internal communications may be non-compliant, but a model which never flags possible non-compliance is worthless even if it is highly accurate. A better metric, in this case, is an F1 score or an Fβ score for β> 1 more generally. The latter will reward the model more for true positives and punish the model for false negatives, occurrences where the model fails to detect non-compliant communication. 

Similarly, for regression problems, the data scientist should decide on a metric based on whether a few bad errors with most being small is preferable in which case she should use mean absolute error (MAE); or whether no errors should exceed a particular threshold in which case the data scientist should use the max error. A metric like a root mean squared error (RMSE) interpolates between these cases. 
There are metrics for every type of problem: multi-class classification, all varieties of regression, unsupervised clustering, etc. They can range from quite simple to quite intricate, but whatever the problem, a metric should be decided upon early in development and used to test a model as it is promoted to UAT and then into production. Here are some tests it might encounter along the way.

  • The F1 score is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct positive results divided by the number of all positive results returned by the classifier, and r is the number of correct positive results divided by the number of all relevant samples 

  • SHAP values (interpretability), are used on a per-record basis to justify why a particular record or client got the score they did. This makes SHAP fit into the action/scoring function more than it does in the Metrics Function

  • The ROC Curve to determine the ratio of true positives to false positives

  • The AUC (Area Under the ROC Curve)

Note: There can be other items that determine which model to promote to production. For example, the situation may favor a model with better inference speed, interpretability, etc.

Out of the Box Metrics

ModelOp Center ships with multiple out-of-the-box monitors, which are registered as associated models. The user may add one of these associated monitors to his/her model or decide to write a custom metric function (see next section). These monitors can also be customized via the ModelOp monitoring Python package. See here for documentation on the monitoring package. Here is a sampling of out of the box monitors across 4 categories:

Operational Performance:

Automatically monitor model operations to ensure that models are running at agreed-upon service levels and delivering decisions at the rate expected. Operational performance monitors include:

  • Model availability and SLA performance

  • Data throughput and latency with inference execution

  • Volume and frequency of input requests for the

    application

  • Input data adherence to the defined schema for model

  • Input data records for inferences are within established

    range

Quality Performance:

Ensure that model decisions and outcomes are within established data quality controls, eliminating the risk of unexpected and inaccurate decisions. Quality performance monitors include:

  • Data drift of input data

  • Concept drift of output

  • Statistical effectiveness of model output

Risk Performance

Controlling risk and ensuring models are constantly operating within established business risk and compliance ranges as well as delivering ethically fair results is a constant challenge. Prevent out-of-compliance issues with automated, continuous risk performance monitoring. Risk performance monitors include:

  • Ethical fairness of model output

  • Interpretability of model features weighting

Process Performance:

Continuous monitoring of the end-to-end model operations process ensures that all steps are properly executed and adhered to. Collect and retain data for each step in the model life cycle, resulting in reproducibility and auditability. Process performance monitors include:

  • Registration processes

  • Operational processes

  • Monitoring processes

  • Governance processes

Writing a Custom Monitor

The Metrics Function allows you to define custom metrics that you would like to monitor for your model. This metrics function would be included in the source code that is registered as a model in ModelOp Center and then added as an associated model for monitoring. You can use the Metrics Job to manually execute this script against data, or use an MLC Process to trigger automatic execution. See Model Batch Jobs and Tests for more information.

You can specify a Metrics Function either with a # modelop.metrics smart tag comment before the function definition or you can select it within the Command Center after the model source code is registered. The Metrics Function executes against a batch of records and yields test results as a JSON object of the form {“metric_1”: <value_1>, …, “metric_n”: <value_n>}. These values are used to populate the Test Results visuals within the UI (as seen at the bottom of this page).

Python Custom Monitor Example

Here is an example of how to code a Metrics Function. It is calculating the ROC Curve, AUC, F2, and the Confusion Matrix.

# modelop.metrics
def metrics(x):

    lasso_model = lasso_model_artifacts['lasso_model']
    dictionary = lasso_model_artifacts['dictionary']
    threshold = lasso_model_artifacts['threshold']
    tfidf_model = lasso_model_artifacts['tfidf_model']

    actuals = x.flagged
    
    cleaned = preprocess(x.content)
    corpus = cleaned.apply(dictionary.doc2bow)
    corpus_sparse = gensim.matutils.corpus2csc(corpus).transpose()
    corpus_sparse_padded = pad_sparse_matrix(sp_mat = corpus_sparse, 
                                             length=corpus_sparse.shape[0], 
                                             width = len(dictionary))
    tfidf_vectors = tfidf_model.transform(corpus_sparse_padded)

    probabilities = lasso_model.predict_proba(tfidf_vectors)[:,1]

    predictions = pd.Series(probabilities > threshold, index=x.index).astype(int) 
    
    confusion_matrix = sklearn.metrics.confusion_matrix(actuals, predictions)
    
    fpr,tpr,thres = sklearn.metrics.roc_curve(actuals, predictions)

    auc_val = sklearn.metrics.auc(fpr, tpr)
    f2_score = sklearn.metrics.fbeta_score(actuals, predictions, beta=2)

    roc_curve = [{'fpr': x[0], 'tpr':x[1]} for x in list(zip(fpr, tpr))]
    labels = ['Compliant', 'Non-Compliant']
    cm = matrix_to_dicts(confusion_matrix, labels)
    
    test_results = dict(
        roc_curve=roc_curve,
        auc=auc_val,
        f2_score=f2_score,
        confusion_matrix=cm
    )    

    yield test_results

Here is an example of expected output from this function:

{
  "roc_curve": 
    [
      {"fpr": 0.0, "tpr": 0.0}, 
      {"fpr": 0.026, "tpr": 0.667}, 
      {"fpr": 1.0, "tpr": 1.0}
    ], 
  "auc": 0.821, 
  "f2_score": 0.625, 
  "confusion_matrix": 
    [
      {"Compliant": 76, "Non-Compliant": 2}, 
      {"Compliant": 1, "Non-Compliant": 2}
    ]
}

R Custom Monitor Example

# import librarieslibrary(tidymodels)library(readr)
# modelop.init

begin <- function() {    
  # run any steps for when the monitor is loaded on the ModelOp runtime
}

# modelop.metricsmetrics <- function(data) {    
  df <- data.frame(data)    
  get_metrics <- metric_set(f_meas, accuracy, sensitivity, specificity, precision)    
  output <- get_metrics(data = df, truth = as.factor(label_value), estimate = as.factor(score))    
  mtr <- list(PerformanceMetrics=output)    emit(mtr)
}

Custom Monitor Output - Charts, Graphs, Tables

To have the results of your custom monitor displayed as a Bar Graph, Line Chart, or Table, the output of your metrics function should leverage the following format:

  	"time_line_graph": {
	  "title" : "Example Line Graph - Timeseries Data",
	  "x_axis_label": "X Axis",
	  "y_axis_label": "Y Axis",
	  "data": {
		"data1": [["2023-02-27T20:10:20",100], ["2023-03-01T20:10:20",200], ["2023-03-03T20:10:20", 300]],
		"data2": [["2023-02-28T20:10:20", 350], ["2023-03-02T20:10:20", 250], ["2023-03-04T20:10:20", 150]]
	  }
	},
  "generic_line_graph": {
	"title" : "Example Line Graph - XY Data",
	"x_axis_label": "X Axis",
	"y_axis_label": "Y Axis",
	"data": {
	  "data1": [[1,100], [3,200], [5, 300]],
	  "data2": [[2, 350], [4, 250], [6, 150]]
	}
  },
  "decimal_line_graph": {
	"title" : "Example Line Graph - Decimal Data",
	"x_axis_label": "X Axis",
	"y_axis_label": "Y Axis",
	"data": {
	  "data1": [[1,1.23], [3,2.456], [5, 3.1415]],
	  "data2": [[2, 4.75], [4, 2.987], [6, 1.375]]
	}
  },
  "generic_bar_graph": {
	"title" : "Example Bar Chart",
	"x_axis_label": "X Axis",
	"y_axis_label": "Y Axis",
	"rotated": false,
	"data" : {
	  "data1": [1, 2, 3, 4],
	  "data2": [4, 3, 2, 1]
	},
	"categories": ["cat1", "cat2", "cat3", "cat4"]
  },
  "horizontal_bar_graph": {
	"title" : "Example Bar Chart",
	"x_axis_label": "X Axis",
	"y_axis_label": "Y Axis",
	"rotated": true,
	"data" : {
	  "data1": [1, 2, 3, 4],
	  "data2": [4, 3, 2, 1]
	},
	"categories": ["cat1", "cat2", "cat3", "cat4"]
  },
  "generic_table": [
	  {"data1" : 1, "data2" : 2, "data3" : 3},
	  {"data1" : 2, "data2" : 3, "data3": 4},
	  {"data1" :  3, "data2" : 4, "data3" : 5}
	]

Adding A Monitor to a Business Model

How to Add a Monitor

Monitors can be added to a snapshot of a business model to provide on-going review of the model’s holistic performance and adherence to KPI’s.

1. First, make sure the monitor you want to associate with a business model is imported into ModelOp Center and a snapshot has been created. The monitors list can be accessed by clicking on the “Monitors” main menu item. The monitors will be listed like below:

2. Next, in the Model Snapshot page, click on the “Monitoring” tab. Click the “+ Add” button and select “Monitors” from the drop down.

3. The monitoring wizard is opened, which walks a user through adding a monitor to a business model. First, select the specific monitoring model that the user would like to add:

4. Next, select the specific Snapshot (version) of the monitoring model to be used. As background, ModelOp Center versions and manages monitoring models in the same was as business models, allowing for reusability and auditability of monitors as well.

5. Inputs Assets. Add input assets for the monitoring model to consume. Input assets can be URLs to S3 or HDFS files, SQL assets, directly uploaded files, or assets that are attached to the business model.

Note: click the “Learn More” drop down to review more information about the Monitor, including the required inputs, the output structure, and other details.

6. Thresholds. Optionally, thresholds can be added to the Monitor though the use of a decision table (DMN file). When the monitoring job is run, the DMN file will be examined by an MLC to determine whether the business model is outside of specification and an alert notification should be sent. The user can select an existing DMN or choose to generate a new set of thresholds.

If the user selects to generate a DMN threshold file they can use the DMN builder to construct one:

If the user selects a DMN from the existing business model, they also have the option to 'edit' the file.

This will open the DMN builder with the selected file as a base. Any modifications done to this file through the DMN builder will not alter the original and only be saved to the monitor:

7. Schedule. Optionally, one or more schedules can be set to launch monitoring jobs automatically. These schedules can fire a signal to trigger an MLC to create a monitoring metrics job. Additionally, ModelOp Center supports enterprise schedulers such as Control-M, AutoSys and SSIS via API calls.

8. Review. Review the details of the monitor, and if satisfied, click “Save.”

9. The monitor will appear under the “Configured Monitors and Associated Models” on the business model’s snapshot page.

From here, the monitor can be edited, as required, via the same Monitor Wizard.

Using Dynamic Parameters for Monitoring Setup

For obtaining data from certain data systems (like SQL or REST-based systems), it can be useful to specify parameters that are dynamically set during monitoring execution. The most common example would be for setting time windows, e.g. “obtain all data over the last week.” The below approach outlines how to set up a Monitor to use parameters such as current day, current week, etc. Please note that a user can customize the specific parameter values within a Model Life Cycle.

Example Steps to Use Parameters for Monitoring

  1. From the assets tab of a Business Model wizard, click to add an asset and choose the appropriate asset (REST, SQL, etc.)

  2. Within the resulting “add asset” configuration, the user can add placeholder logic (using standard ${FOO} notation) to the Query Params or Form Data values

  3. Next the user will create a monitor using that asset, and select the newly created REST (or SQL) Asset as input for the job

  4. In the Schedule step of the Wizard, the user will add signal variables that match the placeholder name used in the Asset params (e.g. FOO above). Note that you can pass the optional variable ASSET_REPLACE_FORMAT to specify a date format for the replacement, otherwise default is yyyy-MM-dd).


  5. NOTE: In the RunAssociatedModelMetrics.bpmn (both Jira and Service Now), there is a script to replace the placeholders of all the Assets that reference a variable using the notation above ${FOO}. By default, the value of such variables will be one of the following pre-loaded instance variables as listed below. However, these can be customized within the bpmn if the user would like to add other variables.

    1. ${QUARTER_START}

    2. ${QUARTER_END}

    3. ${MONTH_START}

    4. ${MONTH_END}

    5. ${WEEK_START}

    6. ${WEEK_END}

    7. ${TODAY} or alias ${CURRENT_DAY}

  6. Make any final updates/changes to the monitor and click to save the monitor

  7. If desired, click the “play” button to run the monitor

  8. When the monitor runs, the ModelOp engine will make the call to the REST or SQL data system using with the replaced value. Note here that the last request of the image shows the actual Today and Quarter End dates passed and gets the desired dynamic response.

Next Article: Operational Monitoring >

Generate a Schema

Schema Guidelines:

The schema is meant to provide a “contract” between the data input/output and the model code. For testing and monitoring models, ModelOp Center uses extended Avro schemas attached to business models to determine the characteristics of input data for monitoring runs, such as identifier columns, weight columns, score columns, and others.

Specific requirements for ModelOp Center schemas include:

  1. The schemas should include:

    1. Input features

    2. Model output (also known as Predictions and Scores)

    3. Actual Values (also known as Ground Truth or Labels)

  2. Per the Avro spec, the schema field names MUST:

    1. Start with [A-Za-z_]

    2. Subsequently contain only [A-Za-z0-9_]

    3. Explicitly, the field names should NOT contain spaces or special characters

Schemas can be generated via the ModelOp Center UI or they can be imported with the business model via GitHub or Bitbucket.

To generate a schema, go to the Business Model and click the “Schemas” tab, then click the “Generate Extended Schema” button. A UI will open to aid with generating a schema:

To infer an extended schema from a dataset, a user can either upload a CSV or JSON file with some sample records or paste sample records into the top text box. Click “Generate Schema”.

A generated schema will appear in the preview on the bottom text box after clicking “Generate Schema”:

This schema can then be downloaded and uploaded to the git repository that backs the business model, and ModelOp Center will track the schema.

Note: for certain models that may not be backed by git, it is possible to save the generated schema to the business model via the “Save as Input Schema” or “Save as Output Schema” options. However, it is strongly recommended to keep schema files in source code control.

Running a Monitor Manually

Run a Monitor from ModelOp Center UI

After adding a monitor to a business model’s snapshot (see the “Adding a Monitor” section above), the Play button next to the monitor can be clicked to run an ad hoc monitoring job:

Upon successful initiation of the Monitoring job, the user will be directed to the specific Monitoring Job’s job details page, where the user can see the actual monitor execution and results:

Run a Metric Job Manually from the CLI

  1. To create a ‘metrics job’ from the CLI, use the command

moc job create testjob <deployable-model-uuid> <input-file-name> <output-file-name> optional-flags

  1. This command yields a UUID for the job.

  2. To find the raw JSON results of the job, use the command

moc job result <uuid>

Run a Metrics Job from the ModelOp Center UI

See Manually Create a Batch Job in the Command Center.

Viewing the Results of a Monitoring Job

To see the results of a monitor or test, navigate to Model Snapshot page and select the Monitoring tab:

Individual Monitor Test Results

1. Click on the individual test result of interest:

2. The test result details are displayed:

ModelOp Center supports a variety of visualizations for data science metrics for out of the box monitors. These visualizations can be added to custom monitors by following the standard metrics format as outlined in this documentation.

Monitoring Results over Time

1. Click on the “Results over Time” button:

2. All of the monitoring test results will be plotted over time (assuming the metric can be plotted).

Alerting & Notifications

Alerts, Tasks, and Notifications Messages provide visibility into information and actions that need to be taken as a result of model monitoring. These “messages” are surfaced throughout the ModelOp Command Center UI, but typically are also tied into enterprise ticketing systems such as ServiceNow and/or JIRA.

The types of messages generated from Model Monitoring include:

  • Alerts - test failures, model errors, runtime issues, and other situations that require a response.

    • Alerts are automatically raised by system monitors or as the output of monitor comparison in a model life cycle.

  • Tasks - user tasks such as approve a model, acknowledge a failed test, etc.

    • For details about viewing and responding to test failures.

  • Notifications - includes system status, runtime status and errors, model errors, and other information generated by ModelOp Center automatically.

Next Article: Operational Monitoring >

  • No labels