Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This article describes After successfully adding a monitor (refer to the Add a Monitor page), users can run the monitor and review the detailed model test results in the ModelOp Center. This article demonstrates this process using an example of an out-of-the-box (OOTB) Monitor, called Bias Monitor, to illustrate how ModelOp Center enables facilitates ongoing Ethical Bias /and Fairness Monitoring.

Table of Contents

Table of Contents

Introduction

Organizations need visibility into how models are forming predictions, in particular, if the model is generating unfair and partial results to certain protected classes. Bias monitors should be run routinely against batches of labeled and scored data to ensure that the model is performing within specifications. If the production bias metrics deviate beyond/below pre-set thresholds, then the appropriate alerts are raised for the data scientist or ModelOps engineer to investigate.

ModelOp Center provides bias monitors out-of-the-box (OOTB) but also allows you to write your own custom bias/fairness/group metrics to monitor your model. The subsequent sections describe how to add a bias monitor - assuming an OOTB monitor - and the detailed makeup of a bias monitor.

Adding Bias

Monitors

Monitor to a Business Model

As background on the terminology and concepts used below, please read the Monitoring Concepts section of the Model overview documentation. To add bias monitoring to your business model, you will add an existing “Monitor” to a snapshot (deployable model) of the business model under consideration. Below are the steps to accomplish this. For tutorial purposes, these instructions use all For this tutorial purposes, the instructions below use an out-of-the-box monitor and publicly available content provided by ModelOp, focusing on the German Credit Model and its related assets.

The "German Credit Data" dataset classifies people described by a set of attributes as good or bad credit risks. Among the twenty attributes is gender (reported as a hybrid status_sex attribute), which is considered a protected attribute in most financial risk models. It is therefore of the utmost importance that any machine learning model aiming to assign risk levels to lessees is not biased against a particular gender.

It is important to note that simply excluding gender from the training step does not guarantee an unbiased model, as gender could be highly correlated to other unprotected attributes, such as annual income.

Open-source Python libraries developed to address the problem of Bias and Fairness in AI are available. Among these, Aequitas can be easily leveraged to calculate Bias and Fairness metrics of a particular ML model, given a labeled and scored data set, as well as a set of protected attributes. In the case of German Credit Data, ground truths are provided and predictions can be generated by, say, a logistic regression model. Scores (predictions), label values (ground truths), and protected attributes (e.g. gender) can then be given as inputs to the Aequitas library. Aequitas's Group() class calculates commonly used metrics such as false-positive rate (FPR) and false omission rate (FOR), as well as counts by group and group prevalence among the sample population. It returns group counts and group value bias metrics in a data frame.

For instance, one could discover that under the trained logistic regression model, females have an FPR=0.32, whereas males have an FPR=0.16. This means that women are twice as likely to be falsely labeled as high-risk as men. The Aequitas Bias() class calculates disparities between groups, where a disparity is a ratio of a metric for a group of interest compared to a reference group. For example, the FPR-disparity for the example above between males and females, where males are the reference group, is equal to 32/16 = 2. Disparities are computed for each bias metric and are returned by Aequitas in a data frame.

Associate a Monitor to a Snapshot of a Business Model

Steps to follow:

  1. In MOC, navigate to the business model to be monitored. In our case, as described above, that’s the German Credit Model.

  2. Navigate to the specific snapshot of the business model. If no snapshots exist, create one.

  3. On the Monitoring tab, click on + Add, then click on Monitor

    Image Removed
  4. Search for (or select) the Following the steps in Add a Monitor, search for (or select) the Bias Monitor: Disparity and Group Metrics from the list of OOTB monitors.

  5. Select a snapshot of the monitor. By default, a snapshot is created for each OOTB monitor

    Image Removed
  6. On the Input Assets page, you’ll notice that the only asset that is required is sample data. This is because a bias monitor computes metrics on 1 dataset only, and thus does not do a comparison to a baseline/reference dataset. For our example, select df_sample_scored.json as the Sample Data Asset. Since the file is already an asset of the business model, we can find it under Select Existing

    Image Removed
  7. On the Threshold page, click on ADD A THRESHOLD, then select the .dmn file bias_disparity_DMN.dmn. Since the file is already an asset of the business model, we can find it under Select Existing. If the business model does not have a .dmn asset, the user may upload on from a local directory during the monitor association process. Note: threshold files are optional; if absent, a monitoring job will not be considered a Pass or Fail. More on thresholds and decision tables in the next section.

    Image Removed
  8. The last step in adding a monitor is adding an optional schedule. To do so, click on ADD A SCHEDULE. The Schedule Name field is free-form. The Signal Name field is a dropdown. Choose a signal that corresponds to your ticketing system (Jira, ServiceNow). Lastly, set the frequency of the monitoring job. This can be done either by the wizard or by entering a cron expression. Note: schedules are optional; a monitor may be run on-demand from the business model’s snapshot page, or by a curl command.

    Image Removed
  9. On the Review page click SAVE

    Image Removed

To run a monitor on demand, click on COPY CURL TO RUN JOB EXTERNALLY. The CURL command can then be run from the application of your choosing.

Define Thresholds for your Model

As mentioned in the Monitoring Concepts article, ModelOp Center uses decision tables to define the thresholds within which the model should operate for the given monitor.

  • The first step is to define these thresholds. For this tutorial, we will leverage the example bias_disparity_DMN.dmn decision table. Specifically, this decision table ensures that the gender_female_statistical_parity and gender_female_impact_parity metrics of the German Credit Model are within specification.

    Image Removed
  • The gender_female_statistical_parity and gender_female_impact_parity values can be accessed directly from the Monitoring Test Result by design. More metrics are produced OOTB by the bias monitor. We will discuss this in more detail later.

  • In our example, the .dmn file is already an asset of the business model and versioned/managed along with the source code in the same Github repo. This is considered best practice, as the decision tables are closely tied to the specific business model under consideration. However, it is not a requirement that the .dmn files are available as model assets ahead of time.

Run a Monitor On-demand (UI)

To run a monitor on-demand from the UI, navigate to the business model’s snapshot page and click the play button next to the monitor of interest. A monitoring job will be initiated, and you will be redirected to the corresponding job page once the job is created.

Image Removed

Schedule a Monitor DIY (CURL)

Monitors can be scheduled to run using your preferred enterprise scheduling capability (Control-M, Airflow, Autosys, etc.) While the details will depend on the specific scheduling software, at the highest level, the user simply needs to create a REST call to the ModelOp Center API. Here are the steps:

  1. Obtain the Business Model snapshot’s UUID. This can be found, for instance, in the URL of the snapshot page, as shown in this example:

    Image Removed
  2. Similarly, obtain the Monitoring Model snapshot’s UUID.

  3. Within the scheduler, configure the REST call to ModelOp Center’s automation engine to trigger the monitor for your model:

  4. Obtain a valid auth token

  5. Make a call (POST) to the ModelOp Center API to initiate the monitor. The endpoint is

    Code Block
    <MOC_INSTANCE_URL>/mlc-service/rest/signal
  6. The body should contain references to the Model Life Cycle (MLC) being triggered, as well as the business model and monitor snapshots, as shown below:

    Code Block{ "name": "com.modelop.mlc.definitions.Signals_Run_Associated

    For this example, select df_sample_scored.json as the Assets. Since the file is already an asset of the business model, we can find it under Select Existing.Similarly, the file bias_disparity_DMN.dmn is selected for threshold setup and a schedule is selected in the last step. The screenshot below shows the details of this monitor. To run a monitor on demand, click on COPY CURL TO RUN JOB EXTERNALLY. The CURL command can then be run from the application of your choosing.

Image Added

  1. On the Review page click SAVE.

  2. Before proceeding to executing the monitor, we will go through the details of the decision table selected for this example.

    1. This decision table ensures that the gender_female_statistical_parity and gender_female_impact_parity metrics of the German Credit Model are within specification. Please note that .dmn files can be opened/edited using Camunda Modeler.

      Image Added
    2. The gender_female_statistical_parity and gender_female_impact_parity values can be accessed directly from the Monitoring Test Result by design. More metrics are produced OOTB by the bias monitor. We will discuss this in more detail later.

    3. In our example, the .dmn file is already an asset of the business model and versioned/managed along with the source code in the same Github repo. This is considered best practice, as the decision tables are closely tied to the specific business model under consideration. However, it is not a requirement that the .dmn files are available as model assets ahead of time.

Run a Monitor On-demand (UI)

To run a monitor on-demand from the UI, navigate to the business model’s snapshot page and click the play button next to the monitor of interest. A monitoring job will be initiated, and you will be redirected to the corresponding job page once the job is created. The model test results created from this execution are described in the Monitoring Results and Notifications section below.

Image Added

Schedule a Monitor DIY (CURL)

Monitors can be scheduled to run using your preferred enterprise scheduling capability (Control-M, Airflow, Autosys, etc.) While the details will depend on the specific scheduling software, at the highest level, the user simply needs to create a REST call to the ModelOp Center API. Here are the steps:

  1. Obtain the Business Model snapshot’s UUID. This can be found, for instance, in the URL of the snapshot page, as shown in this example:

Image Added
  1. Similarly, obtain the Monitoring Model snapshot’s UUID by navigating to Inventory from the main menu and selecting Monitors from the drop down menu and searching for the name of the monitor attached to the given business model.

  2. Within the scheduler, configure the REST call to ModelOp Center’s automation engine to trigger the monitor for your model:

    1. Obtain a valid auth token

    2. Make a call (POST) to the ModelOp Center API to initiate the monitor. The endpoint is

      Code Block
      <MOC_INSTANCE_URL>/mlc-service/rest/signal
    3. The body should contain references to the Model Life Cycle (MLC) being triggered, as well as the business model and monitor snapshots, as shown below:

      Code Block
      {
          "name": "com.modelop.mlc.definitions.Signals_Run_Associated_Model_Jira",
          "variables": {
              "DEPLOYABLE_MODEL_ID" : {
                  "value": <UUID_of_business_model_snapshot_as_a_string>
              },
              "ASSOCIATED_MODEL_ID": {
                  "value": <UUID_of_monitoring_model_snapshot_as_a_string>
              }
          }
      }

This process is made easier by copying the CURL command provided at the last step of the monitoring wizard

Image Removed

Image Added

The copied command will look something like this:

curl 'httphttps://localhost:8090mocaasin.modelop.center/mlc-service/rest/signalResponsive' -H 'Accept: application/json, text/plain, /' -H 'Content-Type: application/json' -X POST -H 'Authorization: Bearer <token>' --data-raw '{"name":"com.modelop.mlc.definitions.Signals_Run_Associated_Model_Jira","variables":{"DEPLOYABLE_MODEL_ID":{"value":"23282688d440a597-62a64cfb-47ae4541-8603874f-16f380efca57356cca742446"},"ASSOCIATED_MODEL_ID":{"value":"1dc64c1ecb328e3a-36347fcb-4e2e4853-b37d9833-71d04a9ee5ef44c0b26b0b13"}}}'

Monitoring Execution

Once the scheduler triggers the signal, the corresponding MLC (listening to that signal) will be initiated. The sequence of events include:

  1. Preparing the monitoring job with all artifacts necessary to run the job

  2. Creating the monitoring job

  3. Parsing the results into viewable test results

  4. Comparing the results against the thresholds in the decision table

  5. Taking action, which could include creating a notification and/or opening up an incident in JIRA/ServiceNow/etc.

These steps can be summarized in the following Model Life Cycle (MLC)

Image Removed

Image Added

Monitoring Results and Notifications

Sample Standard Output of Bias Monitors

Monitoring Test Results are listed under the Test Results table:

Image Removed

Image Added

Upon clicking on the “View” View icon, you’ll have two options for looking at test results: graphical (under “Test Results” Test Results), and raw (under “Raw Results” Raw Results).

Visual elements

  1. Summary Metrics: these These are a subset of all metrics computed by the monitor, returned as key:value pairs for ease-of-reference:

    Image Removed

Image Added

  1. Bias/Disparity Metrics

Image RemovedImage AddedImage Added
  1. Group Metrics

Image Removed

Image Added

Raw Results

The “Raw Results” tab shows a clickable (expandable and collapsable) JSON representation of the test results

Image Removed

Image Added

To get a JSON file of the test results, :

  1. Navigate to the “Jobs” Jobs tab of the snapshot and click on “Details” next the row below to the monitoring get details of the job of interest

    Image Removed
Image Added

  1. Click on “Download File” under “Outputs”

Image RemovedImage Added

Expand
titleClick here to see the contents of the file.
Code Block
languagepy
{ "gender_female_statistical_parity": 0.5, "gender_female_impact_parity": 0.8889, "gender_male_statistical_parity": 1.0, "gender_male_impact_parity": 1.0, "bias": [ { "test_name": "Aequitas Bias", "test_category": "bias", "test_type": "bias", "protected_class": "gender", "test_id": "bias_bias_gender", "reference_group": "male", "thresholds": {
{
  "bias": [
    {
      
"min
"test_name": 
0.8
"Aequitas Bias",
      "test_category": "bias",
      
"
max
test_type": 
1.25
"bias",
      "protected_class": "gender",
    
},
  
"test_id": "bias_bias_gender",
      
"
values
reference_group":
[
 "male",
      "thresholds": null,
      
{
"values": [
        {
          "attribute_name": "gender",

          
"attribute_value": "female",
          
"ppr_disparity": 0.5,
          "pprev_disparity": 0.8889,
          "
pprev
precision_disparity": 
0
1.
8889
36,
          "fdr_disparity": 0.7568,
          "
precision
for_disparity": 1.
36
6098,
          "fpr_disparity": 0.7648,
          
"
fdr
fnr_disparity": 
0
1.
7568
32,
          "tpr_disparity": 0.8976,
          "
for
tnr_disparity": 1.
6098
15,
          "npv_disparity": 0.9159
        
"fpr_disparity": 0.7648
},
        
{
          "
fnr
attribute_
disparity
name": 
1.32
"gender",
          "attribute_value": "male",
          "
tpr
ppr_disparity": 
0.8976,
1,
          "
tnr
pprev_disparity": 1
.15
,
          "precision_disparity": 1,
          "
npv
fdr_disparity": 
0.9159
1,
          
}
"for_disparity": 1,
          "fpr_disparity": 1,
    
{
      "fnr_disparity": 1,
          
"
attribute
tpr_
name
disparity": 
"gender"
1,
          "tnr_disparity": 1,
          "
attribute
npv_
value
disparity": 
"male",
1
        }
      ]
   
"ppr_disparity": 1.0
 },
    {
      "test_name": "Aequitas Bias",
      "
pprev
test_
disparity
category": 
1.0
"bias",
      "test_type": "bias",
      "protected_class": "gender",
      "
precision
test_
disparity
id": 
1.0,
"bias_bias_gender",
      "reference_group": "female",
      "thresholds": null,
      
"
fdr_disparity
values": 
1.0,
[
        {
          
"
for
attribute_
disparity
name": 
1.0
"gender",
          "attribute_value": "female",
          "
fpr
ppr_disparity": 1
.0
,
          "pprev_disparity": 1,
          "
fnr
precision_disparity": 1
.0
,
          
"tpr
"fdr_disparity": 1
.0
,
       
   "for_disparity": 1,
          "
tnr
fpr_disparity": 1
.0
,
          "fnr_disparity": 1,
          "
npv
tpr_disparity": 1
.0
,
          "tnr_disparity": 1,
    
}
      "npv_disparity": 1
     
]
   
},
        {

          
"
test
attribute_name": "
Aequitas Group
gender",
          
"
test
attribute_
category
value": "
bias
male",
          
"
test
ppr_
type
disparity": 
"group"
2,

          "
protected
pprev_
class
disparity": 
"gender",
1.125,
          
"
test
precision_
id
disparity": 
"bias_group_gender",
0.7353,
          
"
reference
fdr_
group
disparity": 
null
1.3214,
          
"
values
for_disparity": 
[
0.6212,
          "fpr_disparity": 1.3075,
    
{
      
"fnr_disparity": 0.7576,
          
"
attribute
tpr_
name
disparity": 
"gender",
1.1141,
          "tnr_disparity": 0.8695,
          "
attribute
npv_
value
disparity": 
"female",
1.0918
        }
      ]
    
"tpr": 0.68
},
    {
      "test_name": "Aequitas Group",
      "
tnr
test_category": 
0.7021
"bias",
      "test_type": "group",
      "protected_class": "gender",
      "
for
test_id": 
0.1951,
"bias_group_gender",
      "reference_group": null,
      
"
fdr
values": 
0.4516,
[
        
{
          "
fpr
attribute_name": 
0.2979
"gender",
          "attribute_value": "female",
          "
fnr
tpr": 0.
32
68,
          
"npv
"tnr": 0.
8049
7021,
          
"precision
"for": 0.
5484
1951,
          "fdr": 0.4516,
          
"
ppr
fpr": 0.
3333
2979,
          "fnr": 0.32,
          "
pprev
npv": 0.
4306
8049,
          "precision": 0.5484,
          "
prev
ppr": 0.
3472
3333,
          "pprev": 0.4306,
    
},
      "prev": 0.3472
        },
{
        {
          
"attribute_name": "gender",
          
"attribute_value": "male",

          "tpr": 0.7576,

          
"tnr": 0.6105,
          
"for": 0.1212,

          
"fdr": 0.5968,
          
"fpr": 0.3895,
          
"fnr": 0.2424,
          
"npv": 0.8788,
          
"precision"
: 0.4032,
: 0.4032,
          
"ppr": 0.6667,
          "pprev": 0.4844,
          "
pprev
prev": 0.
4844,
2578
        }
      ]
    }
 
"prev
 ],
  "ref_male_gender_female_statistical_parity": 0.
2578
5,
  "ref_male_gender_female_impact_parity": 0.8889,
  "ref_male_gender_male_statistical_parity": 1,
  "ref_male_gender_male_impact_parity": 1,
  "ref_female_gender_female_statistical_parity": 1,
 
} ] } ] }

Note that the first 4 key:value pairs are what gets shown in the “Summary Metrics” table.

 "ref_female_gender_female_impact_parity": 1,
  "ref_female_gender_male_statistical_parity": 2,
  "ref_female_gender_male_impact_parity": 1.125,
  "Bias_maxPPRDisparityValue": 2,
  "Bias_maxPPRDisparityValueFeature": "ref_female_gender_male_statistical_parity",
  "Bias_minPPRDisparityValue": 0.5,
  "Bias_minPPRDisparityValueFeature": "ref_male_gender_female_statistical_parity"
}

Sample Monitoring Notification

Notifications arising from monitoring jobs can be found under the corresponding model test result

Image Removed

Image Added

If a ticketing system is configured in ModelOp Center, such as Jira, a ticket will be written when an ERROR occurs (as in above), and a link to the ticket will be available next to the notification. In the example above, a metric fell out of a preset threshold, and thus the monitoring job failed.

Bias Monitors Details

Model Assumptions

Business Models considered for bias monitoring have a few requirements:

  1. An extended schema asset for the input data.

  2. Model type is binary classification.

  3. Protected classes under consideration are categorical features.

  4. Input data contains columns for label (ground truth), score (model output), and at least 1 protected class.

Protected Classes with Numerical Values

While not supported out of the box, a bias monitor can be easily edited to allow protected protected classes with numerical values, such as a continuous age column. Consider the following example:

Code Block
languagepy
import pandas
from modelop.monitors.bias import BiasMonitor

dataframe = pandas.DataFrame(
    [
        {"gender": "male", "age": 10, "prediction": 0, "label": 0},
        {"gender": "male", "age": 18, "prediction": 0, "label": 0},
        {"gender": "female", "age": 20, "prediction": 0, "label": 0},
        {"gender": "female", "age": 25, "prediction": 0, "label": 0},
        {"gender": "male", "age": 30, "prediction": 1, "label": 1},
        {"gender": "female", "age": 40, "prediction": 1, "label": 1},
        {"gender": "female", "age": 42, "prediction": 1, "label": 1},
        {"gender": "male", "age": 50, "prediction": 0, "label": 1},
        {"gender": "male", "age": 55, "prediction": 0, "label": 1},
        {"gender": "female", "age": 60, "prediction": 0, "label": 1},
        {"gender": "male", "age": 70, "prediction": 1, "label": 0},
        {"gender": "female", "age": 80, "prediction": 1, "label": 1}
    ]
)

# Instantiate Bias Monitor
numerical_bias_monitor = BiasMonitor(
    dataframe=dataframe,
    score_column="prediction",
    label_column="label",
    protected_classes=[
        {"protected_class":"age", "numerical_cutoffs": [40]}
    ]
)

# Compute bias metrics
numerical_bias_monitor.compute_bias_metrics(
    pre_defined_test='aequitas_bias',
    flatten=False,
    include_min_max_features=False
)

In this example, the cutoff list on line 27 tells the BiasMonitor to split the age values on 40. This produces an artificial age_bucketed column, with two categorical values: (-inf, 40) and [40, +inf). The Bias metrics are then computed on the engineered categorical feature:

Code Block
languagepy
{
    'bias': [
        {
            'test_name': 'Aequitas Bias',
            'test_category': 'bias',
            'test_type': 'bias',
            'protected_class': 'age_bucketed',
            'test_id': 'bias_bias_age_bucketed',
            'reference_group': '(-inf, 40)',
            'thresholds': None,
            'values': [
                {
                    'attribute_name': 'age_bucketed',
                    'attribute_value': '(-inf, 40)',
                    'ppr_disparity': 1.0,
                    'pprev_disparity': 1.0,
                    'precision_disparity': 1.0,
                    'fdr_disparity': None,
                    'for_disparity': None,
                    'fpr_disparity': None,
                    'fnr_disparity': None,
                    'tpr_disparity': 1.0,
                    'tnr_disparity': 1.0,
                    'npv_disparity': 1.0
                },
                {
                    'attribute_name': 'age_bucketed',
                    'attribute_value': '[40, +inf)',
                    'ppr_disparity': 4.0,
                    'pprev_disparity': 2.8571,
                    'precision_disparity': 0.75,
                    'fdr_disparity': 10.0,
                    'for_disparity': 10.0,
                    'fpr_disparity': 10.0,
                    'fnr_disparity': 10.0,
                    'tpr_disparity': 0.5,
                    'tnr_disparity': 0.0,
                    'npv_disparity': 0.0
                }
            ]
        }
    ]
}

To split on more than one value, simply add all interval endpoints to the numerical_cutoffs list. For example,

Code Block
languagepy
numerical_bias_monitor = BiasMonitor(
    dataframe=dataframe,
    score_column="prediction",
    label_column="label",
    protected_classes=[
        {"protected_class":"age", "numerical_cutoffs": [21, 50, 64]}
    ]
)

Model Execution

During execution, bias monitors execute the following steps:

  1. init function extracts extended input schema (corresponding to the BUSINESS_MODEL being monitored) from job JSON.

  2. Monitoring parameters are set based on the schema above. protected_classes, label_column, and score_column are determined accordingly.

  3. metrics function runs an Aequitas Bias test and/or an Aequitas Group test for each protected class in the list of protected classes. A reference group for each protected class is chosen by default (first occurrence).

    • The combination of bias and group metrics to be computed depends on the specific flavor of the bias monitor:

      • Bias Monitor: Group Metrics computes group metrics only for each protected class: tpr, tnr, for, fdr, fpr, fnr, npv, precision, ppr, pprev, prev

      • Bias Monitor: Disparity Metrics computes disparity metrics only for each protected class: ppr_disparity, pprev_disparity, precision_disparity, fdr_disparity, for_disparity, fpr_disparity, fnr_disparity, tpr_disparity, tnr_disparity, npv_disparity

      • Bias Monitor: Disparity and Group Metrics computes both group and disparity metrics for each protected class

  4. Test results are appended to the list of bias tests to be returned by the model.

For a deeper look at OOTB monitors, see /wiki/spaces/ARCHIVE/pages/1726840843.