Introduction
Enterprises have invested millions of dollars into Data Science efforts over the past few years, with many enterprises realizing early successes in their programs. However, these enterprises typically have a plethora of different model development tools, languages/frameworks, environments, and varying operational systems across the disparate teams. Each team typically takes their own approach to operating, managing, and governing their Data Science programs. Additionally, few teams track the business value that these Data Science models are unlocking. The result is that executives have little-to-no visibility into (a) all Data Science models used for production business decisioning (b) the return on investment for those models (c) the health and status of these models.
ModelOp Center’s executive dashboard allows enterprises to overcome these challenges, providing visibility into the business value of each model, as well as the operational, IT, risk, and data science KPI’s for all models across the enterprise, regardless of where the model was developed, the language/framework used, the environment in which it runs, etc.
Dashboard Overview
Key Concepts
“Model in Production”: ModelOp Center considers a “model in production” as a given model snapshot in “state”=DEPLOYED to a runtime that has the “inProduction” flag set to true. Typically, a model life cycle (MLC) is what orchestrates the process of putting a model in production, whereby the MLC sets the snapshot’s state to “DEPLOYED” and also pushes the snapshot to the appropriate runtime(s) that have the “inProduction” flag set to true. A user can verify that a model is in production by navigating to the snapshot page and viewing the deployments section.
“Open Priority Issue”: ModelOp Center considers an “open priority issue” as any notification of “severity”=CRITICAL or HIGH that has an attached ticket (JIRA, ServiceNOW, etc) of elevated priority. A user can verify that a given Model has an Open Priority Issue by navigating to the model snapshot page and viewing if there are any “Associated Tickets” of severity=Error and status=(not closed).
Dashboard Details:
The ModelOp Center Executive Dashboard is divided into 3 main sections:
Summary: provides cumulative statistics such as the number of models in production, cumulative ROI of all models, and cumulative usage (inferences)
Individual Model Status: for each production model, provides a summary of the current business value, open priority issues, as well as a “heatmap” of the current status of all major health indicators against KPI’s.
Issues: provides insights into the list of open issues by Business Unit as well as the breakdown of the issue types
Dashboard Section Details
The Summary and Issues section are standard across all ModelOp Center implementations. However, the “Per Model Status” section can be customized to the specific requirements of a given implementation. See the “Default Dashboard” section below for examples of how to customize the dashboard.
Summary Section Metrics
The Summary section contains the following information:
Cumulative Value: this value is computed by summing the dollar amount for each “production model” contained in the “Business KPI” field of each production model in the “Per Model Status” section. Note that this cumulative value only sums numerical dollar values within the “Business KPI” field of each model – it will ignore all other KPI metrics. This field is dynamically calculated upon dashboard load.
Daily Inferences: this value is computed by summing the daily inferences for each “production model” contained in the “Daily Inferences” field of each production model in the “Per Model Status” section. Note that this cumulative value only sums numerical values within the “Daily Inferences” field of each model – it will ignore non-numerical values. This field is dynamically calculated upon dashboard load.
Models in Production: per the “Key Concepts” section, ModelOp Center considers a “model in production” as a given model snapshot in “state”=DEPLOYED to a runtime that has the “inProduction” flag set to true. This field is dynamically obtained upon Dashboard load by querying ModelOp Center’s “Business Model” inventory to determine all models matching the aforementioned criteria.
Open Priority Issues: per the “Key Concepts” section, ModelOp Center considers an “open priority issue” as any notification of “severity”=CRITICAL or HIGH that has an attached ticket (JIRA, ServiceNOW, etc) of elevated priority. This field is dynamically obtained upon Dashboard load by querying ModelOp Center’s Notifications API to determine matching criteria.
Individual Model Status
The center section displays the status for each individual model “in production”. This section is configurable (see below); however, the information produced by the Default Dashboard model include the following:
Deployed Models By Business KPI: the name of the Business model and the name of the specific snapshot that is currently “in production.” This field is dynamically obtained upon Dashboard load by querying ModelOp Center’s “Business Model” inventory to determine all models matching the aforementioned criteria.
Business Unit: the business unit to which the model belongs. This information is pulled dynamically from the “Model Organization” metadata element of a Business model.
Business KPI: the cumulative business value for the Business model. This information is calculated by the “Dashboard Model” that is run regularly for each Business model that is “in production.”
Open Priority Issues: the number of “open priority issues” for the given Business Model and the trend over the last 30 days. Per the “Key Concepts” section, ModelOp Center considers an “open priority issue” as any notification of “severity”=CRITICAL or HIGH that has an attached ticket (JIRA, ServiceNOW, etc) of elevated priority. This field is dynamically obtained upon Dashboard load by querying ModelOp Center’s Notifications API to determine matching criteria.
Heatmap Values: this section provides a red/green/yellow status of the key health indicators for a model. These metrics are calculated by the “Dashboard Model” and evaluated against thresholds using the “Dashboard Model’s” DMN file to produce the specific red/green/yellow/gray status. To note, by default, GRAY indicates that the “Dashboard model” could not process that particular metric. The metrics included in the default Heatmap include:
Characteristic Stability: calculates the characteristic stability index for each feature and compares the max value against the thresholds (from the DMN) to determine the status.
Performance Monitor: calculates the performance metrics (e.g. auc or rmse) for the model using ground truth. Compares against the thresholds (from the DMN) to determine the status.
Ethical Fairness: calculates the maximum and minimum proportional parity for each protected class and compares the max and min values against the thresholds (from the DMN) to determine the status.
Data Drift: calculates the p-value from a kolmogorov_smirnov test for each feature and compares the max value against the thresholds (from the DMN) to determine the status.
Output Integrity: determines that all input records received a corresponding output inference/score using a unique identifier to ensure the model produced the appropriate output for all inputs.
Concept Drift: calculates the p-value from a kolmogorov_smirnov test for the output score column(s) and compares the max value against the thresholds (from the DMN) to determine the status.
Daily Inferences: the count of inferences processed by the given Business model over the period. This information is calculated by the “Dashboard Model” that is run regularly for each Business model that is “in production.”
Issues Section
The bottom section provides insights into the list of open issues by Business Unit as well as the breakdown of the issue types:
Issues by Business Unit: displays the count of issues for each day over the last 30 days, grouped by Business Unit. This field is calculated by the “Dashboard Model” for each Business Model in production (in the “notificationsTimelineYTD” section of the Dashboard model test result). Upon page load, the ModelOp Center UI dynamically aggregates all open priority issues across all models and displays the count for each day, grouped by Business Unit.
Issues by Type: displays the breakdown of issues by issue type (in percentage form). This field is calculated by the “Dashboard Model” for each Business Model in production (in the “notificationsGroupedByTypeYTD” section of the Dashboard model test result). Upon page load, the ModelOp Center UI dynamically aggregates all issue types across all models and displays the aggregate percentage breakdown by issue type.
Running the Dashboard
Scheduled Runs
By default, the “Dashboard model” runs on a regular basis that is triggered by a built-in scheduler.
To View the Current Schedule
Select “Dashboard” from the main menu
Select the gear icon in the upper right hand corner of the screen and select the “Scheduler” option
Expand the active schedule (or choose to add a schedule)
View schedule details
To Modify the Current Schedule:
Select “Dashboard” from the main menu
Select the gear icon in the upper right hand corner of the screen and select the “Scheduler” option
Expand the active schedule (or choose to add a schedule)
Make the modifications to the schedule, using the “Advanced” tab (default for the Dashboard model) OR the “Standard” option:
Advanced (cron expression) Option:
Standard Option: selecting the “Minutes”, “Hourly”, “Daily”, “Weekly”, “Monthly”, or “Yearly” tabs
Run On-demand (UI)
To run a monitor on-demand from the UI:
Select “Dashboard” from the main menu
Select the gear icon in the upper right hand corner of the screen and select the “Scheduler” option
Expand the active schedule (or choose to add a schedule)
Click the “Fire a Signal” button
A request to run the “Dashboard model will be made:
Once the “Dashboard model” starts, the User is directed to the “Jobs” page to view the status of the “Dashboard Model” job runs:
Note that a “Dashboard model” job will be created for each business model that is “deployed in production” (see Dashboard Terminology section above)
Dashboard Model Execution Process Details
Once the scheduler triggers the Dashboard Model run, the corresponding Dashboard MLC will be initiated (by default, this MLC is called “Dashboard Process“). The sequence of events include:
Get a list of all Business Models that are “deployed in production” (see Dashboard Terminology section above)
For each Business Model that is “deployed in production”, the MLC will:
The process will take the data and other assets from the Business Model and pass them to the “Dashboard model” to produce all of the metrics for the given Business Model
After the job(s) is complete a test result (including the heatMap) will be generated for the Business Model.
If a DMN with the name "dashboard_model.dmn" is found on the dashboard model assets, it is used to produce the “heatmap” items and other pass/fail metrics for the Business Model.
If there are failures according to the thresholds, a notification in ModelOp Center will be produced with the failures.
If there are errors in executing the “Dashboard model”, a notification in ModelOp Center will be produced.
These steps can be summarized in the following Model Life Cycle (MLC):
Dashboard Monitor Results and Notifications
Default Output
For a given Business Model, the Dashboard Test Results are listed under the Test Results
table:
View an Individual Dashboard Test Result
Click on the “View” icon will display the Test Result details, including the Test Result notifications, graphical output, and raw output:
Test Result Notifications: provides a list of related notifications. This could include:
Execution Errors: when a given “Dashboard model” run could not execute successfully.
Typically this is due to a system issue, such as the ModelOp runtime used to run the “Dashboard model” had an issue.
To troubleshoot, click on the “Jobs” tab of the Business Model snapshot. Click on the latest “Dashboard model” job run to view the logs from the Dashboard model job run.
Individual Monitor Issue: when a given monitor (e.g. a monitor that calculates drift) had did not run successfully or had to be skipped:
Typically this is due to missing inputs for that given monitor
To troubleshoot, click on the “Jobs” tab of the Business Model snapshot. Click on the latest “Dashboard model” job run to view the logs from the Dashboard model job run.
Graphical Test Results: shows a table view of all the output metrics from the Default Dashboard model, which includes the following items by default:
Numerical Metrics: Business Value, Data Drift, Statistical Performance, Stability, Ethical Fairness, Inferences:
Heatmap: contains the list of items that appear in the “heatmap” and their status (typically Green, Yellow, Red):
Raw Test Results: contains the detailed list (json) of all output from the “Dashboard model”:
Installing the Dashboard Model - Initial Install & Setup
By default, ModelOp Center will create and configure the default “Dashboard model” during ModelOp Center installation. Below are more details on this process and troubleshooting steps, as required.
Dashboard Model Installation
To configure the dashboard the following steps need to be taken:
Add repository attributes to model manager
application.yaml
(See below for example)Restart model-manager
The list of the repositories can be multiple models and for each provide the following attributes:
repositoryRemote: The git clone URL
repositoryBranch: the branch of the git repository
deployedModel: (optional) if deployedModel.runtimeName is set ModelOp will create a snapshot and deploy to the runtimeName after finishing the import
schedule: (optional) The schedule on which to run the deployed model by triggering the signal in signalActionName
Below is an example that would be placed of the application.yaml
model-manage: git: load-on-startup: repos: - repositoryBranch: master repositoryRemote: https://github.com/modelop/default_dashboard_model deployedModel: runtimeName: engine-8 schedule: quartzSchedule: '0 0 7 * * ? *' signalActionName: com.modelop.mlc.definitions.Signals_DASHBOARD_MODEL
Notes:
Line 7 should be replaced with your the a private repository value in order to have proper permissions.
Line 9 should be the name of an existing runtime in your environment where the model can be deployed.
Line 11 is a quartz schedule for every day at 7 am. This can be adjusted as necessary.
Line 12 is the signal to trigger on the schedule.
You will need to restart the Model-Manage for change to take effect
Dashboard Process
Upon installation of the environment, the default MLC’s should be loaded in the ModelOp Center instance. The Dashboard Process MLC is used to generate Dashboard test results. This MLC can be executed on a schedule or on demand using the signal com.modelop.mlc.definitions.Signals_DASHBOARD_MODEL
as seen in line 12 of the snippet above.
Dashboard Schedule Manual Setup
If you want to manually modify the schedule for the Dashboard process:
Navigate to “Dashboard” page
Click on the gear icon in the upper right hand corner and select “Scheduler”
Edit the current schedule or add additional schedules using
com.modelop.mlc.definitions.Signals_DASHBOARD_MODEL
as the signal name.
Dashboard Schedule Troubleshooting
If no Dashboard model is deployed attempting to edit the schedules will result in a dialog like the follow.
This can be resolved by deploying the dashboard model to an appropriate runtime either by following the instructions in Dashboard-Model-Installation or by deploying manually.
Configuring the Dashboard Model
Dashboard Monitor Assumptions
To run the “Dashboard model”, there are a few requirements:
A Dashboard model and related “batch deployment of the Dashboard model” have been created (a default version is created with ModelOp Center installation)
A Dashboard scheduler has been created (a default version is created with ModelOp Center installation)
A Dashboard model threshold file (e.g. dashboard_model.dmn) is present on the “Dashboard model”
One or more Business Models are configured correctly (see the Included Monitors section for more details on prerequisites):
Contains an extended schema asset that specifies the appropriate fields for drift, labeled fields, score fields, etc.
Contains the requisite model metadata for business value calculation.
“Baseline” data asset present
Production “Comparator” data asset present
Business Model is “deployed in production”
Included Monitors
The default “Dashboard model” contains a number of individual monitors that are commonly used across all models across the enterprise. While the “Dashboard model” can be customized per the needs of the business, the below tables provide details of the default monitors and metrics calculated, as well as the inputs required:
Business KPI & Inferences
Metric | Description | Required Inputs for a Given Business Model | Metric for Evaluation (from Dashboard Model Test Result) |
---|---|---|---|
Business KPI | The cumulative business value for the Business model. |
| actualROIAllTime |
Daily Inferences | The count of inferences processed by the given Business model over the period |
| allVolumetricMonitorRecordCount |
Heatmap Monitors
Monitor name | Description | Required Inputs for a Given Business Model | Metric for evaluation | Heatmap criteria |
---|---|---|---|---|
Data drift | Calculates the p-value from a kolmogorov_smirnov test for each feature and compares the max value against the thresholds (from the DMN) to determine the status |
| max( <feature_1>: <p-value>, ...:..., <feature_n>: <p-value>) i.e. the max of all the p-values across all the features | max(p-value) > 2 → RED 1 < max(p-value) < 2 → YELLOW max(p-value) < 1 → GREEN max(p-value) IS NULL or test fails → GRAY |
Concept drift | Calculates the p-value from a kolmogorov_smirnov test for the output score column(s) and compares the max value against the thresholds (from the DMN) to determine the status |
| max( i.e. the max of all the p-values across the score columns (usually there is only one but there could be multiple) | max(p-value) > 2 → RED 1 < max(p-value) < 2 → YELLOW max(p-value) < 1 → GREEN max(p-value) IS NULL or test fails → GRAY |
Statistical performance | Calculates the performance metrics (e.g. auc or rmse) for the model using ground truth. Compares against the thresholds (from the DMN) to determine the status |
|
|
0.6 <
|
Characteristic Stability | calculates the characteristic stability index for each feature and compares the max value against the thresholds (from the DMN) to determine the status |
| max( i.e. the max of all the stability indexes across all features | max( 0.1 < max( max( max( |
Ethical Fairness | Calculates the maximum and minimum proportional parity for each protected class and compares the max and min values against the thresholds (from the DMN) to determine the status |
| max( | max( max( |
Configuring Dashboard Thresholds
Defining Thresholds for the Dashboard
As mentioned in the Monitoring Concepts article, ModelOp Center uses decision tables to define the thresholds within which the model should operate. This applies for the “Dashboard Model” as well. However, the decision table for the “Dashboard Model” uses a few specific features to populate the UI dynamically. For this tutorial, we will leverage the default dashboard_model_demo.dmn
decision table. Specifically, this decision table ensures that all typical metrics for example classification models (e.g. German Credit Model) are within specification.
Decision Table Inputs:
The input column names on the decision tables can be any of the element names in the “Dashboard Model” model test results (see Running the Dashboard Model section):
The input column values for the Dashboard Model are typically a specific value for the given metric OR the MAX or MIN of a given metric across all features or characteristics (e.g. for Data Drift, the max p-value for a KS test across all input features). Using MAX and MIN thresholds allows for providing a consistent way to evaluate metric thresholds across all features for a given model (as opposed to having to define thresholds for each individual feature)
Decision Table Output:
“monitor_name”: the category for the specific metric that is being considered. A Customer can choose any categories they want by simply filling in their desired category in this decision table column with the specific column name called “monitor_name”
“color”: the resulting status color that will be displayed in the Executive Dashboard “heatmap” for that “monitor name” based on the thresholds defined in the decision table inputs. Using this fictitious example, if the max p-value for data drift is greater than 2, then the Executive Dashboard heatmap will display a “Red” value for the “Data Drift” monitor category.
Uploading Dashboard Thresholds Changes
Modifications to the “Default Dashboard” thresholds can be made by updating the “dashboard_model.dmn” file in the “Default Dashboard” git repository. To find the backing git repository for the “Default Dashboard” model:
Navigate to Monitors from the main menu
Select the “Default Dashboard” monitor from the list
Select the “Repository” tab. The git repository URL and branch are listed here:
To make the changes to the “Default Dashboard” thresholds, simply check in the relevant changes to the “dashboard_model.dmn” into the git repository and branch listed in the prior step.
Once ModelOp Center performs its regular git sync (or the User selects the “Sync GIT” option in the Repository tab of step 3, the updated “dashboard_model.dmn” file should be brought into ModelOp Center.
Confirm by selecting the “Assets” tab of the “Default Dashboard” monitor and click on the “dashboard_model.dmn” file to get more details:
Select the “view source” icon to look for the changes for confirmation.