Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This article describes how to use the ModelOp Command Center as the central station to monitor your models. It also describes the MLC processes that generate the alerts, and how to react to alerts, tasks and notifications reported by the MLC. The primary audience for this article is the Model ModelOps Support Team. Files | Questions?

Table of Contents

Table of Contents

Introduction

ModelOp

Command Center Dashboard

ModelOps Command Center Dashboard provides visibility into the health of the system and any events, conditions or tasks that need attention. You can customize an MLC to test for technical problems within the model, underlying business problems that are revealed in the data set, and for the efficacy of a model using metrics that warn the operator when the metrics fall outside of the configured range.

Operators in the MLC are configured to produce messages that are accessible through the Dashboard.

Brendan Kelly (Deactivated) screenshot with overlay

ModelOp Center Messages

The types messages generated from the monitoring infrastructure Center enables comprehensive monitoring of a deployed model through several mechanisms:

  • Runtime monitoring

  • Backtest metrics monitoring

  • Alerting & notifications

Runtime Monitoring

To get real-time insight into how your model is performing, you can click into a detailed, real-time view of the Runtime information for the deployed model. This includes real-time monitors about the infrastructure, data throughput, model logs, and lineage.

To see the Runtime Monitoring, navigate to the deployed model: Runtimes → Runtime Dashboard → <Runtime where your model is deployed>

The Runtime monitor displays the following information about the Runtime environment:

  • Endpoint throughput - volume of data through the deployed model

  • CPU Utilization - User CPU utilization and Kernel CPU usage

  • System Resource Usage - real-time memory usage

  • Lineage of the deployment - MLC Process metadata that details the deployment information and history

  • Logs - A live scroll of the model logs

Image Added

Backtest Metrics Monitoring

While some models may allow for inline efficacy monitoring, most models do not obtain ground truth until a future date, which necessitates the use of regular backtesting. ModelOp Center allows you to define metrics functions that can be used to execute regular backtests. An MLC process can automate the regular execution of a backtest to compute statistical metrics. See Model Efficacy Metrics and Monitoring for more details on defining and executing backtesting metrics.

Alerting & Notifications

Alerts, Tasks, and Notifications Messages provide visibility into information and actions that need to be taken as a result of model monitoring. These “messages” are surfaced through the Command Center Dashboard, but also can be tied into enterprise ticketing and alerting systems.

The types of messages generated from Model Monitoring include:

  • Alerts - test failures and , model errors, and other situations that require a response.

  • Tasks - user tasks such as approve a model, acknowledge a failed test, etc.

    • For details about viewing and responding to test failures, seeAddressing User Taskson this page.

  • Notifications - includes system status, engine runtime status and errors, and model errors

Model Test Failures

  • Click Test Failure in the Models pane.

  • Triage

  • Model Errors or Engine Errors

  • Click Model Error.

  • Triage

  • Addressing
    • , and other information generated by ModelOp Center automatically.

    Responding to Notifications & Alerts

    Notifications and Alerts provide visibility into real-time information about all models across teams and the organization. Notifications are generated by ModelOp Center as well as by MLC Processes. They provide information on what is happening within the system. Alerts function similarly, but require investigation and response. Alerts are generated by the MLC Process and have a severity of ERROR. You can use logic in the process to determine when they’re generated for ModelOps Support to respond to: test exceeds threshold, deployment failures, etc.

    Responding to Notifications & Alerts:

    1. In the Command Center home screen, select the Notification from table to learn more information. For example, selecting a Runtime Notification will take you to the Runtime Detail View of the Runtime.

    2. Selecting into an Alert will provide information about the context of the Alert. For example, by clicking “Test Failures” in the Models pane, you can see the details of the model that failed the backtest.

      Image AddedImage Added

    3. Clicking on this first Alert gives you more information about the item.

    4. You can also interact with this Alert to send it to have the issue resolved. In this case, the main action is to Notify the Developer.

    User Tasks

    User tasks are displayed under the Tasks & Alerts tab in the column on the left. Tasks are filtered by All Open Tasks and Tasks in Progress. ModelOps ModelOp Center has a light-weight task management tool and also integrates with the task management tool of your choisechoice, such as ServiceNow or Jira.

    Image RemovedClick the Tasks and

    Typically, user tasks are related to either (a) model productionization or (b) model monitoring.

    1. To view user tasks, click the Tasks & Alerts icon in the left columnsidebar. Tasks are filtered by All Open Tasks and Tasks in Progress. 

    (NOTE: Replace this image with one that has some of the fields populated)

    RunTime/ Engine Monitoring

    The Runtime monitor displays the following information about the Runtime environment:

    • Endpoint throughput

    • User CPU utilization and Kernel CPU usage

    • Real-time system resource usage

    • A diagram of the MLC process currently managing the model

    • A live scroll of the logs

    Image Removed

    http://mocaasin.modelop.com/#/engine_overview/engine_view

    Related Articles
    1. Image Added

    2. Note that this particular task resulted from a failed model backtest for a currently deployed production model.

    3. Click into the User Task for context on the task to be performed.

      Image Added
    4. You have the option to assign the issue to yourself and begin work on the task to address the issue.

      Image Added