Overview
ModelOp Center enables comprehensive monitoring of a deployed model through several mechanisms:
Runtime monitoring
Backtest metrics monitoring
Alerting & Notifications
Runtime Monitoring
To get real-time insight into how your model is performing, you can click into a detailed real-time view of the Runtime information for the deployed model. This includes real-time monitors about the infrastructure, data throughput, model logs, and lineage.
To see the Runtime Monitoring, navigate to the deployed model: Runtimes → Runtime Dashboard → <Runtime where your model is deployed>
The Runtime monitor displays the following information about the Runtime environment:
Endpoint throughput - volume of data through the deployed model
CPU Utilization - User CPU utilization and Kernel CPU usage
System Resource Usage - real-time memory usage
Lineage of the deployment - MLC Process metadata that details the deployment information and history
Logs - A live scroll of the model logs
Backtest Metrics Monitoring
While some models may allow for inline efficacy monitoring, most models do not obtain ground truth until a future date, which necessitates the use of regular backtesting. ModelOp Center allows you to define metrics functions that can be used to execute regular backtests. An MLC process can then be used to automate the regular execution of a backtest to compute statistical metrics. See Model Efficacy Metrics and Monitoring for more details on defining and executing backtesting metrics.
ModelOp Center Messages
Messages provide visibility into the models across the organizations on the Command Center Dashboard.
The types of messages generated include:
Alerts - test failures, model errors, and other situations that require a response.
For details about viewing and responding to test failures, see Model Test Failures on this page.
For information about how to configure an alert in an MLC Process, see https://docs.camunda.org/optimize/latest/user-guide/alerting/.
Tasks - user tasks such as approve a model, acknowledge a failed test, etc.
For details about viewing and responding to test failures, see Addressing User Tasks on this page.
Notifications - includes system status, engine status and errors, model errors, and other information generated by ModelOp Center automatically.
Responding to Notifications & Alerts
Notifications and Alerts provide visibility into the models across teams and the organization. Notifications are generated by ModelOp Center as well as by MLC Processes. They provide information on what is happening within the system. Alerts function similarly, but require investigation and response. Alerts are generated by the MLC Process and have a severity of ERROR. You can use logic in the process to determine when they’re generated for ModelOps Support to respond to: test exceeds threshold, deployment failures, etc.
Responding to Notifications & Alerts:
Select the Notification from table to learn more information. For example, selecting a Runtime Notification will take you to the Runtime Detail View of the Runtime.
Selecting into an Alert will provide information about the context of the Alert. In this example, you can see the details of the model that failed the back-test.
You can also interact with this Alert to send it to have the issue resolved. In this case, you can see the ModelOp Support Team can notify the Developer about this issue.
<screenshot>
User Tasks
User tasks are displayed under the Tasks & Alerts tab in the column on the left. Tasks are filtered by All Open Tasks and Tasks in Progress. ModelOps has a light-weight task management tool and also integrates with the task management tool of your choice, such as ServiceNow or Jira.
Click the Tasks and Alerts icon in the sidebar. Tasks are filtered by All Open Tasks and Tasks in Progress.
Click into the User Task for context on the task to be performed.
You have the option to assign the issue to yourself.
At this point, you can complete the required task and the MLC Process will update.
<screenshot>
Job Monitoring
Batch Jobs are used for business critical activities like scoring, tests, and training so they should also be monitored similar to the Runtime.