Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This article describes how to use the ModelOp Command Center as the central station to monitor your models. It also describes the MLC processes that generate the alerts, and how to react to alerts, tasks and notifications reported by the MLC. The primary audience for this article is the ModelOps Support Team. to enable operational monitoring, focused on ensuring that models are available and running within SLA’s on the target runtimes.

Table of Contents

Table of Contents

Introduction

ModelOp Center enables comprehensive monitoring of a deployed model through several mechanisms:

  • Runtime monitoring

  • Backtest metrics monitoring

  • Alerting & notifications

    Operational performance monitors include:

    • Runtime Monitoring:

      • Model availability and SLA performance

      • Data throughput and latency with inference execution

    • Model Data Monitoring:

      • Input (and output) data adherence to the defined schema for model

    Runtime Monitoring

    To get real-time insight into how your model is performing, you can click into a detailed, real-time view of the Runtime information for the deployed model. This includes real-time monitors about the infrastructure, data throughput, model logs, and lineage, where available.

    To see the Runtime Monitoring, navigate to the deployed model: Runtimes → Runtime Dashboard → <Runtime where your model is deployed>

    The Runtime monitor displays the following information about the Runtime environment:

    • Endpoint throughput - volume of data through the deployed model

    • CPU Utilization - User CPU utilization and Kernel CPU usage

    • System Resource Usage - real-time memory usage

    • Lineage of the deployment - MLC Process metadata that details the deployment information and history

    • Logs - A live scroll of the model logs

    Image RemovedImage Added

    Backtest Metrics

    Model Data Monitoring

    While some models may allow for inline efficacy monitoring, most models do not obtain ground truth until a future date, which necessitates the use of regular backtesting. ModelOp Center allows you to define metrics functions that can be used to execute regular backtests. An MLC process can automate the regular execution of a backtest to compute statistical metrics. See Model Efficacy Metrics and Monitoring for more details on defining and executing backtesting metrics.

    Alerting & Notifications

    Alerts, Tasks, and Notifications Messages provide visibility into information and actions that need to be taken as a result of model monitoring. These “messages” are surfaced through the Command Center Dashboard, but also can be tied into enterprise ticketing and alerting systems.

    The types of messages generated from Model Monitoring include:

    • Alerts - test failures, model errors, and other situations that require a response.

    • Tasks - user tasks such as approve a model, acknowledge a failed test, etc.

      • For details about viewing and responding to test failures.

    • Notifications - includes system status, runtime status and errors, model errors, and other information generated by ModelOp Center automatically.

    Responding to Notifications, Tasks & Alerts

    ModelOp Center integrates with the ticketing system of your choice, such as ServiceNow or Jira. Model Lifecycle processes can be configured to generate alerts on these events and assigned to an appropriate party for further investigationnot required, ModelOp Center provides its own runtime out of the box, which has the capability to validate incoming and outgoing data from the model for adherence to a defined schema. This schema is a defined structure that the model expects to ensure that erroneous data is not accidentally processed by the model causing model stability errors or downtime.

    Overview

    ModelOp Center enforces strict typing of engine inputs and outputs at two levels: stream input/output, and model input/output. Types are declared using AVRO schema.

    To support this functionality, ModelOp Center’s Model Manage maintains a database of named AVRO schemas. Python and R models must then reference their input and output schemas using smart comments. (PrettyPFA and PFA models instead explicitly include their AVRO types as part of the model format.) Stream descriptors may either reference a named schema from Model Manage, or they may explicitly declare schemas.

    In either case, ModelOp Center performs the following type checks:

    1. Before starting a job: the input stream’s schema is checked for compatibility against the model’s input schema, and the output stream’s schema is checked for compatibility against the model’s output schema.

    2. When incoming data is received: the incoming data is checked against the input schemas of the stream and model.

    3. When output is produced by the model: the outcoming data is checked against the model and stream’s output schemas.

    Failures of any of these checks are reported: schema incompatibilities between the model and the input or output streams will produce an error, and the engine will not run the job. Input or output records that are rejected due to schema incompatibility appear as messages in the ModelOp runtime logs.

    Examples

    The following model takes in a record with three fields (namex and y), and returns the product of the two numbers.

    Code Block
    # modelop.schema.0: input_schema.avsc
    # modelop.schema.1: output_schema.avsc
    
    def action(datum):
          my_name = datum['name']
          x = datum['x']
          y = datum['y']
          yield {'name': my_name, 'product':x*y}
    

    The corresponding input and output AVRO schema are:

    Code Block
    {
      "type":"record",
      "name":"input",
      "fields": [
        {"name":"name", "type":"string"},
        {"name":"x", "type":"double"},
        {"name":"y", "type":"double"}
        ]
    }
    
    {
      "type":"record",
      "name":"output",
      "fields": [
        {"name":"name", "type":"string"},
        {"name":"product", "type":"double"}
        ]
    }
    

    So, for example, this model may take as input the JSON record

    Code Block
    {"name":"Bob", "x":4.0, "y":1.5}
    

    and score this record to produce

    Code Block
    {"name":"Bob", "product":"6.0"}
    

    Note that in both the model’s smart comments, the CLI commands, and the stream descriptor schema references, the schemas are referenced by their name in model manage, not the filename or any other property.

    Next Article: Drift Monitoring >