Model Lifecycle Management: Overview
This article describes the ModelOp Center’s MLC (Model Life Cycle) Manager and the MLC Process, and how they are used to drive automation and repeatability for a robust ModelOps program.
Table of Contents
Overview
The ModelOp Life Cycle Manager (MLC Manager) automates operations related to the productionization, monitoring, and governance of models so that you can get them into service quickly, keep track of how each model is performing, and have easy access to the entire history of each model. For a large enterprise, there are hundreds or thousands of models, each of which has differing business requirements and different pathways to production. The MLC Manager provides flexibility with how you manage and automate portions of a model’s life cycle to meet the disparate needs across groups -- all in a central, governed location.
There are two core concepts to how ModelOp Center achieves enterprise-scale automation and repeatability: the MLC Manager and the MLC Process. The subsequent sections provide more detail on each, and then dive into several scenarios of how to leverage the MLC Manager and MLC Processes.
MLC Manager
The MLC Manager is a low-code automation framework that executes, monitors, and manages MLC Processes. The MLC Manager is built on top of Camunda: a leading Java-based framework supporting Business Process Model and Notation (BPMN) for workflow and process automation. The MLC Manager is the answer to a number of obstacles faced by teams:
Reduces the time it takes to get a model from the model factory into production by defining a consistent methodology within your business to move the model through each required step, and track its progress throughout your organization.
Ensures that all models in production are producing optimal results and within compliance rules
Scales the functions necessary to manage the hundreds or thousands of models across the enterprise, controlling the most important tasks and processes for a variety different models.
MLC Process
The MLC Process encodes and automates a set of steps in a model’s life cycle, which can range from model registration, to submitting models for full productionization, to continuous production testing, and eventual retirement. The MLC Manager executes and monitors each MLC Process, and automatically captures metadata and information about the model’s journey through the MLC Process.
An MLC Process can apply to an individual model or a set of models, using common criteria such as business unit, model language, or the model framework they employ. Regardless, the MLC Process provides the consistent methodology for managing the various pathways of a model’s journey in an enterprise, across all models and all groups. This could include highly regulated models that require strict regulatory oversight, or rapid deployment internal-use-only models that require a minimal process.
A typical ModelOp Center implementation will have more than one MLC Process. Each MLC Process is defined in any BPMN compliant editor, such as Camunda Modeler, as a BPMN file.
MLC Processes leverage the standard elements of a Camunda BPMN along with custom delegates that interface with ModelOp Center. This allows the flexibility to orchestrate complex operations within ModelOp Center. The common entities within an MLC Process include:
Signal events - events that initiate the MLC Process or trigger an action to occur from within an MLC. These can be triggered on events such as when a model is changed or based on a timer.
Tasks - there are a variety of tasks within an MLC Process:
User tasks - manual tasks for specific users to perform, such as approvals. These pause the progress of the workflow until completed.
External service calls - used to integrate and interact with other systems.
Script tasks - runs custom code including inline Groovy. Typically, you utilize variables and model metadata to determine parameters for calls to ModelOp Center.
ModelOp Center calls - specific calls to ModelOp Center that automate interactions with the model including Batch Jobs (see: Model Batch Jobs and Tests) and Model Deployments.
Gateway - decision logic gates that control the flow based on information in the process, such as model metadata, test results, etc.
The automated operations within an MLC Process include collecting key metrics to help calculate Key Performance Indicators (KPI), such as how long it takes to get Models into Business or get changes approved.
For more details on the standard elements of BPMN 2.0, you can see the full documentation of Camunda at https://camunda.com/bpmn/reference/.
Scenarios for MLC Processes
The MLC Manager provides flexibility with how you manage and automate the various life cycles of models across the enterprise. Each model in the enterprise can take a wide variety of paths to production, have different patterns for monitoring, and have various continuous improvement or retirement steps.
MLC Processes are often triggered by external events. Some typical examples of this include:
a model is marked ready for productionization
a time based event
new data arrives in a location
a notification is received
a manual intervention by a user
In fact, the ModelOp Command Center has several UI features that leverage MLC Processes under the hood, including creating a snapshot which begins the process for productionization or creating a Job from the Jobs page, which handles executing the job. As one can see, there are many ways to automatically do a variety of different tasks with MLC Processes. The following provides more details of these typical processes.
Model Productionization
MLC Processes can automate the productionization of a model, regardless of whether the path to production is simple or complex. For example, you can use an MLC Process to deploy a newly registered model into your QA runtime, run the model through a series of tests, trigger an automated security scan, and seek appropriate approvals before it is deployed into Production. MLC Processes can be created in a flexible manner to meet the needs of your team. They can be configured to automatically locate an available runtime that is compatible with the current model, or a specific group of runtimes can be targeted by tags. See this article for more details on how this is accomplished. The example in Deploy with Test and Jira includes these deployment pieces.
Model Refresh & Retraining
After the initial deployment, it’s important to have a way to rapidly retrain or trigger the refresh of a model to ensure it is performing optimally. Retraining can be automated within an MLC process to run on a schedule or when new labeled data is available. Using the same MLC process, the new candidate model can be compared against the current deployed model using a Champion/Challenger Model Comparison. Finally, the MLC Process can automate the steps required for Change Management including re-testing and approvals. The example Deploy with Test and Jira demonstrates how you can build these operations into an MLC Process.
Approval & Tasks
Throughout the MLC Process, you can include User Tasks to direct specific team members or roles to review and approve changes to the models. ModelOp Center integrates with existing IT task management systems, such as JIRA, and ticketing systems, such as ServiceNow to incorporate these user based tasks. For each of these externally-created tasks or approvals, the MLC can inject model-specific metadata to provide context for the task or approval approval.
Monitoring Models
You can monitor models using MLC Processes by automatically running Model Batch Jobs and Tests on a model. You can run Batch Jobs on a schedule or based on new labeled or ground truth data becoming available. For example, you can run a Batch Metrics Job using the Run Back Test MLC to calculate the statistical performance and/or determine if the model has started to produce ethically biased predictions, and then use decision criteria to determine which action to take. A common pattern is to generate an alert into ModelOp Center for the ModelOp Support Team to triage.
Sample MLC Processes
This section describes some specific examples of MLC Processes in detail.
Deploy with run test and Jira MLC Process
Deploy with run test and Jira is an MLC Process that incorporates several of the patterns described earlier in a single MLC Process for managing the creation of a model, or changes to an existing model.
Model Submitted - a deployable model object has been created, which is a snapshot in time of the model with the typical goal of moving the model into production. Clicking “Create New Snapshot” in the Model Details page triggers the start of this MLC process.
Training - based on a metadata flag, a Training Job can be initiated to train the model. A service task automatically polls checking for the Training Job to finish before proceeding.
Testing - based on a metadata flag “run_test” (read from a snapshot level tag), an automated, reproducible Metrics Job executes, and then the results are persisted in ModelOp Center. If the test process fails a Jira ticket will be created with failure information and another run test attempt will be made if Jira ticket is moved to Done.
Approval Based on Test Results - based on a metadata flag “jira” (read from a snapshot level tag), the previous test results are analyzed if a DMN file is associated with the model. The details of the model, including all of the core information about the model, the changes to the model and the test results, are passed on to the reviewer on the Jira ticket.
Model Deployment - the MLC Process receives a list of ModelOp Runtimes if the matching runtime has no endpoints defined, then the model will be deployed as batch, leaving it ready for scoring on the matching runtime at the time of batch execution otherwise it will an Online deployment.
Error handling - when models are rejected, errors occur running the tests, or the model fails to deploy, the process creates Jira tasks to review the reasons for failure so they can take the appropriate actions.
A rejection Jira ticket is created with the details for the reviewer if:
We intended to run a Metrics Test Job (step 3) the model is missing test data
The job execution failed
The analyzed test results didn’t pass the DMN criteria
The generated Jira review was rejected
An error Jira ticket is created if other general errors are found during the process of deployment.
Update Expiration Date- when “jira” created for test results is moved to “Done”, expiration date on the snapshot is updated by value of Expiration Date “jira” field. If the Expiration Date field is not present in the “jira” project then snapshot’s expiration date is updated to next year.
Add Schedule- After the model is deployed if there are any schedules present on the previous deployed snapshot are copied to the newly deployed snapshot of the same model.
Run Back Test MLC Process
The Run Back Test is a simple monitor that runs a test against a new set of labeled data for a given model.
Start Event - a triggered signal event initiates the monitor. This signal (com.modelop.mlc.definitions.Signals_Run_Back_Test_Jira / com.modelop.mlc.definitions.Signals_Run_Back_Test_ServiceNow) can be triggered by a rest API providing the variables used during the process.
Get model - based on the
MODEL_ID
signal variable, the process will fetch the snapshotGet data - Based on the variables provided using the signal, the input/output will be decided. If the signal has
INPUT_FILE
the process will use as input to the job or find an asset with TEST_DATA role on the model. Similarly, ifOUTPUT_FILE
is provided with the signal, it will be used for storing the job’s output. Otherwise, it will create an embedded output file and use it for the job.Run and Analyze test - runs a Metrics Test batch job to evaluate the model with the new data. Based on the results of the test, if the model has an associated DMN file, it will be used to determine success criteria.
Test Passed - generates a notification stating that the test passed
Test Failed - if test fails a Jira/ServiceNow ticket is created. The details of the model, including all of the core information about the model, the changes to the model and the test results with failure, are passed on to the reviewer on the Jira/Servicenow ticket.
Error Handling - The following are the scenarios if any error or exception occurs in the process
a. If there is an exception while running a test job a Jira/Servicenow review ticket is created and all the exception details are passed on to the reviewer.
b. If there is any exception or error occurs during the process a notification is generated with a failure reason to notify the user.
For more details on how to run this MLC see the article on triggering metrics tests.
Run Batch Model Job MLC Process
Run Batch Model Job is an MLC process that will trigger the execution of a scoring job on a batch deployed model. In order for this MLC to pick up correctly, a model must have been deployed as batch.
Start event - a triggered signal event initiates the execution. This signal (com.modelop.mlc.definitions.Signals_DEPLOYED_BATCH_JOB) should contain a variable with the TAG of the model (model service tag) to run.
Get Deployed Model - based on the tag given in the input signal, the process finds the most recent batch deployed model in “deployed” (active) state using the MODEL_STAGE (if provided) to match the deployment target of this execution.
Set inputs and outputs in order - Creates the input and output parameters from the provided signal variables.
Get compatible runtime - Finds the matching target runtime to run this scoring job.
Run job - runs the Scoring batch job to obtain model’s inference given the data.
Error handling - if certain conditions are not met an error is raised and Jira notification will be created.
Create & Deploy a New MLC Process Using Camunda Modeler
Download and install Camunda Modeler on your local machine. Go to https://camunda.com/download/modeler/. (Supported versions 4.8.1+)
Create a new BPMN. Go to https://docs.camunda.org/manual/7.13/modeler/bpmn/ for a quickstart on BPMN modeling.
When the MLC Process is ready, select the Deploy icon to put it in the MLC Manager. Note: while the URL will be highly dependent on your environment’s exact setup, it likely uses a path such as
http://<moc-base-url>/mlc-service/rest
3(a). If your target url is behind an oauth2 secured environment the Camunda Modeler will request for an authentication method. Please provide a bearer token by logging into your secured ModelOp Center.
curl --location --request POST 'http://<moc-base-url>/gocli/token' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'username=<user>' \
--data-urlencode 'password=<password>'
From the above command replace the <moc-base-url> and provide its respective <user> and <password>.
Retrieve the “access_token” from the response of the command above and paste it in the “Token” field presented.
4. Verify that your new MLC Process is registered with MLC Manager. Go to the Command Center and click the Models icon in the left column.
5. For Camunda Modeler version 5.x.x. Please make sure to select Camunda Platform 7 for a new bpmn file.
Related Articles
Next Article: Operationalizing Models: Overview >