Overview
A Batch Job is an execution of a model against a set of records. Batch Jobs score or test models against batches of records. You can build Batch Jobs into the MLC Process, do it manually from the Command Center, or run a Batch Job from the command line. The main types of Batch Jobs are detailed in the following table.
Type | Description |
Scoring Job | Executes the Scoring Function to yield predictions for each of the records in the input data. This can be used for conducting testing or for production Batch Jobs. |
Metrics Job | Executes the Metric Function against labeled test data. This yields efficacy metrics and/or bias detection and interpretability metrics. |
Training Job | Executes the Training Function to train or re-train a model. The output is typically a trained model artifact or other type of attachment. |
For additional context on the functions, see Creating Production Ready Models
For more information about efficacy metrics, see Model Efficacy Metrics and Monitoring.
For more information about bias and interpretability metrics, see Model Governance: Bias & Interpretability.
Batch Job Scenarios
You can use Batch Jobs for several different scenarios in a Model’s Life Cycle as detailed in the following table.
Scenario | Job Type | Description |
---|---|---|
Testing a Model | Scoring Job | Use the Scoring Job to score test data so that you can conduct functional, performance, or system testing of the model execution code. You can enable or disable schema checking during this test. |
Model Back-Test/Evaluation | Metrics Job | The Metrics Job executes the Metrics Function against labeled test data to generate evaluation metrics such as F1, Confusion Matrices, ROC Curve, and AUC. See Model Model Efficacy Metrics and Monitoring for more information. |
Ethical Fairness Detection | Metrics Job | Use the Metrics Job to run the Metrics Function against labeled data to generate metrics that detect ethical fairness. See Model Governance: Bias & Interpretability for more information. |
Re-Training/Refresh | Training Job | When new labeled data is available, use a Training Job to create a new trained model artifact. This can be automated in an MLC Process. See Model Lifecycle Manager: Automation for more information. |
Input & Output Data Sets for Batch Jobs
Batch Jobs require input data in order to run. You can upload your input data set into the Command Center or specify the location of your local input data set file from the command line. You can also leverage data that is stored in an S3-compliant storage location, and reference that data set in your Batch Job.
When saved as an Embedded File, the output is accessible under Models > Model Tests. From there you can inspect the details of the Batch Job, and use the file in a Champion/Challenger Model Comparison comparison test.
When saved as an S3 Based File, the Download File button directs you to the requisite S3 object when you select your Batch Job from the list .
Create a Batch Job in the Command Center
Click Runtimes in the left column.
Click Create a New Batch Job. A list of models appears.
Click the the model you want to test.
Provide the input data set. Click Choose File, select the file with your input data, and then click Upload File or Embed File.
Designate the name of the output data set and the location of where it should be posted upon completion.
Click Create Scoring Job, Create Metrics Job, or Create Training Job at the bottom of the page to tell ModelOp Center which function to leverage.
The Job Details screen displays the status of the job as it runs.
Create a Batch Job from the CLI
Install the ModelOp CLI if it is not already installed. See the ModelOp CLI Reference for install instructions.
Type
moc job create [batchjob | testjob | trainingjob] <deployable-or-deployed-model-uuid> input.json output.json
Where:batchjob
is a Scoring Job as described earlier in this articletestjob
is a Metrics Job as described earlier in this articletrainingjob
is a Training Job as described earlier in this articledeployable-or-deployed-model-uuid
is the uuid of a model already registered with the ModelOp Center (see the ModelOp CLI Reference for how to find these uuids)input.json
is the name of the data set to run the job againstoutput.json
is the name of the output file
Type
moc job result <uuid>
where<uuid>
is the unique identifier generated by the command in the previous step. If the output file is embedded, the results are displayed in the terminal. If the job utilizes an external file asset (S3) for the output, then the results will yield a link to the S3 object where the results are placed.
Next Article: Champion/Challenger Model Comparison >