Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

The following are steps to run a Batch Scoring Job on a Batch Deployed Model.

Table of Contents

Pre-requisites:

Deploy a Model as Batch

Prior to running a Batch Scoring job, you should have a Model Deployed as Batch. To do so, please refer to the section https://modelopdocs.atlassian.net/wiki/spaces/dv25/pages/1655341915/Operationalizing+Models%3A+Batch#Operationalize-a-Model---Batch-Deployment-in-a-ModelOp-Runtime.

Pepare Runtimes

Identify the target Runtimes across the requisite Environments.

Please note, it is possible this step has already been done given the pre-requisites, but it’s worth noting that the Runtime matching also happens at Job scheduling so the engine still has to match at Job creation, not only at deployment time.

For each target Runtime, complete the following:

  1. Add “Environment/Stage Tags”. Based on the environments/stages required (see pre-requisites), add the necessary “environment/stage tag” to the runtime.

    1. Example: add a “DEV” tag to the Runtime in their development environment, an “SIT” tag to the Runtime in their SIT environment, a “UAT” tag to the Runtime in their UAT environment, and ultimately a “PROD” tag to the Runtime in their Prod environment

  2. Add “Model Service Tags”. The Model “Service” tag will be used to identify that this specific runtime is designed to be a target runtime for that particular model. Add the appropriate “Model Service Tag” to the runtime.

    1. Example: add a “cc-fraud” Model Service Tag to the runtime for a 3rd party credit card model to the “Dev”, “SIT”, “UAT”, and “Prod” runtimes.

Running batch Job with MLC

Trigger Job Creation:

Launch MLC via REST API

Example request:

curl --request POST 'http://gateway/mlc-service/rest/signalResponsive' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {{token}}' \
--data-raw '{
    "name": "com.modelop.mlc.definitions.Signals_DEPLOYED_BATCH_JOB",
    "variables": {
        "TAG": {
            "value": "model-service-tag"
        },
        "MODEL_STAGE": {
            "value": "PROD"
        }
    }
}'

 Additional example including custom input asset
{
    "name": "com.modelop.mlc.definitions.Signals_DEPLOYED_BATCH_JOB",
    "variables": {
        "TAG": {
            "value": "model-service-tag"
        },
        "MODEL_STAGE": {
            "value": "PROD"
        },
        "INPUT_ASSETS": {
            "value": "[{\"name\": \"input_data.json\",\"assetType\": \"EXTERNAL_FILE\",\"repositoryInfo\": {\"repositoryType\": \"S3_REPOSITORY\",\"secure\": false,\"host\": \"modelop\",\"port\": 9000,\"region\": \"default-region\"},\"fileUrl\": \"http://modelop:9000/modelop/input_data.json\",\"filename\": \"input_data.json\",\"fileFormat\":\"JSON\"}]",
            "type": "Object",
            "valueInfo": {
                "objectTypeName": "java.util.ArrayList<com.modelop.sdk.dataobjects.v2.assets.ExternalFileAsset>",
                "serializationDataFormat": "application/json"
            }
        }
    }
}

Notice the escaped value with the serialized asset list and the valueInfo with serialization info.
More info here (Variables in the REST API Camunda docs).

Launch MLC via MOC CLI

  • Make sure to have the MOC CLI installed.

  • Create a json file a similar structure as the one described in the body of the request above.

  • Trigger signal with the following command:
    moc mlc trigger --file <local file>

Additional details

 Additional options about the MLC trigger through the CLI (file vs body)

moc mlc trigger -h

Trigger/launch an MLC process by providing signal object json body using --file or --body flag.

Usage:
  moc mlc trigger [flags]

Examples:

# Trigger mlc using signal object from a file
moc mlc trigger --file ./path/to/file/signal.json

# Trigger mlc using raw json
moc mlc trigger --body {"name":"com.modelop.mlc.definitions.Signals_start_data_drift","variables":{"TAG":{"value":"model_a","type":"Object","valueInfo":{"objectTypeName":"java.lang.String","serializationDataFormat":"application/json"}}}}

Flags:
      --body string   Provide JSON body for launching the MLC
  -f, --file string   Use json from the file for launching the MLC
  -h, --help          help for trigger

Follow up:

To follow up the process triggered via MLC there are several points of validation. Please look at the following diagram to identify them as explained below.

  1. Use the ‘processInstanceId' returned by the signalResponsive endpoint mentioned above, to call the following endpoint and retrieve the “jobId" from the JSON response.
    http://gateway/model-manage/api/jobHistories/search/findAllByJobMLCS_ProcessInstanceRootProcessInstanceId?processInstanceId={processInstanceId}

  2. Use the ‘jobId’ returned by the previous call, to check the status of the job in the following endpoint. 
    http://gateway/model-manage/api/jobs/{jobId}
    If the job finished successfully or finished in error will be (or is still running), will be visible on this state.

  3. But if the job never ran due to an error during the MLC, we can follow up on the runningInstance incidents through this endpoint.
    http://gateway/mlc-service/rest/incident?processInstanceId={processInstanceId}

Running batch job with CLI

Trigger Job Creation:

Launch MLC via REST API

  • Make sure to have the MOC CLI installed.

  • Create a json file a similar structure as the one described in the body of the request above.

  • Retrieve the deployment id
    moc deployment ls <storedModel name> --state deployed --tag <target stage>

  • Trigger signal with the following command:
    moc job create deployedbatch <deployedModel ID> <input_file> <output_file> [flags]

Additional details

 Additional options about the CLI job scoring batch job creation for a deployed model as batch.

moc job create deployedbatch -h

Create and run a deployed batch job in ModelOp Center using a deployment ID, an input file, and an output file name.

Input can be provided by the following methods:
    • Provide the path to a local file that will be embedded in the database for input. There is a size limit of 10MB for embedding files. If unsure of the file size, use the --force flag; this will not fail the command, and will push the file to the S3 bucket configured with ModelOp Center.
    • Provide the path to a local file and use the --upload-input flag to push the file to the S3 bucket configured with ModelOp Center.
    • Provide a URL to an existing file in the S3 bucket in the format [http/s3/S3n/S3a]://Domain/PATH/file.txt. The credentials should be configured with the ModelOp Center. When using a URL of a file in S3, use the --input-region flag to provide the S3 region. If the URL is not using one of the schemes - "http", "s3", "s3n", "s3a", use the --external-input flag to enforce the URL to be an external asset URL.
    • Provide a SQL asset as input. Use the connection URL as the input URL, e.g., mysql://username:password@host:3306/db_name. The query can be provided using the --input-query flag, and additional parameters can be provided using the --input-param flag. The --input-param flag can be used multiple times and the query parameters will be stored in the order the flags are provided. If the connection URL is not using one of the schemes - "mysql", "sqlserver", "snowflakedsiidriver", "db2", use the --sql-input flag to enforce the URL to be used as a SQL connection URL.
    • Provide a HDFS asset using a URL, e.g., hdfs:///hadoop/demo/test_model/sample_data.csv. If the URL is not using the "hdfs" scheme, use the --hdfs-input flag to enforce the URL to be a HDFS asset URL.
	• Use existing asset from the storedModel by providing asset name, e.g ref:asset_tes.json

Output can be provided in a similar way to the input, but uses output-related flags:
    • Provide a name for the output file that will be embedded in the database.
    • Provide a name of the file, and use the --upload-output flag to push the file to the S3 bucket configured with ModelOp Center.
    • Provide a URL to an existing file in the S3 bucket in a similar format as the input file. When using a URL of a file in S3, use the --output-region flag to provide the S3 region. If the URL is not using one of the schemes - "http", "s3", "s3n", "s3a", use the --external-output flag to enforce the URL to be external asset URL.
    • Provide a SQL asset as output. Use connection URL as the output URL, e.g., mysql://username:password@host:3306/db_name. The query can be provided using the --output-query flag, and additional parameters can be provided using the --output-param flag. The --output-param flag can be used multiple times and the query parameters will be stored in the order the flags are provided. If the connection URL is not using one of the schemes - "mysql", "sqlserver", "snowflakedsiidriver", "db2", use the --sql-output flag to enforce the URL to be used as a SQL connection URL.
    • Provide a HDFS asset using URL, e.g., hdfs:///hadoop/demo/test_model/sample_data.csv. If the URL is not using the "hdfs" scheme, use the --hdfs-output flag to enforce the URL to be a HDFS asset URL.
	• Use existing asset from the storedModel by providing asset name, e.g ref:output.json

Once the job is created, an engine is assigned to the job based on the MLC used for engine-to-job assignments. To specify the target engine, use the --engine flag and provide the engine name where the job should run.

By default, schema checking is disabled for all jobs. To enable input and/or output schema checking, use the --input-schema-check, --output-schema-check or --schema-check flags.

The deployment provided as a command argument should be a batch deployment and not a persistent deployment (endpoint deployment). To make sure that the batch model is in the DEPLOYED state, use --enforce-deployed flag.

By default, the command creates a job of type MODEL_BATCH_JOB using the deployed model provided. To create a job of type MODEL_BATCH_TRAINING_JOB and MODEL_BATCH_TEST_JOB, use --training-job and --test-job flag respectively.

Usage:
  moc job create deployedbatch <deployedModel ID> <input_file> <output_file> [flags]

Examples:

# Input: local file (embed) - Output: Empty embedded file
moc job create deployedbatch 4e4a19c7-2acb-4337-83c6-d0cc82db5a96 input.json output.json

# Input: local file (embed) - Output: Create empty S3 file
moc job create deployedbatch 4e4a19c7-2acb-4337-83c6-d0cc82db5a96 input.json output.json --upload-output

# Input: Upload local file to S3 - Output: Create empty S3 file
moc job create deployedbatch 4e4a19c7-2acb-4337-83c6-d0cc82db5a96 input.json --upload-input output.json --upload-output

# Input: URL to file in S3 bucket configured with ModelOp Center - Output: Create empty S3 file
moc job create deployedbatch 4e4a19c7-2acb-4337-83c6-d0cc82db5a96 https://modelop.s3.us-east-2.amazonaws.com/test_model_data/input.json --input-region us-east-2 output.json --upload-output

# Input: SQL asset - Output: SQL asset
moc job create deployedbatch 4e4a19c7-2acb-4337-83c6-d0cc82db5a96 mysql://username:password@host:3306/db_name --input-query 'SELECT symbol,price FROM xyz_table' mysql://username:password@host:3306/db_name --output-query 'INSERT INTO test_output (value) VALUES (?)' --output-param total

# Input: URL to file in S3 bucket configured with ModelOp Center - Output: SQL asset
moc job create deployedbatch 4e4a19c7-2acb-4337-83c6-d0cc82db5a96 https://modelop.s3.us-east-2.amazonaws.com/test_model_data/input.json --input-region us-east-2 mysql://username:password@host:3306/db_name --output-query 'INSERT INTO test_output (value) VALUES (?)' --output-param total

# Input: HDFS asset - Output: HDFS asset
moc job create deployedbatch 4e4a19c7-2acb-4337-83c6-d0cc82db5a96 hdfs:///hadoop/demo/test_model/sample_data.csv hdfs:///hadoop/demo/test_model/sample_output.csv

# Input: Referenced asset from storedModel - Output: Referenced asset from storedModel
moc job create deployedbatch 4e4a19c7-2acb-4337-83c6-d0cc82db5a96 ref:input.sql ref:output.sql

# Input - Upload local file to S3 and Output - Create empty S3 file, create deployed batch test job
moc job create deployedbatch 4e4a19c7-2acb-4337-83c6-d0cc82db5a96 input.json --upload-input output.json --upload-output --test-job

# Input - SQL asset and Output - Create empty S3 file, create deployed batch training job
moc job create deployedbatch 4e4a19c7-2acb-4337-83c6-d0cc82db5a96 mysql://username:password@host:3306/db_name --input-query 'SELECT symbol,price FROM xyz_table' output.json --upload-output --training-job

Flags:
      --enforce-deployed           Enforce the state to be DEPLOYED on the model provided
      --engine string              Specify target engine name where the job should run
      --external-input             Force the URL provided for input to be an external S3 file URL
      --external-output            Force the URL provided for output to be an external S3 file URL
  -f, --force                      In case file is too large to be stored as a local asset, store it as an external asset
      --hdfs-input                 Force the URL provided for input to be a HDFS URL
      --hdfs-output                Force the URL provided for output to be a HDFS URL
  -h, --help                       help for deployedbatch
      --input-param stringArray    Provide parameters for the input SQL query
      --input-query string         Provide a query string for the input SQL asset
      --input-region string        Provide the region for the input S3 URL
      --input-schema-check         Enable schema checking on input
      --output-param stringArray   Provide parameters for the output SQL query
      --output-query string        Provide a query string for the output SQL asset
      --output-region string       Provide the region for the output S3 URL
      --output-schema-check        Enable schema checking on output
      --schema-check               Enable schema checking on both input and output
      --sql-input                  Force the URL provided for input to be a SQL connection string
      --sql-output                 Force the URL provided for output to be a SQL connection string
      --test-job                   Create MODEL_BATCH_TEST_JOB with the model provided
      --training-job               Create MODEL_BATCH_TRAINING_JOB with the model provided
      --upload-input               Upload the input file provided to the S3 bucket configured with ModelOp Center
      --upload-output              Create an output file with the provided name in the S3 bucket configured with ModelOp Center

More info about the CLI command on the (moc job docs).

Follow up:

The above command returns the jobId which can be used in the following REST API endpoint to query the status:
http://gateway/model-manage/api/jobs/{jobId}

  • No labels