Triggering Model Test MLCs
There are three MLCs that can be triggered using a REST call to run the test.
Process name - CronTriggeredDriftTest.bpmn
Process name- CronTriggeredConceptDrift.bpmn
Process name - CronTriggeredBackTest.bpmn
Drift Test
Prerequisites:
A drift model taking two inputs (input data as first input and baseline/training data as the second input to the metrics function)
An input data file - this can be provided when adding the drift model as the associated model or can be provided when triggering the MLC
A baseline/training data file - this can be provided as model assets to the base model for which the drift test will run or can be provided when triggering the MLC
An output file - This is required only if the output is needed to be stored as external output - S3/SQL/HDFS asset. The output file can be provided when triggering the signal. (The process by default creates an embedded output file for the test)
Running the Drift test:
Create a snapshot of the drift model (this will be used for association)
Add a baseline/training data file to the base model if available. (this can be provided later with the Rest call to trigger the process)
Create a snapshot of the base model
When creating a snapshot add an associated model using the snapshot created by step 1
Use the “Data Drift Model” as the association role
Provide input data asset if available (this can be provided later with the Rest call to trigger the process)
(Optional)Provide the dmn (test data comparator) file that compares the output of the drift test
Make an HTTP POST request to
{modelop-center-url}/mlc-service/api/signal
(this might require a JWT token for authentication) using following request body:{ "name": "com.modelop.mlc.definitions.Signals_MODEL_DATA_DRIFT_TEST", "variables": { "MODEL_ID": { "value": "5acaa712-fa7f-45ca-8ae5-f0bdd7e8536d" }, "INPUT_FILE": { "value": { "assetId": "db6ce585-ae83-4814-94b4-f4ac1ecec54c", "name": "input.csv", "assetType": "EXTERNAL_FILE", "assetRole": "UNKNOWN", "repositoryInfo": { "repositoryType": "HDFS_REPOSITORY", "port": 0 }, "fileUrl": "hdfs:///hadoop/demo/titanic-spark/input.csv", "filename": "input.csv", "fileFormat": "CSV", "fileSize": 0 } }, "BASELINE_DATA_FILE": { "value": { "assetId": "db6ce585-ae83-4814-94b4-f4ac1ecec54c", "name": "baseline.csv", "assetType": "EXTERNAL_FILE", "assetRole": "UNKNOWN", "repositoryInfo": { "repositoryType": "HDFS_REPOSITORY", "port": 0 }, "fileUrl": "hdfs:///hadoop/demo/titanic-spark/baseline.csv", "filename": "baseline.csv", "fileFormat": "CSV", "fileSize": 0 } }, "OUTPUT_FILE": { "value": { "assetId": "c95c5338-36b9-421f-917b-ceabc8f7d821", "name": "output.json", "assetType": "EXTERNAL_FILE", "assetRole": "UNKNOWN", "repositoryInfo": { "repositoryType": "HDFS_REPOSITORY", "port": 0 }, "fileUrl": "hdfs:///hadoop/demo/titanic-spark/output.json", "filename": "output.json", "fileFormat": "JSON", "fileSize": 0 } } } }
Notes:
“value” of MODEL_ID is the ID of the snapshot created on step 3
“value” of INPUT_FILE can be any asset type - File, S3, SQL, HDFS. If the input was provided as part of step 3(c), “INPUT_FILE” can be removed from the request body
“value” of BASELINE_DATA_FILE can be any asset type - File, S3, SQL, HDFS. If the input was provided as part of step 2, “BASELINE_DATA_FILE” can be removed from the request body
“value” of OUTPUT_FILE can be any asset type - File, S3, SQL, HDFS. This can be removed from the request body. If the output is not provided as part of the request body, the MLC creates a new file that will be embedded to the job.
Concept-Drift Test
Prerequisites:
A concept-drift model taking two inputs (input data as first input and baseline/training data as the second input to the metrics function)
An input data file - this can be provided when adding the concept-drift model as the associated model or can be provided when triggering the MLC
A baseline/training data file - this can be provided as model assets to the base model for which the concept-drift test will run or can be provided when triggering the MLC
An output file - This is required only if the output is needed to be stored as external output - S3/SQL/HDFS asset. The output file can be provided when triggering the signal. (The process by default creates an embedded output file for the test)
Running the Concept-Drift test:
Create a snapshot of the concept-drift model (this will be used for association)
Add a baseline/training data file to the base model if available. (this can be provided later with the Rest call to trigger the process)
Create a snapshot of the base model
When creating a snapshot add an associated model using the snapshot created by step 1
Use the “Concept Drift Model” as the association role
Provide input data asset if available (this can be provided later with the Rest call to trigger the process)
(Optional) Provide the dmn (test data comparator) file that compares the output of the drift test
Make an HTTP POST request to
{modelop-center-url}/mlc-service/api/signal
(this might require a JWT token for authentication) using following request body:{ "name": "com.modelop.mlc.definitions.Signals_MODEL_CONCEPT_DRIFT_TEST", "variables": { "MODEL_ID": { "value": "5acaa712-fa7f-45ca-8ae5-f0bdd7e8536d" }, "INPUT_FILE": { "value": { "assetId": "db6ce585-ae83-4814-94b4-f4ac1ecec54c", "name": "input.csv", "assetType": "EXTERNAL_FILE", "assetRole": "UNKNOWN", "repositoryInfo": { "repositoryType": "HDFS_REPOSITORY", "port": 0 }, "fileUrl": "hdfs:///hadoop/demo/titanic-spark/input.csv", "filename": "input.csv", "fileFormat": "CSV", "fileSize": 0 } }, "BASELINE_DATA_FILE": { "value": { "assetId": "db6ce585-ae83-4814-94b4-f4ac1ecec54c", "name": "baseline.csv", "assetType": "EXTERNAL_FILE", "assetRole": "UNKNOWN", "repositoryInfo": { "repositoryType": "HDFS_REPOSITORY", "port": 0 }, "fileUrl": "hdfs:///hadoop/demo/titanic-spark/baseline.csv", "filename": "baseline.csv", "fileFormat": "CSV", "fileSize": 0 } }, "OUTPUT_FILE": { "value": { "assetId": "c95c5338-36b9-421f-917b-ceabc8f7d821", "name": "output.json", "assetType": "EXTERNAL_FILE", "assetRole": "UNKNOWN", "repositoryInfo": { "repositoryType": "HDFS_REPOSITORY", "port": 0 }, "fileUrl": "hdfs:///hadoop/demo/titanic-spark/output.json", "filename": "output.json", "fileFormat": "JSON", "fileSize": 0 } } } }
Notes:
“value” of MODEL_ID is the ID of the snapshot created on step 3
“value” of INPUT_FILE can be any asset type - File, S3, SQL, HDFS. If the input was provided as part of step 3(c), “INPUT_FILE” can be removed from the request body
“value” of BASELINE_DATA_FILE can be any asset type - File, S3, SQL, HDFS. If the input was provided as part of step 2, “BASELINE_DATA_FILE” can be removed from the request body
“value” of OUTPUT_FILE can be any asset type - File, S3, SQL, HDFS. This can be removed from the request body. If the output is not provided as part of the request body, the MLC creates a new file that will be embedded to the job.
Back Test
Prerequisites:
An input data file - this can be provided as model assets to the base model or can be provided when triggering the MLC
A baseline/training data file - this can be provided as model assets to the base model or can be provided when triggering the MLC
An output file - This is required only if the output is needed to be stored as external output - S3/SQL/HDFS asset. The output file can be provided when triggering the signal. (The process by default creates an embedded output file for the test)
Running the Concept-Drift test:
Add an input data file to the base model if available. (this can be provided later with the Rest call to trigger the process)
(Optional) Add the dmn (test data comparator) file that compares the output of the backtest as an asset to the base model
Create a snapshot of the base model
Make an HTTP POST request to
{modelop-center-url}/mlc-service/api/signal
(this might require a JWT token for authentication) using following request body:{ "name": "com.modelop.mlc.definitions.Signals_MODEL_BACK_TEST", "variables": { "MODEL_ID": { "value": "5acaa712-fa7f-45ca-8ae5-f0bdd7e8536d" }, "INPUT_FILE": { "value": { "assetId": "db6ce585-ae83-4814-94b4-f4ac1ecec54c", "name": "input.csv", "assetType": "EXTERNAL_FILE", "assetRole": "UNKNOWN", "repositoryInfo": { "repositoryType": "HDFS_REPOSITORY", "port": 0 }, "fileUrl": "hdfs:///hadoop/demo/titanic-spark/input.csv", "filename": "input.csv", "fileFormat": "CSV", "fileSize": 0 } } "OUTPUT_FILE": { "value": { "assetId": "c95c5338-36b9-421f-917b-ceabc8f7d821", "name": "output.json", "assetType": "EXTERNAL_FILE", "assetRole": "UNKNOWN", "repositoryInfo": { "repositoryType": "HDFS_REPOSITORY", "port": 0 }, "fileUrl": "hdfs:///hadoop/demo/titanic-spark/output.json", "filename": "output.json", "fileFormat": "JSON", "fileSize": 0 } } } }
Notes:
“value” of MODEL_ID is the ID of the snapshot created on step 3
“value” of INPUT_FILE can be any asset type - File, S3, SQL, HDFS. If the input was provided as part of step 1, “INPUT_FILE” can be removed from the request body
“value” of OUTPUT_FILE can be any asset type - File, S3, SQL, HDFS. This can be removed from the request body. If the output is not provided as part of the request body, the MLC creates a new file that will be embedded to the job.