Spark jobs submit orchestration overview
This article provides an overview on how MOC leverages spark-submit
to execute Spark
jobs.
Table of Contents
Â
Design principles
To ensure efficient and robust execution of Spark jobs, the spark-submit process within ModelOp Center has been meticulously designed and architected. The following principles and features have been incorporated to optimize the integration and operation of Spark jobs in both local and cluster deploy modes:
Â
Code Implementation: Whenever possible, implement Python and/or PySpark components using Python. This approach ensures easier testing and debugging in both local and cluster deploy modes.
Job Monitoring: The spark-runtime-service monitors jobs in the WAITING and RUNNING states until they reach COMPLETE, ERROR, or CANCELLED states. Any detected changes in job status will trigger updates in the model manager.
Job Submission Process: Before submitting a job to the Spark cluster, the ModelOpPySparkDriver extracts
inputData
,outputData
, andstoredModel.modelAssets
from the MODEL_BATCH_JOB, MODEL_BATCH_TEST_JOB, or MODEL_BATCH_TRAINING_JOB and passes them as parameters to the function to be executed. These external file assets are provided as lists of assets:input_data
– [{"filename": "test.csv", "assetType": "EXTERNAL_FILE", "fileUrl": "hdfs:///hadoop/test.csv"}]output_data
– [{"filename": "output.csv", "assetType": "EXTERNAL_FILE", "fileUrl": "hdfs:///hadoop/output.csv"}]
Logging: The YarnMonitorService component listens for and stores the stdout and stderr outputs produced by the running job on the Spark cluster. The storage of stdout and stderr in jobMessages is configurable.
Architecture Overview
The following diagram represents a high level overview on how core components interact with each other, when a Spark job is submitted to the cluster.
Pre spark submit process flow
The following diagram illustrates the transformation process from a ModelOp Center Batch job into a SparkLauncher:
Monitoring Spark jobs
ModelOp Center has a component called ModelOpJobMonitor
in charge of:
Submitting jobs to the
Spark
clusterMonitoring job statuses
In order to successfully fulfill these responsibilities, it relies on three key services:
SparkLauncherService
– component in charge of translating jobs intoSparkLauncher
objectsYarnMonitorService
– component in charge of fetching job statuses and output produced from theSpark
clusterKerberizedYarnMonitorService
– component that is an instance ofYarnMonitorService
and in charge of authenticating the principal withKerberos
before delegating control toYarnMonitorService
Â
Supported Spark jobs types
Scoring jobs - Scoring job details here
Metrics jobs - these jobs produce a
ModelTestResult
ModelOpPySparkDriver.py
ModelOp Center utilizes a predefined PySpark class called ModelOpPySparkDriver.py
to manage execution and job orchestration for Spark jobs. This class abstracts the complexity of these processes by taking ModelOp Center Batch Jobs as input and translating them into the necessary format for job execution.
Â
Â