Overview

To enable integration with existing Spark environments, ModelOp Center provides a Spark runtime micro service. This component is in charge of submitting Spark Jobs into a pre-defined Spark cluster, monitoring their statuses and updating them at model-manage accordingly.

Spark runtime service should be able to run outside the K8s fleet, likely to be running as an Edge Node.

It also supports auto enrollment at Model-Manage and Eureka and OAuth2 interaction.

Spark-Runtime-Service core components:

ModelOpJobMonitor for ModelBatchJobs:

Monitor jobs repository for CREATED ModelBatchJobs with SparkRuntime engines.
- Launch SparkSubmit jobs using the SparkRuntime engine data.
- Update Jobs status from CREATED to Waiting.
- Update Job appending SparkApplication Job ID.
Monitor job repository for WAITING & RUNNING ModelBatchJobs with SparkRuntime engine.
- Extract SparkApplicationJob ID and use it to query Job Status at SparkCluster.
- Update Job status and Job repository accordingly with SparkCluster Job updates.

PySparkPreprocessor:

Component in charge of translating a ModelBatchJob into a PySparkJobManifest.
Reads all the ExternalFileAssets intputs & outputs leaving them available for the PySparkJob as lists during runtime.
From ModelBatchJobs:
- Extract and write StoredModel primary source code as tmp file to be executed inside SparkSubmit.

SparkSubmitService:

Build SparkSubmit execution out of PySparkJobManifest
Read/Recover SparkJob output, from different sources.
Clean tmp file once SparkSubmit finished of failed.
Support Kerberos authentication.

ApplicationListener:

Auto enroll as Spark engine at model-manage.
Support secured OAuth2 to talk with Model-manage and Eureka.

Health Check (goal to leverage existing SpringCloud library actuator functionality):

Maintain keep alive/health status with Eureka.
If spark cluster experiences connectivity issues it should be able to remove it self from Model-Manage.

Integrate with Sagemaker

Overview

Spark-Runtime-Service core components: