Overview
To enable integration with existing Spark environments, ModelOp Center provides a Spark runtime micro service. This component is in charge of submitting Spark Jobs into a pre-defined Spark cluster, monitoring their statuses and updating them at model-manage accordingly.
Spark runtime service should be able to run outside the K8s fleet, likely to be running as an Edge Node.
It also supports auto enrollment at Model-Manage and Eureka and OAuth2 interaction.
Spark-Runtime-Service core components:
ModelOpJobMonitor for ModelBatchJobs:
Monitor jobs repository for CREATED ModelBatchJobs with SparkRuntime engines.
Launch SparkSubmit jobs using the SparkRuntime engine data.
Update Jobs status from CREATED to Waiting.
Update Job appending SparkApplication Job ID.
Monitor job repository for WAITING & RUNNING ModelBatchJobs with SparkRuntime engine.
Extract SparkApplicationJob ID and use it to query Job Status at SparkCluster.
Update Job status and Job repository accordingly with SparkCluster Job updates.
PySparkPreprocessor:
Component in charge of translating a ModelBatchJob into a
PySparkJobManifest
.Reads all the ExternalFileAssets intputs & outputs leaving them available for the PySparkJob as lists during runtime.
From ModelBatchJobs:
Extract and write StoredModel primary source code as tmp file to be executed inside SparkSubmit.
SparkSubmitService:
Build SparkSubmit execution out of
PySparkJobManifest
Read/Recover SparkJob output, from different sources.
Clean tmp file once SparkSubmit finished of failed.
Support Kerberos authentication.
ApplicationListener:
Auto enroll as Spark engine at model-manage.
Support secured OAuth2 to talk with Model-manage and Eureka.
Health Check (goal to leverage existing SpringCloud library actuator functionality):
Maintain keep alive/health status with Eureka.
If spark cluster experiences connectivity issues it should be able to remove it self from Model-Manage.