Overview
To enable integration with existing Spark environments, ModelOp Center provides a Spark runtime micro servicemicroservice. This component is in charge of submitting Spark Jobs jobs into a pre-defined Spark cluster, monitoring their statuses, and updating them at model-manage accordingly. Also, it supports auto-enrollment at Eureka and model-manage, as well as OAuth2 interaction.
Spark runtime service should be able to run outside the K8s fleet, likely to be running as an Edge Node.
It also supports auto enrollment at Model-Manage and Eureka and OAuth2 interaction.
Spark-Runtime-Service core components:
ModelOpJobMonitor for ModelBatchJobs:
Monitor jobs the job repository for CREATED ModelBatchJobs with SparkRuntime enginesengine.
Launch SparkSubmit jobs Job using the SparkRuntime engine data.
Update Jobs Job status from CREATED to Waiting WAITING.
Update Job by appending SparkApplication Job ID.
Monitor job repository for WAITING & RUNNING ModelBatchJobs with SparkRuntime engine.
Extract SparkApplicationJob SparkApplication Job ID and use it to query Job Status status at SparkClusterSpark cluster.
Update Job status and Job job repository accordingly with SparkCluster Spark cluster Job updates.
PySparkPreprocessor:
Component in charge of translating a ModelBatchJob into a
PySparkJobManifest
.Reads Read all the ExternalFileAssets intputs ExternalFileAsset inputs & outputs leaving them available for the PySparkJob as lists during runtime.
From ModelBatchJobs:
Extract and write StoredModel primary source code as tmp file to be executed inside SparkSubmit.
SparkSubmitService:
Build SparkSubmit execution out of
PySparkJobManifest
.Read/Recover SparkJob PySparkJob output , from different sources.
Clean tmp file once SparkSubmit finished of or failed.
Support Kerberos authentication.
ApplicationListener:
Auto-enroll as Spark engine at model-manage.
Support secured OAuth2 to talk with ModelEureka and model-manage and Eureka.
Health Check (goal to leverage existing SpringCloud library actuator functionality):
Maintain keep alive/health status with Eureka.
If spark cluster experiences connectivity issues, it should be able to remove it self from Model-Manageitself from Eureka and model-manage.