Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

ModelOp Center seamlessly integrates with existing AWS Sagemaker environments, allowing enterprises to choose their Model Factory (Data Science workbench) of choice.

Table of Contents

Overview

To enable integration with existing Spark environments, ModelOp Center provides a Spark runtime micro service. This component is in charge of submitting Spark Jobs into a pre-defined Spark cluster, monitoring their statuses and updating them at model-manage accordingly.

Spark runtime service should be able to run outside the K8s fleet, likely to be running as an Edge Node.

It also supports auto enrollment at Model-Manage and Eureka and OAuth2 interaction.

Spark-Runtime-Service core components:

ModelOpJobMonitor for ModelBatchJobs:

  • Monitor jobs repository for CREATED ModelBatchJobs with SparkRuntime engines.

    • Launch SparkSubmit jobs using the SparkRuntime engine data.

    • Update Jobs status from CREATED to Waiting.

    • Update Job appending SparkApplication Job ID.

  • Monitor job repository for WAITING & RUNNING ModelBatchJobs with SparkRuntime engine.

    • Extract SparkApplicationJob ID and use it to query Job Status at SparkCluster.

    • Update Job status and Job repository accordingly with SparkCluster Job updates.

PySparkPreprocessor:

  • Component in charge of translating a ModelBatchJob into a PySparkJobManifest.

  • Reads all the ExternalFileAssets intputs & outputs leaving them available for the PySparkJob as lists during runtime.

  • From ModelBatchJobs:

    • Extract and write StoredModel primary source code as tmp file to be executed inside SparkSubmit.

SparkSubmitService:

  • Build SparkSubmit execution out of PySparkJobManifest

  • Read/Recover SparkJob output, from different sources.

  • Clean tmp file once SparkSubmit finished of failed.

  • Support Kerberos authentication.

ApplicationListener:

  • Auto enroll as Spark engine at model-manage.

  • Support secured OAuth2 to talk with Model-manage and Eureka.

Health Check (goal to leverage existing SpringCloud library actuator functionality):

  • Maintain keep alive/health status with Eureka.

  • If spark cluster experiences connectivity issues it should be able to remove it self from Model-Manage.

  • No labels