Overview
To enable integration with existing Spark environments, ModelOp Center provides a Spark runtime microservice. This component is in charge of submitting Spark jobs to a pre-defined Spark cluster, monitoring their statuses, and updating them at model-manage accordingly. Also, it supports auto-enrollment with Eureka and model-manage, as well as OAuth2 secured interactions.
The Spark runtime should be able to run outside the K8s fleet, likely but not exclusively in an edge node.
Consult the following page for information on configuring the spark runtime service via helm: Configuring the Spark Runtime via Helm
Pre-requisites:
The node hosting the spark-runtime-service needs to meet the next criteria:
Ensure Apache Spark is installed on the host machine.
Current validated spark and hadoop versions:
Spark 2.4
Hadoop 2.6
ENV variables are set
SPARK_HOME
HADOOP_HOME
JAVA_HOME
HADOOP_CONF_DIR
Hadoop cluster configuration files (e.g):
hdfs-site.xml
core-site.xml
mapred-site.xml
yarn-site.xml
Ensure host machine can communicate with Spark cluster. (e.g)
master-node
yarn
nodeManager:
remote-app-log-dir
remote-app-log-dir-suffix
resourceManager:
hostname
address
hdfs
host
Ensure host machine can communicate with ModelOp Center and ModelOp Center Eureka (Registry)
Security
Kerberos:
krb5.conf
Principal
keytab
jaas.conf ( optional )
jaas-conf-key ( optional )
Service | Port |
---|---|
Spark |
|
Yarn |
|
HDFS |
|
ModelOp Center |
|
Kerberos glossary:
krb5.conf - Tells host machine how to talk to Kerberos. Tells host machine where to find the kerberos server and what rules to follow.
keytab - Secret key for host machine. Key used by the host machine to prove its allowed to execute specific actions in the Kerberos environemnt.
jaas.conf - file used by host machine that tells it how to interact with other applications or programs (such as kerberos) - It helps host machine to know location of the keytab.
Core components of the spark-runtime-service:
ModelOpJobMonitor
Monitor job repository for
MODEL_BATCH_JOB
,MODEL_BATCH_TEST_JOB
andMODEL_BATCH_TRAINING_JOB
inCREATED
state with aSPARK_RUNTIME
as the runtime typeUpdate the job status from
CREATED
toWAITING
Submit the job for execution to the Spark cluster
Update the job by appending the Spark application id generated by the Spark cluster
Monitor the job repository for
MODEL_BATCH_JOB
,MODEL_BATCH_TEST_JOB
andMODEL_BATCH_TRAINING_JOB
inWAITING
orRUNNING
state with aSPARK_RUNTIME
as the runtime typeUse the Spark application id to query the status of the job while still running on the Spark cluster
Update the job status based on the final Spark application status
Update the job with the logs generated by the Spark cluster
Update the job by storing the output data, if the output data contains embedded asset(s)
Clean the job’s temporary working directory
PySparkPreprocessorService
Translate the
MODEL_BATCH_JOB
/MODEL_BATCH_TEST_JOB
/MODEL_BATCH_TRAINING_JOB
into aPySparkJobManifest
Create temporary files for the
ModelOpPySparkDriver
, primary source code, metadata, attachments and non primary source codeThe
ModelOpPySparkDriver
file is the main driver of the Spark jobThe primary source code file is what will be executed in the Spark cluster
Create temporary HDFS file(s), if input data contains embedded asset(s)
SparkLauncherService
Build a
SparkLauncher
from thePySparkJobManifest
for execution
ModelManageRegistrationService
Auto enroll the Spark runtime as a
SPARK_RUNTIME
with model-manage
LoadRuntimesListener
Maintain an alive/healthy status with Eureka
Re-register again, if spark-runtime-service is lost for some reason
KerberizedYarnMonitorService
Authenticate the principal with
Kerberos
before attempting to use theYarnClient