
spark-operator
A Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes, enabling declarative submission, scheduling, and monitoring of Spark jobs via native Kubernetes custom resources.
What is Spark Operator?
The Spark Operator image packages the Kubernetes operator developed by the Google Cloud Platform team (now maintained by the Kubeflow community) for running Apache Spark workloads natively on Kubernetes. Instead of submitting Spark jobs imperatively via spark-submit, the operator lets you define SparkApplication and ScheduledSparkApplication custom resources declaratively — and handles driver and executor pod lifecycle, retries, and status reporting automatically. It is the standard approach for data engineering teams running batch and streaming Spark pipelines on Kubernetes, and integrates naturally with tools like Argo Workflows, Airflow, and Helm-based platform stacks.
What is Echo's Spark Operator image?
Echo's Spark Operator image is a hardened build of the Spark Operator on Echo's hardened base. Echo images are designed to be a drop-in replacement: change the FROM line in your Dockerfile and CVEs go to zero without breaking your Spark workloads. Every image is tested across clouds, image use cases, and deployment targets. Echo ships every image in two variants: a distroless variant optimized for runtime use, and a default variant that includes essential build tools, package managers, and shells. For production data pipeline environments, the distroless variant reduces attack surface while keeping operator logic and Kubernetes API interactions fully intact; the default variant suits platform teams that need shell access for debugging or extended tooling.
What is the difference between Echo's Spark Operator image and the public Spark Operator image?
Public Spark Operator images ship on general-purpose bases that carry OS-level tooling your data pipelines don't use in production — but which your security team has to track as CVEs on every scanner run. Echo's build trims the base to what the operator actually needs to manage Spark application lifecycles on Kubernetes, dropping the CVE count to zero without changing CRD behavior or controller logic. As we explored in our post on container vulnerability management, data platform images are often deprioritized in vulnerability programs despite running persistently in production clusters. Echo commits to a 7-day SLA for critical and high severity vulnerabilities, and 10 days for medium, low, and unknown — with vulnerabilities triaged within 24 hours. Echo images are recognized by all major scanners and mirrored to all major registries, so they fit into existing pipelines without changing your registry, scanner, or runtime tooling.
FAQ
Can I replace my Spark Operator image with Echo's Spark Operator image?
Yes. Echo's Spark Operator image is a drop-in replacement. Update the image reference in your Helm values or operator manifests and your Spark workloads keep running — the CVEs disappear, the behavior doesn't. CRD handling, driver and executor pod management, and status reporting all continue to work as expected without any changes to your SparkApplication definitions.
Is Echo's Spark Operator image FIPS-validated?
Yes. Echo's FIPS-validated images use cryptographic modules with an active FIPS 140-3 CMVP certificate, making them fit for federal use — unlike FIPS-compliant images that haven't been validated. This matters for data engineering teams operating Spark pipelines inside FedRAMP boundaries where the full operator stack is in scope.
What is Echo's vulnerability management SLA on the Spark Operator image?
Echo commits to a 7-day SLA for critical and high severity vulnerabilities, and 10 days for medium, low, and unknown — with vulnerabilities triaged within 24 hours. Patches are mirrored automatically into your private registry so you're always running a clean version.
Is Echo's Spark Operator image distroless?
Echo ships every image in two variants: a distroless variant optimized for runtime use, and a default variant that includes essential build tools, package managers, and shells. For production operator deployments, the distroless variant is the leaner, more secure choice; for platform teams that wrap the operator with additional tooling or need shell access for debugging, the default variant is the right fit.
How does Echo achieve such a drastic CVE reduction in Spark Operator?
Echo's Spark Operator image is built from source with only the absolute essentials needed to run the operator workload on Kubernetes, which significantly shrinks the attack surface. Echo also patches aggressively over time, with backports available so you can stay on the operator version that works for your platform without forcing a functional change for the sake of security.
Will Echo's Spark Operator image help us achieve FedRAMP?
Yes. The hard parts of FedRAMP — managing vulnerabilities, applying fixes, and using FIPS-validated cryptography — are baked into Echo images, including STIG-hardened configuration and ConMon/POA&M-ready reporting. For data engineering teams running Spark pipelines under an ATO, Echo's hardened Spark Operator image keeps the operator layer in-boundary and compliant.
.avif)