ollama

An open-source runtime for running large language models locally and in containers, providing a simple API and CLI for pulling, managing, and serving models like Llama, Mistral, and Gemma on CPU and GPU hardware.

ollama

What is Ollama?

The Ollama image packages the Ollama model runtime so you can run large language models inside containers without installing Ollama and its dependencies on the host. Ollama provides a Docker-style experience for LLMs — you pull models by name (ollama pull llama3), serve them via a local REST API compatible with the OpenAI API spec, and manage them through a straightforward CLI. It supports CPU and GPU inference, multi-model serving, and a growing library of open-weight models including Llama 3, Mistral, Gemma, Phi, and Qwen. The Ollama image is widely used for local development, self-hosted AI assistants, private inference endpoints, and as the model serving layer in containerized RAG and agentic pipelines.

What is Echo's Ollama image?

Echo's Ollama image is a hardened build of Ollama on Echo's hardened base. Echo images are designed to be a drop-in replacement: change the FROM line in your Dockerfile and CVEs go to zero without breaking your model serving setup. Every image is tested across clouds, image use cases, and deployment targets. Echo ships every image in two variants: a distroless variant optimized for runtime use, and a default variant that includes essential build tools, package managers, and shells. For production AI inference deployments, the distroless variant minimizes attack surface while keeping model loading, REST API serving, and GPU acceleration fully intact; the default variant suits teams that need shell access for model management, debugging, or integration scripting around the Ollama CLI.

What is the difference between Echo's Ollama image and the public Ollama image?

Public Ollama images ship on bases that include broad OS-level tooling convenient for local development but which contribute CVEs that accumulate on an internet-exposed inference endpoint serving sensitive model interactions. AI inference images are a particularly high-value target — they often run with GPU access, handle proprietary prompts and data, and are increasingly deployed in production environments where security posture matters. Echo's build retains everything Ollama needs for model loading, inference, and API serving while removing the packages that don't belong in a production model server. As we covered in our recent post on CVE-2026-7482: the critical Ollama memory vulnerability, unpatched Ollama images carry real, exploitable risk — not just theoretical CVE counts. Echo commits to a 7-day SLA for critical and high severity vulnerabilities, and 10 days for medium, low, and unknown — with vulnerabilities triaged within 24 hours. Echo images are recognized by all major scanners and mirrored to all major registries, so they fit into existing pipelines without changing your registry, scanner, or runtime tooling.

FAQ

Can I replace my Ollama image with Echo's Ollama image?

Yes. Echo's Ollama image is a drop-in replacement. Update the FROM line in your Dockerfile or the image reference in your deployment manifests and your models keep serving — the CVEs disappear, the behavior doesn't. Model pulling, the REST API, OpenAI-compatible endpoints, GPU acceleration, and CLI interactions all continue to work without any changes to your existing Ollama setup or model library.

Is Echo's Ollama image FIPS-validated?

Yes. Echo's FIPS-validated images use cryptographic modules with an active FIPS 140-3 CMVP certificate, making them fit for federal use — unlike FIPS-compliant images that haven't been validated. This matters for teams deploying private AI inference inside FedRAMP boundaries where the model serving layer and its handling of sensitive prompts and outputs must meet cryptographic requirements.

What is Echo's vulnerability management SLA on the Ollama image?

Echo commits to a 7-day SLA for critical and high severity vulnerabilities, and 10 days for medium, low, and unknown — with vulnerabilities triaged within 24 hours. Patches are mirrored automatically into your private registry so you're always running a clean version — especially important given that Ollama has been the subject of critical CVEs targeting its memory handling and API surface.

Is Echo's Ollama image distroless?

Echo ships every image in two variants: a distroless variant optimized for runtime use, and a default variant that includes essential build tools, package managers, and shells. For production inference deployments, the distroless variant is the leaner, more secure choice; for development or integration environments where shell access is needed for model management or Ollama CLI scripting, the default variant is the right fit.

How does Echo achieve such a drastic CVE reduction in Ollama?

Echo's Ollama image is built from source with only the absolute essentials needed to run the model serving workload, which significantly shrinks the attack surface. Echo also patches aggressively over time, with backports available so you can stay on the Ollama version that works for your inference stack without forcing a functional change for the sake of security.

Will Echo's Ollama image help us achieve FedRAMP?

Yes. The hard parts of FedRAMP — managing vulnerabilities, applying fixes, and using FIPS-validated cryptography — are baked into Echo images, including STIG-hardened configuration and ConMon/POA&M-ready reporting. For teams deploying private AI inference under an ATO, Echo's hardened Ollama image keeps the model serving layer in-boundary and compliant.

Interested in base images that start and stay clean?