Step 1 — Hello to prod¶

"Twelve lines that hide a hundred-and-fifty."

The Quickstart shipped a bare-bones agent. In this step you graduate it to a production-ready version — same length, but with the four knobs that separate a demo from a real service: distributed tracing, a cross-model fallback, the function-calling loop, and a streaming endpoint that survives network hiccups.

What you build¶

A Support agent that answers POST /chat over Server-Sent Events, falls back to a cheaper model on provider failure, looks up real order state via a tool, and emits one OpenTelemetry span per call.

The code¶

Open src/acme_support/agents/support.py (replace acme_support with your package name) and replace its contents with:

from typing import Annotated

from pydantic import BaseModel

from ajolopy import Agent, Stream, Tool


class ChatRequest(BaseModel):
    """Payload accepted by the /chat endpoint."""

    message: str


@Agent(
    model="claude-opus-4-7",
    system="You are Acme Support. Be concise, friendly, and accurate.",
    fallback="claude-haiku-4-5",
)
class Support:
    """The on-call assistant."""

    @Tool
    async def lookup_order(self, order_id: str) -> dict[str, str]:
        """Look up an order's current state by id."""
        return {"order_id": order_id, "status": "in_transit", "eta": "tomorrow"}

    @Stream("/chat")
    async def respond(self, body: Annotated[ChatRequest, Body()]):
        """Stream a reply for the user's message over SSE."""
        async for chunk in self.stream(body.message):
            yield chunk

You also need to add the Body import — it sits on ajolopy.http:

from ajolopy.http import Body

What every kwarg buys you¶

A handful of real lines (without imports). Each token has a job:

Token	What it does	What it replaces
`@Agent(model="claude-opus-4-7")`	Picks the provider (Anthropic) by prefix, validates the model, wires the SDK at boot.	~5 lines of SDK init + provider selection logic.
`system="You are Acme Support..."`	Sets the system prompt once. Static strings unlock prompt caching when you opt in with `cache=`.	A `messages=[{"role": "system", ...}]` dance on every call.
OpenTelemetry always-on	Emits one span per `run` / `stream` with `gen_ai.*` attributes and `gen_ai.cost_usd`. No kwarg — install `ajolopy[otel]` and point standard OTel env vars at your backend.	~20 lines: tracer setup, manual span boundaries, token/cost accounting wrappers.
`fallback="claude-haiku-4-5"`	On retriable provider failure, transparently retries on the named model.	~40 lines: retry policy, alternate client, error classification.
`@Tool` + Python type hints	Synthesises the JSON Schema, registers the tool, runs the function-calling loop.	~40 lines: hand-written schema, `tool_use` reentry, `tool_result` plumbing.
`@Stream("/chat")`	Mounts the method as an SSE endpoint with heartbeats and disconnect cancellation.	~30 lines: Starlette streaming response, keepalives, cancel-on-disconnect.
`Annotated[ChatRequest, Body()]`	Validates the request body against a Pydantic model before invoking your handler.	Manual `await request.json()` plus a hand-rolled validator.

Twelve lines + one dependency replaces roughly 150 lines + four dependencies in the wild today.

Why a real-typed body, not message: str?

The Quickstart used message: str as a teaching convenience. In a real service you almost always want a Pydantic model on the wire — it gives you input validation, OpenAPI schema, and a stable shape clients can rely on. Annotated[..., Body()] is the @Stream pattern documented in @Stream reference.

Run it¶

The ajolopy dev server from the Quickstart still works — just relaunch it:

ajolopy dev

In a second terminal hit the endpoint:

curl -N -X POST http://127.0.0.1:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Where is order 4392? Use the tool."}'

You should see the response stream back token by token with a function_call → tool_result → final_text round trip in your dev-server logs. The agent calls lookup_order("4392"), gets your stub answer, and folds it into the streamed reply.

See the production behaviour at work¶

The two production knobs you added are easiest to verify by inducing the condition they protect against.

Env validation¶

The framework fails fast when a required variable is missing. Temporarily remove your API key and restart:

unset ANTHROPIC_API_KEY
ajolopy dev

ajolopy dev refuses to boot and points at the missing variable. Put the key back, the server starts. That is knob #1 from the seven production primitives: no more 2 a.m. discovery of an unset secret in staging.

Fallback¶

To see the fallback in action, point model= at a temporarily-misnamed Anthropic model and watch the span emit a gen_ai.fallback.used attribute with the alternative model on retry. (Production fallbacks fire on retriable errors — 5xx, rate limit, timeout — not on a missing model, but the OTel attribute is the same.)

What just happened¶

You have a Support agent that:

Validates its provider env var before accepting traffic.
Streams responses over SSE with heartbeats and disconnect cancellation.
Runs a tool-calling loop without any JSON Schema you had to write.
Falls back to claude-haiku-4-5 automatically if Opus returns a retriable error.
Emits one OpenTelemetry span per request — ready to land in Langfuse / Sentry / Grafana / Honeycomb / Datadog the moment you wire an exporter.

That is production from day one. Twelve lines, one dependency. No manual tool-calling loop, no SSE plumbing, no retry policy, no env-var debugging at deploy time.

What's next¶

In Step 2 — Evals you keep this same Support class and add the safety net that catches the moment a prompt change, a model upgrade, or a tool tweak silently regresses your quality.