LLM 0.32a0: Messages and Typed Streaming for LLMs

LLM 0.32a0 refactors inputs to message sequences and outputs to typed streaming parts, handling conversations, tools, and multimodal content backwards-compatibly without breaking existing prompt APIs.

Message Sequences Replace Prompt for Conversations

Build conversations by passing lists of llm.user() and llm.assistant() messages to model.prompt(messages=...), enabling you to preload prior exchanges without SQLite hacks. Old prompt="text" still works—it converts to a single user message internally.

Before:

conversation = model.conversation()
r1 = conversation.prompt("Capital of France?")  # "Paris"
r2 = conversation.prompt("Germany?")  # "Berlin"

This couldn't ingest external histories easily.

Now:

response = model.prompt([
    llm.user("Capital of France?"),
    llm.assistant("Paris"),
    llm.user("Germany?")
])
print(response.text)  # "Berlin"

Or chain with response.reply("Hungary?") to extend naturally. This mirrors OpenAI's chat completions API messages array, simplifying emulations and multi-turn flows across 1000+ models via plugins.

Typed Streaming Handles Mixed Response Parts

Iterate response.stream_events (sync) or astream_events (async) to process text, tool calls, reasoning, images, or audio as they arrive—crucial for models like Claude that interleave reasoning before tools.

Example with tool:

def describe_dog(name: str, bio: str) -> str:
    return f"{name}: {bio}"

response = model.prompt(
    "Invent 3 cool dogs, first talk about your motivations",
    tools=[describe_dog]
)
for event in response.stream_events:
    if event.type == "text":
        print(event.chunk, end="", flush=True)
    elif event.type == "tool_call_name":
        print(f"\nTool call: {event.chunk}(", end="", flush=True)
    elif event.type == "tool_call_args":
        print(event.chunk, end="", flush=True)

Output shows motivations as text, then three describe_dog calls with JSON args like {"name": "Nova Jetpaw", "bio": "..."}. Post-stream, run response.execute_tool_calls() or response.reply("Tell me about the dogs") to loop tools back to the model.

CLI gains -R/--no-reasoning to suppress thinking tokens (to stderr, colored differently). Supports server-side tools like OpenAI code interpreter or Anthropic web search, plus emerging multimodal outputs.

Trade-off: More granular than old for chunk in response, but unlocks tool/reasoning parsing without custom plugins.

Serialize Responses for Custom Storage

Convert any response to JSON via response.to_dict() (a TypedDict), store anywhere, then reconstruct with Response.from_dict(serializable). Replaces rigid SQLite conversation persistence, letting you build pluggable backends.

Future: Graph-based SQLite logging for deduplicated chat histories (0.32 or 0.33). Alpha tests plugins like llm-anthropic for Claude Sonnet 4.6 streaming.

Summarized by x-ai/grok-4.1-fast via openrouter

6641 input / 1874 output tokens in 19176ms

© 2026 Edge