LLM 0.32a0: Messages and Typed Streaming for LLMs
LLM 0.32a0 refactors inputs to message sequences and outputs to typed streaming parts, handling conversations, tools, and multimodal content backwards-compatibly without breaking existing prompt APIs.
Message Sequences Replace Prompt for Conversations
Build conversations by passing lists of llm.user() and llm.assistant() messages to model.prompt(messages=...), enabling you to preload prior exchanges without SQLite hacks. Old prompt="text" still works—it converts to a single user message internally.
Before:
conversation = model.conversation()
r1 = conversation.prompt("Capital of France?") # "Paris"
r2 = conversation.prompt("Germany?") # "Berlin"
This couldn't ingest external histories easily.
Now:
response = model.prompt([
llm.user("Capital of France?"),
llm.assistant("Paris"),
llm.user("Germany?")
])
print(response.text) # "Berlin"
Or chain with response.reply("Hungary?") to extend naturally. This mirrors OpenAI's chat completions API messages array, simplifying emulations and multi-turn flows across 1000+ models via plugins.
Typed Streaming Handles Mixed Response Parts
Iterate response.stream_events (sync) or astream_events (async) to process text, tool calls, reasoning, images, or audio as they arrive—crucial for models like Claude that interleave reasoning before tools.
Example with tool:
def describe_dog(name: str, bio: str) -> str:
return f"{name}: {bio}"
response = model.prompt(
"Invent 3 cool dogs, first talk about your motivations",
tools=[describe_dog]
)
for event in response.stream_events:
if event.type == "text":
print(event.chunk, end="", flush=True)
elif event.type == "tool_call_name":
print(f"\nTool call: {event.chunk}(", end="", flush=True)
elif event.type == "tool_call_args":
print(event.chunk, end="", flush=True)
Output shows motivations as text, then three describe_dog calls with JSON args like {"name": "Nova Jetpaw", "bio": "..."}. Post-stream, run response.execute_tool_calls() or response.reply("Tell me about the dogs") to loop tools back to the model.
CLI gains -R/--no-reasoning to suppress thinking tokens (to stderr, colored differently). Supports server-side tools like OpenAI code interpreter or Anthropic web search, plus emerging multimodal outputs.
Trade-off: More granular than old for chunk in response, but unlocks tool/reasoning parsing without custom plugins.
Serialize Responses for Custom Storage
Convert any response to JSON via response.to_dict() (a TypedDict), store anywhere, then reconstruct with Response.from_dict(serializable). Replaces rigid SQLite conversation persistence, letting you build pluggable backends.
Future: Graph-based SQLite logging for deduplicated chat histories (0.32 or 0.33). Alpha tests plugins like llm-anthropic for Claude Sonnet 4.6 streaming.