Harmony Format Powers gpt-oss Prompting Like Responses API

Roles Establish Instruction Hierarchy and Message Types

gpt-oss models process messages via five roles forming a strict hierarchy: system > developer > user > assistant > tool. This resolves conflicts by prioritizing higher roles. system sets reasoning effort, knowledge cutoff, and built-in tools. developer delivers core instructions (traditional system prompt) and function tools. user captures inputs. assistant outputs responses, tool calls, or reasoning, often tied to channels. tool feeds back results, using the tool name as the role.

"These roles also represent the information hierarchy that the model applies in case there are any instruction conflicts: system > developer > user > assistant > tool"

Role	Purpose
`system`	Reasoning effort, meta info like knowledge cutoff, built-in tools
`developer`	Instructions and function tools
`user`	Model input
`assistant`	Tool calls or messages, channel-specific
`tool`	Tool outputs

This setup ensures models follow developer intent over user queries, critical for reliable agentic flows.

Channels Separate User Output from Internal Reasoning

Assistant messages route to three channels: final for end-user responses, analysis for chain-of-thought reasoning (unsafe for users), and commentary for function tool calls or preambles. Built-in tools favor analysis; custom functions use commentary. Channels prevent leaking internal thoughts to users.

"Messages in the analysis channel do not adhere to the same safety standards as final messages do. Avoid showing these to end-users."

"Any function tool call will typically be triggered on the commentary channel while built-in tools will normally be triggered on the analysis channel."

Channels mimic Responses API separation, enabling safe streaming of final content while hiding analysis traces that boost reasoning but risk hallucinations or unsafe content.

Harmony Renderer Library Handles Tokenization and Parsing

The openai-harmony library (PyPI/crates.io) automates formatting messages into tokens using o200k_harmony encoding (tiktoken-compatible). Construct SystemContent, DeveloperContent with ToolDescriptions, and Conversation from Messages, then render for completion.

Key workflow:

Load encoding: encoding = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)
Build system: SystemContent.new().with_reasoning_effort(ReasoningEffort.HIGH).with_conversation_start_date("2025-06-28")
Add developer instructions/tools: DeveloperContent.new().with_instructions("Always respond in riddles").with_function_tools([ToolDescription.new("get_current_weather", params_schema)])
Assemble conversation with user/assistant/tool messages, assign channels/recipients/content types.
Render: tokens = encoding.render_conversation_for_completion(convo, Role.ASSISTANT)
Parse response: parsed_response = encoding.parse_messages_from_completion_tokens(new_tokens, Role.ASSISTANT)

For streaming, StreamableParser decodes tokens incrementally, exposing current_role, current_channel, last_content_delta, etc., ideal for real-time UIs handling JSON or unicode.

Example stream output tracks shifts: analysis reasoning → commentary tool call → final response.

"gpt-oss should not be used without using the harmony format, as it will not work correctly."

This library abstracts special tokens like <|type|>, ensuring compatibility without manual prompt engineering.

Custom Renderers Must Mimic Responses API with Special Tokens

Format emulates Responses API familiarity: conversation history → assistant completion. Include ReasoningEffort (LOW/MED/HIGH) in system for compute trade-offs. Conversation start date aids recency awareness.

Without the library, manually tokenize messages respecting hierarchy and channels—error-prone, hence the recommendation to use openai-harmony.

Production Implications: Safety, Streaming, and Model Limits

Harmony enforces safety by isolating analysis (weaker safeguards) from final. High reasoning effort trades latency for accuracy in complex tasks. For APIs/providers, inference handles formatting; direct gpt-oss needs explicit Harmony. Avoid raw gpt-oss sans format—degrades to incoherent outputs.

Integrates with function calling: JSON schemas in ToolDescription, results as Author(Role.TOOL, tool_name). Streams parse mid-generation for low-latency apps.

Key Takeaways

Always pair gpt-oss with Harmony format; skip it and models fail on structured tasks.
Prioritize system for reasoning effort (HIGH for agents) and date cutoffs to ground responses.
Route assistant outputs: final to users, analysis internally, commentary for tools.
Install openai-harmony via PyPI/crates.io—renders conversations to tokens, parses streams.
Define tools in developer role with JSON schemas; echo results as tool role in commentary.
Use StreamableParser for real-time decoding: track channels/deltas without full tokens.
Leverage role hierarchy to override user instructions reliably.
Test with o200k_harmony encoding in tiktoken for custom setups.
Hide analysis channel from users—lacks full safety filters.

Roles Establish Instruction Hierarchy and Message Types

Channels Separate User Output from Internal Reasoning

Harmony Renderer Library Handles Tokenization and Parsing

Custom Renderers Must Mimic Responses API with Special Tokens

Production Implications: Safety, Streaming, and Model Limits

Key Takeaways

More on Edge

Guarantee LLM Outputs Match Exact Taxonomies with Tries

Rebuild GPT-5.5 Prompts from Scratch: Minimal Wins Over Legacy Detail

KERNEL Framework Delivers 340% AI Accuracy Gains

7 Prompts to Stop AI Sycophancy