Roles Establish Instruction Hierarchy and Message Types

gpt-oss models process messages via five roles forming a strict hierarchy: system > developer > user > assistant > tool. This resolves conflicts by prioritizing higher roles. system sets reasoning effort, knowledge cutoff, and built-in tools. developer delivers core instructions (traditional system prompt) and function tools. user captures inputs. assistant outputs responses, tool calls, or reasoning, often tied to channels. tool feeds back results, using the tool name as the role.

"These roles also represent the information hierarchy that the model applies in case there are any instruction conflicts: system > developer > user > assistant > tool"

RolePurpose
systemReasoning effort, meta info like knowledge cutoff, built-in tools
developerInstructions and function tools
userModel input
assistantTool calls or messages, channel-specific
toolTool outputs

This setup ensures models follow developer intent over user queries, critical for reliable agentic flows.

Channels Separate User Output from Internal Reasoning

Assistant messages route to three channels: final for end-user responses, analysis for chain-of-thought reasoning (unsafe for users), and commentary for function tool calls or preambles. Built-in tools favor analysis; custom functions use commentary. Channels prevent leaking internal thoughts to users.

"Messages in the analysis channel do not adhere to the same safety standards as final messages do. Avoid showing these to end-users."

"Any function tool call will typically be triggered on the commentary channel while built-in tools will normally be triggered on the analysis channel."

Channels mimic Responses API separation, enabling safe streaming of final content while hiding analysis traces that boost reasoning but risk hallucinations or unsafe content.

Harmony Renderer Library Handles Tokenization and Parsing

The openai-harmony library (PyPI/crates.io) automates formatting messages into tokens using o200k_harmony encoding (tiktoken-compatible). Construct SystemContent, DeveloperContent with ToolDescriptions, and Conversation from Messages, then render for completion.

Key workflow:

  1. Load encoding: encoding = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)
  2. Build system: SystemContent.new().with_reasoning_effort(ReasoningEffort.HIGH).with_conversation_start_date("2025-06-28")
  3. Add developer instructions/tools: DeveloperContent.new().with_instructions("Always respond in riddles").with_function_tools([ToolDescription.new("get_current_weather", params_schema)])
  4. Assemble conversation with user/assistant/tool messages, assign channels/recipients/content types.
  5. Render: tokens = encoding.render_conversation_for_completion(convo, Role.ASSISTANT)
  6. Parse response: parsed_response = encoding.parse_messages_from_completion_tokens(new_tokens, Role.ASSISTANT)

For streaming, StreamableParser decodes tokens incrementally, exposing current_role, current_channel, last_content_delta, etc., ideal for real-time UIs handling JSON or unicode.

Example stream output tracks shifts: analysis reasoning → commentary tool call → final response.

"gpt-oss should not be used without using the harmony format, as it will not work correctly."

This library abstracts special tokens like <|type|>, ensuring compatibility without manual prompt engineering.

Custom Renderers Must Mimic Responses API with Special Tokens

For self-built inference (e.g., Ollama bypass), replicate Harmony using o200k_harmony encoding. Special tokens structure inputs: <|system|>, <|developer|>, etc., with channels via <|final|>, <|analysis|>, <|commentary|>. Tool calls specify <|recipient|functions.tool_name|>, <|constrain|>json>. Preambles in commentary precede multi-calls.

Format emulates Responses API familiarity: conversation history → assistant completion. Include ReasoningEffort (LOW/MED/HIGH) in system for compute trade-offs. Conversation start date aids recency awareness.

Without the library, manually tokenize messages respecting hierarchy and channels—error-prone, hence the recommendation to use openai-harmony.

Production Implications: Safety, Streaming, and Model Limits

Harmony enforces safety by isolating analysis (weaker safeguards) from final. High reasoning effort trades latency for accuracy in complex tasks. For APIs/providers, inference handles formatting; direct gpt-oss needs explicit Harmony. Avoid raw gpt-oss sans format—degrades to incoherent outputs.

Integrates with function calling: JSON schemas in ToolDescription, results as Author(Role.TOOL, tool_name). Streams parse mid-generation for low-latency apps.

Key Takeaways

  • Always pair gpt-oss with Harmony format; skip it and models fail on structured tasks.
  • Prioritize system for reasoning effort (HIGH for agents) and date cutoffs to ground responses.
  • Route assistant outputs: final to users, analysis internally, commentary for tools.
  • Install openai-harmony via PyPI/crates.io—renders conversations to tokens, parses streams.
  • Define tools in developer role with JSON schemas; echo results as tool role in commentary.
  • Use StreamableParser for real-time decoding: track channels/deltas without full tokens.
  • Leverage role hierarchy to override user instructions reliably.
  • Test with o200k_harmony encoding in tiktoken for custom setups.
  • Hide analysis channel from users—lacks full safety filters.