OpenInference: Standard LLM Span Kinds & Attributes

Defines 10 span kinds (LLM, AGENT, TOOL, etc.) and 60+ reserved attributes for inputs, outputs, tokens, costs to standardize OpenTelemetry tracing of LLM apps, chains, retrievers, and agents.

Classify Operations with 10 Required Span Kinds

Every OpenInference span requires openinference.span.kind to categorize the operation, helping backends assemble traces correctly. Use these exact values:

  • LLM: Traces calls to models like OpenAI chat completions or Llama text generation.
  • EMBEDDING: Captures embedding generation, e.g., OpenAI ada for retrieval.
  • CHAIN: Marks entry points or links between steps, like passing retriever context to LLM.
  • RETRIEVER: Logs data fetches from vector stores or databases.
  • RERANKER: Tracks reranking inputs by relevance scores, returning top K via cross-encoders.
  • TOOL: Records external calls like calculators or weather APIs invoked by LLMs.
  • AGENT: Wraps LLM-guided tool reasoning blocks.
  • GUARDRAIL: Monitors jailbreak protection, modifying/rejecting unsafe LLM outputs.
  • EVALUATOR: Measures LLM output quality like relevance or correctness.
  • PROMPT: Tracks prompt template rendering with variables.

Set this attribute on all spans to enable visualization of execution graphs via graph.node.id, graph.node.name, and graph.node.parent_id.

Track Inputs, Outputs, and Documents Uniformly

Populate spans with these reserved attributes for consistency across SDKs:

  • Documents: document.content (string), document.id (string/int), document.metadata (JSON), document.score (float, e.g., 0.98).
  • Embeddings: embedding.model_name (e.g., "BERT-base"), embedding.text (input), embedding.vector (float list), embedding.embeddings (list of objects), embedding.invocation_parameters (JSON excluding input).
  • Inputs/Outputs: input.value (string), input.mime_type (e.g., "text/plain"), output.value, output.mime_type.
  • Messages: llm.input_messages and llm.output_messages as flattened lists, e.g., llm.input_messages.0.message.role="user", llm.input_messages.0.message.content="hello". Supports multimodal via message.contents.0.message_content.type="image", message_content.image.url.
  • Prompts/Completions (legacy): llm.prompts.0.prompt.text, llm.choices.0.completion.text.
  • Retriever/Reranker: retrieval.documents (list), reranker.query, reranker.top_k (int, e.g., 3), reranker.input_documents/output_documents.
  • Tools: llm.tools.0.tool.json_schema (full schema), tool.name, tool.description. For calls: message.tool_calls.0.tool_call.id="call_62136355", tool_call.function.name="get_weather", tool_call.function.arguments (JSON).
  • Sessions/Users: session.id, user.id (UUIDs).

Use metadata (JSON) for extras, tag.tags (string list) for categorization.

Monitor Tokens, Costs, and Model Details

Capture usage precisely for optimization:

  • Tokens: llm.token_count.prompt (int, e.g., 10), llm.token_count.completion (15), llm.token_count.total (20). Granular: prompt_details.cache_read (OpenAI cached_tokens, e.g., 5), prompt_details.cache_write (Anthropic misses), prompt_details.audio/completion_details.audio/reasoning.
  • Costs (USD floats): llm.cost.prompt (0.0021), llm.cost.completion (0.0045), llm.cost.total (0.0066). Details like prompt_details.cache_read (0.0003), completion_details.reasoning (0.0024).

Identify models: llm.model_name (e.g., "gpt-3.5-turbo"), llm.system (well-known: "openai", "anthropic", "cohere", etc.), llm.provider (e.g., "azure", "groq"). For embeddings, use embedding.model_name only—no llm.system/provider.

Exceptions: exception.type, exception.message, exception.stacktrace, exception.escaped (bool).

Flatten Nested Lists for OpenTelemetry

Convert lists/objects to flat keys with zero-based indexing: {base}.{index}.{nested.path}.

Examples:

  • Messages: llm.input_messages.0.message.role="user".
  • Multimodal: llm.input_messages.0.message.contents.1.message_content.image.url.
  • Tools: llm.tools.0.tool.json_schema="{...}", llm.output_messages.0.message.tool_calls.0.tool_call.function.arguments="{...}".

Code snippets: Python:

for i, obj in enumerate(messages):
    for key, value in obj.items():
        span.set_attribute(f"llm.input_messages.{i}.{key}", value)

JS/TS:

const messages = [...];
for (const [i, obj] of messages.entries()) {
    for (const [key, value] of Object.entries(obj)) {
        span.setAttribute(`llm.input_messages.${i}.${key}`, value);
    }
}

Flatten until simple types (str, int, float, bool, lists thereof).

Summarized by x-ai/grok-4.1-fast via openrouter

9405 input / 2261 output tokens in 14744ms

© 2026 Edge