OpenInference: Standard LLM Span Kinds & Attributes

Classify Operations with 10 Required Span Kinds

Every OpenInference span requires openinference.span.kind to categorize the operation, helping backends assemble traces correctly. Use these exact values:

LLM: Traces calls to models like OpenAI chat completions or Llama text generation.
EMBEDDING: Captures embedding generation, e.g., OpenAI ada for retrieval.
CHAIN: Marks entry points or links between steps, like passing retriever context to LLM.
RETRIEVER: Logs data fetches from vector stores or databases.
RERANKER: Tracks reranking inputs by relevance scores, returning top K via cross-encoders.
TOOL: Records external calls like calculators or weather APIs invoked by LLMs.
AGENT: Wraps LLM-guided tool reasoning blocks.
GUARDRAIL: Monitors jailbreak protection, modifying/rejecting unsafe LLM outputs.
EVALUATOR: Measures LLM output quality like relevance or correctness.
PROMPT: Tracks prompt template rendering with variables.

Set this attribute on all spans to enable visualization of execution graphs via graph.node.id, graph.node.name, and graph.node.parent_id.

Track Inputs, Outputs, and Documents Uniformly

Populate spans with these reserved attributes for consistency across SDKs:

Documents: document.content (string), document.id (string/int), document.metadata (JSON), document.score (float, e.g., 0.98).
Embeddings: embedding.model_name (e.g., "BERT-base"), embedding.text (input), embedding.vector (float list), embedding.embeddings (list of objects), embedding.invocation_parameters (JSON excluding input).
Inputs/Outputs: input.value (string), input.mime_type (e.g., "text/plain"), output.value, output.mime_type.
Messages: llm.input_messages and llm.output_messages as flattened lists, e.g., llm.input_messages.0.message.role="user", llm.input_messages.0.message.content="hello". Supports multimodal via message.contents.0.message_content.type="image", message_content.image.url.
Prompts/Completions (legacy): llm.prompts.0.prompt.text, llm.choices.0.completion.text.
Retriever/Reranker: retrieval.documents (list), reranker.query, reranker.top_k (int, e.g., 3), reranker.input_documents/output_documents.
Tools: llm.tools.0.tool.json_schema (full schema), tool.name, tool.description. For calls: message.tool_calls.0.tool_call.id="call_62136355", tool_call.function.name="get_weather", tool_call.function.arguments (JSON).
Sessions/Users: session.id, user.id (UUIDs).

Use metadata (JSON) for extras, tag.tags (string list) for categorization.

Monitor Tokens, Costs, and Model Details

Capture usage precisely for optimization:

Tokens: llm.token_count.prompt (int, e.g., 10), llm.token_count.completion (15), llm.token_count.total (20). Granular: prompt_details.cache_read (OpenAI cached_tokens, e.g., 5), prompt_details.cache_write (Anthropic misses), prompt_details.audio/completion_details.audio/reasoning.
Costs (USD floats): llm.cost.prompt (0.0021), llm.cost.completion (0.0045), llm.cost.total (0.0066). Details like prompt_details.cache_read (0.0003), completion_details.reasoning (0.0024).

Identify models: llm.model_name (e.g., "gpt-3.5-turbo"), llm.system (well-known: "openai", "anthropic", "cohere", etc.), llm.provider (e.g., "azure", "groq"). For embeddings, use embedding.model_name only—no llm.system/provider.

Exceptions: exception.type, exception.message, exception.stacktrace, exception.escaped (bool).

Flatten Nested Lists for OpenTelemetry

Convert lists/objects to flat keys with zero-based indexing: {base}.{index}.{nested.path}.

Examples:

Messages: llm.input_messages.0.message.role="user".
Multimodal: llm.input_messages.0.message.contents.1.message_content.image.url.
Tools: llm.tools.0.tool.json_schema="{...}", llm.output_messages.0.message.tool_calls.0.tool_call.function.arguments="{...}".

Code snippets: Python:

for i, obj in enumerate(messages):
    for key, value in obj.items():
        span.set_attribute(f"llm.input_messages.{i}.{key}", value)

JS/TS:

const messages = [...];
for (const [i, obj] of messages.entries()) {
    for (const [key, value] of Object.entries(obj)) {
        span.setAttribute(`llm.input_messages.${i}.${key}`, value);
    }
}

Flatten until simple types (str, int, float, bool, lists thereof).

Classify Operations with 10 Required Span Kinds

Track Inputs, Outputs, and Documents Uniformly

Monitor Tokens, Costs, and Model Details

Flatten Nested Lists for OpenTelemetry

More on Edge

TokenSpeed Beats TensorRT-LLM 9-11% on Agentic Coding Inference

Open Source AI: Innovation Engine or Security Risk?

Hermes Agent Persists Learning Across Sessions

Kimi K2.6: Open MoE Model Tops Agentic Coding Benchmarks