Tracing
GenAI Processors includes a powerful hierarchical tracing mechanism for monitoring and debugging processor pipelines. Traces capture a time-stamped log of processor executions, including input/output parts, nested processor calls, errors, and cancellations. This is invaluable for understanding complex data flows, diagnosing issues, analyzing timing, and inspecting multimodal data at each stage of a pipeline.
Overview
When tracing is enabled, every call to a Processor or PartProcessor can be
recorded. A trace logs:
- Input/Output: Each
ProcessorPartthat enters or leaves a processor. - Timing: Start and end times for each processor execution.
- Hierarchy: Nested calls are captured as sub-traces, showing how
processors call each other (e.g., in a
chainorparalleloperation). - Content: Multimodal content like text, images, and audio can be inspected. Metadata, function calls, tool responses, and code execution are also captured.
- Errors: Exceptions raised by a processor are logged in the trace, along with stack traces.
- Cancellations: If a processor's task is cancelled, this is marked in the trace.
How Tracing Works
The tracing mechanism is built into the Processor and PartProcessor base
classes and integrates with Python's asyncio and contextvars.
Trace Context
Tracing is activated when a processor is executed within an active trace.Trace
asynchronous context. The root Trace object manages trace collection for its
scope.
When a Processor is called, it checks for an active trace: - If tracing is
active, it creates a sub-trace representing its own execution and attaches
it to the parent trace. - If no trace is active, the processor runs without
tracing.
Certain internal or debugging processor modules (e.g., genai_processors.debug)
are excluded by default to reduce noise in traces. You can view or extend this
list via trace.EXCLUDED_TRACE_MODULES.
Event Logging
Within a trace, each input part consumed and output part produced by a processor
is logged as a TraceEvent with a timestamp. If a processor calls another
processor, this call is also logged as an event containing a nested sub-trace.
Enabling Tracing
Tracing is enabled by wrapping the processor pipeline execution in a Trace
context manager. You do not need to modify your processor implementations.
The library provides SyncFileTrace for file-based trace logging.
import asyncio
from collections.abc import AsyncIterable
from genai_processors import content_api
from genai_processors import processor
from genai_processors.dev import trace_file
@processor.processor_function
async def my_pipeline(
content: content_api.ProcessorStream,
) -> AsyncIterable[content_api.ProcessorPartTypes]:
async for part in content:
yield part.text.upper()
async def main():
# When my_pipeline() is called, its execution will be traced
# because it's inside the SyncFileTrace context.
async with trace_file.SyncFileTrace(
trace_dir='/tmp/traces', name='my_pipeline_trace'
):
result = await my_pipeline('my input').text()
print(result)
asyncio.run(main())
This will run my_pipeline, record its execution trace, and save it to the
/tmp/traces directory.
File-Based Tracing: SyncFileTrace
SyncFileTrace is the default backend for tracing, which saves traces to disk
when its context exits. For each traced execution, it generates two files in the
specified trace_dir:
.json: A JSON file containing all trace events, parts, and metadata. This file can be loaded for programmatic analysis usingtrace_file.SyncFileTrace.load(path)..html: An interactive HTML trace viewer that can be opened in a browser for visual inspection of the processor execution timeline, nested calls, and multimodal data.
Trace Viewer
The HTML trace viewer provides an interactive interface for exploring traces:
- A hierarchical view of nested processor calls on the left panel.
- A detailed, time-ordered log of inputs and outputs for the selected processor on the right panel.
- Inline rendering for text, images, and audio parts.
- Formatted display for function calls, function responses, executable code, and file data.
- Playback controls for audio streams.
- Metadata inspection for each part.
Configuration
SyncFileTrace can be configured with options like:
trace_dir: Directory to save trace files.name: A name for the trace, used in filenames and the viewer title.image_size: A(width, height)tuple to resize images to for saving space in traces, e.g.,(200, 200). Set toNoneto disable resizing.max_size_bytes: If set, trace part content will be omitted if the total trace size exceeds this limit, to prevent excessive memory usage or huge trace files. Metadata is still kept. This is handy for real-time agents that take video input and can therefore generate very large traces quickly.
Custom Trace Backends
To send traces to a different backend (e.g., a database, or a streaming
service), you can implement a custom trace class by inheriting from
trace.Trace and implementing its abstract methods for handling inputs,
outputs, sub-traces, errors, and finalization.
Guidelines
- Development: Tracing is invaluable during development and debugging.
Enable
SyncFileTraceto understand how data flows through your pipeline and to diagnose issues. - Production: Tracing adds some overhead due to data collection and serialization. For production environments, consider disabling tracing or implementing a sampling mechanism or a more lightweight trace backend if needed.
- Multimodal Data: The trace viewer is especially useful for pipelines that handle images and audio, allowing you to see or hear the data at each stage.
- Errors and Cancellations: If a processor raises an exception or is cancelled, the trace will record this state, which is useful for debugging failures in complex asynchronous pipelines.