Skip to content

Built-in Processors

The genai_processors.core package contains a rich set of processors for building AI agents and pipelines. Beyond model interactions, it provides tools for handling I/O, data fetching, text manipulation, function calling, and more.

For model-specific processors like GenaiModel, see Supported Models.

Speech and Audio

Processors for handling voice input and output.

SpeechToText

Transcribes audio streams into text using Google Cloud Speech-to-Text, and generates speech events like StartOfSpeech and EndOfSpeech.

from genai_processors.core import speech_to_text

stt = speech_to_text.SpeechToText(project_id='your-gcp-project-id')

See: speech_to_text.py

TextToSpeech

Converts text streams into audible speech using Google Cloud Text-to-Speech.

from genai_processors.core import text_to_speech

tts = text_to_speech.TextToSpeech(project_id='your-gcp-project-id')

See: text_to_speech.py

Audio I/O

Use PyAudioIn to capture microphone input and PyAudioOut to play audio to speakers.

import pyaudio
from genai_processors.core import audio_io

pya = pyaudio.PyAudio()
mic_input = audio_io.PyAudioIn(pya)
speaker_output = audio_io.PyAudioOut(pya)

pipeline = mic_input + stt + model + tts + speaker_output

See: audio_io.py

RateLimitAudio

For streaming TTS, RateLimitAudio splits audio into small chunks and yields them at their natural playback speed, allowing for smoother playback and interruption.

from genai_processors.core import rate_limit_audio

rate_limiter = rate_limit_audio.RateLimitAudio(sample_rate=24000)
pipeline = model + tts + rate_limiter + speaker_output

See: rate_limit_audio.py

Video and Document

Processors for handling video streams and document formats like PDF.

VideoIn

Captures video frames from a camera or screen recording as a stream of images.

from genai_processors.core import video

camera = video.VideoIn(video_mode=video.VideoMode.CAMERA, substream_name='realtime')
screen = video.VideoIn(video_mode=video.VideoMode.SCREEN, substream_name='realtime')

See: video.py

PDFExtract

Extracts text and images from PDF files. For pages containing images, it renders the page as an image; otherwise, it extracts text.

from genai_processors.core import pdf

pdf_extractor = pdf.PDFExtract()

See: pdf.py

EventDetection

An advanced processor that uses a GenAI model to detect events in a stream of images (e.g., from a live video feed). It identifies state transitions (e.g., from "no object" to "object detected") and injects corresponding event notifications into the stream. This is useful for building agents that need to react to visual changes in real-time. For more details on realtime, see Realtime Processing.

from genai_processors.core import event_detection

detector = event_detection.EventDetection(...)

See: event_detection.py

Text, Templating, and Output

Processors for manipulating text, extracting information, and parsing model outputs.

Preamble / Suffix

Adds fixed content to the beginning (Preamble) or end (Suffix) of a stream. Useful for adding system prompts or instructions.

from genai_processors.core import preamble

add_prompt = preamble.Preamble("You are a helpful assistant.")
add_suffix = preamble.Suffix("Answer in one sentence.")

See: preamble.py

JinjaTemplate

Renders Jinja templates, allowing dynamic prompt generation with multimodal content.

from genai_processors.core import jinja_template

template = jinja_template.JinjaTemplate(
    template_str='Summary of {{ doc_name }}: {{ content }}',
    doc_name='Annual Report',
    content_varname='content'
)

See: jinja_template.py

StructuredOutputParser

If a model is prompted to return JSON, this processor parses the streamed JSON text and converts it into ProcessorsParts holding Python dataclass or Enum instances based on a provided schema.

from genai_processors.core import constrained_decoding

# and schema is a dataclass or enum
parser = constrained_decoding.StructuredOutputParser(schema=MyDataClass)
pipeline = model + parser

See: constrained_decoding.py

UrlExtractor & HtmlCleaner

UrlExtractor finds URLs in text and replaces them with FetchRequest parts. HtmlCleaner strips HTML tags to produce clean HTML or plain text.

from genai_processors.core import text

url_extractor = text.UrlExtractor()
html_cleaner = text.HtmlCleaner(cleaning_mode='plain')

See: text.py

MatchProcessor

Finds and extracts text matching a regex pattern from the stream. This is useful for intercepting and handling specific text patterns in model output, such as commands or structured data embedded in text before it is returned to the user. It can also be used to detect unsafe keywords.

from genai_processors.core import text

matcher = text.MatchProcessor(pattern=r'\[command:.*\]', substream_output='commands')

See: text.py

Data Fetching

Processors to fetch content from various sources like Google Drive, GitHub, or web URLs.

UrlFetch

Fetches content from URLs contained in FetchRequest parts.

from genai_processors.core import text
from genai_processors.core import web

url_fetcher = web.UrlFetch()
pipeline = text.UrlExtractor() + url_fetcher + text.HtmlCleaner()

See: web.py

Drive

Fetches content from Google Docs, Sheets, or Slides as PDF or CSV.

from genai_processors.core import drive

docs = drive.Docs()
sheets = drive.Sheets()
slides = drive.Slides()

See: drive.py

GitHub

Fetches file content from GitHub URLs.

from genai_processors.core import github

github_fetcher = github.GithubProcessor(api_key='your-github-api-key')

See: github.py

Filesystem

Reads local files matching a glob pattern.

from genai_processors.core import filesystem

file_loader = filesystem.GlobSource(pattern='**/*.txt', base_dir='./docs')

See: filesystem.py

Function Calling

The FunctionCalling processor automates tool use by intercepting function calls from a model, executing the corresponding Python functions, and feeding results back to the model.

from genai_processors.core import function_calling
from google.genai import types as genai_types

def get_weather(city: str) -> str:
    # ... implementation ...
    return f"Weather in {city} is sunny."

tools = [get_weather]
model_with_tools = genai_model.GenaiModel(
    ...,
    generate_content_config=genai_types.GenerateContentConfig(
        tools=tools,
        automatic_function_calling=genai_types.AutomaticFunctionCallingConfig(
            disable=True
        ),
    ),
)

agent = function_calling.FunctionCalling(model=model_with_tools, fns=tools)

See: Function Calling Concept and function_calling.py

Stream Manipulation

Window

Invokes a processor on a sliding window of content, useful for processing long streams or video.

from genai_processors.core import window

# Apply model to windows of 3 turns
windowed_processor = window.Window(
    model,
    compress_history=window.keep_last_n_turns(3)
)

See: window.py

Timestamp

Adds timestamp parts to a stream, typically after image frames in a video.

from genai_processors.core import timestamp

ts = timestamp.add_timestamps(with_ms=True)

See: timestamp.py