PAPER-2026-001

Agent SDK Gemini Tools Integration

Documenting the integration of bash and file_read tools within the Agent SDK's Gemini provider, focusing on implementation, safety, and agentic loop patterns.

Technical Paper 15 min read Advanced

Abstract

LLMs generate plausible but often inaccurate content when they can't access the actual codebase. This paper documents the integration of bash and file_read tools into the Agent SDK's Gemini provider. We cover: how the tools work, the allowlist-based safety controls, and the agentic loop that enables iterative tool use. The result: papers that reference real file paths, actual line numbers, and verifiable code. Tradeoff: higher token usage. For technical documentation, the precision is worth the cost.

I. Introduction: The Unseen Codebase

"How can an AI agent generate research papers that are not only coherent but also factually grounded in a dynamic codebase?"

LLMs hallucinate. Without access to your actual code, they generate plausible but unverifiable content. They'll describe a feature in general terms but miss the exact file paths, function names, or configuration values that define the current implementation. For CREATE SOMETHING's technical papers, this is a problem: generic advice isn't useful; we need papers grounded in real code.

This paper documents GeminiToolsProvider, an Agent SDK extension that gives Gemini controlled access to the monorepo via bash and file_read tools. The model can search code with grep, read specific files, and use what it finds in its output. The result: papers that cite actual line numbers and file paths.

II. Problem & Context: Bridging the Epistemic Gap

The core problem addressed by the GeminiToolsProvider is the inherent limitation of LLMs when tasked with generating content that requires up-to-the-minute, specific details from a codebase. Without direct access, an LLM's knowledge is confined to its training data, which quickly becomes outdated in a fast-paced development environment.

The packages/agent-sdk/experiments/test-gemini-tools.py (line 5) explicitly states the expected outcome: "Baseline (no tools) - generic content" versus "Tools (bash, file_read) - codebase-grounded content." This highlights the need for tools to move beyond generic descriptions to verifiable facts.

With Tools: Grounded Research

  • Real file paths and line numbers
  • Actual metrics from the codebase
  • Specific code examples
  • Grounded philosophical claims

Source: test-gemini-tools.py:8-11

Without Tools: Generic Content

  • Vague references and assumptions
  • Estimated or absent metrics
  • Abstract code patterns
  • Theoretical or unverified claims

Source: test-gemini-tools.py:5

What We Did

  • Identified the core limitation of LLMs in accessing dynamic codebase information.
  • Prioritized the development of a tool-augmented provider for Gemini within the Agent SDK.
  • Defined clear objectives for "codebase-grounded content" to guide implementation.

The outcome: GeminiToolsProvider with bash and file_read tools. Papers become verifiable against the actual codebase.

III. Methodology: The GeminiToolsProvider Architecture

"How do we give Gemini shell and file access without breaking the monorepo?"

The GeminiToolsProvider is implemented in packages/agent-sdk/src/create_something_agents/providers/gemini_tools.py and serves as a specialized AgentProvider that extends Gemini's capabilities with custom tool definitions. Unlike a generic Gemini provider, this implementation directly injects bash and file_read as callable functions, allowing the model to interact with the monorepo.

The provider's _build_tools method (lines 100-109) constructs FunctionDeclaration objects for both tools, making them available to the Gemini model for function calling. This allows the LLM to dynamically decide when and how to use these tools based on the task at hand.

Tool Schema Definitions

# From gemini_tools.py (Lines 14-26)
BASH_TOOL_SCHEMA = {
    "name": "bash",
    "description": "Execute a bash command in the monorepo. Use for searching code (grep), listing files, or running simple commands. Do NOT use for destructive operations.",
    "parameters": {
        "type": "object",
        "properties": {
            "command": {
                "type": "string",
                "description": "The bash command to execute."
            }
        },
        "required": ["command"]
    }
}

# Lines 28-45
FILE_READ_TOOL_SCHEMA = {
    "name": "file_read",
    "description": "Read the contents of a file. Use to examine source code, configuration files, or documentation.",
    "parameters": {
        "type": "object",
        "properties": {
            "path": {
                "type": "string",
                "description": "Path to the file relative to monorepo root."
            },
            "start_line": { "type": "integer" },
            "end_line": { "type": "integer" }
        },
        "required": ["path"]
    }
}

What We Did

  • Defined explicit BASH_TOOL_SCHEMA and FILE_READ_TOOL_SCHEMA for Gemini's function calling interface.
  • Implemented _execute_bash and _execute_file_read methods within the provider.
  • Integrated these tools into the Gemini client via FunctionDeclaration objects.

The tools are integrated into Gemini's reasoning process—the model decides when to search and when to read files. The outcome: a provider that generates papers with real file paths and line numbers.

IV. Safety Controls: Guarding the Monorepo

"How are potentially destructive operations mitigated when granting an AI agent shell access?"

Shell access creates security risks. Unrestricted bash commands could delete files, modify config, or execute arbitrary code. The GeminiToolsProvider mitigates this with an allowlist for bash commands and path validation for file reads.

The _is_command_safe method (lines 99-111) is central to bash safety. It checks if a command starts with an allowed prefix from ALLOWED_BASH_PREFIXES and ensures it does not contain any BLOCKED_PATTERNS. This dual-layer approach prevents destructive commands while permitting safe inspection.

Safety MechanismDescriptionExample
Bash AllowlistOnly commands starting with specific prefixes are allowed.grep, find, ls, cat
Bash BlocklistCommands containing destructive patterns are forbidden.rm, mv, sudo, >
Path ValidationEnsures file paths do not escape the working directory.Resolved path must start with monorepo root
Output TruncationLimits output size to prevent context overflow.Bash: 10,000 chars; File: 15,000 chars

What We Did

  • Defined ALLOWED_BASH_PREFIXES to restrict shell access to safe, read-only operations.
  • Established BLOCKED_PATTERNS to forbid common destructive shell commands.
  • Implemented path resolution and validation in _execute_file_read to prevent directory traversal.
  • Introduced output truncation for both tools to manage context window usage.

The outcome: the agent can search and read, but can't delete, write, or execute arbitrary commands.

V. The Agentic Loop: Iterative Grounding

"How does the agent leverage these tools to iteratively refine its understanding and generate high-quality output?"

GeminiToolsProvider implements an agentic loop—a multi-turn pattern where the model calls tools, gets results, and uses those results to call more tools. Instead of a single prompt/response, the agent iterates: search for a pattern, read a promising file, search within that file, synthesize findings.

The execute method (lines 190-280) orchestrates this loop. It sends the initial task to Gemini, and if the model decides to call a bash or file_read tool, the provider intercepts this call, executes the tool, and then feeds the result back into the conversation history. This cycle continues for a maximum of max_tool_calls (default 20) iterations.

  1. Initial bash search (e.g., grep -r "pattern" packages/).
  2. file_read on promising files found in step 1.
  3. Further grep or cat commands based on file content.
  4. Synthesizing findings into the final paper.

"Tools recede into transparent use—the hammer disappears when hammering."

— CLAUDE.md:171

What We Did

  • Implemented a multi-turn execute loop to facilitate iterative tool use and reasoning.
  • Configured max_tool_calls to prevent infinite loops and manage execution time.
  • Enabled Gemini's thinking_config with a thinking_budget (default 8192) to support complex reasoning.

The outcome: papers cite real file paths and line numbers. The agent found them by searching.

VI. Cost & Quality Tradeoffs

With tools, papers include "Real file paths instead of generic references," "Actual metrics from the codebase," "Specific code examples," and "Grounded philosophical claims." The difference: generic vs. specific, theoretical vs. verifiable.

However, this enhanced capability comes with a tradeoff: increased operational cost. Each tool call, along with the model's "thinking" process, consumes tokens. The GeminiToolsProvider tracks total_input_tokens, total_output_tokens, and total_thinking_tokens (lines 196-198) to provide a clear cost breakdown. While a baseline model might generate a paper in a single, less expensive turn, a tool-augmented agent might engage in multiple tool calls and reasoning steps. For high-fidelity research papers, this cost is justified by the improvement in verifiability and accuracy.

VII. Limitations & Future Directions

What Doesn't Work Yet

The bash allowlist restricts complex scenarios—you can grep but not run arbitrary analysis scripts. Output truncation (10,000 chars for bash, 15,000 for file_read) loses data on large files or broad searches.

Future Work

Dynamic tool approval for edge cases, integration with ripgrep for context-aware search, and patterns for multi-repository access.

References

packages/agent-sdk/src/create_something_agents/providers/gemini_tools.py — GeminiToolsProvider implementation

packages/agent-sdk/experiments/test-gemini-tools.py — Test script and quality validation

CLAUDE.md — CREATE SOMETHING development philosophy