What are the biggest security risks of AI tool use?

The biggest risks are prompt injection (attackers manipulate the LLM into misusing tools), excessive permissions (tools with broader access than needed), data exfiltration through tool outputs, and unvalidated inputs passed directly to APIs or databases. Each risk compounds when tools chain together in agent workflows.

How do you prevent prompt injection in tool-calling LLMs?

Use a layered defense: sanitize all user inputs before they reach the LLM, apply strict tool-call schemas that reject unexpected parameters, run a secondary classifier to detect injection attempts, and never pass raw LLM output to sensitive tools without validation. Defense-in-depth is essential because no single method is foolproof.

What is least-privilege access for AI tools?

Least-privilege means each AI tool receives only the minimum permissions required for its task. A search tool should have read-only database access, not write access. A file tool should be scoped to a specific directory. This limits the blast radius if the LLM is manipulated into misusing a tool.

Should AI tool calls be sandboxed?

Yes. Running AI tool calls in sandboxed environments like containers, VMs, or restricted process namespaces prevents a compromised tool from accessing the host system. Sandboxing is especially critical for code-execution tools where the LLM generates and runs arbitrary code.

How do you audit AI tool usage in production?

Log every tool invocation including the triggering prompt, tool name, input parameters, output, latency, and user context. Store logs immutably and set up alerts for anomalous patterns like unusual tool call frequency, unexpected parameter values, or access to sensitive resources outside normal hours.

MCP & AI InfrastructureDeep Dives

Securing AI Tool Use: A Practical Guide

Authentication, authorization, rate limiting, and input validation.

The Prompt Engineering Project February 25, 2025 12 min read

Quick Answer

AI tool security involves protecting systems where LLMs invoke external tools, APIs, or code execution. Key defenses include strict input validation, least-privilege permissions, sandboxed execution environments, output filtering, and audit logging. Without these safeguards, prompt injection attacks can escalate tool access into data exfiltration or unauthorized actions.

The moment you give an AI agent the ability to call tools, you have moved from a system that generates text to a system that takes actions. It can read databases, write files, send emails, execute code, and interact with external services. Every one of those capabilities is a potential attack surface. Every tool invocation is a trust decision. And the default security posture of most AI agent frameworks is approximately zero.

This is not a theoretical concern. Production AI agents today have access to customer databases, internal APIs, cloud infrastructure, and communication channels. A prompt injection that convinces an agent to call a tool with attacker-controlled parameters is not a hypothetical -- it is an incident report waiting to be written. Securing tool use requires the same defense-in-depth thinking you apply to any system that executes untrusted input. Here are the seven layers that matter.

Layer 1: Authentication

Authentication answers the question: who is making this request? In a tool-use system, the answer is never simply "the AI agent." The agent is acting on behalf of a user, a service account, or a scheduled job. The identity that matters is the upstream principal -- the human or system that initiated the conversation that led to the tool call.

Every tool invocation must carry the identity of the originating principal, not just the agent's service credentials. This means propagating authentication context through the entire call chain. When an agent calls a database tool, the database tool should know which user's session triggered the query, not just that "the agent service" is asking.

auth-middleware.ts

interface ToolContext {
  userId: string
  sessionId: string
  roles: string[]
  tokenExpiry: number
  sourceIp: string
}

function authMiddleware(
  handler: ToolHandler
): ToolHandler {
  return async (params, context: ToolContext) => {
    // Verify the session token has not expired
    if (Date.now() > context.tokenExpiry) {
      throw new AuthError('Session expired', 401)
    }

    // Verify the user identity against the auth provider
    const verified = await authProvider.verify(
      context.userId,
      context.sessionId
    )
    if (!verified) {
      throw new AuthError('Invalid session', 401)
    }

    // Attach verified identity to the tool execution
    return handler(params, {
      ...context,
      verified: true,
      verifiedAt: Date.now(),
    })
  }
}

Three authentication patterns work in practice. API keys are the simplest: each user or service has a unique key that is passed with every tool invocation. They are easy to implement but hard to rotate and impossible to scope. OAuth tokens are better for user-facing systems: they carry scopes, expire automatically, and can be revoked without changing credentials. Session-based authentication works when the agent operates within a web application context where a session cookie or JWT already establishes identity.

The critical rule is that agent credentials and user credentials are never the same thing. The agent may have its own service account for accessing tool infrastructure, but that service account should never be used to determine what the user is allowed to do. Conflating agent identity with user identity is the most common authentication mistake in AI systems.

The agent is not the principal. It is a proxy. Every tool call must carry the identity of the human or system that initiated the request.

Layer 2: Authorization

Authentication tells you who is asking. Authorization tells you what they are allowed to do. In a tool-use system, authorization operates at three levels: which tools can this user access, what parameters can they pass, and which resources can those parameters reference?

Tool-level permissions are the coarsest control. An admin user might have access to the database_query tool while a regular user does not. This is necessary but insufficient. A user with access to the database query tool should not automatically be able to query every table. Resource scoping limits what a tool can touch based on who is asking.

authorization.ts

interface ToolPermission {
  toolId: string
  allowedRoles: string[]
  resourceScopes: Record<string, string[]>
  parameterRestrictions: Record<string, any>
}

const permissions: ToolPermission[] = [
  {
    toolId: 'database_query',
    allowedRoles: ['admin', 'analyst'],
    resourceScopes: {
      admin: ['*'],
      analyst: ['public_analytics', 'user_metrics'],
    },
    parameterRestrictions: {
      analyst: {
        maxRows: 1000,
        allowedOperations: ['SELECT'],
        forbiddenTables: ['credentials', 'api_keys'],
      },
    },
  },
]

function authorize(
  toolId: string,
  context: ToolContext,
  params: any
): boolean {
  const permission = permissions.find(
    (p) => p.toolId === toolId
  )
  if (!permission) return false

  const hasRole = context.roles.some((r) =>
    permission.allowedRoles.includes(r)
  )
  if (!hasRole) return false

  // Check resource scoping per role
  for (const role of context.roles) {
    const scopes = permission.resourceScopes[role]
    if (scopes && !scopes.includes('*')) {
      if (!scopes.includes(params.table)) return false
    }
  }

  return true
}

Default-deny is not optional. If a tool does not have an explicit permission entry for the requesting user's role, the call must fail. Never fall through to "allow" as a default.

Layer 3: Rate Limiting

AI agents are capable of generating tool calls at machine speed. Without rate limiting, a confused or compromised agent can exhaust API quotas, overwhelm databases, or rack up unbounded costs in seconds. Rate limiting is not just about protecting external services -- it is about protecting your own infrastructure from your own agents.

Effective rate limiting for AI tool use requires two dimensions: per-tool limits and per-session limits. Per-tool limits prevent any single tool from being called too frequently, regardless of which agent or user is making the request. Per-session limits prevent any single conversation from consuming a disproportionate share of resources.

rate-limiter.ts

interface RateLimit {
  maxCalls: number
  windowMs: number
}

interface RateLimitConfig {
  perTool: Record<string, RateLimit>
  perSession: RateLimit
  perUser: RateLimit
  global: RateLimit
}

class TokenBucketLimiter {
  private buckets: Map<string, { tokens: number; lastRefill: number }>

  constructor(private config: RateLimitConfig) {
    this.buckets = new Map()
  }

  async checkLimit(
    toolId: string,
    sessionId: string,
    userId: string
  ): Promise<{ allowed: boolean; retryAfterMs?: number }> {
    const checks = [
      this.checkBucket(
        `tool:${toolId}`,
        this.config.perTool[toolId] ?? { maxCalls: 60, windowMs: 60_000 }
      ),
      this.checkBucket(
        `session:${sessionId}`,
        this.config.perSession
      ),
      this.checkBucket(
        `user:${userId}`,
        this.config.perUser
      ),
      this.checkBucket('global', this.config.global),
    ]

    for (const check of checks) {
      if (!check.allowed) return check
    }
    return { allowed: true }
  }

  private checkBucket(
    key: string,
    limit: RateLimit
  ): { allowed: boolean; retryAfterMs?: number } {
    const now = Date.now()
    const bucket = this.buckets.get(key) ?? {
      tokens: limit.maxCalls,
      lastRefill: now,
    }

    const elapsed = now - bucket.lastRefill
    const refill = Math.floor(
      (elapsed / limit.windowMs) * limit.maxCalls
    )
    bucket.tokens = Math.min(
      limit.maxCalls,
      bucket.tokens + refill
    )
    bucket.lastRefill = now

    if (bucket.tokens <= 0) {
      return {
        allowed: false,
        retryAfterMs: limit.windowMs / limit.maxCalls,
      }
    }

    bucket.tokens--
    this.buckets.set(key, bucket)
    return { allowed: true }
  }
}

A particularly dangerous scenario is the agent loop: the agent calls a tool, receives an error, and retries immediately in an infinite loop. Each retry consumes tokens for the tool call and tokens for the model's reasoning about why it should retry. Without per-session rate limiting, a single stuck conversation can generate hundreds of tool calls in minutes. Set a hard ceiling on tool calls per conversation and force the agent to stop when it hits that ceiling.

Layer 4: Input Validation

This is where most AI tool-use systems are weakest. The model generates parameters for a tool call, and the system passes those parameters directly to the tool without validation. This is the equivalent of passing user input to a SQL query without sanitization. The model is an untrusted input source. Its parameters must be validated with the same rigor you apply to any external input.

Schema validation is the first line of defense. Every tool should have a strict JSON Schema that defines the exact types, ranges, formats, and constraints for every parameter. If the model produces a parameter that does not match the schema, the call fails before it reaches the tool implementation.

input-validation.ts

import { z } from 'zod'

const DatabaseQuerySchema = z.object({
  query: z
    .string()
    .max(2000)
    .refine(
      (q) => !q.toLowerCase().includes('drop'),
      'DROP statements are not allowed'
    )
    .refine(
      (q) => !q.toLowerCase().includes('delete'),
      'DELETE statements are not allowed'
    ),
  table: z.enum([
    'users',
    'orders',
    'products',
    'analytics',
  ]),
  limit: z.number().int().min(1).max(1000).default(100),
  offset: z.number().int().min(0).default(0),
})

const FileWriteSchema = z.object({
  path: z
    .string()
    .refine(
      (p) => !p.includes('..'),
      'Path traversal is not allowed'
    )
    .refine(
      (p) => p.startsWith('/workspace/'),
      'Writes must be within /workspace/'
    ),
  content: z.string().max(100_000),
  encoding: z.enum(['utf-8', 'base64']).default('utf-8'),
})

function validateToolInput<T>(
  schema: z.ZodSchema<T>,
  input: unknown
): T {
  const result = schema.safeParse(input)
  if (!result.success) {
    throw new ValidationError(
      'Invalid tool parameters',
      result.error.issues
    )
  }
  return result.data
}

Beyond schema validation, certain tools require semantic validation. A file write tool should check that the path does not traverse outside its sandbox. A database query tool should check that the query does not contain destructive operations. An email tool should check that the recipient is within the allowed domain. These checks cannot be expressed in a JSON Schema alone -- they require custom validation logic that understands what the tool does and what the risks are.

The model is an untrusted input source. Its parameters must be validated with the same rigor you apply to any user-submitted form data.

Layer 5: Output Sanitization

Tool outputs flow back to the model as context for its next response. If a tool returns sensitive data -- API keys, passwords, personal information, internal system details -- that data becomes part of the model's context and may be included in the response to the user. Output sanitization ensures that tool results are filtered before they reach the model.

The most common failure here is database tools that return full rows including columns the user should never see. A customer lookup tool that returns the full customer record -- including internal notes, credit card tokens, and support flags -- is a data leak waiting to happen. The tool should return only the fields that the requesting user is authorized to see.

output-sanitizer.ts

interface SanitizationRule {
  toolId: string
  redactFields: string[]
  maskPatterns: { pattern: RegExp; replacement: string }[]
  maxOutputLength: number
}

const rules: SanitizationRule[] = [
  {
    toolId: 'customer_lookup',
    redactFields: [
      'ssn',
      'credit_card',
      'internal_notes',
      'password_hash',
    ],
    maskPatterns: [
      {
        pattern: /\b\d{3}-\d{2}-\d{4}\b/g,
        replacement: '***-**-****',
      },
      {
        pattern: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
        replacement: '[email redacted]',
      },
    ],
    maxOutputLength: 5000,
  },
]

function sanitizeOutput(
  toolId: string,
  output: any,
  userRoles: string[]
): any {
  const rule = rules.find((r) => r.toolId === toolId)
  if (!rule) return output

  let sanitized = structuredClone(output)

  // Remove redacted fields
  if (typeof sanitized === 'object' && sanitized !== null) {
    for (const field of rule.redactFields) {
      delete sanitized[field]
    }
  }

  // Apply pattern masking to string output
  let text = JSON.stringify(sanitized)
  for (const { pattern, replacement } of rule.maskPatterns) {
    text = text.replace(pattern, replacement)
  }

  // Enforce maximum output length
  if (text.length > rule.maxOutputLength) {
    text = text.slice(0, rule.maxOutputLength)
  }

  return JSON.parse(text)
}

Sanitization rules should be defined per-tool and per-role. An admin may see fields that a regular user cannot. The sanitizer must know who is asking, not just which tool is responding.

Layer 6: Audit Logging

Every tool invocation must be logged. This is not optional, and it is not sufficient to log only errors. You need a complete record of every tool call: who triggered it, which tool was called, what parameters were passed, what the tool returned, and how long it took. This log is your compliance trail, your debugging tool, and your anomaly detection input.

The audit log should be append-only, tamper-evident, and stored separately from the tool execution environment. If an attacker compromises the agent system, they should not be able to alter the audit log to cover their tracks. In regulated industries, this log may also need to satisfy specific retention and access requirements.

audit-logger.ts

interface AuditEntry {
  timestamp: string
  traceId: string
  userId: string
  sessionId: string
  toolId: string
  parameters: Record<string, unknown>
  result: 'success' | 'error' | 'denied'
  latencyMs: number
  errorMessage?: string
  denialReason?: string
}

class AuditLogger {
  constructor(
    private sink: AuditSink,
    private redactor: ParameterRedactor
  ) {}

  async log(entry: AuditEntry): Promise<void> {
    // Redact sensitive parameters before logging
    const safe = {
      ...entry,
      parameters: this.redactor.redact(
        entry.toolId,
        entry.parameters
      ),
    }

    await this.sink.append(safe)

    // Emit metrics for monitoring
    metrics.increment('tool.invocation', {
      tool: entry.toolId,
      result: entry.result,
    })

    if (entry.result === 'denied') {
      metrics.increment('tool.denied', {
        tool: entry.toolId,
        reason: entry.denialReason ?? 'unknown',
      })
    }
  }
}

A subtle but important detail: the audit log must redact sensitive parameters before writing them. If a user passes a password to an authentication tool, you do not want that password sitting in your audit log in plaintext. Apply the same redaction logic to log entries that you apply to tool outputs. The goal is a complete record of what happened without creating a new repository of sensitive data.

Layer 7: Sandboxing

The final layer is execution isolation. When an AI agent calls a tool that executes code, writes files, or interacts with system resources, that execution must happen in an environment where the blast radius of a failure or exploit is contained. A tool that executes user-provided code should not have access to the host filesystem. A tool that makes HTTP requests should not be able to reach internal services on the private network.

Sandboxing strategies range from lightweight to heavy. Process-level isolation uses separate OS processes with restricted permissions. Container-level isolation uses Docker or similar runtimes to create ephemeral environments with constrained filesystem, network, and resource access. VM-level isolation uses lightweight virtual machines like Firecracker for maximum separation at the cost of startup latency.

Filesystem isolation

Mount only the directories the tool needs, read-only where possible. Use tmpfs for scratch space that is automatically cleaned up.

Network isolation

Whitelist only the external endpoints the tool needs to reach. Block access to internal services, metadata endpoints, and the local network.

Resource limits

Set hard caps on CPU time, memory, and disk usage. A runaway tool should be killed by the OS, not by your application code.

Time limits

Every tool execution must have a timeout. If a tool has not completed within its timeout, kill the process and return an error to the agent.

Ephemeral environments

Create a fresh sandbox for each tool invocation. Never reuse execution environments across different users or sessions. State leakage between invocations is a security vulnerability.

Defense in depth means that no single layer is responsible for security. When one layer fails -- and eventually, one will -- the remaining layers contain the damage.

Putting It Together

These seven layers form a defense-in-depth strategy for AI tool use. No single layer is sufficient on its own. Authentication without authorization is useless. Authorization without input validation is bypassable. Input validation without output sanitization leaks data. Audit logging without sandboxing means you can see the damage but not prevent it.

The practical approach is to implement these as middleware that wraps every tool handler. Each tool call passes through the full stack: authenticate, authorize, rate limit, validate inputs, execute in a sandbox, sanitize outputs, and log everything. The tool implementation itself should be concerned only with its core functionality. Security is an infrastructure concern, not a tool concern.

Start with authentication, authorization, and input validation -- these three layers prevent the most common and most dangerous failures. Add rate limiting once you have agents in production and can observe their calling patterns. Add output sanitization when your tools touch sensitive data. Add sandboxing when your tools execute code or interact with system resources. Add audit logging from day one -- it costs almost nothing and saves everything when you need to investigate an incident.

Key Takeaways

AI agents with tool access are action-taking systems, not text generators. Every tool call is a trust decision that requires the same security rigor as any API endpoint.

The agent is a proxy, not a principal. Propagate the originating user identity through the entire tool call chain and authorize based on the human, not the agent.

Rate limit at multiple dimensions: per-tool, per-session, per-user, and global. Without limits, a confused agent can exhaust resources in seconds.

Validate every parameter the model generates with the same discipline you apply to user-submitted form data. The model is an untrusted input source.

Sanitize tool outputs before they re-enter the model context. Sensitive data in tool results becomes sensitive data in user-facing responses.

Log every tool invocation with full context: who, what, when, and with what parameters. The audit trail is your only debugging tool after an incident.

Sandbox tool execution environments. Contain the blast radius. No tool should have access to resources beyond what it strictly needs.

Frequently Asked Questions

Common questions about this topic

The AI Search Stack: How to Build Search That Actually Works AI Agent Workflow Patterns: Fan-Out, Pipeline, and Orchestration

MCP & AI Infrastructure