MCP & AI InfrastructureDeep Dives

Building an MCP Server: Architecture Decisions

A technical walkthrough from zero to production.

The Prompt Engineering Project March 24, 2025 14 min read

Quick Answer

To build an MCP server, you define tools with JSON Schema parameters, implement handler functions for each tool, set up a transport layer using stdio or HTTP with server-sent events, add authentication, and deploy as a standalone service. The MCP SDK handles protocol negotiation and message framing. A basic server with one tool can be built in under an hour.

The Model Context Protocol is an open standard for connecting AI models to external tools and data sources. It defines how a model discovers available tools, understands their parameters, invokes them, and processes their responses. Building an MCP server means building the bridge between what a model can reason about and what it can actually do. This article walks through the architecture decisions we made building our production MCP server, from language choice to deployment.

This is not a tutorial that ends at "hello world." We will cover transport layers, tool registration, handler implementation, error handling, security, testing, and deployment. The goal is to give senior engineers a complete technical reference for building MCP servers that operate reliably at production scale.

1. Language Choice: TypeScript

We chose TypeScript for three reasons that compound over the life of the project. First, type safety. MCP is a JSON-RPC protocol. Every message has a defined schema. TypeScript lets us define those schemas as types and validate them at compile time, catching malformed requests and responses before they reach production.

Second, JSON Schema integration. MCP tool definitions include JSON Schema for parameter validation. TypeScript has the best ecosystem for generating, validating, and consuming JSON Schema -- libraries like Zod, AJV, and TypeBox let you define schemas once and derive both runtime validation and static types from the same source.

Third, ecosystem alignment. The MCP specification is maintained by Anthropic, and the reference SDK is TypeScript. The client libraries, testing utilities, and community tooling are all TypeScript-first. Building in TypeScript means building on the path of least resistance.

server.ts
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
import { z } from 'zod'

const server = new McpServer({
  name: 'pep-mcp-server',
  version: '1.0.0',
  description: 'The Prompt Engineering Project MCP Server',
})

// Tool registration happens here (covered in section 3)

const transport = new StdioServerTransport()
await server.connect(transport)
Python is a viable alternative, especially if your tooling ecosystem is Python-heavy. The trade-off is weaker type safety and less mature MCP SDK support. Go is suitable for high-throughput servers but lacks the JSON Schema ergonomics that TypeScript provides.

2. Transport Layer: stdio vs. HTTP/SSE

MCP supports two transport mechanisms, and the choice between them determines your deployment model, security posture, and operational complexity.

stdio Transport

The stdio transport runs the MCP server as a child process of the client. Communication happens over stdin and stdout using newline- delimited JSON. This is the simpler option: no network configuration, no authentication layer, no TLS certificates. The client spawns the server, sends messages to stdin, and reads responses from stdout.

Use stdio when the server runs locally on the same machine as the client. This is the standard model for IDE integrations, CLI tools, and development environments. It is how Claude Desktop, Cursor, and most local MCP clients expect to communicate.

stdio-transport.ts
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'

const transport = new StdioServerTransport()
await server.connect(transport)

// The server now reads from stdin and writes to stdout.
// The client manages the process lifecycle.

HTTP/SSE Transport

The HTTP transport exposes the MCP server as an HTTP endpoint. The client sends requests via POST and receives responses (and server- initiated messages) via Server-Sent Events. This is the model for remote servers, multi-tenant deployments, and web applications.

Use HTTP/SSE when the server needs to be accessible over a network, serve multiple clients simultaneously, or integrate with existing web infrastructure. This is how our production /api/mcp endpoint operates: a Next.js API route that handles MCP requests over HTTP with SSE for streaming responses.

http-transport.ts
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js'

// Inside a Next.js API route or Express handler
export async function POST(request: Request) {
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: () => crypto.randomUUID(),
  })

  await server.connect(transport)

  const body = await request.json()
  const response = await transport.handleRequest(body)

  return new Response(JSON.stringify(response), {
    headers: { 'Content-Type': 'application/json' },
  })
}
HTTP transport requires authentication, rate limiting, and input validation that stdio does not. Do not expose an HTTP MCP endpoint without a security layer. We cover this in section 6.

3. Tool Registration

Tools are the core abstraction in MCP. Each tool has a name, a natural-language description, and a parameter schema. The name is how the model references the tool. The description is how the model decides whether to use it. The schema is how the model constructs valid invocations.

Tool names should be verb-noun pairs: search_documentation,create_prompt_version, get_evaluation_results. Avoid generic names like process or handle. The model selects tools based on name and description, so clarity in naming directly affects tool selection accuracy.

tool-registration.ts
server.tool(
  'search_documentation',
  'Search the project documentation by keyword or semantic query. ' +
  'Returns matching sections with titles, content previews, and ' +
  'relevance scores. Use this when users ask about features, ' +
  'architecture, or configuration.',
  {
    query: z.string().describe('The search query, 1-200 characters'),
    limit: z.number().min(1).max(20).default(5)
      .describe('Maximum number of results to return'),
    section: z.enum(['guides', 'api', 'architecture', 'all']).default('all')
      .describe('Documentation section to search within'),
  },
  async ({ query, limit, section }) => {
    const results = await searchDocs(query, { limit, section })

    return {
      content: [{
        type: 'text' as const,
        text: JSON.stringify(results, null, 2),
      }],
    }
  }
)

Three principles for tool descriptions. First, state what the tool does, not how it works internally. The model does not need implementation details. Second, describe when to use it. A sentence like "Use this when users ask about features, architecture, or configuration" gives the model a decision criterion. Third, describe the output format so the model can plan how to use the result.

Parameter descriptions are equally important. Each parameter should have a Zod description that explains its purpose and constraints. The model reads these descriptions to construct valid arguments. A parameter with no description is a parameter the model will guess at.

The model selects tools based on name and description. Clarity in naming directly affects tool selection accuracy.

4. Handler Implementation

Each tool handler is an async function that receives validated parameters and returns a structured response. The handler is where your business logic lives: database queries, API calls, file operations, computations. The MCP SDK handles parameter validation via the Zod schema you defined during registration, so by the time your handler executes, you can trust that the parameters conform to the schema.

handler-pattern.ts
server.tool(
  'create_prompt_version',
  'Create a new version of a prompt template. Validates the ' +
  'prompt structure, assigns a semantic version, and stores it ' +
  'in the version history.',
  {
    promptId: z.string().uuid().describe('The prompt template ID'),
    content: z.string().min(1).max(50000)
      .describe('The full prompt content'),
    changelog: z.string().max(500)
      .describe('Description of what changed in this version'),
    bumpType: z.enum(['major', 'minor', 'patch']).default('patch')
      .describe('Semantic version bump type'),
  },
  async ({ promptId, content, changelog, bumpType }) => {
    // 1. Validate prompt structure
    const validation = validatePromptStructure(content)
    if (!validation.valid) {
      return {
        content: [{
          type: 'text' as const,
          text: JSON.stringify({
            error: 'INVALID_PROMPT_STRUCTURE',
            details: validation.errors,
          }),
        }],
        isError: true,
      }
    }

    // 2. Create the version
    const version = await createVersion({
      promptId,
      content,
      changelog,
      bumpType,
    })

    // 3. Return structured success response
    return {
      content: [{
        type: 'text' as const,
        text: JSON.stringify({
          versionId: version.id,
          version: version.semver,
          createdAt: version.createdAt,
          changelog,
        }, null, 2),
      }],
    }
  }
)

The handler pattern follows a consistent structure: validate inputs beyond what the schema enforces, execute the operation, and return a structured response. Note the isError: true flag on error responses -- this tells the model that the tool invocation failed and the content describes the failure, not a successful result.

Always return structured JSON in tool responses, even for errors. The model processes structured data more reliably than free-form text, and downstream systems can parse it programmatically.

5. Error Handling

Error handling in MCP servers has two audiences: the model and the operator. The model needs structured, parseable error information so it can decide whether to retry, try a different approach, or report the failure to the user. The operator needs logs, metrics, and traces to diagnose and fix problems.

error-handling.ts
// Define a consistent error response type
interface ToolError {
  error: string        // Machine-readable error code
  message: string      // Human-readable description
  retryable: boolean   // Whether the model should retry
  suggestion?: string  // Alternative approach if not retryable
}

function toolError(error: ToolError) {
  return {
    content: [{
      type: 'text' as const,
      text: JSON.stringify(error, null, 2),
    }],
    isError: true,
  }
}

// Usage in a handler
async function handleSearch(params: SearchParams) {
  try {
    const results = await searchIndex(params.query)
    if (results.length === 0) {
      return toolError({
        error: 'NO_RESULTS',
        message: `No documents matched the query "${params.query}".`,
        retryable: false,
        suggestion: 'Try broader search terms or check spelling.',
      })
    }
    return { content: [{ type: 'text' as const, text: JSON.stringify(results) }] }
  } catch (err) {
    logger.error('search_failed', { query: params.query, error: err })
    return toolError({
      error: 'SEARCH_UNAVAILABLE',
      message: 'The search index is temporarily unavailable.',
      retryable: true,
    })
  }
}

The retryable field is critical. When a tool fails due to a transient issue (network timeout, rate limit), the model should know it can retry. When a tool fails due to invalid input or a permanent condition, the model should know to try a different approach. Without this signal, the model will either retry indefinitely or give up prematurely.

The suggestion field gives the model a concrete alternative. Instead of leaving the model to reason about what went wrong, you tell it what to do next. This reduces unnecessary reasoning tokens and improves the user experience by producing faster, more relevant recovery behavior.

6. Security Model

An MCP server is an attack surface. It accepts structured input from a model that accepts unstructured input from a user. This means that every tool handler is, by definition, processing user-influenced input. Your security model must account for this.

Authentication

For HTTP transport, every request must be authenticated. We use Bearer tokens validated against a server-side session store. The token is issued during an OAuth flow and scoped to a specific user and organization. For stdio transport, authentication is implicit -- the client process runs under the user's OS credentials.

Authorization

Not every authenticated user should have access to every tool. Our server implements tool-level authorization: each tool declares the permissions it requires, and the server checks the user's permissions before executing the handler.

authorization.ts
// Middleware that checks permissions before handler execution
function withAuth(requiredPermission: string, handler: ToolHandler): ToolHandler {
  return async (params, context) => {
    const user = context.session?.user
    if (!user) {
      return toolError({
        error: 'UNAUTHENTICATED',
        message: 'This tool requires authentication.',
        retryable: false,
      })
    }

    if (!user.permissions.includes(requiredPermission)) {
      return toolError({
        error: 'UNAUTHORIZED',
        message: `This tool requires the "${requiredPermission}" permission.`,
        retryable: false,
      })
    }

    return handler(params, context)
  }
}

Input Sanitization

Zod schemas validate structure and types, but they do not sanitize content. A string parameter that passes schema validation can still contain SQL injection, path traversal, or prompt injection payloads. Every string parameter that reaches a database, file system, or external API must be sanitized for that specific context.

Rate Limiting

MCP tool calls can be expensive -- they may query databases, call external APIs, or trigger compute-intensive operations. Rate limiting prevents both abuse and accidental runaway costs. We implement per-user, per-tool rate limits using a sliding window algorithm, with different limits for different tool categories.

Never trust that the model will use tools responsibly. The model is following instructions from users, and users can craft inputs that cause the model to invoke tools in unexpected ways. Defense in depth is not optional.

7. Testing Strategy

MCP servers require testing at three levels: unit tests for individual handlers, integration tests for tool chains, and end-to-end tests that simulate model interactions.

Unit Tests

Each tool handler is a pure async function. Test it like any other function: provide valid inputs, assert correct outputs. Provide invalid inputs, assert correct error responses. Mock external dependencies (databases, APIs) to keep tests fast and deterministic.

handler.test.ts
import { describe, it, expect, vi } from 'vitest'
import { handleSearchDocumentation } from './handlers/search'

describe('search_documentation', () => {
  it('returns results for valid queries', async () => {
    const mockSearch = vi.fn().mockResolvedValue([
      { title: 'Getting Started', score: 0.95 },
      { title: 'API Reference', score: 0.87 },
    ])

    const result = await handleSearchDocumentation(
      { query: 'authentication', limit: 5, section: 'all' },
      { searchFn: mockSearch }
    )

    expect(result.isError).toBeUndefined()
    const data = JSON.parse(result.content[0].text)
    expect(data).toHaveLength(2)
    expect(data[0].title).toBe('Getting Started')
  })

  it('returns structured error for empty results', async () => {
    const mockSearch = vi.fn().mockResolvedValue([])

    const result = await handleSearchDocumentation(
      { query: 'nonexistent_topic', limit: 5, section: 'all' },
      { searchFn: mockSearch }
    )

    expect(result.isError).toBe(true)
    const error = JSON.parse(result.content[0].text)
    expect(error.error).toBe('NO_RESULTS')
    expect(error.retryable).toBe(false)
  })
})

Integration Tests

Integration tests verify that tools work together as a chain. A common pattern in MCP is tool chaining: the model calls tool A, uses the result to construct arguments for tool B, and combines both results. Your integration tests should verify these chains by simulating the multi-step interaction.

integration.test.ts
describe('prompt versioning chain', () => {
  it('creates and retrieves a prompt version', async () => {
    // Step 1: Create a version
    const createResult = await callTool('create_prompt_version', {
      promptId: testPromptId,
      content: 'You are a helpful assistant...',
      changelog: 'Initial version',
      bumpType: 'major',
    })

    const created = JSON.parse(createResult.content[0].text)
    expect(created.version).toBe('1.0.0')

    // Step 2: Retrieve the version (as the model would)
    const getResult = await callTool('get_prompt_version', {
      promptId: testPromptId,
      version: created.version,
    })

    const retrieved = JSON.parse(getResult.content[0].text)
    expect(retrieved.content).toBe('You are a helpful assistant...')
  })
})

8. Deployment

Deployment strategy depends on your transport choice and operational requirements. We use three deployment models for different contexts.

1

Docker Container

For HTTP transport in production. The server runs in a container with a health check endpoint, structured logging to stdout, and graceful shutdown handling. This is our primary deployment model for the /api/mcp endpoint. It integrates with existing container orchestration (ECS, Kubernetes) and supports horizontal scaling.

2

npm Package (stdio)

For local development and IDE integration. The server is published as an npm package with a bin entry. Clients install it globally or per-project and spawn it as a child process. No network configuration required. This is how developers interact with the MCP server during local development.

3

Serverless Function

For low-traffic or event-driven use cases. The server runs as a serverless function (Vercel, AWS Lambda) with the HTTP transport. Cold starts add latency to the first request, but subsequent requests are fast. Cost-effective for tools that are invoked infrequently.

Dockerfile
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --production=false
COPY . .
RUN npm run build

FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

ENV NODE_ENV=production
EXPOSE 3100

HEALTHCHECK --interval=30s --timeout=5s \
  CMD wget -qO- http://localhost:3100/health || exit 1

CMD ["node", "dist/server.js"]
For production HTTP deployments, always include a health check endpoint that verifies database connectivity, external API reachability, and memory usage. A server that starts but cannot reach its dependencies is worse than a server that does not start.

Building an MCP server is not a weekend project. It is an infrastructure investment that determines how capable your AI integrations can be. The decisions you make about transport, security, error handling, and testing compound over the life of the system. Get them right early, and you have a foundation that scales. Get them wrong, and you have technical debt that compounds with every new tool you add.

The architecture we have outlined here is not theoretical. It is the architecture running behind our /api/mcp endpoint, serving tool calls in production. Every decision described in this article was made under the constraints of real traffic, real security requirements, and real operational complexity.

Key Takeaways

1

TypeScript provides the best combination of type safety, JSON Schema integration, and MCP SDK maturity for building MCP servers.

2

Choose stdio for local/IDE integration and HTTP/SSE for remote, multi-tenant, or web-based deployments. Each has distinct security implications.

3

Tool names, descriptions, and parameter schemas directly affect model behavior. Invest in clear, specific, and well-documented tool definitions.

4

Implement structured error responses with retryable flags and suggestions to give the model actionable recovery paths.

5

Defense in depth is mandatory: authentication, authorization, rate limiting, and input sanitization at every layer.

6

Test at three levels -- unit tests for handlers, integration tests for tool chains, and end-to-end tests for model interaction patterns.

Frequently Asked Questions

Common questions about this topic

5 Prompt Anti-Patterns That Waste Tokens and TrustFrom Experiment to Production: An AI Operations Checklist

Related Articles

MCP & AI Infrastructure

The MCP Pattern: Giving AI Tools It Can Actually Use

What problem MCP solves, how our server works, and the tool design patterns that make AI agents actually useful.

MCP & AI Infrastructure

MCP Tool Naming: Why Names Are Your Most Important API Decision

AI models use tool names to decide when and how to call them. Good naming reduces errors, bad naming causes cascading fa...

MCP & AI Infrastructure

Securing AI Tool Use: A Practical Guide

When AI agents can call tools, security becomes a first-class concern. Here's a practical guide to locking down AI tool ...

All Articles