Lazy loading inverts the dynamic context pattern. Instead of retrieving context before calling the model, you start with minimal context and let the model request additional information as it needs it, via tool calls. The model decides what information is relevant, not the retrieval pipeline.
This pattern is central to how AI agents work. The agent receives a task, considers what information it needs, calls the appropriate tools to fetch that information, reasons about the results, and repeats until it has enough context to produce a final answer.
User Message (minimal initial context)
│
▼
┌─────────┐
┌─────>│ LLM │──────┐
│ └─────────┘ │
│ │ │
│ Tool call needed? │
│ │ │ │
│ YES NO ───┼──> Final Response
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Execute │ │
│ │ Tool Call │ │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Return │ │
└─┤ Results │ │
└──────────┘ │
│
(loop until model has │
enough context) ◄─────────┘
const tools = [
{
name: 'get_customer_profile',
description: 'Retrieve customer details by ID or email',
parameters: {
type: 'object',
properties: {
identifier: { type: 'string', description: 'Customer ID or email' },
},
},
},
{
name: 'search_knowledge_base',
description: 'Search product documentation and FAQs',
parameters: {
type: 'object',
properties: {
query: { type: 'string', description: 'Search query' },
},
},
},
{
name: 'get_recent_tickets',
description: 'Get recent support tickets for a customer',
parameters: {
type: 'object',
properties: {
customerId: { type: 'string' },
limit: { type: 'number', description: 'Max tickets to return' },
},
},
},
]
async function handleWithLazyLoading(userMessage: string) {
const messages = [{ role: 'user' as const, content: userMessage }]
// Loop: let the model request context as needed
while (true) {
const response = await callModel({
system: systemPrompt,
messages,
tools,
})
// If the model wants to call a tool, execute it
if (response.toolCalls && response.toolCalls.length > 0) {
for (const call of response.toolCalls) {
const result = await executeTool(call.name, call.arguments)
messages.push(
{ role: 'assistant' as const, content: response.content },
{ role: 'tool' as const, content: JSON.stringify(result) },
)
}
continue // Let the model reason about the new context
}
// No more tool calls: model has enough context
return response.content
}
}
The advantage of lazy loading is token efficiency. The model only retrieves information it actually needs, which can be dramatically less than what a pre-retrieval pipeline would inject. A customer asks "what is my account status?" -- the model calls one tool and gets one result. The same question through a dynamic context pipeline might have injected five documents from the knowledge base that were never needed.
The disadvantage is latency. Each tool call is a round trip: the model generates a tool call, your server executes it, the result goes back to the model, and the model generates again. Two tool calls means three model invocations instead of one. For simple questions this overhead is wasteful. For complex questions that genuinely need multiple pieces of information, it is the right architecture.