Tool Use: When the Model Calls Something Outside Itself

AI, De-Mystified · Article 10

A language model trained on text knows a lot, but it cannot see today’s weather, run a calculation reliably, or look inside your files. Tool use is what happens when the model decides it needs help from something outside itself and calls that thing into action.

The tool could be a search engine, a calculator, a code runner, a file browser, a database query, or any API. The model does not execute the tool itself. It asks for it, the outside system does the work, and the result is handed back.

Point C1 Tool use extends a language model by letting it invoke external capabilities it does not itself possess.

Plain English Meaning

Think of the model as a person sitting at a desk who cannot leave the room. That person can read, reason, and write. But the room has a phone. With tool use, the person can call someone who can check a fact, run an experiment, or fetch a document.

In practice the sequence is: the model sees a task, decides a tool would help, outputs a structured call, the host system runs it, and the result is added to the conversation.

Existing Concept It Resembles

Tool use is not a new idea. It resembles several older patterns:

A library reference desk. A librarian may call another department to answer a specific question.
A spreadsheet with formulas. You type the formula; the program calls a calculation engine.
A remote procedure call in software. One program asks another program to do work and waits for the answer.
A human using a calculator. You do the reasoning; the device does the arithmetic.

Point C2 Tool use in AI is conceptually similar to delegation, remote procedure calls, and human use of instruments, but it automates the choice of which tool to invoke.

What Is Actually New?

What changed with modern language models is that the caller is not a person writing a script. The model itself decides when to call a tool and what arguments to pass, based on the conversation so far.

This softens the boundary between thinking in language and acting in the world. The same system can switch between writing prose, searching the web, and running code within one task.

That flexibility is powerful, but it comes with a cost: the model can call the wrong tool, pass wrong arguments, or trust a bad result. Tool use adds capability; it does not automatically add reliability.

How It Works In Practice

Here are four common patterns.

1. Search:

user asks about current events → model calls search → snippets return → model summarizes

The model cannot know today’s news from training data alone, so it looks it up.

2. Code execution:

user asks for a calculation → model writes code → runner executes it → model explains

Useful for exact arithmetic and simulations where language reasoning might slip.

3. File or database access:

user asks about a document → model calls file search → text returns → model answers

This overlaps with retrieval-augmented generation, but retrieval is triggered as a tool call.

4. API actions:

user asks to schedule a meeting → model calls calendar API → API confirms

The tool acts on the world. A wrong call can have real consequences.

Point C3 Common tool-use patterns include search, code execution, file or database retrieval, and API calls, each with different reliability and risk profiles.

Where It Helps

Tool use helps when internal knowledge is not enough:

Current information. Training data has a cutoff; search or APIs fetch fresher facts.
Exact computation. Language models are not calculators; a code tool can be exact.
Private data. Your files and databases are not in the training set; a tool can read them with permission.
Action in systems. Booking, ordering, and updating can happen through APIs.

Where It Fails

Wrong tool choice. The model may call search when it already knows the answer, adding latency and cost.

Bad arguments. A calendar API call with the wrong date or time zone can schedule a meeting incorrectly.

Trusting bad results. Outdated search snippets or buggy code output may be repeated.

Overreach. A model with too many tools might call one it should not, such as deleting data or sending a message without confirmation.

Point C4 Tool use introduces failure modes of wrong selection, bad arguments, misplaced trust in tool output, and unwanted actions that must be governed by permissions and checks.

Academic Connections

Tool use connects to several research areas:

Function calling is the engineering pattern that lets models emit structured calls with names and arguments.
Affordances come from design psychology: the perceived actions an object offers.
Action selection is the decision problem of choosing what to do next in an environment with multiple options.
Human-computer interaction studies how people work with tools, including when automation helps and when it introduces errors.

Practical Checklist

When you build or use a tool-using AI system, ask:

Does this task actually need a tool?
Which tools are available, and what can each one do?
What happens if the tool returns nothing, an error, or bad data?
Are there permissions before high-stakes actions?
How is the final answer validated against the tool output?
Is there a cost or latency budget for tool calls?

If you cannot answer these, the system may call tools blindly.

The De-Hype Check

Old name for this idea: delegation, external APIs, subroutine calls, using instruments.
What is genuinely new: language models can decide which tool to call and how to phrase the call based on open-ended conversation.
What gets exaggerated: “The model can do anything now.” In reality, it can only call tools it has been given, with arguments it guesses, and it may misuse or over-trust them.
Who benefits from the hype: Vendors selling all-in-one agent platforms. The truth is more modest: tool use expands possibility, but each tool is a dependency that can fail.

Open Questions

How should a model decide it has enough tool results to answer?
What is the best way to verify a tool result before acting on it?
When should a model answer from memory instead of calling a tool?
How do tool-use failures get communicated clearly to users?

Article guide Important points and sources 4 points Show guide Hide guide

C001 core · high Tool use extends a language model by letting it invoke external capabilities it does not itself possess.
C002 landscape · high Tool use in AI is conceptually similar to delegation, remote procedure calls, and human use of instruments, but it automates the choice of which tool to invoke.
C003 design · medium-high Common tool-use patterns include search, code execution, file or database retrieval, and API calls, each with different reliability and risk profiles.
C004 risk · medium Tool use introduces failure modes of wrong selection, bad arguments, misplaced trust in tool output, and unwanted actions that must be governed by permissions and checks.

Sources Sources used 4 sources Show sources Hide sources

Look closer

Sources and notes

Open details Close details

These notes collect the sources, counterpoints, and review status behind the article's important points. Read the essay first; open this when you want to check something.

Confidence reflects how strongly the sources support the point (low / medium / high). Status describes the point's role (e.g., core, argument, landscape). Sources link to supporting material; counterpoints note boundary conditions or conflicting findings.

C001 high core

Tool use extends a language model by letting it invoke external capabilities it does not itself possess.

Sources (1): “Toolformer trains a language model to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction.”
Toolformer: Language Models Can Teach Themselves to Use Tools direct
Counterpoints (1): Tool use assumes the external capability is available and correctly implemented; a missing or broken tool leaves the model no better off.

C002 high landscape

Tool use in AI is conceptually similar to delegation, remote procedure calls, and human use of instruments, but it automates the choice of which tool to invoke.

Sources (1): “An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.”
Russell and Norvig: Artificial Intelligence: A Modern Approach background
Counterpoints (1): Older systems often hard-code when to call a tool; modern models learn or are prompted to decide dynamically, which introduces new error modes.

C003 medium-high design

Common tool-use patterns include search, code execution, file or database retrieval, and API calls, each with different reliability and risk profiles.

Sources (1): “ReAct interleaves reasoning traces and task-specific actions, allowing the model to perform actions such as search over Wikipedia and interact with environments beyond language generation.”
ReAct: Synergizing Reasoning and Acting in Language Models direct
Counterpoints (1): The boundaries between these patterns blur; retrieval can be implemented as a tool call, and code execution can be chained with search in many ways.

C004 medium risk

Tool use introduces failure modes of wrong selection, bad arguments, misplaced trust in tool output, and unwanted actions that must be governed by permissions and checks.

Sources (1): “Affordances are the perceived action possibilities of an object or environment; mismatches between perceived and actual affordances lead to action errors.”
Gibson: The Ecological Approach to Visual Perception indirect
Counterpoints (1): Many deployed systems reduce these risks by hard-coding tool availability, requiring human confirmation for high-stakes actions, or using deterministic validation layers.

Review recordHow this was madeShow detailsHide details

Created 2026-06-29 by human. Policy: policy:default v1.0.0.

✓ Approved hash matches current article

Agent runs

draftingkimi2026-06-29in:00000000…out:be152f2b…
reviewkimi2026-06-29in:00000000…out:be152f2b…

Reviews

agentapproved2026-06-29
Scope: claims, tone, privacy, scope
contentHash: be152f2bda77bbac…
Sibling-agent review against article-proposal-ideation eval-card. Privacy scan passed. No proprietary or personal content detected.
humanapproved2026-06-29
Scope: thesis, examples, tone, safety
contentHash: be152f2bda77bbac…
Human author approved the draft for publication.

Machine-readable files

The same points, sources, and relationships are also available as structured files for agents and tools. The JSON follows the publication record schema.

JSON file Brief (Markdown)