_

007: Virtual Tool Calling: A Token-Efficient Alternative

When building AI agents, tool calling is the standard way to give models access to external functionality. Most providers offer native tool calling APIs where you define schemas, and the model outputs structured JSON.

But there's another approach: virtual tool calling. Instead of using the provider's tool calling mechanism, you describe tools in plain text and have the model output tool invocations as delimited text (often XML).

What's the difference?

Traditional tool calling:

Virtual tool calling:

Try it yourself

Here's the same tool call in both formats. You can run these through a token counter to see the difference.

Traditional tool calling (JSON)

The model outputs a tool_use block with JSON-encoded parameters:

{
  "type": "tool_use",
  "id": "tool_1",
  "name": "execute_go_code",
  "input": {
    "code": "package main\n\nimport (\n\t\"bufio\"\n\t\"context\"\n\t\"fmt\"\n\t\"os\"\n\t\"strings\"\n)\n\nfunc Run(ctx context.Context) error {\n\t// Read cities from file and get weather for each\n\tfile, err := os.Open(\"cities.txt\")\n\tif err != nil {\n\t\treturn fmt.Errorf(\"failed to open file: %w\", err)\n\t}\n\tdefer file.Close()\n\n\tvar results []string\n\tscanner := bufio.NewScanner(file)\n\tfor scanner.Scan() {\n\t\tcity := strings.TrimSpace(scanner.Text())\n\t\tif city == \"\" {\n\t\t\tcontinue\n\t\t}\n\n\t\tweather, err := GetWeather(ctx, GetWeatherInput{\n\t\t\tCity: city,\n\t\t\tUnit: \"fahrenheit\",\n\t\t})\n\t\tif err != nil {\n\t\t\treturn fmt.Errorf(\"failed to get weather for %q: %w\", city, err)\n\t\t}\n\n\t\tresults = append(results, fmt.Sprintf(\"%s: %.0f°F\", city, *weather.Temperature))\n\t}\n\n\tif err := scanner.Err(); err != nil {\n\t\treturn err\n\t}\n\n\tfmt.Println(\"Weather Report\")\n\tfmt.Println(\"==============\")\n\tfmt.Println(strings.Join(results, \"\\n\"))\n\treturn nil\n}\n",
    "executionTimeout": 60
  }
}

Notice how every newline becomes \\n, every tab becomes \\t, and quotes inside the code need escaping.

Virtual tool calling (XML)

The model outputs plain text with XML delimiters:

<execute_go_code>
<code>
package main

import (
	"bufio"
	"context"
	"fmt"
	"os"
	"strings"
)

func Run(ctx context.Context) error {
	// Read cities from file and get weather for each
	file, err := os.Open("cities.txt")
	if err != nil {
		return fmt.Errorf("failed to open file: %w", err)
	}
	defer file.Close()

	var results []string
	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		city := strings.TrimSpace(scanner.Text())
		if city == "" {
			continue
		}

		weather, err := GetWeather(ctx, GetWeatherInput{
			City: city,
			Unit: "fahrenheit",
		})
		if err != nil {
			return fmt.Errorf("failed to get weather for %q: %w", city, err)
		}

		results = append(results, fmt.Sprintf("%s: %.0f°F", city, *weather.Temperature))
	}

	if err := scanner.Err(); err != nil {
		return err
	}

	fmt.Println("Weather Report")
	fmt.Println("==============")
	fmt.Println(strings.Join(results, "\n"))
	return nil
}
</code>
<executionTimeout>60</executionTimeout>
</execute_go_code>

The code appears exactly as written. No escaping needed.

The token efficiency finding

I ran token counts on equivalent conversations using both approaches. The same code, the same tool descriptions, the same task.

Virtual tool calling used ~30% fewer tokens.

The savings come from escaping overhead. When you pass code as a JSON string, every newline becomes \\n, every quote needs escaping, every backslash doubles. In virtual tool calling, the code sits in plain text between XML tags.

For short strings, this doesn't matter much. For multi-line code blocks—which is common when using code-as-tool-call patterns—the overhead adds up.

A note on model providers

The actual token savings will vary depending on your model provider. Each provider post-trains their models differently for tool calling, which affects how tool calls get tokenized and generated.

Some providers may have optimized their tokenizers or generation for native tool calling in ways that reduce the escaping overhead. Others may add additional formatting or structure that increases it. The 30% figure comes from testing with Claude—your results with other models may differ.

The only way to know for sure is to measure with your specific provider and use case.

Where the savings come from

The token efficiency gains have two sources:

  1. No escaping. In JSON strings, newlines become \\n, tabs become \\t, quotes become \\\", and backslashes double up. Each escape sequence adds tokens. In XML-delimited text, these characters appear as-is.

  2. No JSON syntax overhead. Traditional tool calls require braces, colons, quotes around keys, and structural formatting that have nothing to do with the tool name or arguments. Compare:

{"type":"tool_use","id":"tool_1","name":"execute_go_code","input":{"code":"...","executionTimeout":60}}
<execute_go_code>
<code>...</code>
<executionTimeout>60</executionTimeout>
</execute_go_code>

The JSON version needs quotes around every key, colons, commas, and nested braces. These all cost tokens.

Tradeoffs

Virtual tool calling is better when:

Traditional tool calling is better when:

Implementation note

Most providers don't penalize you for putting tool descriptions in the system prompt instead of the dedicated tools parameter. The tokens count the same either way. The savings come purely from how the model's output gets encoded.

If you're already prompting the model to output in a specific format anyway, virtual tool calling may cost you nothing to try.

Counterpoint: Native Tool Calling May Still Win

Token efficiency isn't everything. There are reasons to prefer native tool calling despite the overhead:

Guaranteed schema compliance. OpenAI's Structured Outputs uses constrained decoding to guarantee 100% valid JSON. Malformed responses are impossible. With XML, you rely on the model to produce valid markup—parsing failures happen.

Models are trained for it. Providers train their models specifically on native tool calling formats. The Berkeley Function Calling Leaderboard distinguishes between native function calling (FC) and prompt-based approaches—FC consistently scores higher. Research like ToolACE (ICLR 2025) shows that models trained on function calling data achieve state-of-the-art results, with 8B parameter models rivaling GPT-4 on tool calling benchmarks.

Provider-specific optimizations. Native tool calling likely uses special tokens and internal representations that the model was trained with. OpenAI's fine-tuning cookbook explicitly supports fine-tuning for function calling when accuracy is critical.

Multi-turn reliability. Native formats handle conversation state and tool result injection in standardized ways. Custom formats must reinvent this, and edge cases accumulate.

The token savings from virtual tool calling are real, but for production systems where a single malformed tool call breaks an entire workflow, the reliability of native tool calling may be worth the cost.

Conclusion

Virtual tool calling isn't universally better. But for agents that generate code or other text-heavy outputs, the token savings are real. Worth measuring for your specific use case.