AI Architecture10 min readBy Ravi Shankar

Quick Answer

A technical deep dive into AI function calling and tool use — how it works, implementation patterns, error handling, parallel tool calling, and production best practices.

AI Function Calling and Tool Use: Technical Deep Dive

Function calling — the ability of an LLM to invoke external tools, APIs, and code — is what transforms a language model from a text generator into an agent that can act in the world. Understanding how it works technically is essential for building reliable agentic AI systems.


How Function Calling Works

At a high level, function calling is a structured conversation pattern:

  1. You tell the LLM what tools it has available (function names, descriptions, parameter schemas)
  2. The LLM decides whether to use a tool to answer the user's request
  3. If yes, the LLM returns a structured tool call (function name + arguments)
  4. Your application executes the tool call
  5. You return the result to the LLM
  6. The LLM synthesizes the tool result into a final response

This loop continues until the LLM decides it has enough information to respond or completes the requested task.


Tool Definition

Tools are defined as JSON schemas that the LLM reads to understand what each tool does and what parameters it requires:

{
  "type": "function",
  "function": {
    "name": "get_customer_order",
    "description": "Retrieve a customer's order details by order ID. Returns order status, items, and shipping information.",
    "parameters": {
      "type": "object",
      "properties": {
        "order_id": {
          "type": "string",
          "description": "The unique order identifier, e.g. ORD-12345"
        }
      },
      "required": ["order_id"]
    }
  }
}

Critical: Write excellent descriptions. The LLM decides which tool to call based on the description. A vague description leads to incorrect tool selection or incorrect parameters.


Implementation with OpenAI

from openai import OpenAI
import json

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }
]

def get_weather(city: str) -> dict:
    # Your actual implementation
    return {"city": city, "temperature": 22, "condition": "sunny"}

def run_with_tools(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )

    # Check if the model wants to call a tool
    while response.choices[0].finish_reason == "tool_calls":
        tool_calls = response.choices[0].message.tool_calls

        # Add the assistant's message (with tool calls) to history
        messages.append(response.choices[0].message)

        # Execute each tool call
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            arguments = json.loads(tool_call.function.arguments)

            # Route to the appropriate function
            if function_name == "get_weather":
                result = get_weather(**arguments)
            else:
                result = {"error": f"Unknown function: {function_name}"}

            # Add tool result to messages
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            })

        # Get next response
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools
        )

    return response.choices[0].message.content

Parallel Tool Calling

Modern LLMs can call multiple tools simultaneously — a significant performance optimization for workflows that require multiple independent data lookups:

# The LLM may return multiple tool calls in one response:
tool_calls = [
    ToolCall(function="get_customer_profile", args={"customer_id": "C123"}),
    ToolCall(function="get_account_balance", args={"account_id": "A456"}),
    ToolCall(function="get_recent_transactions", args={"account_id": "A456"})
]

# Execute in parallel using asyncio or threading
import asyncio

async def execute_tool_calls_parallel(tool_calls):
    tasks = [execute_tool(tc) for tc in tool_calls]
    results = await asyncio.gather(*tasks)
    return results

Sequential execution of 3 tools taking 500ms each = 1,500ms total. Parallel execution = ~500ms total. For multi-step agent workflows, this matters significantly.


Error Handling Patterns

Tool calls fail. The LLM needs to know about failures and handle them gracefully:

Pattern 1: Return error details in tool result

def safe_tool_execution(function_name, arguments):
    try:
        result = execute_function(function_name, arguments)
        return {"success": True, "data": result}
    except NotFoundException as e:
        return {"success": False, "error": "not_found", "message": str(e)}
    except PermissionError as e:
        return {"success": False, "error": "permission_denied", "message": str(e)}
    except Exception as e:
        return {"success": False, "error": "execution_failed", "message": str(e)}

Pattern 2: Retry on transient failures

import time
from functools import wraps

def with_retry(max_attempts=3, backoff_seconds=1):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except TransientError:
                    if attempt < max_attempts - 1:
                        time.sleep(backoff_seconds * (2 ** attempt))
            raise MaxRetriesExceeded()
        return wrapper
    return decorator

Security Considerations

Tool use is where AI agent security matters most. Tools can modify data, send communications, and take external actions.

Principle of least privilege: Each tool should only have permissions needed for its specific operation. An order lookup tool should not also have order cancellation permissions.

Input sanitization: Never pass raw LLM-generated arguments to database queries or shell commands. Validate and sanitize all tool arguments.

Action confirmation for destructive operations: For tools that modify or delete data, implement confirmation requirements:

def delete_record(record_id: str, confirmed: bool = False) -> dict:
    if not confirmed:
        return {
            "status": "requires_confirmation",
            "message": f"This will delete record {record_id}. Set confirmed=true to proceed."
        }
    # proceed with deletion

Rate limiting: Prevent runaway tool calling from creating unexpected costs or system load.

Audit logging: Log every tool call with timestamp, function name, arguments, and result.


Choosing Tool Granularity

Too coarse ("search_everything"): LLM can't predict what the tool returns; harder to compose.

Too fine ("get_customer_first_name", "get_customer_last_name"): Excessive tool calls for simple operations.

Right granularity: Tools that map to meaningful business operations and return coherent data units:

  • get_customer_profile (returns all relevant customer attributes)
  • search_products (returns list of matching products with key attributes)
  • create_support_ticket (creates ticket and returns ticket ID and status)

Production Best Practices

  1. Version your tool schemas: When tool parameters change, version the interface and update models accordingly.

  2. Cache tool results: For read-only tools with stable data, cache results within a session or across sessions.

  3. Implement timeouts: Every tool call must have a timeout. An AI agent waiting indefinitely for a slow external API creates poor user experience.

  4. Monitor tool usage: Track which tools are called most frequently, with what arguments, and with what outcomes. This data informs optimization.

  5. Test tool definitions: Use automated tests that verify tool descriptions produce correct tool selection for a set of test queries.


Conclusion

Function calling and tool use are the mechanism that makes AI agents capable of real-world impact. Mastering the implementation — correct tool definition, parallel execution, robust error handling, and security controls — is the engineering foundation for reliable production AI agents.


Related Reading

Ready to deploy autonomous AI agents?

Our engineers are available to discuss your specific requirements.

Book a Consultation