AI helped me articulate my thoughts here and I am planning to capture addtitional thoughts here later on. Specifially would like to 1) map model capabilities/constraints to agent features as a framework 2) talk about implementation details using MCP and A2A protocols
The current wave of AI development is shifting from static large language models (LLMs) to dynamic, goal-driven LLM agents. But to truly understand what this shift entails, we need to start with the foundational idea of agents in artificial intelligence concept well-articulated in Artificial Intelligence: A Modern Approach by Russell and Norvig.
What Is an Agent?
In the classical AI sense, an agent is defined as anything that can perceive its environment through sensors and act upon that environment through actuators. The core of agent design is the perception-action loop: the agent observes the environment, makes a decision, and performs an action. The sophistication of that decision process distinguishes a reactive thermostat from an autonomous robot.
To evaluate how an agent functions, Norvig and Russell define several key dimensions of the environment:
- Fully Observable vs. Partially Observable: Can the agent access the complete state of the environment at any given time?
- Single Agent vs. Multi-Agent: Is it the only decision-maker, or does it operate alongside others (cooperative or competitive)?
- Deterministic vs. Stochastic: Do actions always have predictable outcomes?
- Static vs. Dynamic: Does the environment change while the agent deliberates?
- Discrete vs. Continuous: Is the state space composed of countable, distinct values or infinite, flowing variables?
Each axis determines the design tradeoffs needed in an intelligent system. For instance, a chess-playing agent exists in a fully observable, deterministic, and discrete world—perfect for planning and search. In contrast, a self-driving car must navigate a dynamic, partially observable, stochastic, and continuous world—demanding perception, real-time reaction, and uncertainty modeling.
Reframing LLMs as Agents
Traditionally, LLMs have existed in isolation from this framework. They are stateless, reactive systems that produce outputs purely based on input text. Their only environment is the prompt. Their only actuator is language.
But as we wrap LLMs in agentic scaffolding—allowing them to perceive, reason, plan, and act—they increasingly resemble the agents described by Norvig and Russell. This shift marks the transition from foundational models to LLM agents.
From Completion to Control
A base LLM like GPT-4 or Claude can complete text, summarize documents, or generate code. But when combined with tools like memory, retrieval, code execution, and external APIs, an LLM becomes an agent with a persistent identity and long-term goal-seeking behavior.
In the new framing, LLM agents:
- Perceive their world not through sensors, but via API calls, documents, files, or user input.
- Reason by running chains of thought, performing searches, or querying databases.
- Act by issuing commands, making edits, calling functions, or writing code.
- Learn or Adapt by updating memory or changing strategy over time.
The Environment Revisited
Let’s revisit the environment dimensions in the context of LLM agents:
- Partially Observable: Most LLM agents operate in partially observable spaces (e.g., a file system, API state, or user interface), with incomplete or evolving information.
- Multi-Agent: Many AI systems now coordinate with other agents or human collaborators (e.g., a team of agents working on a software project).
- Stochastic: Outcomes of LLM-generated actions (e.g., calling a buggy API) are often unpredictable.
- Dynamic: The environment may change between actions—files get modified, users give new instructions.
- Discrete + Continuous: While interaction points are often discrete (e.g., button clicks or JSON payloads), the underlying data (e.g., natural language) exists on a continuous semantic spectrum. In essence, we are embedding a language-based cognitive system into a real environment, forcing it to operate under uncertainty and adapt over time—just like a classical AI agent.
Why This Matters
This evolution is not just a technical milestone—it redefines what LLMs are. We’re no longer building static prediction engines; we’re building systems that observe, reason, plan, and act. They can write code, test it, debug it, and deploy it. They can search for papers, extract insights, and write reports. And increasingly, they do this autonomously. By adopting the agent lens, we can better design architectures, evaluate performance, and reason about capabilities. It’s no longer just about token prediction accuracy—it’s about whether an agent can achieve a goal in a messy, partially observable world.