What’s Emerging in AI: Autonomous Multi-Agents and Large Action/Agentic Models (LAMs)

Published in

Towards AI

8 min readMar 18, 2024

What a thrilling time (or innovation wars) we have been witnessing in AI! Every day, there seems to be a new innovation pushing the boundaries of what machines can do. One of the most exciting areas of development has been is the field of Large Language Models (LLMs). These models have been taking center stage, impressing us with their ability to generate human-like text and understand complex language patterns.

But what’s next for LLMs?

Well, one exciting possibility is the integration of LLMs into autonomous agents. Imagine a world where intelligent machines can communicate with us in natural language, understand our needs, and make decisions based on their own knowledge and rules. Yes, that’s the next big thing in AI — LLM-powered Autonomous Agents!

But where do these agents come from? Let’s begin with grasping their evolution.

Evolution of Autonomous Agents

An agent is any physical or digital entity capable of perceiving its environment and taking corresponding action. The technological evolution of AI agents has undergone several stages with a steady rise in their abilities:

Symbolic or Rule-based agents: Logical rule-based and symbolic representations encapsulate knowledge and facilitate reasoning processes.
Reactive or Reflex agents: Function on sense-act loop focusing on perceiving and reacting to the environment.
Reinforcement learning-based agents: learn by interacting with an environment and receiving rewards for desired actions and penalties for undesired ones, using trial and error. A few successful domains are gaming, robots, and self-driving.
Transfer learning and meta learning agents: Transfer learning eases the training on new tasks by facilitating the sharing and migration of knowledge, hence enhancing learning efficiency, performance, and generalization capabilities. Meta learning relies on a small number of samples, enabling quick learning.
LLM-based agents: Fueled by LLM’s emergent capabilities (unexpected rise of a capability within the model rather than being explicitly programmed or trained) and linguistic abilities (comprehension, NLP, NLU, sentiment analysis, text generation), these agents reveal exceptional perceptual, reasoning, and action abilities through techniques like problem breakdown and Chain-of-Thought (CoT).

A profound fact of the Learning Agents is their ability to observe, learn, and improve their behavior based on their experience, past decision-making, and results of their actions. It’s like a self-reflection!

From LLMs Constraints to Opportunities

LLMs are trained on massive datasets of text, images, and code from the internet, books, enterprise data, and so on. They are knowledge powerhouse. We all know that! But they do have a few limitations:

Statelessness: LLMs are stateless, meaning they cannot maintain information about previous interactions or make decisions based on past events. This limits their ability to engage in meaningful conversations or tasks requiring contextual awareness.
Knowledge of real-time data: LLMs are trained on a fixed dataset. They cannot access new information after training, incorporate it, or adapt to changing circumstances in real time.
No access to tools: LLMs do not have the ability to interact directly with tools such as APIs or software applications, which limits their ability to reason, plan, and act.

Autonomous agents require access to memory, real-time knowledge bases, and tools to operate in the real world.

Here lies an opportunity to level them up…

Remember what Master Shifu says: “If you only do what you can do, you will never be more than who you are.”

Let’s see what an agent can do when powered by these forces, as illustrated in the image below:

Conceptual illustration of an LLM-based agent in action (Inspired from references shared below)

Memory: For an autonomous agent to perform effectively in multi-step actions, memory provides the context and allows the agent to learn from past experiences to apply in a specific situation. Short-term memory is achieved by merging input text with contextually pertinent data related to the ongoing task, bound by the LLM’s context length. Long-term memory stores and regulates significant volumes of knowledge, observed data, and historical records in vectors, graphs, relational databases, files, and folders. Memory retrieval is about Retrieval-augmented generation (RAG) which is a popular technique for enhancing the accuracy and reliability of LLMs with knowledge retrieved from external sources at inference.
Knowledge base: For genuine autonomy, they need access to external, real-time knowledge bases like knowledge graphs and databases to continuously learn, reason, and make decisions with up-to-date facts.
Tools: LLM agents might need interfaces to connect with sensors, IoT devices, actuators, search engines, and websites. Tools can expand the action space of LLM-based agents, providing access to various external resources and diversifying the modalities of agent actions.

The characteristics of power-packed agents

Autonomy: operate independently without the direct intervention of humans or others.
Goal-oriented: a set of goals to achieve with actions directed towards goals.
Intelligence: reason, plan, learn, and use knowledge to achieve goals.
Flexibility: handle various tasks and situations, not just a single task.
Adaptivity: learn from their experiences and adapt to new situations and environments.
Proactiveness: take the initiative to achieve their goals.
Reactiveness: perceive the environment and respond to changes promptly.
Mobility: physically moving from one place to another in their environment.
Social Ability: interact with other agents and humans
Temporal Continuity: not just one-shot decision makers but continuous learners based on perceptions and actions.

From Single Agent to Multi-Agent System (MAS)

A multi-agent system (MAS) composed of multiple interacting intelligent agents. With each agent’s specific domain expertise, the multi-agent systems benefit a larger ecosystem spanning multiple domains.

An example of agents from Kung Fu Panda on a complex task

Let’s look at a hypothetical example to understand the ecosystem of multiple agents taking on a complex task, breaking it down into smaller tasks, and carrying them out sequentially or parallelly.

Multiple intelligent agents working to achieve common goal — illustration with Kung Fu Panda

Transcending towards Large Action Models (LAMs) and Large Agentic Models (LAMs)

Large Action Model (LAM)

A type of AI model that focuses on taking actions in the real world based on understanding user intent and context, or even observing user actions. It is designed to be flexible and adaptable to different situations. The Large Action Model provides a framework for an autonomous agent to select and execute actions based on its current state and the desired outcome.

Examples: Calling taxi, ordering food, booking appointments or tickets

Large Agentic Model (LAM)

They encompass models that exhibit agency to act independently within their environment. Such models learn from interaction with the real world, perform capabilities like planning and decision-making, and take action. The large agentic model provides a framework for autonomous agents to interact with each other and the environment and to adapt their behavior based on feedback and learning.

Examples: Agents that can plan, make decisions, and act autonomously interacting with the environment and even other agencies

Differences between Large Language Model, Large Action Model, and Large Agentic Model

Industry Research and Development

Rabbit R1: The Rabbit R1 is a small AI-powered device that can be used to complete tasks like ordering takeout, calling a taxi, or playing music without needing to open individual apps. Rabbit R1 is built with their proprietary Large Action Model, powered by neuro-symbolic programming, to learn how humans interact with interfaces, mimic the pattern, and automate the process in the future.
SuperAGI: SuperAGI focuses on developing Large Agentic Models (LAMs) designed to power autonomous AI agents. Their agentic AI, where models can actively take actions within an environment to achieve goals, goes beyond traditional AI models for tasks like classification or generation. Their open-source Small Agentic Model (SAM) project represents interesting developments in offering lightweight, compact, yet mighty with their reasoning benchmarks and task efficiency.
Microsoft AutoGen: Microsoft AutoGen is an open-source framework, currently under development, designed to enable the next-generation LLM applications with multi-agent collaboration. Based on their paper, AutoGen works with defining the conversable agents powered with LLMs, tools, humans, specifying their roles and interactions, and automating a workflow orchestration.
MetaGPT: Another multi-agent system, MetaGPT, stands out with its structured meta-programming techniques for manipulating, analyzing, and transforming code in real-time. It utilizes Standardized Operating Procedures (SOPs) with LLM-based multi-agent systems, fostering effective collaboration and task decomposition in complex, real-world applications.

These are just a few examples, and research in both multi-agent systems and LAMs is rapidly evolving. Companies and research institutions around the world are actively exploring the potential of these technologies in various fields, from healthcare, defense, finance, robotics, and autonomous vehicles.

Ethical Implications of Autonomous Agents and LAMs

There is no doubt that autonomous agents are powerful, but like any other AI system, they raise ethical implications that need to be considered, including bias and stereotypes arising from knowledge bases, transparency and explainability regarding their decisions, accountability and responsibility for their actions, and privacy and security.

It is important to ensure appropriate human oversight of autonomous agents and LAMs, particularly in high-stakes decisions. This can ensure that they are used responsibly and ethically and that their decisions align with human values.

Sustainability and Environmental Impact

Autonomous agents and LAMs can be computationally intensive, requiring significant energy to train and operate. It is important to consider energy efficiency in their design and implementation, using techniques such as model compression, pruning, and quantization to reduce computational requirements. On a similar note, it’s crucial to consider resource utilization, carbon footprint, and environmental impact, ensuring that they are used in a responsible and sustainable manner.

Wrap up

Autonomous agents and LAMs are fairly new in research and development, and their full potential is yet undiscovered. However, they do offer a promising shift in the way computers interact with humans and are cognizant of circumstances. AI agents are now acknowledged as a pivotal stride towards achieving Artificial General Intelligence (AGI).

References

Cheng, Y., et al. (2024). Exploring large language model based intelligent agents: Definitions, methods, and prospects. [arXiv preprint arXiv:2401.03428]. Retrieved from https://arxiv.org/pdf/2401.03428v1.pdf

Cheng, Y., et al. (2024). The Rise and Potential of Large Language Model Based Agents: A Survey. [arXiv preprint arXiv:2309.07864]. Retrieved from https://arxiv.org/pdf/2309.07864.pdf