24-07-11-OPEA-Agent¶

Agent

Author¶

xuechendi

Status¶

v0.1 team sharing completed(07/10/24)

Objective¶

This RFC introduces a new concept of an “Hierarchical Agent,” which includes two parts.

‘Agent’: Agent refers to a framework that integrates the reasoning capabilities of large language models (LLMs) with the ability to take actionable steps, creating a more sophisticated system that can understand and process information, evaluate situations, take appropriate actions, communicate responses, and track ongoing situations, finally output with result meeting defined goals.

Single Agent Example:

behind the scene

‘Multi Agent’ system: Multi Agents refer to a design that leveraging a Hierarchical Agent Teams to complete sub-tasks through individual agent working groups. Benefits of multi-agents’ design: (1) Grouping tools/responsibilities can give better results. An agent is more likely to succeed on a focused task than if it must select from dozens of tools. (2) Each agent will have their own assets including prompt, llm model, planning strategy and toolsets. (3) User can easily use yaml files or few lines of python to build a ‘Hierarchical Multi Agent’ megaservice by cherry-picking ready-to-use individual agents. (4) For small tasks which can be perfectly performed by single Agent, user can directly use ‘Agent’ microservice with simple/easy resource management.

Multi Agent example:

curl ${ip_addr}:${SUPERVISOR_AGENT_PORT}/v1/chat/completions -X POST \
-d "{'input': 'Generate a Analyst Stock Recommendations by taking an average of all analyst recommendations and classifying them as Strong Buy, Buy, Hold, Underperform or Sell.'}"

Motivation¶

This RFC aims to provide low-code / no-code agents as new microservice / megaservice for Enterprise users who are looking for using their own tools with LLM. Tools includes domain_specific_search, knowledgebase_retrieval, enterprise_servic_api_authorization_required, proprietary_tools, etc.

Persona¶

We use the listed terms to define different persona mentioned in this document.

OPEA developer: OPEA developers describe who will follow current OPEA API SPEC or expand OPEA API SPEC to add new solutions. OPEA developers are expected to use this RFC to understand how this microservice communicates with other microservices and chained in megaflow. OPEA developer develops OPEA agent codes and add new Agent Implementation by extending current Agent library with advanced agent strategies.
Enterprise User (Devops): Devops describe who will follow OPEA yaml configuration format to update settings according to their real need, or tune some of the configuration to get better performance, who will also use their updated configuration to launch all microservices and get functional endpoint and API calling. Devops are expected to use this RFC to understand the keywords, how these keywords works and rules of using this microservice. Devops are expected to follow customer tool template to provide their own tools and register to Agent microservice.
End user: End user describe who writes application which will use OPEA exposed endpoints and API to fulfill task goals. End users are expected to use this RFC to understand API keywords and rules.

Design Proposal¶

Execution Plan¶

v0.8 (PR ready or merge to opea - agent branch)

Agent component v0.1
- Support chat-completion API
Agent example - Insight Assistant v0.1 (IT demo)
- hierarchical multi agents
- includes: research(rag, data_crawler); writer(format); reviewer(rule)
Agent debug system

V0.9

Agent component v0.1
- Support assistants API
- K8s helm chart
Agent Example - Insight Assistant v0.1
- Shared demo with IT
- Establish IT collaboration effort

V1.0

Performance benchmark
Scaling
Concurrency

Part 1. API SPEC¶

Provide two types of API for different client application.

openAI chat completion API.

Reference: https://platform.openai.com/docs/api-reference/chat/create

Advantage and limitation:

Most common API, should be working with any existing client uses openAI.
will not be able to memorize user historical session, human_in_loop agent will not work using this API.

"/v1/chat/completions": {
                      "model": str,
                      "messages": list,
                      "tools": list,
                  }

openAI assistant API

Reference: https://platform.openai.com/docs/api-reference/assistants

Advantage and limitation:

User can create a session thread memorizing previous conversation as long-term memory. And Human-In-Loop agent will only works use this API.
User client application may need codes change to work with this new API.
openAI assistant API is tagged with ‘beta’, not stable

# assistants API is used to create agent runtime instance with a set of tool / append addition instructions
- "/v1/assistants": {
                        "instructions": str,
                        "name": str,
                        "tools": list
                    }

# threads API is to used maintain conversation session with one user. It can be resumed from previous, can tracking long term memories.
- "/v1/threads/ ": { # empty is allowed }


# threads messages API is to add a task content to thread_1 (the thread created by threads API)
- "/v1/threads/thread_1/messages": {
                            "role": str,
                            "content": str
                        }

# threads run API is to start to execute agent thread using run api

- "/v1/threads/thread_1/runs": {
                            'assistant_id': str,
                            'instructions': str,
                        }

Part 2. ‘Agent’ genAI Component definition¶

‘Agent’ genAI Component is regarded as the resource management unit in “Agent” design. It will be launched as one microservice and can be instantiated as ‘Agent’, ‘Planner’ or ‘Executor’ according to configuration. Tools will be registered to ‘Agent’ microservice during launch or runtime.

SPEC for any agent Role - agent, planner, executor¶

"/v1/chat/completions": {
                        "model": str,
                        "messages": list,
                        "tools": list,
                    }
"/v1/assistants": {
                        "instructions": str,
                        "name": str,
                        "tools": list
                    }
"/v1/threads/: {}
"/v1/threads/thread_1/runs": {
                            'assistant_id': str,
                            'instructions': str,
                        }
"/v1/threads/thread_1/messages": {
                            "role": str,
                            "content": str
                        }

Agent Role microservice definition - ‘Agent’:¶

A complete implementation of Agent, which contains LLM endpoint as planner, strategy algorithm for plan execution, Tools, and database handler to keep track of historical state and conversation.

configuration:

strategy: choices([react, planexec, humanInLoopPlanExec])
require_human_feedback: bool
llm_endpoint_url: str
llm_engine: choices([tgi, vllm, openai])
llm_model_id: str
recursion_limit: int
tools: file_path or dict

# Tools definition
[tool_name]:
  description: str
  callable_api: choices([http://xxxx, xxx.py:func_name])
  env: str
  pip_dependencies: str # sep by ,
  args_schema:
    query:
      type: choices([int, str, bool])
      description: str
  return_output: str

Agent Role microservice definition - ‘Planner’:¶

Agent without tools. Planner only contains LLM endpoints as planner, certain strategies to complete an optimized plan.

configuration:

strategy: choices([react, planexec, humanInLoopPlanExec])
require_human_feedback: bool
llm_endpoint_url: str
llm_engine: choices([tgi, vllm, openai])
llm_model_id: str
recursion_limit: int
require_human_feedback: bool

Agent Role microservice definition - ‘Executor’:¶

Tools executors. Executor is used to process input with registered tools.

Configuration:

[tool_name]:
description: str
callable_api: choices([http://xxxx, xxx.py:func_name])
env: str
pip_dependencies: str # sep by ,
args_schema:
  query:
    type: choices([int, str, bool])
    description: str
return_output: str

Any microservcice follow this spec can be registered as role in Part3-graph-based

Part3. ‘Multi Agent’ system overview¶

We planned to provide multi-agent system in two phases.

Phase I: Hierarchical Multi Agents
1. In this design, only top-layer Agent will be exposed to OPEA mega flow. And only ‘Agent’ microservice will be used to compose Hierarchical Multi Agents system.
2. Users are only allowed to use yaml files to provide tools configuration, high-level instructions text and hierarchical relationship between agents.
3. This design simplifies the agent configuration, using simple yaml definition can still be used to compose a multi agent system to handle complex tasks.
Detailed configuration please refer to Part3.1
Phase II: Graph-Based Multi Agent
1. In this design, we provide user a new SDK to compose a graph-based multi agents system with conditional edge to define all strategic rules.
2. Enterprise user will be able to use python code to wrap either ‘agent’, ‘planner’ or tools as ‘Role’ and add conditional edges between them for complex task agent design.
3. This design provides user enough flexibility to handle very complex tasks and also provide flexibility to handle resource management when certain tools are running way slower than others.
Detailed configuration please refer to Part3.2

Part3.1 Hierarchical Multi Agents¶

Example 1: ‘Single Agent megaservice’ Only 1 agent is presented in this configuration.

3 tools are registered to this agent through custom_tools.yaml

Example 2: ‘Hierarchical Multi Agents’ 3 agents are presented in this configuration, 1st layer supervisor agent is the gateway to interact with user, and 1st layer agent will manage 2nd layer worker agents.

Users are expected to register 2nd layer workerAgents to 1st layer supervisor agent through supervisor_agent_custom_tools.yaml file.

User can follow this way to add more layers:

Example 3: ‘Multi Steps Agent megaservice’:

User can also chain agent into a multi-step mega service. audioAgent_megaservice.yaml

Part3.2 Graph-Based Multi Agent¶

In Phase II, we propose to provide a graph-based multi agents system, which enterprise user will be able to define edges and conditional edges between agent nodes, planner nodes and tools for complex task agent design.

The user can build and launch the graph-based message group by the combination of docker image and yaml file:

The yaml file contains the basic config information for each single “Role” in the agent architecture. The user can build a MessageGroup to define the link connection information and the data flow via “edges” and “conditional_edges”. The “edges” mean the output of the head_node is the input of the tail_node. The “conditional_edges” means there is a decision-making among the candidate tail_nodes based on the output of the head_node. The logic of this selection part is defined by the state component “Should_Continue”.

Appending agents/roles in MessageGroup. Define the role class define the action of the role  add edges  recompile the messagegroup

Part 4. Agent Debug System¶

TBD

Part 5. Benchmark¶

TBD