24-07-11-OPEA-Agent

Agent

Author

xuechendi

Status

v0.1 team sharing completed(07/10/24)

Objective

This RFC introduces a new concept of an “Hierarchical Agent,” which includes two parts.

  • ‘Agent’: Agent refers to a framework that integrates the reasoning capabilities of large language models (LLMs) with the ability to take actionable steps, creating a more sophisticated system that can understand and process information, evaluate situations, take appropriate actions, communicate responses, and track ongoing situations, finally output with result meeting defined goals.

Single Agent Example:

image

behind the scene

image

  • ‘Multi Agent’ system: Multi Agents refer to a design that leveraging a Hierarchical Agent Teams to complete sub-tasks through individual agent working groups. Benefits of multi-agents’ design: (1) Grouping tools/responsibilities can give better results. An agent is more likely to succeed on a focused task than if it must select from dozens of tools. (2) Each agent will have their own assets including prompt, llm model, planning strategy and toolsets. (3) User can easily use yaml files or few lines of python to build a ‘Hierarchical Multi Agent’ megaservice by cherry-picking ready-to-use individual agents. (4) For small tasks which can be perfectly performed by single Agent, user can directly use ‘Agent’ microservice with simple/easy resource management.

Multi Agent example:

curl ${ip_addr}:${SUPERVISOR_AGENT_PORT}/v1/chat/completions -X POST \
-d "{'input': 'Generate a Analyst Stock Recommendations by taking an average of all analyst recommendations and classifying them as Strong Buy, Buy, Hold, Underperform or Sell.'}"

image

Motivation

This RFC aims to provide low-code / no-code agents as new microservice / megaservice for Enterprise users who are looking for using their own tools with LLM. Tools includes domain_specific_search, knowledgebase_retrieval, enterprise_servic_api_authorization_required, proprietary_tools, etc.

Persona

We use the listed terms to define different persona mentioned in this document.

  • OPEA developer: OPEA developers describe who will follow current OPEA API SPEC or expand OPEA API SPEC to add new solutions. OPEA developers are expected to use this RFC to understand how this microservice communicates with other microservices and chained in megaflow. OPEA developer develops OPEA agent codes and add new Agent Implementation by extending current Agent library with advanced agent strategies.

  • Enterprise User (Devops): Devops describe who will follow OPEA yaml configuration format to update settings according to their real need, or tune some of the configuration to get better performance, who will also use their updated configuration to launch all microservices and get functional endpoint and API calling. Devops are expected to use this RFC to understand the keywords, how these keywords works and rules of using this microservice. Devops are expected to follow customer tool template to provide their own tools and register to Agent microservice.

  • End user: End user describe who writes application which will use OPEA exposed endpoints and API to fulfill task goals. End users are expected to use this RFC to understand API keywords and rules.

Design Proposal

Execution Plan

v0.8 (PR ready or merge to opea - agent branch)

  • Agent component v0.1

    • Support chat-completion API

  • Agent example - Insight Assistant v0.1 (IT demo)

    • hierarchical multi agents

    • includes: research(rag, data_crawler); writer(format); reviewer(rule)

  • Agent debug system

V0.9

  • Agent component v0.1

    • Support assistants API

    • K8s helm chart

  • Agent Example - Insight Assistant v0.1

    • Shared demo with IT

    • Establish IT collaboration effort

V1.0

  • Performance benchmark

  • Scaling

  • Concurrency

Part 1. API SPEC

Provide two types of API for different client application.

  1. openAI chat completion API.

Reference: https://platform.openai.com/docs/api-reference/chat/create

Advantage and limitation:

  • Most common API, should be working with any existing client uses openAI.

  • will not be able to memorize user historical session, human_in_loop agent will not work using this API.

"/v1/chat/completions": {
                      "model": str,
                      "messages": list,
                      "tools": list,
                  }
  1. openAI assistant API

Reference: https://platform.openai.com/docs/api-reference/assistants

Advantage and limitation:

  • User can create a session thread memorizing previous conversation as long-term memory. And Human-In-Loop agent will only works use this API.

  • User client application may need codes change to work with this new API.

  • openAI assistant API is tagged with ‘beta’, not stable

# assistants API is used to create agent runtime instance with a set of tool / append addition instructions
- "/v1/assistants": {
                        "instructions": str,
                        "name": str,
                        "tools": list
                    }

# threads API is to used maintain conversation session with one user. It can be resumed from previous, can tracking long term memories.
- "/v1/threads/ ": { # empty is allowed }


# threads messages API is to add a task content to thread_1 (the thread created by threads API)
- "/v1/threads/thread_1/messages": {
                            "role": str,
                            "content": str
                        }

# threads run API is to start to execute agent thread using run api

- "/v1/threads/thread_1/runs": {
                            'assistant_id': str,
                            'instructions': str,
                        }

Part 2. ‘Agent’ genAI Component definition

‘Agent’ genAI Component is regarded as the resource management unit in “Agent” design. It will be launched as one microservice and can be instantiated as ‘Agent’, ‘Planner’ or ‘Executor’ according to configuration. Tools will be registered to ‘Agent’ microservice during launch or runetime.

image

SPEC for any agent Role - agent, planner, executor

"/v1/chat/completions": {
                        "model": str,
                        "messages": list,
                        "tools": list,
                    }
"/v1/assistants": {
                        "instructions": str,
                        "name": str,
                        "tools": list
                    }
"/v1/threads/: {}
"/v1/threads/thread_1/runs": {
                            'assistant_id': str,
                            'instructions': str,
                        }
"/v1/threads/thread_1/messages": {
                            "role": str,
                            "content": str
                        }

Agent Role microservice definition - ‘Agent’:

A complete implementation of Agent, which contains LLM endpoint as planner, strategy algorithm for plan execution, Tools, and database handler to keep track of historical state and conversation.

configuration:

strategy: choices([react, planexec, humanInLoopPlanExec])
require_human_feedback: bool
llm_endpoint_url: str
llm_engine: choices([tgi, vllm, openai])
llm_model_id: str
recursion_limit: int
tools: file_path or dict

# Tools definition
[tool_name]:
  description: str
  callable_api: choices([http://xxxx, xxx.py:func_name])
  env: str
  pip_dependencies: str # sep by ,
  args_schema:
    query:
      type: choices([int, str, bool])
      description: str
  return_output: str

Agent Role microservice definition - ‘Planner’:

Agent without tools. Planner only contains LLM endpoints as planner, certain strategies to complete an optimized plan.

configuration:

strategy: choices([react, planexec, humanInLoopPlanExec])
require_human_feedback: bool
llm_endpoint_url: str
llm_engine: choices([tgi, vllm, openai])
llm_model_id: str
recursion_limit: int
require_human_feedback: bool

Agent Role microservice definition - ‘Executor’:

Tools executors. Executor is used to process input with registered tools.

Configuration:

[tool_name]:
description: str
callable_api: choices([http://xxxx, xxx.py:func_name])
env: str
pip_dependencies: str # sep by ,
args_schema:
  query:
    type: choices([int, str, bool])
    description: str
return_output: str

Any microservcice follow this spec can be registered as role in Part3-graph-based

Part3. ‘Multi Agent’ system overview

We planned to provide multi-agent system in two phases.

  • Phase I: Hierarchical Multi Agents

    1. In this design, only top-layer Agent will be exposed to OPEA mega flow. And only ‘Agent’ microservice will be used to compose Hierarchical Multi Agents system.

    2. Users are only allowed to use yaml files to provide tools configuration, high-level instructions text and hierarchical relationship between agents.

    3. This design simplifies the agent configuration, using simple yaml definition can still be used to compose a multi agent system to handle complex tasks.

    Detailed configuration please refer to Part3.1 image

  • Phase II: Graph-Based Multi Agent

    1. In this design, we provide user a new SDK to compose a graph-based multi agents system with conditional edge to define all strategic rules.

    2. Enterprise user will be able to use python code to wrap either ‘agent’, ‘planner’ or tools as ‘Role’ and add conditional edges between them for complex task agent design.

    3. This design provides user enough flexibility to handle very complex tasks and also provide flexibility to handle resource management when certain tools are running way slower than others.

    Detailed configuration please refer to Part3.2 image

Part3.1 Hierarchical Multi Agents

Example 1: ‘Single Agent megaservice’ Only 1 agent is presented in this configuration. image

3 tools are registered to this agent through custom_tools.yaml image

image image

Example 2: ‘Hierarchical Multi Agents’ 3 agents are presented in this configuration, 1st layer supervisor agent is the gateway to interact with user, and 1st layer agent will manage 2nd layer worker agents.

image

Users are expected to register 2nd layer workerAgents to 1st layer supervisor agent through supervisor_agent_custom_tools.yaml file. image

image image

User can follow this way to add more layers: image

Example 3: ‘Multi Steps Agent megaservice’:

User can also chain agent into a multi-step mega service. audioAgent_megaservice.yaml image

Part3.2 Graph-Based Multi Agent

In Phase II, we propose to provide a graph-based multi agents system, which enterprise user will be able to define edges and conditional edges between agent nodes, planner nodes and tools for complex task agent design.

image

The user can build and launch the graph-based message group by the combination of docker image and yaml file: image

The yaml file contains the basic config information for each single “Role” in the agent architecture. The user can build a MessageGroup to define the link connection information and the data flow via “edges” and “conditional_edges”. The “edges” mean the output of the head_node is the input of the tail_node. The “conditional_edges” means there is a decision-making among the candidate tail_nodes based on the output of the head_node. The logic of this selection part is defined by the state component “Should_Continue”. image

Appending agents/roles in MessageGroup. Define the role class define the action of the role  add edges  recompile the messagegroup image

Part 4. Agent Debug System

TBD

Part 5. Benchmark

TBD