Routing AgentRFC

A dynamic routing agent for optimal model selection and orchestration

Author(s)

Haim Barad Madison Evans

Status

Proposed

Objective

Create an intelligent routing layer that:

  • Analyzes text-based input queries in real-time.

  • Selects optimal model based on criteria like cost, latency, and capability requirements

  • Supports multiple cloud providers and self-hosted models

Motivation

  • Growing complexity of multi-LLM environments

  • Need for cost-efficient inference without sacrificing quality

  • Lack of standardized orchestration patterns

  • Increasing demand for hybrid cloud/on-prem deployments

Design Proposal

Core Components:

  1. Query Analyzer: Supports several known classifiers (matrix factorization, BERT, etc) and Semantic understanding and intent classification

  2. Routing Engine: Provides dynamic model selection based on query complexity

  3. Monitoring: Real-time metrics collection (latency, cost, accuracy)

  4. This code is based on RouteLLM, which is available at https://github.com/lm-sys/RouteLLM

Key Features:

  • Dynamic model selection based on query complexity

  • Returns the selected model endpoint so that developer can call proper model, or does actual routing to the chosen model so this process is invisible to the developer

  • Cost-aware routing policies

Miscellaneous

  • Performance: <5ms overhead per request

  • Security: Zero-trust authentication between components

  • Staging Plan:

    1. Phase 1: Basic routing MVP

    2. Phase 2: Advanced analytics dashboard

    3. Phase 3: Auto-scaling integration