24-08-20-OPEA-001-AI Gateway API

AI Gateway API

Author

daixiang0, zhixie, gyohuangxin, Forrest-zhao, ruijin-intel

Status

Under Review

Objective

Design the API for AI Gateway.

Motivation

  • Introduce gateway to do mTLS, traffic control, observability and so on

  • Introduce AI Gateway API to use existing gateway sloutions rather than implement our own one.

Design Proposal

The AI gateway is at the front of all microservices:

graph TD; A(AI Gateway)-->Retrival; A-->Rerank; A-->LLM; A-->Guardrails; A-->B(Any microservice);

API overall

To make the most of current resources, we choose to follow Kubernetes Gateway API since it is the gateway API standard that all gateways support.

Since AI specific features of Kubernetes Gateway API are still under discussion, We design AI Gateway API including following two parts:

  • Kubernetes Gateway API for features it already supports

  • Extension API for all other features

API workflow

graph LR; A(Config using AI Gateway API)-->B(Convert to specific gateway API)

AI Gateway is not a brand-new gateway implementation, only does one thing: Convert.

Extension API

apiVersion: extension.gateway.opea.dev/v1
kind: Gateway
metadata:
  name: extension-exmaple
spec:
  gatewayClassName: envoy
  extensions:
  - name: extension-1
    config:
      extension-1-config: aaa
  - name: extension-2
    config:
      extension-2-config: bbb
  • gatewayClassName: specific gateway implement

  • name: the name of extension feature, support multiple extensions

  • config: the content of extension config, following specified gateway API

Extension API example

apiVersion: extension.gateway.opea.dev/v1
kind: Gateway
metadata:
  name: envoy-extension-exmaple
spec:
  gatewayClassName: envoy
  extensions:
  - name: token-ratelimit
    config:
      name: envoy.filters.http.guardrails
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.http.guardrails.v3.Guardrails
        inference:
          runtime: envoy.inference_runtime.openvino
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.inference_runtime.openvino.v3.OpenvinoConfig
            backend: CPU
            plugins:
            - /usr/lib/libopenvino_tokenizers.so
          model_path: /home/zhihao/envoy/.project/openvino/models/OTIS-Official-Spam-Model.xml
        source: RESPONSE
        action: ALLOW

Guardrail is AI specific feature, here we use Extension API to config Envoy to use CPU to inference with specified model to do response check.

The config field follows the Envoy API.