# 24-08-20-OPEA-001-AI Gateway API

AI Gateway API

## Author

[daixiang0](https://github.com/daixiang0), [zhixie](https://github.com/zhxie), [gyohuangxin](https://github.com/gyohuangxin), [Forrest-zhao](https://github.com/Forrest-zhao), [ruijin-intel](https://github.com/ruijin-intel)

## Status

Under Review

## Objective

Design the API for AI Gateway.

## Motivation

- Introduce gateway to do mTLS, traffic control, observability and so on
- Introduce AI Gateway API to use existing gateway sloutions rather than implement our own one.

## Design Proposal

The AI gateway is at the front of all microservices:

```mermaid
graph TD;
    A(AI Gateway)-->Retrival;
    A-->Rerank;
    A-->LLM;
    A-->Guardrails;
    A-->B(Any microservice);
```

### API overall

To make the most of current resources, we choose to follow [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/) since it is the gateway API standard that all gateways support.

Since AI specific features of Kubernetes Gateway API are still [under discussion](https://docs.google.com/document/d/1FQN_hGhTNeoTgV5Jj16ialzaSiAxC0ozxH1D9ngCVew/edit), We design AI Gateway API including following two parts:

- **Kubernetes Gateway API** for features it already supports
- **Extension API for** all other features

### API workflow

```mermaid
graph LR;
  A(Config using AI Gateway API)-->B(Convert to specific gateway API)
```

AI Gateway is not a brand-new gateway implementation, only does one thing: Convert.

### Extension API

```yaml
apiVersion: extension.gateway.opea.dev/v1
kind: Gateway
metadata:
  name: extension-exmaple
spec:
  gatewayClassName: envoy
  extensions:
  - name: extension-1
    config:
      extension-1-config: aaa
  - name: extension-2
    config:
      extension-2-config: bbb
```

- gatewayClassName: specific gateway implement
- name: the name of extension feature, support multiple extensions
- config: the content of extension config, following specified gateway API

### Extension API example

```yaml

apiVersion: extension.gateway.opea.dev/v1
kind: Gateway
metadata:
  name: envoy-extension-exmaple
spec:
  gatewayClassName: envoy
  extensions:
  - name: token-ratelimit
    config:
      name: envoy.filters.http.guardrails
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.http.guardrails.v3.Guardrails
        inference:
          runtime: envoy.inference_runtime.openvino
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.inference_runtime.openvino.v3.OpenvinoConfig
            backend: CPU
            plugins:
            - /usr/lib/libopenvino_tokenizers.so
          model_path: /home/zhihao/envoy/.project/openvino/models/OTIS-Official-Spam-Model.xml
        source: RESPONSE
        action: ALLOW
```

**Guardrail** is AI specific feature, here we use Extension API to config Envoy to use CPU to inference with specified model to do response check.

The config field follows the Envoy API.