Benchmarks for agentic applications¶

We collected two benchmarks for evaluating agentic applications:

CRAG (Comprehensive RAG) benchmark for RAG agents
TAG-Bench for SQL agents

These agent benchmarks are enabled on Intel Gaudi systems using vllm as the LLM serving framework. You can choose to serve the models on other hardware with vllm too.

We will add more benchmarks for agents in the future. Stay tuned.