Evaluating GenAI¶
GenAIEval provides evaluation, benchmark, scorecard, and targeting for performance on throughpuut and latency, accuracy on popular evaluation harnesses, safety, and hallucination.
We’re building this documentation from content in the GenAIEval GitHub repository.
- GenAIEval
- Legal Information
- Kubernetes Platform Optimization with Resource Management
- OPEA Benchmark Tool
- Auto-Tuning for ChatQnA: Optimizing Resource Allocation in Kubernetes
- Usage
- Auto-Tuning for ChatQnA: Optimizing Accuracy by Tuning Model Related Parameters
- Setup Prometheus and Grafana to visualize microservice metrics
- StressCli
- locust scripts for OPEA ChatQnA
- HELMET: How to Evaluate Long-context Language Models Effectively and Thoroughly
- CRAG Benchmark for Agent QnA systems
- AutoRAG to evaluate the RAG system performance
- 🚀 QuickStart
- 🚀 QuickStart
- Evaluation Methodology
- Metric Card for BLEU
- RAGAAF (RAG assessment - Annotation Free)
- OPEA adaption of ragas (LLM-as-a-judge evaluation of Retrieval Augmented Generation)