Evaluation of Large Language Models

Learn how to evaluate large language models for accuracy, safety, alignment, and performance using human and automated metrics to ensure reliable, ethical, and high-quality AI systems.

Enroll for free

Course Curriculum

1. Overview
1. Rationale for LLM and Agent Evaluation
2. Components of LLM Evaluation
3. Tasks and Benchmark Datasets for Evaluation
4. Challenges in LLM Evaluation
5. Quiz: LLM Evaluation Fundamentals
1. Classic and Contextual Embedding Approaches
2. BLUE, ROUGE and BERT Score
3. Evaluating RAG-Based Applications
4. Faithfulness
5. Answer Relevancy
6. Context Precision
7. Context Recall
8. Evaluation of RAG Applications Using RAGAS
1. Evaluating a RAG Application Using RAGAs
2. LLM-As-A-Judge Evaluation
3. Classic Evaluation Metrics
4. Semantic Similarity With BERTScore
1. Slide Deck
2. OpenAI API Key Setup

About this course

Free
20 lessons
1.5 hours of video content

Evaluation of Large Language Models

Course Curriculum

Getting Started

Fundamentals

Evaluation Metrics

Practical Exercises

Resources

About this course