Evaluation of Large Language Models
Master the art of evaluating language models beyond accuracy. Gain insights into benchmarks and hands-on practices to build trust in their performance.
Course Overview
Learning Objectives
Introduction
Necessity of Evaluation
Evaluation Contexts and Challenges
Quiz
Introduction
Benchmark Datasets
Why Does Benchmarking Fail?
Quiz
Introduction
Traditional and Embedding-Based Metrics
RAGAs
Faithfulness
Answer Relevancy
Context Precision
Context Recall
Quiz
In-Class Exercises
Supplementary Exercise
Graded Quiz