Workshop: Reinforcement Learning with Human Feedback and GRPO

Fine-tune LLMs with RLHF and GRPO to solve reasoning, math, and coding tasks.

Key Insights You'll Gain

Understand why LLMs’ capabilities in reasoning, math, and coding emerged later than their language abilities

Learn what Reinforcement Learning with Human Feedback (RLHF) is and why it’s essential for fine-tuning models

Dive into Group Relative Policy Optimization (GRPO), a popular method for optimizing models on logical tasks

Explore practical applications of RLHF and GRPO to make LLMs more reliable in reasoning-intensive scenarios

Founder, Serrano Academy

Luis Serrano is a renowned AI scientist with a PhD in mathematics and experience in AI at companies that include Google, Apple, and Cohere.