Direct Preference Optimization (DPO) Published 2023-11-13 Download video MP4 360p Recommendations 19:44 Chat Fine tuning 33:16 Improving LLM accuracy with Monte Carlo Tree Search 26:55 DPO Debate: Is RL needed for RLHF? 58:07 Aligning LLMs with Direct Preference Optimization 48:48 Deep dive in transformer positional encodings 48:46 Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math 28:01 Understanding Mixture of Experts 1:16:36 Function Calling Datasets, Training and Inference 1:27:41 Programming in Modern C with a Sneak Peek into C23 - Dawid Zalewski - ACCU 2023 08:55 Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained 1:02:26 The Best Tiny LLMs 49:26 Fine tuning Whisper for Speech Transcription 38:24 Proximal Policy Optimization (PPO) - How to train Large Language Models 43:40 Fine-tuning on Wikipedia Datasets 1:07:40 Multi GPU Fine tuning with DDP and FSDP 59:37 Data Preparation Tips and Tricks 53:25 The Only Unbreakable Law 40:40 Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained) 1:31:13 A Hackers' Guide to Language Models 33:26 Fine tuning Optimizations - DoRA, NEFT, LoRA+, Unsloth Similar videos 09:10 Direct Preference Optimization: Forget RLHF (PPO) 36:25 Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model Explained 05:12 Direct Preference Optimization (DPO) in AI 21:15 Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning 1:03:55 Towards Reliable Use of Large Language Models: Better Detection, Consistency, and Instruction-Tuning 14:14 Direct Preference Optimization 08:00 Direct Preference Optimization (DPO) of LLMs to Reduce Toxicity 16:43 What is Direct Preference Optimization? 53:03 DPO - Part1 - Direct Preference Optimization Paper Explanation | DPO an alternative to RLHF?? 1:06:31 UMass CS685 S24 (Advanced NLP) #12: Direct preference optimization (DPO) 47:55 DPO : Direct Preference Optimization 37:12 PR-453: Direct Preference Optimization 45:21 How DPO Works and Why It's Better Than RLHF 33:26 ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained) More results