Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

Published 2023-09-02

Recommendations

1:10:55

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
26:55

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch
08:22

Python Session 4 - 26th May 2024
55:55

Miles Cranmer - The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024)
50:55

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training
06:14

Mamba Language Model Simplified In JUST 5 MINUTES!
58:04

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
14:06

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs
37:01

TransformerFAM: Feedback attention is working memory
11:17

Rotary Positional Embeddings: Combining Absolute and Relative
1:26:21

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer
08:33

The KV Cache: Memory Usage in Transformers
54:52

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token
1:15:39

Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem
11:00

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
49:24

Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)
1:27:41

Programming in Modern C with a Sneak Peek into C23 - Dawid Zalewski - ACCU 2023
10:31

The U-Net (actually) explained in 10 minutes
1:00:49

The Art of Code - Dylan Beattie

Similar videos