A Walkthrough of A Mathematical Framework for Transformer Circuits

Published 2022-10-13

Download video MP4 360p
Download video MP4 720p

Recommendations

57:20

A Walkthrough of Interpretability in the Wild Part 1/2: Overview (w/ authors Kevin, Arthur, Alex)
1:03:01

What is a Transformer? (Transformer Walkthrough Part 1/2)
40:08

The Most Important Algorithm in Machine Learning
19:14

The strange cousin of the complex numbers -- the dual numbers.
03:13

What is mechanistic interpretability? Neel Nanda explains.
31:51

MAMBA from Scratch: Neural Nets Better and Faster than Transformers
27:14

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning
1:22:38

CS480/680 Lecture 19: Attention and Transformer Networks
59:48

[1hr Talk] Intro to Large Language Models
19:17

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA
06:23

KANTENUMLEIMER für 0 Euro: Dieser TRICK für SPERRHOLZ, MULTIPLEX & Co ist genial! | Jonas Winkler
56:33

Neel Nanda: Mechanistic Interpretability & Mathematics
1:00:58

Transformer Circuits Part 1
43:10

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: What? (Part 1/3)
57:10

Pytorch Transformers from Scratch (Attention is all you need)
46:44

Битва за Кремль закончится осенью. Для Украины все пойдёт по лучшему сценарию, - Соловей
1:03:52

A Walkthrough of In-Context Learning and Induction Heads Part 1 of 2 (w/ Charles Frye)
37:56

Orignal transformer paper "Attention is all you need" introduced by a layman | Shawn's ML Notes
56:20

Transformers with Lucas Beyer, Google Brain
1:19:25

Implementing GPT-2 From Scratch (Transformer Walkthrough Part 2/2)

A Walkthrough of A Mathematical Framework for Transformer Circuits

Download video MP4 360p

Download video MP4 720p

A Walkthrough of Interpretability in the Wild Part 1/2: Overview (w/ authors Kevin, Arthur, Alex)

What is a Transformer? (Transformer Walkthrough Part 1/2)

The Most Important Algorithm in Machine Learning

The strange cousin of the complex numbers -- the dual numbers.

What is mechanistic interpretability? Neel Nanda explains.

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

CS480/680 Lecture 19: Attention and Transformer Networks

[1hr Talk] Intro to Large Language Models

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

KANTENUMLEIMER für 0 Euro: Dieser TRICK für SPERRHOLZ, MULTIPLEX & Co ist genial! | Jonas Winkler

Neel Nanda: Mechanistic Interpretability & Mathematics

Transformer Circuits Part 1

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: What? (Part 1/3)

Pytorch Transformers from Scratch (Attention is all you need)

Битва за Кремль закончится осенью. Для Украины все пойдёт по лучшему сценарию, - Соловей

A Walkthrough of In-Context Learning and Induction Heads Part 1 of 2 (w/ Charles Frye)

Orignal transformer paper "Attention is all you need" introduced by a layman | Shawn's ML Notes

Transformers with Lucas Beyer, Google Brain

Implementing GPT-2 From Scratch (Transformer Walkthrough Part 2/2)

[論文導讀] A Mathematical Framework for Transformer Circuits 導讀

Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning

Elon Musk Laughs at the Idea of Getting a PhD... and Explains How to Actually Be Useful!

A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: How? (Part 2/3)

Mechanistic Interpretability - Stella Biderman | Stanford MLSys #70

UPSC VS IIT JEE 🥵 #iitstatus #motivation #toppers #iitjee #jeemains #upscstatus #neet #nit #jee

Most💯 Important Step Before any Procedure 🔥

A Walkthrough of Automated Circuit Discovery w/ Arthur Conmy Part 1/3

A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3)

Coding for 1 Month Versus 1 Year #shorts #coding

A Walkthrough of Aligning Causal Variables and Distributed Representations w/ Atticus Geiger (1/3)

Cyril Zhang | How do Transformers reason? First principles via automata, semigroups, and circuits

Cosplay by b.tech final year at IIT Kharagpur

Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

0L - Theory [rough early thoughts]