Towards Monosemanticity: Decomposing Language Models Into Understandable Components

Published 2023-10-25

Download video MP4 360p
Download video MP4 720p

Recommendations

39:41

Language Models Can Explain Neurons in Language Models
26:55

ChatGPT: 30 Year History | How AI Learned to Talk
1:00:14

Studying Large Language Model Generalization with Influence Functions
11:49

Anthropic Solved Interpretability?
44:34

Chronos: Learning the Language of Time Series
1:26:45

Bill Dally | Directions in Deep Learning Hardware
55:55

Miles Cranmer - The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024)
46:02

What is generative AI and how does it work? – The Turing Lectures with Mirella Lapata
2:29:35

A Walkthrough of Toy Models of Superposition w/ Jess Smith
43:31

What is a vector database? Why are they critical infrastructure for #ai #applications?
36:29

Compromising LLMs: The Advent of AI Malware
30:21

How Stable Diffusion Works (AI Image Generation)
55:41

Connor Leahy Unveils the Darker Side of AI
15:22

How Intelligence Evolved | A 600 Million Year Story
31:51

Universal and Transferable Adversarial Attacks on Aligned Language Models Explained
09:21

Reading AI's Mind - Mechanistic Interpretability Explained [Anthropic Research]
47:06

MIT 6.S087: Foundation Models & Generative AI. INTRODUCTION
15:25

Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention
14:22

How AI Learns Concepts
2:22:44

Adversarial Attacks on LLMs

Similar videos

15:54

🚀🔍 AI papers deep dive: LLM understanding, RAG, CoT
1:12:46

EP36: ChatGPT Vision Road Tested, AutoGen Cheese Test & Anthropic's Break Through
40:59

Chris Olah - Looking Inside Neural Networks with Mechanistic Interpretability
58:06

Poisoning Web-Scale Training Datasets - Nicholas Carlini | Stanford MLSys #75
3:13:13

Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind
47:20

Superposition in LLM Feature Representations | Boluwatife Ben-Adeola | Conf42 LLMs 2024
03:14

Google invests $2B in Anthropic 💰, RAG demystified ❓, decomposing LLMs with dictionary learning 📚
1:14:36

The AI Scouting Report: Jailbreaks and Defense
1:11:01

Interpretability Hackathon 3.0 Keynote - Neel Nanda
More results