Sparsity for Efficient Long Sequence Generation of LLMs

Published 2023-12-08

Download video MP4 360p

Recommendations

1:05:25

Theoretical Characterization of Forgetting and Generalization of Continual Learning
49:47

“What's wrong with LLMs and what we should be building instead” - Tom Dietterich - #VSCF2023
1:01:23

DSPy: Advanced Prompt Engineering?
1:06:11

Rethinking the Theoretical Foundation of Reinforcement Learning
1:05:21

[REFAI Seminar 04/20/23] Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
1:37:37

The Turing Lectures: The future of generative AI
58:58

FlashAttention - Tri Dao | Stanford MLSys #67
25:47

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED
58:12

MIT Introduction to Deep Learning | 6.S191
19:17

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA
59:41

Machine Learning challenges in Metrology in Semiconductor Device Industry
47:48

LLM Foundations (LLM Bootcamp)
59:48

[1hr Talk] Intro to Large Language Models
45:46

Geoffrey Hinton | On working with Ilya, choosing problems, and the power of intuition

Similar videos

53:35

Yuandong Tian | Efficient Inference of LLMs with Long Context Support
52:12

Pixelated Butterfly: Fast Machine Learning with Sparsity - Beidi Chen | Stanford MLSys #49
04:38

LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply
03:03

Text Classification Explained | Sentiment Analysis Example | Deep Learning Applications | Edureka
07:08

Ep. 5 - How to Overcome LLM Context Window Limitations
56:18

Ji Lin's PhD Defense, Efficient Deep Learning Computing: From TinyML to Large Language Model. @MIT
42:27

Unlock Faster and More Efficient LLMs with SparseGPT
30:25

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral
59:54

Compression for AGI - Jack Rae | Stanford MLSys #76
14:46

LLama-2 7B: 400K context length - Beyond Limits?
15:09

LongT5: Efficient Text-To-Text Transformer for Long Sequences (Research Paper Summary)
05:34

Attention mechanism: Overview
05:04

xFormers: Building Blocks for Efficient Transformers at PyTorch Conference 2022
1:19:06

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87
More results