Sparsity for Efficient Long Sequence Generation of LLMs Published 2023-12-08 Download video MP4 360p Recommendations 1:05:25 Theoretical Characterization of Forgetting and Generalization of Continual Learning 1:14:19 Towards a Theoretical Understanding of Parameter-Efficient Fine-Tuning (and Beyond) 1:12:23 Transformers as Support Vector Machines 49:47 “What's wrong with LLMs and what we should be building instead” - Tom Dietterich - #VSCF2023 52:16 Demystifying and Mitigating Unfairness for Machine Learning over Graphs 1:01:23 DSPy: Advanced Prompt Engineering? 1:06:11 Rethinking the Theoretical Foundation of Reinforcement Learning 1:05:21 [REFAI Seminar 04/20/23] Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time 55:41 Flat Minima Generalize for Low-Rank Matrix Recovery 1:37:37 The Turing Lectures: The future of generative AI 58:58 FlashAttention - Tri Dao | Stanford MLSys #67 25:47 Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED 1:09:58 MIT Introduction to Deep Learning | 6.S191 19:17 Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA 59:41 Machine Learning challenges in Metrology in Semiconductor Device Industry 47:48 LLM Foundations (LLM Bootcamp) 33:02 FlexGen:High-throughput Generative Inference of Large Language Models with a Single GPU - Ying Sheng 47:50 MBZUAI AI Quorum - SlowDNN - Beidi Chen 59:48 [1hr Talk] Intro to Large Language Models 45:46 Geoffrey Hinton | On working with Ilya, choosing problems, and the power of intuition Similar videos 1:11:54 LightOn AI Meetup: Sparsity for Efficient Long Sequence Generation of LLMs with Beidi Chen 53:35 Yuandong Tian | Efficient Inference of LLMs with Long Context Support 52:12 Pixelated Butterfly: Fast Machine Learning with Sparsity - Beidi Chen | Stanford MLSys #49 04:38 LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply 03:03 Text Classification Explained | Sentiment Analysis Example | Deep Learning Applications | Edureka 07:08 Ep 5. How to Overcome LLM Context Window Limitations 56:18 Ji Lin's PhD Defense, Efficient Deep Learning Computing: From TinyML to Large Language Model. @MIT 42:27 Unlock Faster and More Efficient LLMs with SparseGPT 13:00 Sparse Priming Representation (SPR): 🧠 Giving AI Unlimited Memory! MemGPT 2.0! (AGI IS HERE?!) 30:25 Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral 59:54 Compression for AGI - Jack Rae | Stanford MLSys #76 34:32 Mixtral of Experts (Paper Explained) 14:46 LLama-2 7B: 400K context length - Beyond Limits? 15:09 LongT5: Efficient Text-To-Text Transformer for Long Sequences (Research Paper Summary) 05:34 Attention mechanism: Overview 05:04 xFormers: Building Blocks for Efficient Transformers at PyTorch Conference 2022 1:19:06 Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87 06:28 LLM in a flash: Efficient Large Language Model Inference with Limited Memory More results