Learning Notebook - David Rostcheck
Public View
learning_event details
Learning Event ID
Subject
Topic
Program
Length
Institution
Presenter
Format
Recorded Date
Completed Date
Notes
Using window attention with attention sink tokens allows pretrained chat-style LLMs, such as all Llama, Mistral, MPT, Falcon, and GPT-NeoX (Pythia) models, to stay fluent across hundreds of subsequent prompts, unlike when these models are loaded using transformers. Furthermore, this approach allows for constant memory usage, while most LLMs loaded with transformers have linear space complexity resulting in memory issues. Using this form of attention is as simple as importing your model class from attention_sinks rather than transformers: from attention_sinks import AutoModel model = AutoModel.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1", device_map="auto")
Personal Notes
Link
Review
Return to
main screen