Learning Notebook - David Rostcheck
Public View
learning_event details
Learning Event ID
Subject
Topic
Program
Length
Institution
Presenter
Format
Recorded Date
Completed Date
Notes
Welcome back to The Memo. While this is another huge edition (perhaps we could call it the OpenAI edition!), I am relieved to announce a gentle easing in the AI news cycle. After the first half of 2023 saw a record 30 editions of The Memo (that’s more than one edition per week), 47 new model highlights including the massive releases of OpenAI GPT-4 and Google PaLM 2, I can see my bed on the distant horizon… While the pace of change is actually increasing, I believe that this ‘eye of the storm’ is a short moment of solace before imminent releases of multi trillion-parameter models including Google DeepMind Gemini (my link), OpenAI GPT-5 (my link), Anthropic Claude-Next, and many more. The BIG Stuff Microsoft LongNet increases sequence length to 1 billion tokens (6/Jul/2023) Our work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence… Sequence length, as the last atomic dimension of the neural network, is desirable to be unlimited. Breaking the limitation of sequence length introduces significant advantages. First, it provides large memory and receptive field for models, which is practical for them to interact with human and the world. Second, a longer context contains more complex causality and reasoning paths that models can exploit in training data. In contrast, short dependency has more spurious correlations, which is harmful to generalization. Third, it enables to explore the limits of in-context learning. As an interesting aside, all researchers on this paper (there are seven of them) are from the Natural Language Computing Group at Microsoft Research Asia, Beijing, China. The output from both AI labs in China and Chinese researchers working in Western AI labs is phenomenal. Read the paper: https://arxiv.org/abs/2307.02486 Salesforce launches XGen 7B (Jul/2023) XGen was trained on 1.5T tokens to 7B parameters. That’s 215:1, far outpacing the recommended ratio of tokens to parameters proposed by Chinchilla (20:1) and reinforced by recent models like LLaMA (22:1). The Salesforce team collected a significant corpus from recent datasets like RedPajama, The Pile, and C4. Salesforce found that training to 7B parameters gave them a ‘training cost of $150K on 1T tokens under Google Cloud pricing for TPU-v4.’
Personal Notes
Link
Review
Return to
main screen