Learning Notebook - David Rostcheck

learning_event details

Learning Event ID
Subject
Topic
Program
Length
Institution
Presenter
Format
Recorded Date
Completed Date
Notes	First models for 2024, MosaicML scaling laws, Kepler K1, and much more! Exclusive: First models for 2024 (Jan/2024) The first large language models for 2024 are JPMorgan DocLLM (7B, paper) an LLM focused on the the spatial layout structure of documents. SUTD TinyLlama (1.1B, paper), out of Singapore, finally finished training from its Sep/2023 start. This model was deliberately overtrained using 2,727 tokens per parameter (see my explanation of Chinchilla data-optimal scaling, and Mosaic scaling later in this edition). The dataset was 1T tokens, and ran for 3 epochs to 3T total tokens seen. Tencent LLaMA Pro (8.3B, paper) presented expanded blocks, with fine-tuning (actually ‘a new post-pretraining method’) on 80B tokens using code and math data. Exclusive: Counting down to the release of GPT-4.5 (11/Jan/2024) We’re counting down to the release of OpenAI’s GPT-4.5 model release. Will it be some in the second half of January 2024? Rumors are scarce, though I’m hoping to see another increase in intelligence, as measured by MMLU score. The time between each model release can be months or years, but each successive model has boasted a significant performance increase across this wide benchmark.
Personal Notes
Link
Review