Berlin Brown and Software Development

Posts

Showing posts from October, 2024

Learning Language Model from Scratch Code and Parameters

October 23, 2024

So I took Sebastian's large language model code as is. Note, copyright to him. But I was curious what are parameters? What are parameters? See my previous post but... # This is a large model, reducing #LLAMA2_CONFIG_7B = { # "vocab_size": 32000, # Vocabulary size # "context_length": 4096, # Context length # "emb_dim": 4096, # Embedding dimension # "n_heads": 32, # Number of attention heads # "n_layers": 32, # Number of layers # "hidden_dim": 11008, # NEW: Size of the intermediate dimension in FeedForward # "dtype": torch.bfloat16 # NEW: Lower-precision dtype to save memory #} # Previous run # huggingface_hub version: 0.26.1 # sentencepiece version: 0.2.0 # torch version: 2.4.1 # Total number of parameters: 6,738,415,616 # float32 (PyTorch default): 52.33 GB # bfloat16: 26.17 GB LLAMA2_CONFIG_7B = { "vocab_size" : 32000 , # Keeping

Calculating Parameters with LLM

October 22, 2024

And this data from Sebastian's book on large language models 124 Million parameters GPT_CONFIG_124M = { "vocab_size": 50257, # Vocabulary size "context_length": 1024, # Context length "emb_dim": 768, # Embedding dimension "n_heads": 12, # Number of attention heads "n_layers": 12, # Number of layers "drop_rate": 0.1, # Dropout rate "qkv_bias": False # Query-Key-Value bias } The 1.5 billion parameter GPT model config GPT_CONFIG_1558M = { "vocab_size": 50257, # Vocabulary size "context_length": 1024, # Context length "emb_dim": 1600, # Embedding dimension (Change here) "n_heads": 25, # Number of attention heads "n_layers": 48, # Number of layers "drop_rate": 0.1, # Dropout rate "qkv_bias": False #