Posts

Showing posts from October, 2024

Learning Language Model from Scratch Code and Parameters

 So I took Sebastian's large language model code as is.  Note, copyright to him.  But I was curious what are parameters? What are parameters?  See my previous post but... # This is a large model, reducing #LLAMA2_CONFIG_7B = { # "vocab_size": 32000, # Vocabulary size # "context_length": 4096, # Context length # "emb_dim": 4096, # Embedding dimension # "n_heads": 32, # Number of attention heads # "n_layers": 32, # Number of layers # "hidden_dim": 11008, # NEW: Size of the intermediate dimension in FeedForward # "dtype": torch.bfloat16 # NEW: Lower-precision dtype to save memory #} # Previous run # huggingface_hub version: 0.26.1 # sentencepiece version: 0.2.0 # torch version: 2.4.1 # Total number of parameters: 6,738,415,616 # float32 (PyTorch default): 52.33 GB # bfloat16: 26.17 GB LLAMA2_CONFIG_7B = { "vocab_size" : 32000 , # Keeping

Calculating Parameters with LLM

 And this data from Sebastian's book on large language models 124 Million parameters GPT_CONFIG_124M = {     "vocab_size": 50257,     # Vocabulary size     "context_length": 1024,  # Context length     "emb_dim": 768,          # Embedding dimension     "n_heads": 12,           # Number of attention heads     "n_layers": 12,          # Number of layers     "drop_rate": 0.1,        # Dropout rate     "qkv_bias": False        # Query-Key-Value bias } The 1.5 billion parameter GPT model config  GPT_CONFIG_1558M = {     "vocab_size": 50257,     # Vocabulary size     "context_length": 1024,  # Context length     "emb_dim": 1600,         # Embedding dimension  (Change here)     "n_heads": 25,           # Number of attention heads     "n_layers": 48,          # Number of layers     "drop_rate": 0.1,        # Dropout rate     "qkv_bias": False        #