Learning Language Model from Scratch Code and Parameters

October 23, 2024

So I took Sebastian's large language model code as is. Note, copyright to him. But I was curious what are parameters? What are parameters? See my previous post but...

# This is a large model, reducing
#LLAMA2_CONFIG_7B = {
#    "vocab_size": 32000,     # Vocabulary size
#    "context_length": 4096,  # Context length
#    "emb_dim": 4096,         # Embedding dimension
#    "n_heads": 32,           # Number of attention heads
#    "n_layers": 32,          # Number of layers
#    "hidden_dim": 11008,     # NEW: Size of the intermediate dimension in FeedForward
#    "dtype": torch.bfloat16  # NEW: Lower-precision dtype to save memory
#}

# Previous run
# huggingface_hub version: 0.26.1
# sentencepiece version: 0.2.0
# torch version: 2.4.1
# Total number of parameters: 6,738,415,616
# float32 (PyTorch default): 52.33 GB
# bfloat16: 26.17 GB

LLAMA2_CONFIG_7B = {
    "vocab_size": 32000,      # Keeping the same vocabulary size
    "context_length": 4096,   # Keeping the same context length
    "emb_dim": 1280,          # Reduced embedding dimension (approximately 10% of original 4096)
    "n_heads": 4,             # Reduced number of attention heads (10% of original 32)
    "n_layers": 4,            # Reduced number of layers (10% of original 32)
    "hidden_dim": 1100,       # Reduced intermediate dimension in FeedForward (10% of original 11008)
    "dtype": torch.bfloat16   # Keeping lower-precision dtype to save memory
}

And then if you run:

# Previous run

# huggingface_hub version: 0.26.1

# sentencepiece version: 0.2.0

# torch version: 2.4.1

# Total number of parameters: 6,738,415,616

# float32 (PyTorch default): 52.33 GB

# bfloat16: 26.17 GB

# huggingface_hub version: 0.26.1
#sentencepiece version: 0.2.0
#torch version: 2.4.1
#Total number of parameters: 125,041,920
#float32 (PyTorch default): 1.22 GB
#bfloat16: 0.61 GB

See code here: https://github.com/berlinbrown/berlin-learn-ml-dl-capstone-projects/blob/main/basic-exercises/basic-llm/llama2.py

...

https://github.com/berlinbrown/berlin-learn-ml-dl-capstone-projects/blob/main/basic-exercises/basic-llm/llama2.py

Search This Blog

Berlin Brown and Software Development

Learning Language Model from Scratch Code and Parameters

Comments

Popular posts from this blog

JVM Notebook: Basic Clojure, Java and JVM Language performance

On Unit Testing, Java TDD for developers to write

Is Java the new COBOL? Yes. What does that mean, exactly? (Part 1)