Learning Language Model from Scratch Code and Parameters

 So I took Sebastian's large language model code as is.  Note, copyright to him.  But I was curious what are parameters? What are parameters?  See my previous post but...

# This is a large model, reducing
#LLAMA2_CONFIG_7B = {
# "vocab_size": 32000, # Vocabulary size
# "context_length": 4096, # Context length
# "emb_dim": 4096, # Embedding dimension
# "n_heads": 32, # Number of attention heads
# "n_layers": 32, # Number of layers
# "hidden_dim": 11008, # NEW: Size of the intermediate dimension in FeedForward
# "dtype": torch.bfloat16 # NEW: Lower-precision dtype to save memory
#}

# Previous run
# huggingface_hub version: 0.26.1
# sentencepiece version: 0.2.0
# torch version: 2.4.1
# Total number of parameters: 6,738,415,616
# float32 (PyTorch default): 52.33 GB
# bfloat16: 26.17 GB

LLAMA2_CONFIG_7B = {
"vocab_size": 32000, # Keeping the same vocabulary size
"context_length": 4096, # Keeping the same context length
"emb_dim": 1280, # Reduced embedding dimension (approximately 10% of original 4096)
"n_heads": 4, # Reduced number of attention heads (10% of original 32)
"n_layers": 4, # Reduced number of layers (10% of original 32)
"hidden_dim": 1100, # Reduced intermediate dimension in FeedForward (10% of original 11008)
"dtype": torch.bfloat16 # Keeping lower-precision dtype to save memory
}


And then if you run:
# Previous run
# huggingface_hub version: 0.26.1
# sentencepiece version: 0.2.0
# torch version: 2.4.1
# Total number of parameters: 6,738,415,616
# float32 (PyTorch default): 52.33 GB
# bfloat16: 26.17 GB

# huggingface_hub version: 0.26.1
#sentencepiece version: 0.2.0
#torch version: 2.4.1
#Total number of parameters: 125,041,920
#float32 (PyTorch default): 1.22 GB
#bfloat16: 0.61 GB

See code here: https://github.com/berlinbrown/berlin-learn-ml-dl-capstone-projects/blob/main/basic-exercises/basic-llm/llama2.py


...

Comments

Popular posts from this blog

On Unit Testing, Java TDD for developers to write

Is Java the new COBOL? Yes. What does that mean, exactly? (Part 1)

JVM Notebook: Basic Clojure, Java and JVM Language performance