Learning Language Model from Scratch Code and Parameters
So I took Sebastian's large language model code as is. Note, copyright to him. But I was curious what are parameters? What are parameters? See my previous post but...
# This is a large model, reducing
#LLAMA2_CONFIG_7B = {
# "vocab_size": 32000, # Vocabulary size
# "context_length": 4096, # Context length
# "emb_dim": 4096, # Embedding dimension
# "n_heads": 32, # Number of attention heads
# "n_layers": 32, # Number of layers
# "hidden_dim": 11008, # NEW: Size of the intermediate dimension in FeedForward
# "dtype": torch.bfloat16 # NEW: Lower-precision dtype to save memory
#}
# Previous run
# huggingface_hub version: 0.26.1
# sentencepiece version: 0.2.0
# torch version: 2.4.1
# Total number of parameters: 6,738,415,616
# float32 (PyTorch default): 52.33 GB
# bfloat16: 26.17 GB
LLAMA2_CONFIG_7B = {
"vocab_size": 32000, # Keeping the same vocabulary size
"context_length": 4096, # Keeping the same context length
"emb_dim": 1280, # Reduced embedding dimension (approximately 10% of original 4096)
"n_heads": 4, # Reduced number of attention heads (10% of original 32)
"n_layers": 4, # Reduced number of layers (10% of original 32)
"hidden_dim": 1100, # Reduced intermediate dimension in FeedForward (10% of original 11008)
"dtype": torch.bfloat16 # Keeping lower-precision dtype to save memory
}
And then if you run:
See code here:
https://github.com/berlinbrown/berlin-learn-ml-dl-capstone-projects/blob/main/basic-exercises/basic-llm/llama2.py
# Previous run
# huggingface_hub version: 0.26.1
# sentencepiece version: 0.2.0
# torch version: 2.4.1
# Total number of parameters: 6,738,415,616
# float32 (PyTorch default): 52.33 GB
# bfloat16: 26.17 GB
# huggingface_hub version: 0.26.1
#sentencepiece version: 0.2.0
#torch version: 2.4.1
#Total number of parameters: 125,041,920
#float32 (PyTorch default): 1.22 GB
#bfloat16: 0.61 GB
...
Comments