Updated on research llm and llama2
So here are some more keywords and updates to learning llm and the like.
So I running the code from here.
https://github.com/karpathy/llama2.c
Mainly the training piece.
So these ran fine:
python tinystories.py download
python tinystories.py pretokenize
But had issue with the time to run this on a mac machine without cuda installed.
This took several days to run:
python3 train.py --compile=False --vocab_source=custom --vocab_size=4096
Overriding: compile = False
Overriding: vocab_source = custom
Overriding: vocab_size = 4096
tokens per iteration will be: 131,072
breaks down as: 4 grad accum steps * 1 processes * 128 batch size * 256 max seq len
Initializing a new model from scratch
num decayed parameter tensors: 43, with 7,151,616 parameters
num non-decayed parameter tensors: 13, with 3,744 parameters
using fused AdamW: False
Created a PretokDataset with rng seed 42
Created a PretokDataset with rng seed 42
97 | loss 6.4552 | lr 4.850000e-05 | 1962367.71ms | mfu 0.01%
98 | loss 6.4167 | lr 4.900000e-05 | 109798.40ms | mfu 0.01%
99 | loss 6.3819 | lr 4.950000e-05 | 105129.16ms | mfu 0.01%
100 | loss 6.3815 | lr 5.000000e-05 | 129126.15ms | mfu 0.01%
101 | loss 6.3463 | lr 5.050000e-05 | 142152.52ms | mfu 0.01%
102 | loss 6.3171 | lr 5.100000e-05 | 123678.04ms | mfu 0.01%
..
So still running...
And here are more random links.
Don't forget discord local llama and reddit:
https://www.reddit.com/r/LocalLLaMA/
Comments