Updated on research llm and llama2

 So here are some more keywords and updates to learning llm and the like.


So I running the code from here.

https://github.com/karpathy/llama2.c


Mainly the training piece.

So these ran fine:

python tinystories.py download

python tinystories.py pretokenize

But had issue with the time to run this on a mac machine without cuda installed.

This took several days to run:

python3 train.py --compile=False --vocab_source=custom --vocab_size=4096

Overriding: compile = False

Overriding: vocab_source = custom

Overriding: vocab_size = 4096

tokens per iteration will be: 131,072

breaks down as: 4 grad accum steps * 1 processes * 128 batch size * 256 max seq len

Initializing a new model from scratch

num decayed parameter tensors: 43, with 7,151,616 parameters

num non-decayed parameter tensors: 13, with 3,744 parameters

using fused AdamW: False

Created a PretokDataset with rng seed 42

Created a PretokDataset with rng seed 42


97 | loss 6.4552 | lr 4.850000e-05 | 1962367.71ms | mfu 0.01%

98 | loss 6.4167 | lr 4.900000e-05 | 109798.40ms | mfu 0.01%

99 | loss 6.3819 | lr 4.950000e-05 | 105129.16ms | mfu 0.01%

100 | loss 6.3815 | lr 5.000000e-05 | 129126.15ms | mfu 0.01%

101 | loss 6.3463 | lr 5.050000e-05 | 142152.52ms | mfu 0.01%

102 | loss 6.3171 | lr 5.100000e-05 | 123678.04ms | mfu 0.01%

..


So still running...

And here are more random links.


Don't forget discord local llama and reddit:

https://www.reddit.com/r/LocalLLaMA/


















Comments

Popular posts from this blog

On Unit Testing, Java TDD for developers to write

Is Java the new COBOL? Yes. What does that mean, exactly? (Part 1)

JVM Notebook: Basic Clojure, Java and JVM Language performance