Posts

Showing posts from 2024

Learning Language Model from Scratch Code and Parameters

 So I took Sebastian's large language model code as is.  Note, copyright to him.  But I was curious what are parameters? What are parameters?  See my previous post but... # This is a large model, reducing #LLAMA2_CONFIG_7B = { # "vocab_size": 32000, # Vocabulary size # "context_length": 4096, # Context length # "emb_dim": 4096, # Embedding dimension # "n_heads": 32, # Number of attention heads # "n_layers": 32, # Number of layers # "hidden_dim": 11008, # NEW: Size of the intermediate dimension in FeedForward # "dtype": torch.bfloat16 # NEW: Lower-precision dtype to save memory #} # Previous run # huggingface_hub version: 0.26.1 # sentencepiece version: 0.2.0 # torch version: 2.4.1 # Total number of parameters: 6,738,415,616 # float32 (PyTorch default): 52.33 GB # bfloat16: 26.17 GB LLAMA2_CONFIG_7B = { "vocab_size" : 32000 , # Keeping

Calculating Parameters with LLM

 And this data from Sebastian's book on large language models 124 Million parameters GPT_CONFIG_124M = {     "vocab_size": 50257,     # Vocabulary size     "context_length": 1024,  # Context length     "emb_dim": 768,          # Embedding dimension     "n_heads": 12,           # Number of attention heads     "n_layers": 12,          # Number of layers     "drop_rate": 0.1,        # Dropout rate     "qkv_bias": False        # Query-Key-Value bias } The 1.5 billion parameter GPT model config  GPT_CONFIG_1558M = {     "vocab_size": 50257,     # Vocabulary size     "context_length": 1024,  # Context length     "emb_dim": 1600,         # Embedding dimension  (Change here)     "n_heads": 25,           # Number of attention heads     "n_layers": 48,          # Number of layers     "drop_rate": 0.1,        # Dropout rate     "qkv_bias": False        #

Review of JVM Notebook - JVM Languages

 It has been a while but there are still relevant JVM projects here. See the JVM Notebook.. https://github.com/berlinbrown/jvmnotebook/tree/master/jvmnotebook The Java Virtual Machine ( Sun's JVM is called HotSpot ) is a java bytecode interpreter which is fast, portable and secure. Jython, JRuby, Scala, ABCL (Common Lisp) are popular language implementations that run on the JVM that allow for the jvmcookbook sugar of their particular languages. Review of languages * http://jruby.codehaus.org/ - JRuby Home * http://groovy.codehaus.org/ - Groovy Home * http://clojure.org/ - Clojure * http://www.scala-lang.org/ - Scala is a general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way. * http://www.jython.org - Jython * http://common-lisp.net/project/armedbear/ - Armed Bear Common Lisp (ABCL) is an implementation of ANSI Common Lisp that runs in a Java virtual machine. * http://sis

Web Security is Important - Basic SQL Injection Project

 Here is a basic SQL Project Test https://github.com/berlinbrown/zri-banking-forum-injection A SQL injection attack consists of insertion or “injection” of a SQL query via the input data from the client to the application. A successful SQL injection exploit can read sensitive data from the database, modify database data (Insert/Update/Delete), execute administration operations on the database (such as shutdown the DBMS), recover the content of a given file present on the DBMS file system and in some cases issue commands to the operating system.  SQL is a standard language for accessing and manipulating databases. SQL lets you access and manipulate databases.  A database is an organized collection of structured information, or data, typically stored electronically in a computer system.  Databases are used in all types of modern applications including for banking, shopping and more.

Some youtube - vids Atlanta and ML

  https://www.youtube.com/watch?v=bIogyR3aPjs https://www.youtube.com/watch?v=Dg6LMAUBbZY

More Bitcoin related linked

 Bitcoin   https://github.com/bitcoinbook/bitcoinbook https://github.com/dvf/blockchain https://github.com/protofire/blockchain-learning-path https://github.com/ndrwnaguib/napster-filesharing-system/blob/master/peer/downloads/downloaded_1.txt https://github.com/ndrwnaguib/napster-filesharing-system https://github.com/yjjnls/awesome-blockchain https://github.com/openblockchains/awesome-blockchains

My post on raygun

 #Raygun thank you for failing.  It is OK to fail.  Fail, move on.  Our own President Biden failed recently.  Shane Gillis failed on SNL.  George Santos failed.  Ingrid Andress on national anthem.  Doing good is better, but OK to fail.  Who else?

Working autobiography personal story book

 My Life on the Computer Similar to mom's book Here is basic outline 0. TRS 85 1. Austin 2. Dallas 3. Atlanta 4. City of Atlanta Engineering 5. Enterprise Engineer 6. More thoughts on future

Random thoughts and media

Image
 

Basic Math and Linear Algebra ML

 Vectors, Matrices, Tensors: Scalar, Vector, Matrix Scalar 0,0  x = 1.5 Vector, 0 - N x = x1, x2, x3, etc Matrix X = X x Y

What we learned machine learning and deep learning

 So we have learned a lot from the machine learning and deep learning from Sebastian Raschka. See his blog https://magazine.sebastianraschka.com/ And the following machine learning and deep learning courses. https://sebastianraschka.com/blog/2021/ml-course.html https://sebastianraschka.com/blog/2021/dl-course.html The first set of courses cover: kNN - k nearest neighbor decision trees and popular ML python libraries.

Start of Photography

Image
 With the bots taking over.  I thought it was time to get creative. I am thinking photography and playing the trumpet.  The Trumpet I have done before. Here are my first posts.  Enjoy. #photography #atlanta #atl

Running the llama2.c training end to end with your custom local raw vocab data - Part 1

Image
 This is first part in a small series.  So you have seen recent reports on LLM, Llama2, chatbots, AI and chat gpt.  Well, how does that work?  How do you build a chatbot?  An inference engine.  There has been a lot of discussion on building this machine learning based models in the past two years from 2022 to 2024 but one of the main issues to understanding and building the models comes from the cost tied training and running the models.  In my example, with a small subset of data, it took a day or more to actually training the smallest dataset. Here are some step by step approaches for building the model. The best FULL example from training the model to the chatbot comes from this project: He really has all the components here.  Why not use that?  You could use this project but I want add more ELI5 basic steps here. Also, he has some optimizations, I want to remove all of that.  I want to focus on running the system. https://github.com/karpathy/llama2.c I am going to continue in the n

AI LLM Fact Checking the June 2024 Debate

 I am fact checking the debate. https://x.com/BerlinBrownMech ,,,

Random act of povray image

Image
 Just random image, enjoy. Image of a sphere

So my LLM LLama2 training did work

 So it took about 4 hours on my mac without GPU improvements but this command ran for about 4 hours. python3 train.py --compile=False --vocab_source=custom --vocab_size=4096 And then I could run against the llama2.c code.  ./run ./out/model.bin I got this gibberish: "pres es eluted эargared copy É Int beforepperunch          KarologfromwayClassistoryork ochidentAr}^ Allet Com easgoogleiden targetegaoman essпиgesscript non behS commandasesasesба amb before pervecnode agcolorkoeln conf Ma Setrat Textema governiowhere ##скогоchange.) respectankön knowPar namesiones неander für enrid muool medcia depBalityви rangehelова () del options ### voando](arget Thereise und descri L`,incless++readble oldredULLockabelutesphaires says буClientIC});viroo test only ser"

Updated on research llm and llama2

 So here are some more keywords and updates to learning llm and the like. So I running the code from here. https://github.com/karpathy/llama2.c Mainly the training piece. So these ran fine: python tinystories.py download python tinystories.py pretokenize But had issue with the time to run this on a mac machine without cuda installed. This took several days to run: python3 train.py --compile=False --vocab_source=custom --vocab_size=4096 Overriding: compile = False Overriding: vocab_source = custom Overriding: vocab_size = 4096 tokens per iteration will be: 131,072 breaks down as: 4 grad accum steps * 1 processes * 128 batch size * 256 max seq len Initializing a new model from scratch num decayed parameter tensors: 43, with 7,151,616 parameters num non-decayed parameter tensors: 13, with 3,744 parameters using fused AdamW: False Created a PretokDataset with rng seed 42 Created a PretokDataset with rng seed 42 97 | loss 6.4552 | lr 4.850000e-05 | 1962367.71ms | mfu 0.01% 98 | loss 6.4167

Ode to Lisp - Blog Entry

 I have never had much affinity for writing in natural language. I grew up enjoying programming, and that passion has continued for over 20 years. However, I felt compelled to write this blog entry because it helped me rethink and reframe my perspective on a particular piece of code. I'll be exploring various concepts, starting with one of the most fundamental data structures in computer science: the linked list. Instead of considering Lisp code as a mere "listing" of elements, try to envision it as a linked list of elements. https://berlinbrowndev.blogspot.com/2008/07/simple-lisp-implementation-in-java-ode.html

On Wolfram Alpha and Cellular Automata

 When most computer users upload a profile image from their desktop to Facebook, they rarely consider the fundamental binary math rules underpinning digital devices. We know that 4 gigabytes of RAM is more memory than 512 megabytes, but we don't visualize the logic chips involved in an XOR $0x100, EAX operation for a 32-bit CISC processor. Software developers must consider memory management and how a computer's operating system loads their programs into memory. However, they typically don't think about VHDL logic circuit designs, data paths, arithmetic logic units, or the millions of transistors comprising a modern CPU. These low-level details are intentionally abstracted away from the user application developer. While modern CPUs have evolved dramatically over the past decade, early digital computing relied on simple Boolean operations. These fundamental rules were combined and replicated to load programs into memory and execute them. The principles controlling most digita

On Unit Testing Updates

 I have been reading about four or five posts a day on unit testing, an obsession that has persisted for a long time. I've moved beyond the technical and practical considerations of unit testing frameworks and have finished debating whether to use JUnit, Mockito, or Karma. Now, I am more intrigued by the psychology of unit testing—who engages in it, who enjoys it, and who dislikes it? Unit testing is one of those concepts that are easy to learn but hard to master. For example, many people play chess when they are young but remain poor players throughout their lives. I am part of that majority. I have never dedicated hours to playing chess or attempting to master it. I don't recognize common patterns or have a well-developed endgame. I simply play with a basic understanding of the rules. Following good unit testing practices in your software development team is a lot like playing chess: easy to learn but difficult to master. However, there are significant differences—chess is a

Donald Trump Guilty All Counts

 https://www.cnn.com/politics/live-news/trump-hush-money-trial-05-30-24/index.html

Me working

Image
 I am using social media, me working.

Llama2 analysis with C and Java

 Llama2 is a advanced language model designed to demonstrate how large language models can be implemented in different programming languages. In this post, we will delve into the implementation of Llama2 in both C and Java, two popular programming languages with distinct characteristics. We will explore the intricacies of coding in these languages, analyze performance differences, and discuss the implications of these differences in real-world applications. Background Before we dive into the implementations, and briefly touch upon what makes C and Java suitable for different types of projects: C: Known for its performance and close-to-hardware capabilities, C is widely used in system programming, embedded systems, and situations where performance is critical. Java: With its object-oriented nature and platform independence due to the Java Virtual Machine (JVM), Java is preferred for enterprise-level applications, Android app development, and large-scale systems. Implementation Details. 

Exploring My GitHub Projects: Java, Scala, and Beyond

 As a developer based in the Atlanta area, I have had the pleasure of working on a variety of interesting projects, primarily focusing on Java programming and unit testing. Today, I want to share some of my favorite projects from my GitHub repository and discuss the technologies and methodologies I employed. Java Projects Simplest HTTP Server One of my notable projects is a simple HTTP web server written in Java. This server demonstrates basic networking and multithreading concepts, using ServerSocket and Socket classes to handle client requests. This project is a great example of how to set up a lightweight server for educational purposes or simple applications. You can check out the project here . Double Buffering Example This project showcases a simple Java 2D graphics application that implements double buffering to reduce flickering. It’s a foundational concept for game development and interactive graphics. The project illustrates the use of JPanel and Graphics classes to manag

Mastering Unit Testing in Java: Insights from an Atlanta-Based Developer

  Introduction Welcome to my blog! I'm Berlin Brown, a seasoned software engineer from Atlanta with over 15 years of experience. Here, I share my journey and insights on Java programming, unit testing, and more. Explore my GitHub for more code samples and projects. Key Topics Introduction to Unit Testing in Java Importance of unit testing Key frameworks: JUnit and TestNG Best Practices for Unit Testing Writing meaningful test cases Mocking dependencies with Mockito Ensuring code coverage Common Pitfalls and How to Avoid Them Avoiding brittle tests Managing test data Refactoring tests Case Studies and Examples Example projects from my GitHub Practical applications and tutorials