ON DeepSeek

"They use a Mixture-of-Experts (MoE) architecture, where only 37B parameters are activated for each token out of the total 671B. This sparse"

https://composio.dev/blog/notes-on-new-deepseek-v3/ 

Comments

Popular posts from this blog

JVM Notebook: Basic Clojure, Java and JVM Language performance

On Unit Testing, Java TDD for developers to write

Is Java the new COBOL? Yes. What does that mean, exactly? (Part 1)