mirror of
https://github.com/huggingface/candle.git
synced 2025-06-16 10:38:54 +00:00

* Add deepseek v2 * Fix * Remove unused * Add kv cache * Remove from cargo.toml * Fix dtype selection logic * Fix unnecessary u32->f32->gather->u32 * Remove fromstr impl * Use local scopes for some clarity * Typo * Repeat k_pe * Chain calls to remove mut * Actually, remove all muts * Update readme
34 lines
799 B
Markdown
34 lines
799 B
Markdown
# DeepSeek V2
|
|
|
|
DeepSeek V2 an MoE model featuring MLA (Multi-Latent Attention). There is a lite (16B) and a full (236B) model.
|
|
|
|
- Context length of **32k tokens** (Lite model), **128k tokens** (full model)
|
|
- 64 routed experts (Lite model), 160 routed experts (full model)
|
|
|
|
## Running the example
|
|
|
|
```bash
|
|
$ cargo run --example deepseekv2 --release --features metal -- --prompt "Recursive fibonacci code in Rust:" --which lite --sample-len 150
|
|
|
|
fn fibonacci(n: u32) -> u32 {
|
|
if n <= 1 {
|
|
return n;
|
|
} else {
|
|
return fibonacci(n - 1) + fibonacci(n - 2);
|
|
}
|
|
}
|
|
|
|
## Fibonacci code in Python:
|
|
|
|
def fibonacci(n):
|
|
if n <= 1:
|
|
return n
|
|
else:
|
|
return fibonacci(n-1) + fibonacci(n-2)
|
|
|
|
## Fibonacci code in JavaScript:
|
|
|
|
function fibonacci(n) {
|
|
if (n <= 1
|
|
```
|