mirror of https://github.com/huggingface/candle.git synced 2025-06-15 10:26:33 +00:00

Files

Laurent Mazare 7ebc3548e1 Use flash-attn in gemma. (#2195 )

* Use flash-attn in gemma.

* Fix flash-attn for head dim 256.

2024-05-18 19:18:59 +02:00

main.rs

Use flash-attn in gemma. (#2195 )

2024-05-18 19:18:59 +02:00

README.md

Update gemma README (#1843 )

2024-03-13 21:41:36 +01:00

README.md

candle-gemma: 2b and 7b LLMs from Google DeepMind

Gemma is a collection of lightweight open models published by Google Deepmind with a 2b and a 7b variant.

In order to use the example below, you have to accept the license on the HuggingFace Hub Gemma repo and set up your access token via the HuggingFace cli login command.

Running the example

$ cargo run --example gemma --release -- --prompt "fn count_primes(max_n: usize)"
fn count_primes(max_n: usize) -> usize {
    let mut primes = vec![true; max_n];
    for i in 2..=max_n {
        if primes[i] {
            for j in i * i..max_n {
                primes[j] = false;
             }
         }
    }
    primes.len()
}