Files
Laurent Mazare 7ebc3548e1 Use flash-attn in gemma. (#2195)
* Use flash-attn in gemma.

* Fix flash-attn for head dim 256.
2024-05-18 19:18:59 +02:00
..
2024-05-18 19:18:59 +02:00
2024-03-13 21:41:36 +01:00

candle-gemma: 2b and 7b LLMs from Google DeepMind

Gemma is a collection of lightweight open models published by Google Deepmind with a 2b and a 7b variant.

In order to use the example below, you have to accept the license on the HuggingFace Hub Gemma repo and set up your access token via the HuggingFace cli login command.

Running the example

$ cargo run --example gemma --release -- --prompt "fn count_primes(max_n: usize)"
fn count_primes(max_n: usize) -> usize {
    let mut primes = vec![true; max_n];
    for i in 2..=max_n {
        if primes[i] {
            for j in i * i..max_n {
                primes[j] = false;
             }
         }
    }
    primes.len()
}