Commit Graph

12 Commits

Author SHA1 Message Date
508d34daf2 GGUF support in the quantized model. (#559)
* GGUF support in the quantized model.

* Get the GGUF support to work on llama.
2023-08-23 09:20:57 +01:00
f9ecc84477 GQA support in the quantized model. (#555)
* GQA support in the quantized model.

* Fix the reshaping.

* Fix the main llama model.

* Infer the proper gqa from the model kind.
2023-08-22 19:41:10 +01:00
44420d8ae1 Add some llama-v2 variants. (#545) 2023-08-22 08:35:15 +01:00
4300864ce9 Add some optional repeat penalty. (#535) 2023-08-21 09:59:13 +01:00
a1812f934f Add a yolo-v3 example. (#528)
* Add a couple functions required for yolo.

* Add the yolo-v3 example.

* Add minimum and maximum.

* Use the newly introduced maximum.

* Cuda support for min/max + add some testing.

* Allow for more tests to work with accelerate.

* Fix a typo.
2023-08-20 18:19:37 +01:00
d73ca3d28e Line up the llama.cpp implementation with the candle one. (#518)
* Separate the prompt stats from the post-prompt ones in the quantized example.

* Slightly nicer output printing.

* Line up with the llama.cpp implementation.
2023-08-19 20:12:07 +01:00
c78ce76501 Add a simple Module trait and implement it for the various nn layers (#500)
* Start adding the module trait.

* Use the module trait.

* Implement module for qmatmul.
2023-08-18 09:38:22 +01:00
557b2c28dd Q6K quantization (#495)
* Print the detected arch options.

* Add the q6k quantization.

* Add a currently broken test.

* Bugfix.

* Bugfix.

* Another bugfix.

* Another bugfix + get the test to work.
2023-08-17 22:22:57 +01:00
5f30c1e1e0 Add the whisper small model. (#490) 2023-08-17 15:48:34 +01:00
ad7c53953b Add a verbose-prompt mode, similar to llama.cpp. (#489) 2023-08-17 15:26:44 +01:00
d32e8199cd Layer norm tweaks (#482)
* Add some options to make layer-norm more configurable.

* Add the rms-norm variant.

* Replace the RmsNorm with the shared bits.
2023-08-17 10:07:13 +01:00
d99cac3ec3 Move the avx specific bits to a separate file. (#481) 2023-08-17 09:01:06 +01:00