Commit Graph

30 Commits

Author SHA1 Message Date
5a363dbc26 Adds check for 7b-zephyr and uses correct template (#1283)
* Adds check for 7b-zephyr and uses correct template

* Handle zephyr as mistral.

* Disable the protoc bits of the CI.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2023-11-06 21:05:39 +01:00
620c94d12e Add support for Zephyr-7b in the quantized model. (#1124) 2023-10-18 17:31:26 +01:00
f6054e9d60 Fix the prompt for mistral when using instruct/interactive mode. (#1013) 2023-10-01 06:44:30 +01:00
328167ec04 Integrate TheBloke quantized mistral weights. (#1012) 2023-09-30 22:39:42 +01:00
cbd36157ac Add a gif to the quantized readme. (#833)
* Add a gif to the quantized readme.

* gif update.
2023-09-13 08:43:52 +01:00
e82fcf1c59 Add more example readmes. (#828)
* Add more readmes.

* Add a readme for dinov2.

* Add some skeleton files for a couple more examples.

* More whisper details.
2023-09-12 17:21:24 +01:00
805bf9ffa7 Implement top_p / nucleus sampling (#819)
* Implement top_p / nucleus sampling

* Update changelog

* rustfmt

* Add tests

* Fix clippy warning

* Fix another clippy error
2023-09-12 18:10:16 +02:00
bb23b90b1d Add a small readme for the quantized example. (#823) 2023-09-12 10:17:31 +01:00
35f72514f5 Move more models to candle-transformers (#796)
* Move dinov2.

* Move efficientnet.

* Move the quantized llama model.

* Move segment-anything.
2023-09-10 10:20:18 +01:00
7cef35c84d Tweak some quantized args (#692)
* Print the args + change the default temp/repeat penalty.

* Minor formatting tweak.
2023-08-31 17:25:21 +01:00
7509c98970 Interactive mode for the quantized model. (#690) 2023-08-31 10:52:42 +01:00
a1a5ab8b0a Neon optimized vecdot (#666)
* Q5k vecdot.

* Add the q3k vecdot.

* Q2k vecdot.

* Move the quantized model to its own file.
2023-08-29 22:28:46 +01:00
72ebb12bca Remove some dead-code annotations. (#629)
* Remove some dead-code annotations.

* More dead code removal.

* One more.

* CI fix.
2023-08-27 18:52:33 +01:00
6e485f2deb Add some optional repeat penalty. (#623)
* Add some optional repeat penalty.

* Add the missing files.
2023-08-27 10:48:45 +01:00
c093b03d51 Generic implementation of vecdot for q80. (#596)
* Generic implementation of vecdot for q80.

* Add support for code-llama 7b.

* Support more code-llama.
2023-08-25 09:04:05 +01:00
4ee1cf038a Get the rms epsilon from GGUF. (#565) 2023-08-23 11:40:20 +01:00
0f4ff8a739 Fix the quantized example. (#564) 2023-08-23 11:09:55 +01:00
89a00b56cc add chat models in quantized example (#551)
* add chat models in quantized example

* cargo fmt
2023-08-23 11:05:33 +01:00
508d34daf2 GGUF support in the quantized model. (#559)
* GGUF support in the quantized model.

* Get the GGUF support to work on llama.
2023-08-23 09:20:57 +01:00
f9ecc84477 GQA support in the quantized model. (#555)
* GQA support in the quantized model.

* Fix the reshaping.

* Fix the main llama model.

* Infer the proper gqa from the model kind.
2023-08-22 19:41:10 +01:00
44420d8ae1 Add some llama-v2 variants. (#545) 2023-08-22 08:35:15 +01:00
4300864ce9 Add some optional repeat penalty. (#535) 2023-08-21 09:59:13 +01:00
a1812f934f Add a yolo-v3 example. (#528)
* Add a couple functions required for yolo.

* Add the yolo-v3 example.

* Add minimum and maximum.

* Use the newly introduced maximum.

* Cuda support for min/max + add some testing.

* Allow for more tests to work with accelerate.

* Fix a typo.
2023-08-20 18:19:37 +01:00
d73ca3d28e Line up the llama.cpp implementation with the candle one. (#518)
* Separate the prompt stats from the post-prompt ones in the quantized example.

* Slightly nicer output printing.

* Line up with the llama.cpp implementation.
2023-08-19 20:12:07 +01:00
c78ce76501 Add a simple Module trait and implement it for the various nn layers (#500)
* Start adding the module trait.

* Use the module trait.

* Implement module for qmatmul.
2023-08-18 09:38:22 +01:00
557b2c28dd Q6K quantization (#495)
* Print the detected arch options.

* Add the q6k quantization.

* Add a currently broken test.

* Bugfix.

* Bugfix.

* Another bugfix.

* Another bugfix + get the test to work.
2023-08-17 22:22:57 +01:00
5f30c1e1e0 Add the whisper small model. (#490) 2023-08-17 15:48:34 +01:00
ad7c53953b Add a verbose-prompt mode, similar to llama.cpp. (#489) 2023-08-17 15:26:44 +01:00
d32e8199cd Layer norm tweaks (#482)
* Add some options to make layer-norm more configurable.

* Add the rms-norm variant.

* Replace the RmsNorm with the shared bits.
2023-08-17 10:07:13 +01:00
d99cac3ec3 Move the avx specific bits to a separate file. (#481) 2023-08-17 09:01:06 +01:00