5a363dbc26
Adds check for 7b-zephyr and uses correct template ( #1283 )
...
* Adds check for 7b-zephyr and uses correct template
* Handle zephyr as mistral.
* Disable the protoc bits of the CI.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2023-11-06 21:05:39 +01:00
620c94d12e
Add support for Zephyr-7b in the quantized model. ( #1124 )
2023-10-18 17:31:26 +01:00
f6054e9d60
Fix the prompt for mistral when using instruct/interactive mode. ( #1013 )
2023-10-01 06:44:30 +01:00
328167ec04
Integrate TheBloke quantized mistral weights. ( #1012 )
2023-09-30 22:39:42 +01:00
cbd36157ac
Add a gif to the quantized readme. ( #833 )
...
* Add a gif to the quantized readme.
* gif update.
2023-09-13 08:43:52 +01:00
e82fcf1c59
Add more example readmes. ( #828 )
...
* Add more readmes.
* Add a readme for dinov2.
* Add some skeleton files for a couple more examples.
* More whisper details.
2023-09-12 17:21:24 +01:00
805bf9ffa7
Implement top_p / nucleus sampling ( #819 )
...
* Implement top_p / nucleus sampling
* Update changelog
* rustfmt
* Add tests
* Fix clippy warning
* Fix another clippy error
2023-09-12 18:10:16 +02:00
bb23b90b1d
Add a small readme for the quantized example. ( #823 )
2023-09-12 10:17:31 +01:00
35f72514f5
Move more models to candle-transformers ( #796 )
...
* Move dinov2.
* Move efficientnet.
* Move the quantized llama model.
* Move segment-anything.
2023-09-10 10:20:18 +01:00
7cef35c84d
Tweak some quantized args ( #692 )
...
* Print the args + change the default temp/repeat penalty.
* Minor formatting tweak.
2023-08-31 17:25:21 +01:00
7509c98970
Interactive mode for the quantized model. ( #690 )
2023-08-31 10:52:42 +01:00
a1a5ab8b0a
Neon optimized vecdot ( #666 )
...
* Q5k vecdot.
* Add the q3k vecdot.
* Q2k vecdot.
* Move the quantized model to its own file.
2023-08-29 22:28:46 +01:00
72ebb12bca
Remove some dead-code annotations. ( #629 )
...
* Remove some dead-code annotations.
* More dead code removal.
* One more.
* CI fix.
2023-08-27 18:52:33 +01:00
6e485f2deb
Add some optional repeat penalty. ( #623 )
...
* Add some optional repeat penalty.
* Add the missing files.
2023-08-27 10:48:45 +01:00
c093b03d51
Generic implementation of vecdot for q80. ( #596 )
...
* Generic implementation of vecdot for q80.
* Add support for code-llama 7b.
* Support more code-llama.
2023-08-25 09:04:05 +01:00
4ee1cf038a
Get the rms epsilon from GGUF. ( #565 )
2023-08-23 11:40:20 +01:00
0f4ff8a739
Fix the quantized example. ( #564 )
2023-08-23 11:09:55 +01:00
89a00b56cc
add chat models in quantized example ( #551 )
...
* add chat models in quantized example
* cargo fmt
2023-08-23 11:05:33 +01:00
508d34daf2
GGUF support in the quantized model. ( #559 )
...
* GGUF support in the quantized model.
* Get the GGUF support to work on llama.
2023-08-23 09:20:57 +01:00
f9ecc84477
GQA support in the quantized model. ( #555 )
...
* GQA support in the quantized model.
* Fix the reshaping.
* Fix the main llama model.
* Infer the proper gqa from the model kind.
2023-08-22 19:41:10 +01:00
44420d8ae1
Add some llama-v2 variants. ( #545 )
2023-08-22 08:35:15 +01:00
4300864ce9
Add some optional repeat penalty. ( #535 )
2023-08-21 09:59:13 +01:00
a1812f934f
Add a yolo-v3 example. ( #528 )
...
* Add a couple functions required for yolo.
* Add the yolo-v3 example.
* Add minimum and maximum.
* Use the newly introduced maximum.
* Cuda support for min/max + add some testing.
* Allow for more tests to work with accelerate.
* Fix a typo.
2023-08-20 18:19:37 +01:00
d73ca3d28e
Line up the llama.cpp implementation with the candle one. ( #518 )
...
* Separate the prompt stats from the post-prompt ones in the quantized example.
* Slightly nicer output printing.
* Line up with the llama.cpp implementation.
2023-08-19 20:12:07 +01:00
c78ce76501
Add a simple Module trait and implement it for the various nn layers ( #500 )
...
* Start adding the module trait.
* Use the module trait.
* Implement module for qmatmul.
2023-08-18 09:38:22 +01:00
557b2c28dd
Q6K quantization ( #495 )
...
* Print the detected arch options.
* Add the q6k quantization.
* Add a currently broken test.
* Bugfix.
* Bugfix.
* Another bugfix.
* Another bugfix + get the test to work.
2023-08-17 22:22:57 +01:00
5f30c1e1e0
Add the whisper small model. ( #490 )
2023-08-17 15:48:34 +01:00
ad7c53953b
Add a verbose-prompt mode, similar to llama.cpp. ( #489 )
2023-08-17 15:26:44 +01:00
d32e8199cd
Layer norm tweaks ( #482 )
...
* Add some options to make layer-norm more configurable.
* Add the rms-norm variant.
* Replace the RmsNorm with the shared bits.
2023-08-17 10:07:13 +01:00
d99cac3ec3
Move the avx specific bits to a separate file. ( #481 )
2023-08-17 09:01:06 +01:00