098909de40
Add vecdot for q6k-q8k. ( #476 )
...
* Add vecdot for q6k-q8k.
* Add some testing for q8k.
* Use QMatMul for the output layer.
2023-08-16 20:59:40 +01:00
c5f45887dc
Add some tracing to the quantized example. ( #473 )
2023-08-16 18:49:08 +01:00
2e206e269d
Add the model argument. ( #471 )
2023-08-16 16:41:06 +01:00
575e88a999
Add a quantized test that use negative values. ( #470 )
...
* Add a quantized test that use negative values.
* Add a default tokenizer.
2023-08-16 16:32:58 +01:00
a9101700b6
Add a kv-cache to the quantized llama example. ( #466 )
...
* Add a kv-cache to the quantized llama example.
* Also print the prompt.
* Bugfix in q6k dequantizing.
* Another bugfix.
2023-08-16 14:28:42 +01:00
3071134788
Get the ggml based llama to generate some text. ( #464 )
...
* Add more stats to the ggml example.
* Build a quantized model from the file content.
* Move the tensor retrieval in the main crate.
* Start adding the forward pass.
* Add more to the forward pass of the quantized llama.
* Apply the attention layers.
* Add the sampling loop.
* Get the sampling loop to work.
* Minor tweak.
* Add a quantize/dequantize test.
* Bugfix.
* Add a comment + swap the order.
* Bugfixes.
2023-08-16 12:41:07 +01:00
ca449f9ee1
Add quantized tensors. ( #458 )
...
* Add quantized tensors.
* Implement the debug trait for QTensor.
* Add the QMatMul custom op.
2023-08-15 22:45:53 +01:00
b8263aa15c
Quantized support for f16 and f32 ( #457 )
...
* Add f32 as a quantized type.
* Add f16 as a quantized type too.
2023-08-15 21:09:37 +01:00
e68b2accb4
Split out the quantized file. ( #456 )
2023-08-15 20:26:27 +01:00
9e7e6e0288
Add dequantization for ggmls q4_0
, q4_1
, q5_0
, q5_1
and q8_0
( #407 )
...
* Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0`
* expose `tensor_from_ggml` for external usage
* bugfixes & example
2023-08-13 23:22:57 +01:00