Commit Graph

16 Commits

Author SHA1 Message Date
403680f17d Quantized GGUF style (#1523)
* Metal quantized modifications proposal.

- Add a device param, wherever needed.
- Create new QMetal storage thing that implements QuantizedType.
- Update everywhere needed.

Fix Python.

Fixing examples.

Fix: fmt + clippy + stub.

Moving everything around.

Only missing the actual implems.

Fixing everything + adding dequantized kernels.

More work.

Fixing matmul.

Fmt + Clippy

Some clippy fixes.

Working state.

Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present previously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal

Fixing Q2K bug (present in ggml).

* Cleanup.

* Fix the rebase.

* Removing the fences speeds everything up and *is* correct this time...

* Cleanup the fence.

* After rebase.

* Bad code removal.

* Rebase after phi2 merge + fix replit default to CPU.

* Making the CI happy.

* More happy tests.

---------

Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
2024-01-17 10:27:58 +01:00
0fc95c9f0c Add a dequantize command to tensor-tools. (#1565)
* Add a dequantize command to tensor-tools.

* Clippy fixes.
2024-01-11 11:21:01 +01:00
deee7612da Quantized version of mistral. (#1009)
* Quantized version of mistral.

* Integrate the quantized mistral variant.

* Use the quantized weight files.

* Tweak the quantization command.

* Fix the dtype when computing the rotary embeddings.

* Update the readme with the quantized version.

* Fix the decoding of the remaining tokens.
2023-09-30 18:25:47 +01:00
ccf352f3d1 Use yoke to provide a self-referential container for mmaped safetenso… (#939)
* Use yoke to provide a self-referential container for mmaped safetensor files.

* Add the new self-owned type for safetensor files without removing the previous version.

* Add routing.

* Add an initializer for the case of multiple files.
2023-09-23 15:43:11 +01:00
912a3d63b0 Use the proper block size for quantizing models. (#933)
* Use the proper block size for quantizing models.

* Use the proper dimension.
2023-09-22 21:36:56 +01:00
3b557765e8 T5 quantized example (#922)
* Load gguf files for the quantized t5.

* Add the quantized t5 example.

* Allow for loading local files.

* Add some support for quantizing safetensor files.

* Transpose before quantizing.

* Quantized t5.

* Retrieve the weights from the hub.
2023-09-21 12:33:15 +01:00
be471d50ab Llama quantization. (#625) 2023-08-27 14:08:15 +01:00
7151f2cf63 Add the quantize command. (#624)
* Add the quantize command.

* Bugfix for writing gguf files.

* And add a comment.
2023-08-27 11:35:19 +01:00
2cde0cb74b More pickle support. (#588)
* More pickle support.

* Be more verbose.
2023-08-24 18:45:10 +01:00
0764741cc4 Handle GGUF files in tensor-tools. (#558) 2023-08-23 06:32:07 +01:00
551409092e Small tweaks to tensor-tools. (#517) 2023-08-19 16:50:26 +01:00
6431140250 Retrieve tensor data from PyTorch files. (#516) 2023-08-19 15:57:18 +01:00
607ffb9f1e Retrieve more information from PyTorch checkpoints. (#515)
* Retrieve more information from PyTorch checkpoints.

* Add enough support to load dino-v2 backbone weights.
2023-08-19 15:05:34 +01:00
f861a9df6e Add ggml support to tensor-tools (#512)
* Pickle work-in-progress.

* More unpickling.

* More pickling.

* Proper handling of setitems.

* Clippy.

* Again more pickling.

* Restore the example.

* Add enough pickle support to get the list of tensors.

* Read the data from zip files.

* Retrieve the tensor shape.

* Extract the size and dtype.

* More storage types.

* Improve the destructuring.

* Also support ggml files.
2023-08-19 11:45:22 +01:00
ad33715c61 Preliminary support for importing PyTorch weights. (#511)
* Pickle work-in-progress.

* More unpickling.

* More pickling.

* Proper handling of setitems.

* Clippy.

* Again more pickling.

* Restore the example.

* Add enough pickle support to get the list of tensors.

* Read the data from zip files.

* Retrieve the tensor shape.

* Extract the size and dtype.

* More storage types.

* Improve the destructuring.
2023-08-19 11:26:32 +01:00
90ff04e77e Add the tensor-tools binary. (#510) 2023-08-19 09:06:44 +01:00