20512ba408
Return the metadata in the gguf pyo3 bindings. ( #729 )
...
* Return the metadata in the gguf pyo3 bindings.
* Read the metadata in the quantized llama example.
* Get inference to work on gguf files.
2023-09-04 07:07:00 +01:00
21109e1983
Recommend using maturin. ( #717 )
2023-09-02 16:19:35 +01:00
ad796eb4be
More quantized llama in python. ( #716 )
...
* More quantized llama in python.
* Expose a couple more functions.
* Apply the last layer.
* Use the vocab from the ggml files.
2023-09-02 13:41:48 +01:00
e8e33752f4
Sketch a quantized llama using the pyo3 api. ( #715 )
...
* Sketch a quantized llama using the pyo3 api.
* Add more ops.
* Expose a few more functions to use in the quantized model.
* Rope embeddings.
* Get the forward pass to work.
2023-09-02 11:26:05 +01:00