Get the ggml based llama to generate some text. (#464)

* Add more stats to the ggml example.

* Build a quantized model from the file content.

* Move the tensor retrieval in the main crate.

* Start adding the forward pass.

* Add more to the forward pass of the quantized llama.

* Apply the attention layers.

* Add the sampling loop.

* Get the sampling loop to work.

* Minor tweak.

* Add a quantize/dequantize test.

* Bugfix.

* Add a comment + swap the order.

* Bugfixes.
This commit is contained in:
Laurent Mazare
2023-08-16 12:41:07 +01:00
committed by GitHub
parent fec87e86f5
commit 3071134788
7 changed files with 381 additions and 37 deletions

View File

@ -210,6 +210,10 @@ impl Error {
Self::Wrapped(Box::new(err))
}
pub fn msg(err: impl std::error::Error + Send + Sync + 'static) -> Self {
Self::Msg(err.to_string())
}
pub fn bt(self) -> Self {
let backtrace = std::backtrace::Backtrace::capture();
match backtrace.status() {