64d4038e4f
Mention rwkv v6 in the readmes. ( #1784 )
2024-03-01 08:58:30 +01:00
979deaca07
EfficientVit (MSRA) model ( #1783 )
...
* Add EfficientVit (Microsoft Research Asia) model.
* Mention models in README
2024-03-01 08:53:52 +01:00
b485e4b6ee
add models of rwkv v6 and quantized rwkv v6 ( #1781 )
...
* add models of rwkv v6 and quantized rwkv v6
* fix ci clippy fail
2024-03-01 08:37:56 +01:00
2c95b7394a
Handle Q5_0 and Q5_1 quants in cuda.
2024-02-29 10:54:01 +01:00
4fd00b8900
Add the StarCoder2 model. ( #1779 )
...
* Add the StarCoder2 model.
* Add the example code and get things to work.
* And also tweak the readme.
2024-02-28 21:02:41 +01:00
57267cd536
Add a flag to force running the quantized model on CPUs. ( #1778 )
...
* Add a flag to force running the quantized model on CPUs.
* Add encodec to the readme.
2024-02-28 14:58:42 +01:00
60ee5cfd4d
Support more modes in the encodec example. ( #1777 )
...
* Support more modes in the encodec example.
* Remove the old encodec model from the musicgen bits.
2024-02-28 09:22:33 +01:00
56e44aabe3
Make some dependencies optional in the examples. ( #1776 )
2024-02-28 07:17:03 +01:00
d0aca6c3c6
Encodec encoding demo. ( #1775 )
2024-02-28 06:49:03 +01:00
15e8644149
Apply dilations in the encodec model. ( #1772 )
...
* Apply dilations in the encodec model.
* Add some encoding bits.
2024-02-27 23:26:35 +01:00
0c49e95dfb
Encodec model. ( #1771 )
...
* Encodec model.
* Fixes.
* Add the padding functions.
* Get the LSTM bit to work.
* Get the encodec model to generate some tokens (decoder only for now).
* Minor tweak.
* Minor tweak.
2024-02-27 22:59:40 +01:00
205767f9de
Avoid tensor copying in the quantized example. ( #1770 )
2024-02-27 20:32:30 +01:00
5e526abc8c
Bump the version number to 0.4.1. ( #1768 )
...
* Fix the block size for some cuda kernels.
* Bump the version number to 0.4.1.
2024-02-27 14:19:59 +01:00
6400e1b0a0
Fix the block size for some cuda kernels. ( #1767 )
2024-02-27 14:08:33 +01:00
32544a2ad6
Add an option to split the prompt. ( #1766 )
2024-02-27 11:24:11 +01:00
badf886583
Cuda kernel for dequantizing q8k. ( #1760 )
...
* Cuda kernel for dequantizing q8k.
* Clippy lints.
2024-02-26 08:42:44 +01:00
918136ba46
add quantized rwkv v5 model ( #1743 )
...
* and quantized rwkv v5 model
* Integrate the quantized rwkv model in the initial example.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-02-25 21:43:40 +01:00
1a6043af51
Tweak the VarMap set type. ( #1758 )
2024-02-25 20:50:08 +01:00
2f22afd80e
Cuda acceleration for quantized model. ( #1754 )
...
* Boilerplate for the quantized cuda support.
* More basic cuda support.
* More cuda quantization (quantize on cpu for now).
* Add the dequantization bit.
* Start adding some dedicated cuda kernels from llama.cpp.
* Move the kernel code.
* Start interfacing with the kernel.
* Tweak the kernel launch params.
* Bugfix for quantized metal.
* Fix some clippy lints.
* Tweak the launch parameters.
* Tweak cuda basics to perform a quantized matmul.
* Perform the dequantization on the cpu + use cublas for matmul.
* Add the dequantization kernel.
* Test the qmatmul.
* More kernels.
* Matmul-vec kernel.
* Add a couple kernels.
* More dequantization kernels.
2024-02-25 18:11:47 +01:00
8d04f70f4d
Fix the eos token for gemma. ( #1753 )
2024-02-24 11:07:02 +01:00
eeb7e2b683
Apply rustfmt to the newly added tests. ( #1749 )
2024-02-23 06:48:28 +01:00
11ea7aac4d
tests ( #1724 )
2024-02-23 06:35:46 +01:00
32eb56d6b3
Fix typo in README ( #1740 )
2024-02-22 12:35:26 +01:00
28057781aa
Make the cache for the llama model explicit too. ( #1745 )
2024-02-22 12:04:33 +01:00
544018b6d0
Explicit caching in llama2.c.
2024-02-22 10:22:03 +01:00
c753f72c85
Support for attention bias in gemma + refactor things a bit. ( #1744 )
...
* Support for attention bias in gemma + refactor things a bit.
* Fix the cuda tests.
2024-02-22 09:35:28 +01:00
8013b50829
Add grads for interpolate1d ( #1742 )
...
* add backprop for interpolate1d
* fix clippy lint
* correct fix clippy lint
2024-02-22 08:44:01 +01:00
45d5322d62
Add the Gemma models. ( #1741 )
...
* Add the Gemma models.
* Add the gemma example.
* Adapt the RmsNorm.
* Get the 2b model to work.
* 7b support.
* Use the config head dim.
* Yet another fix.
* Make the matrixes contiguous.
* Also get the 7b model to work.
* And add to the readme.
2024-02-21 22:02:50 +01:00
a2cb2edead
Add a couple backtraces on cpu errors. ( #1738 )
2024-02-20 19:54:13 +01:00
fc67d878bb
Bugfix for conv-transpose1d ( #1734 )
...
* Add a currently broken test.
* Bugfix + fix test.
2024-02-19 09:04:49 +01:00
3ba37443e5
Bugfix for applying the bias in conv1d-transpose. ( #1732 )
2024-02-18 22:51:20 +01:00
1fb728772d
Support for groups in conv-transpose1d. ( #1731 )
...
* Groups support in conv-transpose-1d.
* Remove dangling file.
2024-02-18 21:28:07 +01:00
cb86b0c82c
Fix float unpickling. ( #1730 )
2024-02-18 19:33:55 +01:00
6284ad784c
Module implementation for options. ( #1728 )
2024-02-18 14:12:55 +01:00
678d44a7f6
Expose the weights and biases in transposed convolutions. ( #1727 )
2024-02-18 10:35:01 +01:00
41416d2376
Expose more conv1d functions/structs. ( #1726 )
2024-02-17 18:50:55 +01:00
5ebcfeaf0f
Make the r, k, v tensors contiguous. ( #1719 )
2024-02-16 09:17:35 +01:00
7c7400fb63
Use the tokenizer-output-stream in the llama example. ( #1715 )
...
* Use the tokenizer-output-stream in the llama example.
* Also use tokenizer-output-stream for llama2-c.
2024-02-15 16:47:33 +01:00
058a910d0e
Add a readme for rwkv. ( #1712 )
2024-02-14 15:31:33 +01:00
26fe162ab5
Custom tokenizer for rwkv. ( #1711 )
...
* Custom tokenizer for rwkv.
* Custom tokenizer.
* Getting the tokenizer to work.
2024-02-14 15:11:38 +01:00
121a71e01f
Fix the silu cuda kernel. ( #1710 )
2024-02-14 11:08:18 +01:00
2d5f2a728d
Add the RWKV model (v5). ( #1707 )
...
* Start adding the RWKV model.
* More of the forward step.
* Handle rescaling.
* FeedForward.
* More work on RWKV.
* Better state tracking.
* Finish a first pass on forward.
* Fix the shape mismatches.
* Do not rescale in f32.
* Rename to rwkv-v5.
* Add the new models to the readme.
2024-02-14 10:58:32 +01:00
68f7655895
Add ConvNeXt-V2 and smaller model variants. ( #1709 )
2024-02-14 10:53:07 +01:00
b60064780d
feat: add silu activation function ( #1706 )
...
* feat: add silu activation function
* use silu/arg in grad
* update candle-nn
* use node
2024-02-14 10:27:22 +01:00
14010a8498
Update our cuda runner. ( #1705 )
...
* Update our cuda runner.
* Fix install rust.
* Simplify.
* Docker in docker.
* Install curl
* Install curl
* No sudo.
* devel
* Put curl again.
* Add missing deps.
* pkg-config.
* Cleanup.
2024-02-13 19:06:15 +01:00
0de0795220
Qmetal tweaks ( #1704 )
...
* Add the dummy qmetal backend.
* Fix the metal compilation.
2024-02-13 18:11:17 +01:00
c1b418586c
Fixing quantized llama demo on metal. ( #1703 )
2024-02-13 16:28:56 +01:00
ad73e93da2
Detach the tensors on batch-norm eval. ( #1702 )
...
* Detach the tensors on batch-norm eval.
* Fix pyo3 bindings.
* Black tweak.
* Formatting.
* Also update the pyo3-onnx formatting.
* Apply black.
2024-02-13 14:26:32 +01:00
13c67226e6
feat: support microphone whisper streaming ( #1678 )
...
* feat: support microphone whisper streaming
* fix: cleanup print stmts and adjust how input is read
* fix: remove incorrect comment
* feat: split into new example and simplify
* fix: feature flag example file
* fix: fmt fixes
* feat: simplify and remove redundant files
2024-02-12 18:01:21 +01:00
d0aa197b07
ConvTranspose1d cuda support. ( #1697 )
...
* ConvTranspose1d cuda support.
* Add the conv-transpose1d kernel.
* Remove some unused variables.
2024-02-12 15:03:18 +01:00