candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Author	SHA1	Message	Date
Nicolas Patry	4826a4212e	Adding support for codellama in examples. Codellama requires bf16 for now (error to convert from bf16 to f16). Multiprocess demo not functional for it because flash-attn only supports f16 for now.	2023-08-25 09:56:11 +00:00
Laurent Mazare	c093b03d51	Generic implementation of vecdot for q80. (#596 ) * Generic implementation of vecdot for q80. * Add support for code-llama 7b. * Support more code-llama.	2023-08-25 09:04:05 +01:00
Laurent Mazare	189442a0fa	Add the pose estimation head for yolo. (#589 ) * Add the pose estimation head for yolo. * Properly handle the added position dimensions. * Integrate the pose estimation head in the forward pass. * Renaming. * Fix for pose estimation.	2023-08-24 22:12:34 +01:00
Laurent Mazare	79916c2edb	Use the hub weights for efficientnet. (#573 )	2023-08-23 18:20:21 +01:00
Laurent Mazare	431051cc32	Add Efficientnet (#572 ) * EfficientNet. * Complete the efficientnet implementation. * Improve group handling. * Get the efficientnet to work.	2023-08-23 18:02:58 +01:00
Laurent Mazare	eedd85ffa7	Move the imagenet specific bits to a separate file. (#571 )	2023-08-23 16:42:09 +01:00
Laurent Mazare	329f661d9b	Trace softmax (#568 ) * Trace the softmax op. * Inline the sum. * Add min/max vec operations.	2023-08-23 15:25:50 +01:00
Laurent Mazare	aba1e90797	Add some group parameter to convolutions. (#566 ) * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.	2023-08-23 12:58:55 +01:00
Laurent Mazare	4ee1cf038a	Get the rms epsilon from GGUF. (#565 )	2023-08-23 11:40:20 +01:00
Laurent Mazare	0f4ff8a739	Fix the quantized example. (#564 )	2023-08-23 11:09:55 +01:00
cksac	89a00b56cc	add chat models in quantized example (#551 ) * add chat models in quantized example * cargo fmt	2023-08-23 11:05:33 +01:00
Laurent Mazare	508d34daf2	GGUF support in the quantized model. (#559 ) * GGUF support in the quantized model. * Get the GGUF support to work on llama.	2023-08-23 09:20:57 +01:00
Laurent Mazare	f9ecc84477	GQA support in the quantized model. (#555 ) * GQA support in the quantized model. * Fix the reshaping. * Fix the main llama model. * Infer the proper gqa from the model kind.	2023-08-22 19:41:10 +01:00
Laurent Mazare	cc22d4db20	Put the transcribe token before the language one. (#553 )	2023-08-22 16:46:34 +01:00
Laurent Mazare	9bc811a247	Improve the aspect ratio handling on yolo-v8. (#549 ) * Fix the aspect ratio handling in yolo-v8. * Typo.	2023-08-22 14:55:33 +01:00
Laurent Mazare	bb69d89e28	Move the yolo shared bits to a common place. (#548 ) * Move the yolo shared bits to a common place. * Share more code. * Configurable thresholds.	2023-08-22 13:03:07 +01:00
Laurent Mazare	20ce3e9f39	Sketch the yolo wasm example. (#546 ) * Sketch the yolo wasm example. * Web ui. * Get the web ui to work. * UI tweaks. * More UI tweaks. * Use the natural width/height. * Add a link to the hf space in the readme.	2023-08-22 11:56:43 +01:00
Laurent Mazare	44420d8ae1	Add some llama-v2 variants. (#545 )	2023-08-22 08:35:15 +01:00
Laurent Mazare	f16bb97401	Use the yolo-v8 weights from the hub. (#544 ) * Use the weights from the hub. * Add to the readme.	2023-08-21 22:07:36 +01:00
Laurent Mazare	3507e14c0c	Yolo v8 fixes (#542 ) * Fixes for the yolo-v8 layout. * Bugfixes. * Another silly bugfix. * Remove the hf-hub dependency. * Remove the transformers dependency.	2023-08-21 21:05:40 +01:00
Laurent Mazare	de50e66af1	Add yolo v8 as an example (#541 ) * Sketching yolo-v8. * Get the model to load. * yolo-v8 forward pass. * Complete(?) the forward pass. * Fix some shape issues. * Add the missing padding. * Process the predictions.	2023-08-21 18:40:09 +01:00
Laurent Mazare	cc2d6cf2e0	Improve the timestamps support in whisper (#539 ) * Timestamp support for whisper. * Properly display the timestamps. * Bugfix for the timestamp units.	2023-08-21 12:26:59 +01:00
Laurent Mazare	e3b71851e6	Retrieve the yolo-v3 weights from the hub. (#537 )	2023-08-21 10:55:09 +01:00
Laurent Mazare	4300864ce9	Add some optional repeat penalty. (#535 )	2023-08-21 09:59:13 +01:00
Laurent Mazare	11c7e7bd67	Some fixes for yolo-v3. (#529 ) * Some fixes for yolo-v3. * Use the running stats for inference in the batch-norm layer. * Get some proper predictions for yolo. * Avoid the quadratic insertion.	2023-08-20 23:19:15 +01:00
Laurent Mazare	a1812f934f	Add a yolo-v3 example. (#528 ) * Add a couple functions required for yolo. * Add the yolo-v3 example. * Add minimum and maximum. * Use the newly introduced maximum. * Cuda support for min/max + add some testing. * Allow for more tests to work with accelerate. * Fix a typo.	2023-08-20 18:19:37 +01:00
Laurent Mazare	aa207f2dd9	Print some per-step timings in stable-diffusion. (#520 ) * Skeleton files for neon support of quantization. * SIMD version for q4 vecdot. * Also simdify the q6k multiplication. * Add some timings to stable-diffusion.	2023-08-20 05:45:12 +01:00
Laurent Mazare	d73ca3d28e	Line up the llama.cpp implementation with the candle one. (#518 ) * Separate the prompt stats from the post-prompt ones in the quantized example. * Slightly nicer output printing. * Line up with the llama.cpp implementation.	2023-08-19 20:12:07 +01:00
Laurent Mazare	b64e782c2d	Use the hub to retrieve dinov2 model weights. (#507 )	2023-08-18 18:27:31 +01:00
Laurent Mazare	e5dd5fd1b3	Print the recognized categories in dino-v2. (#506 )	2023-08-18 17:32:58 +01:00
Laurent Mazare	cb069d6063	Add the permute op (similar to pytorch). (#504 ) * Add the permute op (similar to pytorch). * Add the backprop for dimension permutation.	2023-08-18 16:30:53 +01:00
Laurent Mazare	4f1541526c	dinov2 - read images from disk and compute the class probabilities (#503 ) * Load the image from disk and convert it to a tensor. * Tweak the function name.	2023-08-18 15:50:33 +01:00
Laurent Mazare	95462c6a2e	Add a vision transformer example (dino-v2). (#502 ) * Add a vision transformer example (dino-v2). * Add some documentation + test. * CI fix. * Another fix (still unable to replicate the errors locally :( )	2023-08-18 11:58:06 +01:00
Laurent Mazare	c78ce76501	Add a simple Module trait and implement it for the various nn layers (#500 ) * Start adding the module trait. * Use the module trait. * Implement module for qmatmul.	2023-08-18 09:38:22 +01:00
Laurent Mazare	13401df4d1	Add an abstract type for RmsNorm. (#499 )	2023-08-18 08:52:14 +01:00
Laurent Mazare	26fd37b348	Use the main branch of the HF repo where possible. (#498 ) * Use the main branch of the HF repo where possible. * And add the large model.	2023-08-18 08:18:30 +01:00
Franco Lucchini	f056dcab21	Add medium model (#497 )	2023-08-18 08:08:59 +01:00
Laurent Mazare	557b2c28dd	Q6K quantization (#495 ) * Print the detected arch options. * Add the q6k quantization. * Add a currently broken test. * Bugfix. * Bugfix. * Another bugfix. * Another bugfix + get the test to work.	2023-08-17 22:22:57 +01:00
Laurent Mazare	3164cd24fa	Replicate the sot-token logic from the Python implementation more acc… (#491 ) * Replicate the sot-token logic from the Python implementation more accurately. * Add a flag to control the timestamp mode.	2023-08-17 16:59:36 +01:00
Laurent Mazare	5f30c1e1e0	Add the whisper small model. (#490 )	2023-08-17 15:48:34 +01:00
Laurent Mazare	ad7c53953b	Add a verbose-prompt mode, similar to llama.cpp. (#489 )	2023-08-17 15:26:44 +01:00
Laurent Mazare	5d99026fd2	F16 support for stable diffusion (#488 ) * F16 support for stable diffusion. * Keep the attention bits in F32. * Keep more of the attention bits in F32. * More mixed precision support.	2023-08-17 13:48:56 +01:00
Laurent Mazare	c3176f0dfb	Flash-attention support in stable diffusion (#487 ) * Add flash-attention for the stable-diffusion example. * Change the dtype. * Silly fix. * Another fix. * Revert the dtype back to the query dtype after apply flash-attn.	2023-08-17 12:16:40 +01:00
Laurent Mazare	03be33eea4	Relax the requirements on CustomOp. (#486 ) * Relax the requirements on CustomOp. * Simplify the custom-ops when no backward is required.	2023-08-17 11:12:05 +01:00
Laurent Mazare	d32e8199cd	Layer norm tweaks (#482 ) * Add some options to make layer-norm more configurable. * Add the rms-norm variant. * Replace the RmsNorm with the shared bits.	2023-08-17 10:07:13 +01:00
Laurent Mazare	d99cac3ec3	Move the avx specific bits to a separate file. (#481 )	2023-08-17 09:01:06 +01:00
Laurent Mazare	098909de40	Add vecdot for q6k-q8k. (#476 ) * Add vecdot for q6k-q8k. * Add some testing for q8k. * Use QMatMul for the output layer.	2023-08-16 20:59:40 +01:00
Laurent Mazare	c5f45887dc	Add some tracing to the quantized example. (#473 )	2023-08-16 18:49:08 +01:00
Nicolas Patry	fa4590d7fd	Merge pull request #469 from huggingface/fix_llama_v1 Fixing llamav1	2023-08-16 17:47:40 +02:00
Laurent Mazare	2e206e269d	Add the model argument. (#471 )	2023-08-16 16:41:06 +01:00

1 2 3 4 5 ...

269 Commits