candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 02:38:10 +00:00

Author	SHA1	Message	Date
Laurent Mazare	5130a7da32	Simd128 version of q6k vec-dot. (#1015 ) * Add a specific function for the simd128 q6k vec-dot. * Simdification. * More simdification.	2023-10-01 19:44:12 +01:00
lichin-lin	41143db1af	[segment-anything] add multi point logic for demo site (#1002 ) * [segment-anything] add multi point logic for demo site * [segment-anything] remove libs and update functions	2023-10-01 18:25:22 +01:00
Laurent Mazare	096dee7073	Bump the version to 0.3.0. (#1014 ) * Bump the version to 0.3.0. * Changelog update.	2023-10-01 13:51:57 +01:00
Laurent Mazare	f6054e9d60	Fix the prompt for mistral when using instruct/interactive mode. (#1013 )	2023-10-01 06:44:30 +01:00
Laurent Mazare	328167ec04	Integrate TheBloke quantized mistral weights. (#1012 )	2023-09-30 22:39:42 +01:00
Laurent Mazare	4e55aaa51f	Simd128 version of the q2k-q8k vecdot product. (#1011 ) * Sketch the simd128 version of q2k vecdot. * Use a single accumulator. * Simdify the q2k-q8k vecdot product. * Cosmetic change.	2023-09-30 20:12:41 +01:00
Laurent Mazare	deee7612da	Quantized version of mistral. (#1009 ) * Quantized version of mistral. * Integrate the quantized mistral variant. * Use the quantized weight files. * Tweak the quantization command. * Fix the dtype when computing the rotary embeddings. * Update the readme with the quantized version. * Fix the decoding of the remaining tokens.	2023-09-30 18:25:47 +01:00
Laurent Mazare	06207332bc	Streaming mode for reporting the generated tokens (#1007 ) * Token streaming. * Use the token output stream. * Flush the output. * Ensure that the last characters get reported.	2023-09-30 15:04:11 +01:00
Laurent Mazare	4021272875	Use flash-attn for mistral. (#1004 )	2023-09-30 12:15:10 +01:00
Laurent Mazare	87e3a4e175	Mistral: exit on eos token. (#1001 ) * Mistral: exit on eos token. * Print the proper stats. * Also add a short flag.	2023-09-30 07:07:06 +01:00
Laurent Mazare	6203ced495	Add negative prompts to segment-anything. (#1000 )	2023-09-30 06:17:42 +01:00
GeauxEric	34842fb234	[segment-anything] Print IOU values to help with debugging (#999 )	2023-09-30 05:44:42 +01:00
Laurent Mazare	d188d6a764	Fix the multiple points case for sam. (#998 )	2023-09-29 22:39:43 +02:00
Laurent Mazare	0ac2db577b	Add an entry about WSL slowness to the faq. (#997 )	2023-09-29 17:04:52 +01:00
Gonzalo	fc59bc31bf	fix: add missing gpu fill_* (#996 )	2023-09-29 15:49:30 +01:00
Laurent Mazare	03348e2e6f	Update mistral README.md (#995 )	2023-09-29 12:24:32 +01:00
Laurent Mazare	49fa184a35	Mistral readme (#994 ) * Mistral: print the generated text. * Add mistral to the readmes.	2023-09-29 11:50:50 +01:00
Laurent Mazare	6f17ef82be	Mistral: print the generated text. (#992 )	2023-09-29 10:56:11 +01:00
Gonzalo	01b92cd959	fixes slice_scatter dim type (#988 )	2023-09-29 07:54:45 +01:00
Laurent Mazare	53510ce427	Use a silu activation in mistral. (#991 )	2023-09-29 07:06:54 +01:00
Laurent Mazare	23b3576c47	Add the sliding window. (#986 )	2023-09-28 17:26:33 +01:00
Laurent Mazare	716ab2ccdc	Mistral gpu fix (#985 ) * Add the mistral example. * Use the two model files. * Adjust the dtype. * Tweak the weight paths. * Remove the end of text token. * Get the mistral model to generate some text. * Fix when running on the gpu. * More gpu fixes.	2023-09-28 16:38:13 +01:00
Laurent Mazare	ada8851a23	Add the mistral example. (#984 ) * Add the mistral example. * Use the two model files. * Adjust the dtype. * Tweak the weight paths. * Remove the end of text token. * Get the mistral model to generate some text.	2023-09-28 16:19:18 +01:00
Laurent Mazare	c05a348e36	Add the Mistral 7b model (#983 ) * Start sketching the mistral 7b model. * Add the kv cache. * Add the decoder layer. * Add the mistral model. * Rotary embeddings. * Add the attention mask.	2023-09-28 14:29:41 +01:00
Laurent Mazare	25657804ef	Simd128 q2k vecdot (#982 ) * Sketch the simd128 version of q2k vecdot. * Use a single accumulator.	2023-09-28 12:16:35 +01:00
Laurent Mazare	5e1c595e00	Optimize the index-select cuda kernel. (#976 )	2023-09-28 09:05:29 +01:00
Laurent Mazare	8a49e01b9d	Add the remaining quantized tests to the wasm suite. (#980 )	2023-09-28 08:42:56 +01:00
Laurent Mazare	9cb110c44c	Sketch a simd128 optimized q4k vecdot. (#977 ) * Sketch a simd128 optimized q4k vecdot. * Simdify. * More quantization optimizations. * Again more simdification. * Simdify the splitting loop.	2023-09-27 20:19:38 +01:00
Laurent Mazare	667f01c173	Simd128 vec-dot for q4_0. (#974 ) * Simd128 vec-dot for q4_0. * Bugfix. * Add wasm tests. * Bugfix for the q40 vecdot. * More quantization tests.	2023-09-27 14:15:30 +01:00
Laurent Mazare	e59784e353	simd128 optimized q8_0 vecdot (#972 ) * wasm/simd128 version of the quantized q8_0 vecdot. * Add the missing conversion.	2023-09-27 11:03:20 +01:00
Radamés Ajna	29bd6b2979	Phi 1.5 wasm module (#966 ) * add phi wasm module * replace input with textarea * trim input prompt * stop on <\|endoftext\|> * formatting * clean up * add blurb, and syntax highlighting * add phi-v1.5 wasm * add note * hide Options on details * add first token to generated text * whitespaces for new line * fix: abort -> aborted	2023-09-27 06:07:11 +01:00
Radamés Ajna	9571b200c9	fix firstToken, minor ui changes (#971 )	2023-09-27 06:01:59 +01:00
Laurent Mazare	ce0a4e3a85	Use the gelu-erf activation. (#969 )	2023-09-26 22:30:21 +01:00
Laurent Mazare	4abc1ea34d	Avoid some overflows on wasm32. (#968 )	2023-09-26 11:15:38 +01:00
Radamés Ajna	2dd43d6cdd	add eos token to phi example (#965 ) * add eos token to phi example * rustfmt + get the token directly. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2023-09-26 09:21:22 +01:00
Laurent Mazare	1fcac4afed	Expose a function to clear the KV cache on mixformers. (#964 )	2023-09-26 05:41:07 +01:00
Radamés Ajna	a084f65f9a	fix rep penalty min value (#963 )	2023-09-26 05:23:50 +01:00
Laurent Mazare	c798184c2b	Configurable layer idx for the lstm layer. (#962 )	2023-09-25 21:31:14 +01:00
Laurent Mazare	c78a294323	Add some repeat penalty to the phi example. (#961 )	2023-09-25 20:53:30 +01:00
Laurent Mazare	a36d883254	Use a single flag for the point argument. (#958 )	2023-09-25 12:53:24 +01:00
GeauxEric	7f2bbcf746	[segment-anything] Support multi-point as the prompt input (#945 ) * [sam] Support multi-point prompts * [segment-anything] Pass points by reference * [segment-anything] Update example code and image * Fix clippy lint. --------- Co-authored-by: Yun Ding <yunding@nvidia.com> Co-authored-by: laurent <laurent.mazare@gmail.com>	2023-09-25 12:14:10 +01:00
Laurent Mazare	dc47224ab9	Override the default cudnn heuristics. (#957 )	2023-09-25 10:31:53 +01:00
Laurent Mazare	1ce7fe2543	Add more examples to the phi readme. (#956 )	2023-09-24 18:19:05 +01:00
Laurent Mazare	402ddcfcb4	Add the missing kernel. (#955 )	2023-09-24 17:21:37 +01:00
Laurent Mazare	f5069dd354	Use the repo for the quantized phi model. (#954 )	2023-09-24 16:30:26 +01:00
Laurent Mazare	0007ae9c11	Add the quantized mixformer model. (#953 ) * Add the quantized mixformer model. * Add the quantized option in the phi example.	2023-09-24 15:03:48 +01:00
Laurent Mazare	e15862cfdb	Shared the quantized var-builder code. (#952 ) * Shared the quantized var-builder code. * Fix compilation.	2023-09-24 12:55:07 +01:00
Laurent Mazare	4aeb449017	Depreate the VarBuilder::from_safetensors function. (#951 )	2023-09-24 11:18:17 +01:00
Laurent Mazare	bcb0ed8f1c	Self-contained safetensors for the multiprocess llama example. (#950 )	2023-09-24 06:54:49 +01:00
Laurent Mazare	7edd755756	Pass directly the buffer ownership. (#949 )	2023-09-24 06:34:44 +01:00

1 2 3 4 5 ...

1304 Commits