* Initial commit: model weights working, prediciton incorrect
* moved distilbertformaskedlm into distilbert modeling file
* made maskedLM like bert example, still incorrect predictions
* finally not getting NaNs, fixed attention mask
* getting correct output sentences
* get top k predictions
* fixed output formatting slightly
* added default arg for model_id
* lint
* moved masked token example code from distilbertformaskedlm example to distilbert example
* lint
* removed distilbertformaskedlm example
* cleanup
* clippy
* removed embedding normalization from example
* made output and model dependent on args instead of prompt
* lint
* replaced or_ok anyhow error with anyhow context
* changed error message for mask token not found
* links in chinese_clip
* links for clip model
* add mod docs for flux and llava
* module doc for MMDIT and MIMI
* add docs for a few more modesl
* mod docs for bert naser and beit
* add module docs for convmixer colpali codegeex and chatglm
* add another series of moddocs
* add fastvit-llama2_c
* module docs mamba -> mobileone
* module docs from moondream-phi3
* mod docs for quantized and qwen
* update to yi
* fix long names
* Update llama2_c.rs
* Update llama2_c_weights.rs
* Fix the link for mimi + tweaks
---------
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* add bce with logit loss
* add bce with logit loss
* remove imports
* fix tiny bug
* add test documentation and refactor function
* fix test cases and formatting
* distilbet files
* Apply various cleanups.
* More cleanups.
* More polish.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>