* start to impl chinese clip
* impl vision model
* copy code from bert
* refactor use
* refactor use again
* fix text model
* refactor
* try to fix text model
* tuning
* tuning chinese clip
* delete useless code
* revert code
* Clippy fixes.
* Also apply cargo fmt.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>