Instructions to use google/umt5-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/umt5-base with Transformers:
# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("google/umt5-base", dtype="auto") - Notebooks
- Google Colab
- Kaggle
The best way to deploy UMT5 variants into production with low-latency inference?
#4
by Respair - opened
This is such a neat model, but I don't see it being supported by most frameworks since it uses a different sampling method.
Can you recommend anyway to deploy this model (by this, I mean the model we finetune on the downstream task) into production? and possibly a trivial way to convert it to ONNX. optimum doesn't support it just yet. preferably something that relies on GPUs.
man, I really wish there was a vllm for such seq2seq models. their potential is so underrated. if this tiny voice of mine can be heard by the big guys at google, please create a framework that makes it easier to use seq2seq model!