Instructions to use google/pix2struct-textcaps-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/pix2struct-textcaps-base with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="google/pix2struct-textcaps-base")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/pix2struct-textcaps-base") model = AutoModelForImageTextToText.from_pretrained("google/pix2struct-textcaps-base") - Notebooks
- Google Colab
- Kaggle
The model_type 'pix2struct' is not recognized. It could be a bleeding edge model, or incorrect
#2
by Khaledelsaka - opened
Same issue for me too
+1 same issue.
Did you guys try to run it locally by pulling the model pipeline? Say in Google Colab Notebook..
The model is just 1.5GB, so it must easily load there...
It takes four lines code...
cell1: !pip install -q transformers datasets torch > /dev/null
cell2: from transformers import pipeline
cell3: img2text = pipeline(task='image-to-text',model='google/pix2struct-textcaps-base')
cell4: img2text("/content/your_image.png")
It works, I checked it with a thumbnail of the below video
https://youtu.be/tjrdb8tdXT4
