Image-Text-to-Text
Transformers
Safetensors
qwen2.5-vl
lora
sft
context-classification
out-of-context-detection
coinco
Instructions to use COinCO/Context_Classification_Models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use COinCO/Context_Classification_Models with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="COinCO/Context_Classification_Models")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("COinCO/Context_Classification_Models", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use COinCO/Context_Classification_Models with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "COinCO/Context_Classification_Models" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "COinCO/Context_Classification_Models", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/COinCO/Context_Classification_Models
- SGLang
How to use COinCO/Context_Classification_Models with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "COinCO/Context_Classification_Models" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "COinCO/Context_Classification_Models", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "COinCO/Context_Classification_Models" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "COinCO/Context_Classification_Models", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use COinCO/Context_Classification_Models with Docker Model Runner:
docker model run hf.co/COinCO/Context_Classification_Models
| base_model: Qwen/Qwen2.5-VL-3B-Instruct | |
| library_name: transformers | |
| pipeline_tag: image-text-to-text | |
| tags: | |
| - qwen2.5-vl | |
| - lora | |
| - sft | |
| - context-classification | |
| - out-of-context-detection | |
| - coinco | |
| license: cc-by-4.0 | |
| # COinCO Context Classification Models | |
| **Authors:** Tianze Yang\*, Tyson Jordan\*, Ruitong Sun\*, Ninghao Liu, Jin Sun | |
| \*Equal contribution | |
| **Affiliation:** University of Georgia | |
| ## Overview | |
| Fine-grained context classification models for detecting **out-of-context objects** in images. Each model is a fully merged Qwen2.5-VL-3B-Instruct fine-tuned via LoRA on the [COinCO dataset](https://huggingface.co/datasets/COinCO/COinCO-dataset). | |
| The models classify whether an object (marked by a red bounding box) is **in-context** or **out-of-context** based on three criteria: | |
| | Model | Criterion | Description | | |
| |-------|-----------|-------------| | |
| | `co_occurrence/` | Co-occurrence | Whether the object can reasonably appear together with other objects in the scene | | |
| | `location/` | Location | Whether the object is placed in a physically and contextually reasonable position | | |
| | `size/` | Size | Whether the object's size is proportional and realistic relative to other objects | | |
| ## How to Use | |
| ```python | |
| from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor | |
| import torch | |
| # Choose a model: "co_occurrence", "location", or "size" | |
| model_id = "COinCO/Context_Classification_Models" | |
| subfolder = "co_occurrence" # or "location" or "size" | |
| model = Qwen2_5_VLForConditionalGeneration.from_pretrained( | |
| model_id, | |
| subfolder=subfolder, | |
| torch_dtype=torch.float16, | |
| device_map="auto", | |
| ) | |
| processor = AutoProcessor.from_pretrained(model_id, subfolder=subfolder) | |
| ``` | |
| ## Training Details | |
| - **Base Model:** [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | |
| - **Method:** LoRA fine-tuning (merged into base model) | |
| - **Dataset:** [COinCO](https://huggingface.co/datasets/COinCO/COinCO-dataset) inpainted images with multi-model consensus labels | |
| - **Training Data:** ~5,000 samples per criterion from the training split | |
| - **Epochs:** 3 | |
| - **Learning Rate:** 2e-4 | |
| - **LoRA Rank:** See adapter config for details | |
| ## Evaluation Results | |
| ### Inpainted Test Set (binary classification: In-context vs Out-of-context) | |
| | Criterion | Baseline (Qwen2.5-VL-3B) | Fine-tuned | Improvement | | |
| |-----------|--------------------------|------------|-------------| | |
| | Co-occurrence | 75.54% | **80.82%** | +5.28% | | |
| | Location | 74.43% | 71.05% | -3.38% | | |
| | Size | 50.21% | **66.01%** | +15.80% | | |
| ### Real COCO Images (shortcut learning detection, higher = less shortcut reliance) | |
| | Criterion | Baseline | Fine-tuned | Improvement | | |
| |-----------|----------|------------|-------------| | |
| | Co-occurrence | 88.95% | 87.00% | -1.95% | | |
| | Location | 47.55% | **91.35%** | +43.80% | | |
| | Size | 52.55% | **83.20%** | +30.65% | | |
| ## Related Resources | |
| - **Paper:** "Common Inpainted Objects In-N-Out of Context" | |
| - **Dataset:** [COinCO/COinCO-dataset](https://huggingface.co/datasets/COinCO/COinCO-dataset) | |
| - **Code:** [YangTianze009/COinCO](https://github.com/YangTianze009/COinCO) | |
| ## Citation | |
| ```bibtex | |
| @article{yang2025coinco, | |
| title={Common Inpainted Objects In-N-Out of Context}, | |
| author={Tianze Yang and Tyson Jordan and Ruitong Sun and Ninghao Liu and Jin Sun}, | |
| year={2025} | |
| } | |
| ``` | |