Training and Prompting Guide for Iconclass VLM
This document provides guidance on training, fine-tuning, and effective prompting for Iconclass Vision-Language Models across different backends.
Supported Backends
1. Ollama (Local)
- Iconclass VLM Model: small-models-for-glam/iconclass-vlm
- Pre-trained vision-language model specialized for Iconclass classification
- Based on fine-tuned VLM architecture
- Optimized for GLAM (Galleries, Libraries, Archives, Museums) domain
- Blog Post: Fine-tuning VLMs for Iconclass with TRL by Daniel van Strien
2. OpenRouter (Cloud)
- Qwen3-VL: Qwen3-VL-235B-A22B-Instruct
- State-of-the-art multimodal model for vision-language tasks
- Accessible via OpenRouter API
- Large context window and strong performance on classification tasks
- Official Repo: QwenLM/Qwen3-VL
- OpenRouter Docs: openrouter.ai/docs
Choosing a Backend
| Feature | Ollama (Local) | OpenRouter (Cloud) |
|---|---|---|
| Cost | Free (local compute) | Pay per API call |
| Privacy | Complete data privacy | Data sent to cloud |
| Performance | Depends on local hardware | Consistent, high performance |
| Setup | Requires Ollama + model pull | Just API key |
| Best for | Large batches, sensitive data | Quick testing, smaller batches |
Resources
Data Filtering Best Practices
Only Classify Children Objects
Important: When working with the Basel dataset or similar hierarchical data:
- DO classify: Objects with
mprefix (children objects) - DO NOT classify: Objects with
abbprefix (parent/aggregate objects)
Parent objects (abb) represent aggregations or collections and should not be classified individually. The pipeline automatically filters these out.
Sampling Modes
The pipeline supports three sampling modes:
- Full (
--sampling-mode full): Process all children objects - Random (
--sampling-mode random --sampling-size N --sampling-seed 42): Random sample of N objects with fixed seed for reproducibility - Fixed (
--sampling-mode fixed --fixed-ids-file ids.txt): Process specific objects listed in a file
Example:
python -m iconclass_classification classify \
--source https://forschung.stadtgeschichtebasel.ch/assets/data/metadata.json \
--sampling-mode random \
--sampling-size 100 \
--sampling-seed 42Prompt Engineering
Available Prompt Templates
The pipeline includes three prompt templates optimized for different scenarios:
1. Default (--prompt-template default)
Simple, direct instruction for the model.
Best for: Quick testing, general-purpose classification
2. Instruction (--prompt-template instruction)
Detailed instructions with explicit guidance and NONE fallback.
Best for: Improved accuracy, handling edge cases
Features:
- Step-by-step instructions
- Explicit NONE output for unclear images
- More structured format
3. Few-Shot (--prompt-template few_shot)
Includes example classifications to guide the model.
Best for: Complex images, improving consistency
Features:
- Example input-output pairs
- Demonstrates expected format
- May improve model understanding
Usage Example
python -m iconclass_classification classify \
--source https://forschung.stadtgeschichtebasel.ch/assets/data/metadata.json \
--prompt-template instruction \
--sampling-mode random \
--sampling-size 50Troubleshooting Empty Outputs
Common Causes
- Image Quality Issues
- Low resolution or heavily degraded images
- Solution: Check
image_sha256in details log, review source image
- Model Uncertainty
- Image content doesn’t clearly match known Iconclass categories
- Solution: Try different prompt templates, especially
instructionorfew_shot
- Processing Errors
- Image format or conversion issues
- Solution: Check pipeline logs for warnings during image processing
Debugging Empty Classifications
When the pipeline encounters empty classifications, it automatically logs debug information:
WARNING Empty classification for m10039: model returned no valid codes
DEBUG Object ID: m10039
DEBUG Prompt template: default
DEBUG Image SHA256: abc123...
DEBUG Raw response length: 45 chars
DEBUG Raw response preview: 'The image shows...'
Steps to Debug
- Check the logs: Look in
runs/<timestamp>/logs/pipeline.log - Review raw responses: Check
classify/<objectid>_response.json - Inspect the image: Find it in
data/<objectid>.jpg - Try different prompts: Experiment with
instructionorfew_shottemplates - Adjust temperature: Lower temperature (0.0) for consistency, higher (0.3-0.7) for creativity
Experimenting with Prompts
To test different prompts on a fixed sample:
# Create a file with specific object IDs
echo "m10039" > test_ids.txt
echo "m10040" >> test_ids.txt
# Test default prompt
python -m iconclass_classification classify \
--source https://forschung.stadtgeschichtebasel.ch/assets/data/metadata.json \
--sampling-mode fixed \
--fixed-ids-file test_ids.txt \
--prompt-template default \
--output runs/test-default
# Test instruction prompt
python -m iconclass_classification classify \
--source https://forschung.stadtgeschichtebasel.ch/assets/data/metadata.json \
--sampling-mode fixed \
--fixed-ids-file test_ids.txt \
--prompt-template instruction \
--output runs/test-instruction
# Compare results
diff runs/test-default/*/results/iconclass_details.jsonl \
runs/test-instruction/*/results/iconclass_details.jsonlModel Parameters
Temperature
Controls randomness in model outputs:
0.0: Deterministic, consistent results (recommended for production)0.3-0.5: Slight variation, may improve recall0.7-1.0: More creative, less consistent
Context Window (num_ctx)
- Default: 4096 tokens
- Increase if using very detailed prompts or few-shot examples
- Decrease to speed up inference
Prediction Length (num_predict)
- Default: 128 tokens
- Usually sufficient for 5-10 Iconclass codes
- Increase if expecting many codes per image
Best Practices
For Production Use
- Use full dataset:
--sampling-mode full - Fixed temperature:
--temperature 0.0 - Consistent prompt: Stick with one template after testing
- Log everything: Keep all artifacts for auditability
For Experimentation
- Small samples:
--sampling-mode random --sampling-size 10-100 - Fixed seed: Always use same seed for reproducibility
- Try all templates: Compare results across prompts
- Adjust parameters: Test different temperature and context settings
For Debugging
- Enable debug logging: Check logs in
runs/*/logs/pipeline.log - Review artifacts: Inspect
classify/*_response.jsonfiles - Visual inspection: Check processed images in
data/*.jpg - Fixed samples: Use
--sampling-mode fixedwith problematic objects
Performance Optimization
Speed
- Use smaller context windows if possible
- Process in batches during off-peak hours
- Consider parallel processing (future enhancement)
Quality
- Start with
instructiontemplate for best accuracy - Use
few_shotfor complex iconographic content - Review and iterate on empty or unexpected classifications
- Consider manual review of low-confidence results
Additional Resources
Support
For questions or issues:
- Check the main README
- Review the USAGE guide
- Open an issue on GitHub