Installation & Setup Guide
This guide walks you through installing and setting up the omeka2dsp system for data migration from Omeka to DSP.
Prerequisites
Access Requirements
Before installation, ensure you have:
- Omeka API Access:
- Omeka instance URL
- API key identity and credential
- Collection/item set ID to migrate
- DSP Instance Access:
- DSP API endpoint URL
- DSP user account with appropriate permissions
- Existing DSP project with data model
- Permissions:
- Read access to Omeka collections and media
- Create/update permissions in DSP project
- File upload permissions to DSP storage
Installation
We recommend using GitHub Codespaces for a reproducible setup.
Getting Started
Fork or clone this repository to your GitHub account.
Click the green
<> Code
button at the top right of this repository.Select the βCodespacesβ tab and click βCreate codespace on
main
β. GitHub will now build a container that includes:- β
Python with
uv
- β
Node.js with
pnpm
- β All project dependencies pre-installed
- β
Python with
Once the Codespace is ready, configure your environment:
# Copy example environment file cp example.env .env # Edit .env with your credentials using the built-in editor
Test the installation:
# Test Omeka API access uv run python scripts/api_get_project.py
Note: All dependencies (Python via uv, Node.js via pnpm) are pre-installed in the Codespace.
Prerequisites
- uv (Python manager)
- pnpm
- Node.js (latest LTS)
Note:
uv
installs and manages the correct Python version automatically.
Local Setup Steps
# 1. Clone repository
git clone https://github.com/Stadt-Geschichte-Basel/omeka2dsp.git
cd omeka2dsp
# 2. Install Node.js dependencies
pnpm install
pnpm run prepare
# 3. Setup Python environment
uv sync
# 4. Configure environment
cp example.env .env
# Edit .env with your specific settings
# 5. Test installation
uv run python scripts/api_get_project.py
Configuration
1. Environment File Setup
Configure your environment variables:
# Copy example environment file
cp example.env .env
2. Edit Environment Variables
Update the .env
file with your specific settings:
# Omeka Configuration
OMEKA_API_URL=https://omeka.unibe.ch/api/
KEY_IDENTITY=your_api_key_identity
KEY_CREDENTIAL=your_api_key_credential
ITEM_SET_ID=your_collection_id
# DSP Configuration
PROJECT_SHORT_CODE=0712
API_HOST=https://api.dasch.swiss
INGEST_HOST=https://ingest.dasch.swiss
DSP_USER=your_dsp_username
DSP_PWD=your_dsp_password
# Optional Configuration
PREFIX=StadtGeschichteBasel_v1:
3. Validate Configuration
Test your configuration with a simple API call:
# Test Omeka API access
uv run python scripts/api_get_project.py
# Expected output: Project information saved to ../data/project_data.json
DSP Project Setup
1. Data Model Requirements
Your DSP project must have a compatible data model. The system expects:
Required Resource Classes:
sgb_OBJECT
: Main metadata objectssgb_MEDIA_IMAGE
: Image filessgb_MEDIA_ARCHIV
: Archive files (PDFs, documents)
Required Properties:
identifier
: Unique item identifiertitle
: Resource titledescription
: Resource descriptioncreator
: Creator informationdate
: Date informationsubject
: Subject classificationstype
: Resource type (mapped to lists)format
: File format (mapped to lists)language
: Language (mapped to lists)
2. Create Data Model
If you need to create the data model, use the provided JSON:
# Using DSP-Tools (requires system administrator rights)
dsp-tools create -s your-dsp-host -u admin@example.com -p password data/data_model_dasch.json
3. Verify Lists Setup
Ensure your project has the required value lists:
# Fetch current lists
uv run python scripts/api_get_lists.py
# Get detailed list information
uv run python scripts/api_get_lists_detailed.py
# Verify lists in data/data_lists_detail.json
Required lists include:
- DCMI Type Vocabulary: For resource types
- Internet Media Type: For file formats
- ISO 639 Language Codes: For languages
Directory Structure Setup
The system expects the following directory structure:
omeka2dsp/
βββ data/ # Configuration and cache files
β βββ data_model_dasch.json # DSP data model definition
β βββ data_lists.json # DSP lists summary (generated)
β βββ data_lists_detail.json # Detailed lists (generated)
β βββ project_data.json # Project info (generated)
β βββ media_files/ # Downloaded media cache
βββ docs/ # Documentation
βββ scripts/ # Python scripts
β βββ data_2_dasch.py # Main migration script
β βββ process_data_from_omeka.py # Omeka API utilities
β βββ api_get_project.py # Project info fetcher
β βββ api_get_lists.py # Lists fetcher
β βββ api_get_lists_detailed.py # Detailed lists fetcher
βββ .env # Environment configuration
βββ .gitignore # Git ignore rules
βββ README.md # Basic documentation
βββ requirements.txt # Python dependencies (if created)
Create any missing directories, e.g.:
mkdir -p data/media_files
mkdir -p logs
Verification Steps
1. Test Omeka Connection
uv run python -c "
import os
import requests
from scripts.process_data_from_omeka import get_items_from_collection
# Load environment
from dotenv import load_dotenv
load_dotenv()
try:
items = get_items_from_collection(os.getenv('ITEM_SET_ID'))[:5]
print(f'Successfully connected to Omeka. Found {len(items)} test items.')
except Exception as e:
print(f'Omeka connection failed: {e}')
"
2. Test DSP Authentication
uv run python -c "
import os
from scripts.data_2_dasch import login
try:
token = login(os.getenv('DSP_USER'), os.getenv('DSP_PWD'))
print('DSP authentication successful')
except Exception as e:
print(f'DSP authentication failed: {e}')
"
3. Verify Project Setup
uv run python -c "
import os
from scripts.data_2_dasch import login, get_project, get_lists
try:
token = login(os.getenv('DSP_USER'), os.getenv('DSP_PWD'))
project_iri = get_project()
lists = get_lists(project_iri)
print(f'Project setup verified. Found {len(lists)} lists.')
except Exception as e:
print(f'Project verification failed: {e}')
"
Common Installation Issues
Python Import Errors
Issue: ModuleNotFoundError: No module named 'requests'
Solution:
# Ensure dependencies are installed via uv
uv sync
Environment Variable Issues
Issue: KeyError: 'PROJECT_SHORT_CODE'
Solution:
# Check .env file exists and contains required variables
cat .env | grep PROJECT_SHORT_CODE
# Load environment variables if using a different shell
export $(cat .env | xargs)
API Connection Issues
Issue: Connection timeouts or SSL errors
Solutions:
# Test basic connectivity
curl -k https://your-dsp-host.com/health
# Check firewall and proxy settings
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=https://proxy.example.com:8080
Permission Issues
Issue: DSP API returns 403 Forbidden
Solution:
- Verify user account has correct project permissions
- Check if user account is active and not locked
- Confirm API endpoints are correct
File System Permissions
Issue: Cannot write to data directory
Solution:
# Fix directory permissions
chmod 755 data/
mkdir -p data/media_files
chmod 755 data/media_files
Development Environment Setup
For development and contribution:
1. Install Development Tools
# Install pre-commit hooks
pnpm install
pnpm run prepare
# Install Python development tools (if available)
uv sync --dev
2. Code Formatting
# Check code formatting
pnpm run check
# Auto-format code
pnpm run format
3. Commit Standards
# Use conventional commits
pnpm run commit
# Or use git directly with conventional format
git commit -m "feat: add new synchronization feature"
Performance Optimization
1. Large Dataset Handling
For large collections (>1000 items):
# Use sample mode for testing
uv run python scripts/data_2_dasch.py -m sample_data
# Process in smaller batches
# Edit NUMBER_RANDOM_OBJECTS in data_2_dasch.py
2. Memory Management
# Monitor memory usage during processing
htop
# For very large datasets, consider running on a server with more RAM
3. Network Optimization
# For slow connections, increase timeout values in scripts
# Edit timeout parameters in API calls
Security Considerations
1. Credential Management
- Never commit
.env
file to version control - Use secure credential storage for production
- Rotate API keys regularly
2. File Permissions
# Secure environment file
chmod 600 .env
# Secure log files
chmod 600 *.log
3. Network Security
- Use HTTPS for all API communications
- Consider VPN for sensitive data transfers
- Monitor API access logs
Next Steps
After successful installation:
- Read the Configuration Guide for detailed configuration options
- Follow the Usage Guide for running your first migration
- Review the Architecture Documentation to understand the system
- Check the Troubleshooting Guide for common issues
Support
If you encounter issues during installation:
- Check the Troubleshooting Guide
- Review the logs in
data_2_dasch.log
- Create an issue on the GitHub repository
- Include system information, error messages, and configuration details (without sensitive data)
Installation is now complete! Youβre ready to configure and run your first data migration.