Installation & Setup Guide

Modified

October 15, 2025

This guide walks you through installing and setting up the omeka2dsp system for data migration from Omeka to DSP.

Prerequisites

Access Requirements

Before installation, ensure you have:

Omeka API Access:
- Omeka instance URL
- API key identity and credential
- Collection/item set ID to migrate
DSP Instance Access:
- DSP API endpoint URL
- DSP user account with appropriate permissions
- Existing DSP project with data model
Permissions:
- Read access to Omeka collections and media
- Create/update permissions in DSP project
- File upload permissions to DSP storage

Installation

We recommend using GitHub Codespaces for a reproducible setup.

Getting Started

Fork or clone this repository to your GitHub account.
Click the green <> Code button at the top right of this repository.
Select the “Codespaces” tab and click “Create codespace on main”. GitHub will now build a container that includes:
- ✅ Python with uv
- ✅ Node.js with pnpm
- ✅ All project dependencies pre-installed

Once the Codespace is ready, configure your environment:

# Copy example environment file
cp example.env .env
# Edit .env with your credentials using the built-in editor

Test the installation:

# Test Omeka API access
uv run python scripts/api_get_project.py

Note: All dependencies (Python via uv, Node.js via pnpm) are pre-installed in the Codespace.

Prerequisites

Note: uv installs and manages the correct Python version automatically.

Local Setup Steps

# 1. Clone repository
git clone https://github.com/Stadt-Geschichte-Basel/omeka2dsp.git
cd omeka2dsp

# 2. Install Node.js dependencies
pnpm install
pnpm run prepare

# 3. Setup Python environment
uv sync

# 4. Configure environment
cp example.env .env
# Edit .env with your specific settings

# 5. Test installation
uv run python scripts/api_get_project.py

Configuration

1. Environment File Setup

Configure your environment variables:

# Copy example environment file
cp example.env .env

2. Edit Environment Variables

Update the .env file with your specific settings:

# Omeka Configuration
OMEKA_API_URL=https://omeka.unibe.ch/api/
KEY_IDENTITY=your_api_key_identity
KEY_CREDENTIAL=your_api_key_credential
ITEM_SET_ID=your_collection_id

# DSP Configuration
PROJECT_SHORT_CODE=4001
API_HOST=https://api.dasch.swiss
INGEST_HOST=https://ingest.dasch.swiss
DSP_USER=your_dsp_username
DSP_PWD=your_dsp_password

# Optional Configuration
ONTOLOGY_NAME=SGB

3. Validate Configuration

Test your configuration with a simple API call:

# Test Omeka API access
uv run python scripts/api_get_project.py

# Expected output: Project information saved to ../data/project_data.json

DSP Project Setup

1. Data Model Requirements

Your DSP project must have a compatible data model. The system expects:

Required Resource Classes:

SGB:Parent: Main metadata objects
SGB:Image: Still image media
SGB:Document: Document and text-based media
SGB:ResourceWithoutMedia: Records without a deliverable file

Required Properties:

hasIdentifier: Unique item identifier
hasTitle: Resource title
hasDescription: Resource description or abstract
hasCreator: Creator information
hasDate: Date information in EDTF format
hasSubjectList: Subject classifications (Iconclass list)
hasTypeList: Resource type (DCMI list)
hasFormatList: File format (Internet Media Type list)
hasLanguageList: Language (ISO 639-1 list)
hasLicenseList: Rights statement / licence selection
hasRights: Human-readable rights statement
hasSource: Provenance information
linkToParentObject: Link from media items to their parent metadata resource

2. Create Data Model

If you need to create the data model, use the provided JSON:

# Using DSP-Tools (requires system administrator rights)
dsp-tools create -s your-dsp-host -u admin@example.com -p password data/data_model_dasch.json

3. Verify Lists Setup

Ensure your project has the required value lists:

# Fetch current lists
uv run python scripts/api_get_lists.py

# Get detailed list information
uv run python scripts/api_get_lists_detailed.py

# Verify lists in data/data_lists_detail.json

Required lists include:

DCMI Type Vocabulary: For resource types
Internet Media Type: For file formats
ISO 639-1: For languages

Directory Structure Setup

The system expects the following directory structure:

omeka2dsp/
├── data/                          # Configuration and cache files
│   ├── data_model_dasch.json      # DSP data model definition
│   ├── data_lists.json            # DSP lists summary (generated)
│   ├── data_lists_detail.json     # Detailed lists (generated)
│   ├── project_data.json          # Project info (generated)
│   └── media_files/               # Downloaded media cache
├── docs/                          # Documentation
├── scripts/                       # Python scripts
│   ├── data_2_dasch.py            # Main migration script
│   ├── process_data_from_omeka.py # Omeka API utilities
│   ├── api_get_project.py         # Project info fetcher
│   ├── api_get_lists.py           # Lists fetcher
│   └── api_get_lists_detailed.py  # Detailed lists fetcher
├── .env                           # Environment configuration
├── .gitignore                     # Git ignore rules
├── README.md                      # Basic documentation
└── requirements.txt               # Python dependencies (if created)

Create any missing directories, e.g.:

mkdir -p data/media_files
mkdir -p logs

Verification Steps

1. Test Omeka Connection

uv run python -c "
import os
import requests
from scripts.process_data_from_omeka import get_items_from_collection

# Load environment
from dotenv import load_dotenv
load_dotenv()

try:
    items = get_items_from_collection(os.getenv('ITEM_SET_ID'))[:5]
    print(f'Successfully connected to Omeka. Found {len(items)} test items.')
except Exception as e:
    print(f'Omeka connection failed: {e}')
"

2. Test DSP Authentication

uv run python -c "
import os
from scripts.data_2_dasch import login

try:
    token = login(os.getenv('DSP_USER'), os.getenv('DSP_PWD'))
    print('DSP authentication successful')
except Exception as e:
    print(f'DSP authentication failed: {e}')
"

3. Verify Project Setup

uv run python -c "
import os
from scripts.data_2_dasch import login, get_project, get_lists

try:
    token = login(os.getenv('DSP_USER'), os.getenv('DSP_PWD'))
    project_iri = get_project()
    lists = get_lists(project_iri)
    print(f'Project setup verified. Found {len(lists)} lists.')
except Exception as e:
    print(f'Project verification failed: {e}')
"

Common Installation Issues

Python Import Errors

Issue: ModuleNotFoundError: No module named 'requests'

Solution:

# Ensure dependencies are installed via uv
uv sync

Environment Variable Issues

Issue: KeyError: 'PROJECT_SHORT_CODE'

Solution:

# Check .env file exists and contains required variables
cat .env | grep PROJECT_SHORT_CODE

# Load environment variables if using a different shell
export $(cat .env | xargs)

API Connection Issues

Issue: Connection timeouts or SSL errors

Solutions:

# Test basic connectivity
curl -k https://your-dsp-host.com/health

# Check firewall and proxy settings
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=https://proxy.example.com:8080

Permission Issues

Issue: DSP API returns 403 Forbidden

Solution:

Verify user account has correct project permissions
Check if user account is active and not locked
Confirm API endpoints are correct

File System Permissions

Issue: Cannot write to data directory

Solution:

# Fix directory permissions
chmod 755 data/
mkdir -p data/media_files
chmod 755 data/media_files

Development Environment Setup

For development and contribution:

1. Install Development Tools

# Install pre-commit hooks
pnpm install
pnpm run prepare

# Install Python development tools (if available)
uv sync --dev

2. Code Formatting

# Check code formatting
pnpm run check

# Auto-format code
pnpm run format

3. Commit Standards

# Use conventional commits
pnpm run commit

# Or use git directly with conventional format
git commit -m "feat: add new synchronization feature"

Performance Optimization

1. Large Dataset Handling

For large collections (>1000 items):

# Use sample mode for testing
uv run python scripts/data_2_dasch.py -m sample_data

# Process in smaller batches
# Edit NUMBER_RANDOM_OBJECTS in data_2_dasch.py

2. Memory Management

# Monitor memory usage during processing
htop

# For very large datasets, consider running on a server with more RAM

3. Network Optimization

# For slow connections, increase timeout values in scripts
# Edit timeout parameters in API calls

Security Considerations

1. Credential Management

Never commit .env file to version control
Use secure credential storage for production
Rotate API keys regularly

2. File Permissions

# Secure environment file
chmod 600 .env

# Secure log files
chmod 600 *.log

3. Network Security

Use HTTPS for all API communications
Consider VPN for sensitive data transfers
Monitor API access logs

Next Steps

After successful installation:

Read the Configuration Guide for detailed configuration options
Follow the Usage Guide for running your first migration
Review the Architecture Documentation to understand the system
Check the Troubleshooting Guide for common issues

Support

If you encounter issues during installation:

Check the Troubleshooting Guide
Review the logs in data_2_dasch.log
Create an issue on the GitHub repository
Include system information, error messages, and configuration details (without sensitive data)

Installation is now complete! You’re ready to configure and run your first data migration.