classDiagram class MigrationOrchestrator { +main() +process_items() +handle_item() } class OmekaRepository { +get_items_from_collection() +get_media() +extract_property() } class DSPRepository { +login() +get_project() +get_resource_by_id() +create_resource() } class PayloadBuilder { +construct_payload() +map_properties() +extract_list_values() } class SynchronizationService { +check_values() +sync_resource() +update_value() } MigrationOrchestrator --> OmekaRepository MigrationOrchestrator --> DSPRepository MigrationOrchestrator --> PayloadBuilder MigrationOrchestrator --> SynchronizationService
Development Guide
Guide for developers contributing to or extending the omeka2dsp system.
Development Environment Setup
We recommend using GitHub Codespaces for development.
Fork this repository to your GitHub account.
Click the green
<> Code
button and select βCodespacesβ.Click βCreate codespace on
main
β. This provides:- β
Python with
uv
- β
Node.js with
pnpm
- β Pre-configured development environment
- β All dependencies pre-installed
- β
Python with
Create your development configuration:
# Create development environment file cp example.env .env.dev # Edit with your test instance credentials
Create a development branch:
git checkout -b feature/your-feature-name
Prerequisites
- uv (Python manager)
- pnpm
- Node.js (latest LTS)
- Git
- Access to Omeka and DSP test instances
Initial Setup
# Clone repository
git clone https://github.com/Stadt-Geschichte-Basel/omeka2dsp.git
cd omeka2dsp
# Create development branch
git checkout -b feature/your-feature-name
# Setup Python environment with dev dependencies
uv sync --dev
# Install Node.js development tools
pnpm install
pnpm run prepare
Development Configuration
Create a development environment file:
cp example.env .env.dev
Edit .env.dev
with test instance credentials:
# Development configuration
OMEKA_API_URL=https://test-omeka.example.com/api/
KEY_IDENTITY=test_key_identity
KEY_CREDENTIAL=test_key_credential
ITEM_SET_ID=test_collection_id
# DSP test instance
PROJECT_SHORT_CODE=TEST
API_HOST=https://test-api.dasch.swiss
INGEST_HOST=https://test-ingest.dasch.swiss
DSP_USER=test@example.com
DSP_PWD=test_password
# Development settings
DEBUG_MODE=true
LOG_LEVEL=DEBUG
Code Architecture
Module Structure
scripts/
βββ data_2_dasch.py # Main migration orchestrator
βββ process_data_from_omeka.py # Omeka API interface
βββ api_get_project.py # DSP project utilities
βββ api_get_lists.py # DSP lists utilities
βββ api_get_lists_detailed.py # Detailed list utilities
Also refer to the documentation on the Pipeline Architecture.
Key Design Patterns
- Repository Pattern: Data access abstraction
- Builder Pattern: Payload construction
- Strategy Pattern: Different resource types
- Command Pattern: Update operations
Core Components
Data Flow
sequenceDiagram participant Main as main() participant Omeka as OmekaRepository participant Builder as PayloadBuilder participant DSP as DSPRepository participant Sync as SyncService Main->>Omeka: get_items_from_collection() Omeka->>Main: items[] loop For each item Main->>Omeka: get_media(item_id) Omeka->>Main: media[] Main->>DSP: get_resource_by_id() DSP->>Main: existing_resource or None alt Resource doesn't exist Main->>Builder: construct_payload() Builder->>Main: payload Main->>DSP: create_resource(payload) else Resource exists Main->>Sync: check_values() Sync->>Main: changes[] Main->>Sync: apply_updates() end end
Also refer to the documentation on the Pipeline Workflow.
Contributing Guidelines
Git Workflow
Create Feature Branch
git checkout -b feature/description-of-feature
Make Changes
# Make your changes git add . git commit -m "feat: add new synchronization feature"
Update Documentation
# Update relevant documentation # Add tests for new features
Submit Pull Request
git push origin feature/description-of-feature # Create PR on GitHub
Commit Message Convention
Follow Conventional Commits:
# Types
feat: new feature
fix: bug fix
docs: documentation changes
style: code style changes
refactor: code refactoring
test: test additions/changes
chore: maintenance tasks
# Examples
feat: add support for video files
fix: handle missing media files gracefully
docs: update API documentation
refactor: extract payload building to separate module
Code Review Checklist
Debugging
Debugging Configuration
# Enable debug logging
import logging
=logging.DEBUG)
logging.basicConfig(level
# Add debug prints
def debug_item_processing(item):
print(f"Processing item: {item.get('o:id')}")
print(f"Title: {extract_property(item.get('dcterms:title', []), 1)}")
print(f"Identifier: {extract_property(item.get('dcterms:identifier', []), 10)}")
Common Debugging Techniques
Inspect API Responses
import json # Pretty print API responses = requests.get(url) response print(json.dumps(response.json(), indent=2))
Test Individual Functions
# Test payload construction = {...} # Sample Omeka item test_item = construct_payload(test_item, "sgb_OBJECT", "project_iri", [], "", "") payload print(json.dumps(payload, indent=2))
Validate Data Transformations
# Check property extraction = item.get("dcterms:subject", []) props = extract_combined_values(props) subjects print(f"Extracted subjects: {subjects}")
Debug Mode
Add debug mode to main script:
= os.getenv('DEBUG_MODE', 'false').lower() == 'true'
DEBUG_MODE
if DEBUG_MODE:
# Enable verbose logging
logging.getLogger().setLevel(logging.DEBUG)
# Add debug breakpoints
import pdb; pdb.set_trace()
# Save intermediate data
with open('debug_payload.json', 'w') as f:
=2) json.dump(payload, f, indent
Code Style
Python Style Guidelines
Follow PEP 8 with these specific guidelines:
# Imports
import os
import logging
from typing import Dict, List, Optional
# Constants
= 3
MAX_RETRIES = 30
API_TIMEOUT
# Function definitions
def extract_property(props: List[Dict], prop_id: int, as_uri: bool = False) -> str:
"""Extract property value from Omeka property array.
Args:
props: List of property dictionaries
prop_id: Numerical property ID to find
as_uri: Return as formatted URI link
Returns:
Property value as string or empty string if not found
"""
for prop in props:
if prop.get("property_id") == prop_id:
if as_uri:
return f"[{prop.get('o:label', '')}]({prop.get('@id', '')})"
return prop.get("@value", "")
return ""
# Error handling
try:
= api_call()
result except requests.RequestException as e:
f"API call failed: {e}")
logging.error(raise
Code Formatting Tools
# Development dependencies are managed via uv
uv sync --dev # Installs black, flake8, isort if configured
# Format code
uv run black scripts/
uv run isort scripts/
# Check style
uv run flake8 scripts/
# Pre-commit hooks (configured via pnpm)
pnpm run pre-commit -- run --all-files
Documentation Standards
def complex_function(param1: str, param2: Optional[Dict] = None) -> List[str]:
"""Brief description of function purpose.
Detailed description if needed. Explain complex logic,
assumptions, and important behaviors.
Args:
param1: Description of first parameter
param2: Description of optional parameter with default behavior
Returns:
Description of return value and its structure
Raises:
ValueError: When param1 is invalid
RequestException: When API calls fail
Example:
>>> result = complex_function("test", {"key": "value"})
>>> print(result)
['processed', 'values']
"""
# Implementation here
pass
Release Process
Version Management
# Update version in setup files
# Follow semantic versioning: MAJOR.MINOR.PATCH
# Tag release
git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0
Release Checklist
Deployment
# Production deployment checklist
- [ ] Backup current production data
- [ ] Deploy to staging environment
- [ ] Run integration tests
- [ ] Monitor staging for 24 hours
- [ ] Deploy to production
- [ ] Monitor production deployment