classDiagram
class MigrationOrchestrator {
+main()
+process_items()
+handle_item()
}
class OmekaRepository {
+get_items_from_collection()
+get_media()
+extract_property()
}
class DSPRepository {
+login()
+get_project()
+get_resource_by_id()
+create_resource()
}
class PayloadBuilder {
+construct_payload()
+map_properties()
+extract_list_values()
}
class SynchronizationService {
+check_values()
+sync_resource()
+update_value()
}
MigrationOrchestrator --> OmekaRepository
MigrationOrchestrator --> DSPRepository
MigrationOrchestrator --> PayloadBuilder
MigrationOrchestrator --> SynchronizationService
Development Guide
Guide for developers contributing to or extending the omeka2dsp system.
Development Environment Setup
We recommend using GitHub Codespaces for development.
Fork this repository to your GitHub account.
Click the green
<> Codebutton and select βCodespacesβ.Click βCreate codespace on
mainβ. This provides:- β
Python with
uv - β
Node.js with
pnpm - β Pre-configured development environment
- β All dependencies pre-installed
- β
Python with
Create your development configuration:
# Create development environment file cp example.env .env.dev # Edit with your test instance credentialsCreate a development branch:
git checkout -b feature/your-feature-name
Prerequisites
- uv (Python manager)
- pnpm
- Node.js (latest LTS)
- Git
- Access to Omeka and DSP test instances
Initial Setup
# Clone repository
git clone https://github.com/Stadt-Geschichte-Basel/omeka2dsp.git
cd omeka2dsp
# Create development branch
git checkout -b feature/your-feature-name
# Setup Python environment with dev dependencies
uv sync --dev
# Install Node.js development tools
pnpm install
pnpm run prepareDevelopment Configuration
Create a development environment file:
cp example.env .env.devEdit .env.dev with test instance credentials:
# Development configuration
OMEKA_API_URL=https://test-omeka.example.com/api/
KEY_IDENTITY=test_key_identity
KEY_CREDENTIAL=test_key_credential
ITEM_SET_ID=test_collection_id
# DSP test instance
PROJECT_SHORT_CODE=TEST
API_HOST=https://test-api.dasch.swiss
INGEST_HOST=https://test-ingest.dasch.swiss
DSP_USER=test@example.com
DSP_PWD=test_password
# Development settings
DEBUG_MODE=true
LOG_LEVEL=DEBUGCode Architecture
Module Structure
scripts/
βββ data_2_dasch.py # Main migration orchestrator
βββ process_data_from_omeka.py # Omeka API interface
βββ api_get_project.py # DSP project utilities
βββ api_get_lists.py # DSP lists utilities
βββ api_get_lists_detailed.py # Detailed list utilities
Also refer to the documentation on the Pipeline Architecture.
Key Design Patterns
- Repository Pattern: Data access abstraction
- Builder Pattern: Payload construction
- Strategy Pattern: Different resource types
- Command Pattern: Update operations
Core Components
Data Flow
sequenceDiagram
participant Main as main()
participant Omeka as OmekaRepository
participant Builder as PayloadBuilder
participant DSP as DSPRepository
participant Sync as SyncService
Main->>Omeka: get_items_from_collection()
Omeka->>Main: items[]
loop For each item
Main->>Omeka: get_media(item_id)
Omeka->>Main: media[]
Main->>DSP: get_resource_by_id()
DSP->>Main: existing_resource or None
alt Resource doesn't exist
Main->>Builder: construct_payload()
Builder->>Main: payload
Main->>DSP: create_resource(payload)
else Resource exists
Main->>Sync: check_values()
Sync->>Main: changes[]
Main->>Sync: apply_updates()
end
end
Also refer to the documentation on the Pipeline Workflow.
Contributing Guidelines
Git Workflow
Create Feature Branch
git checkout -b feature/description-of-featureMake Changes
# Make your changes git add . git commit -m "feat: add new synchronization feature"Update Documentation
# Update relevant documentation # Add tests for new featuresSubmit Pull Request
git push origin feature/description-of-feature # Create PR on GitHub
Commit Message Convention
Follow Conventional Commits:
# Types
feat: new feature
fix: bug fix
docs: documentation changes
style: code style changes
refactor: code refactoring
test: test additions/changes
chore: maintenance tasks
# Examples
feat: add support for video files
fix: handle missing media files gracefully
docs: update API documentation
refactor: extract payload building to separate moduleCode Review Checklist
Debugging
Debugging Configuration
# Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)
# Add debug prints
def debug_item_processing(item):
print(f"Processing item: {item.get('o:id')}")
print(f"Title: {extract_property(item.get('dcterms:title', []), 1)}")
print(f"Identifier: {extract_property(item.get('dcterms:identifier', []), 10)}")Common Debugging Techniques
Inspect API Responses
import json # Pretty print API responses response = requests.get(url) print(json.dumps(response.json(), indent=2))Test Individual Functions
# Test payload construction test_item = {...} # Sample Omeka item payload = construct_payload(test_item, "sgb_OBJECT", "project_iri", [], "", "") print(json.dumps(payload, indent=2))Validate Data Transformations
# Check property extraction props = item.get("dcterms:subject", []) subjects = extract_combined_values(props) print(f"Extracted subjects: {subjects}")
Debug Mode
Add debug mode to main script:
DEBUG_MODE = os.getenv('DEBUG_MODE', 'false').lower() == 'true'
if DEBUG_MODE:
# Enable verbose logging
logging.getLogger().setLevel(logging.DEBUG)
# Add debug breakpoints
import pdb; pdb.set_trace()
# Save intermediate data
with open('debug_payload.json', 'w') as f:
json.dump(payload, f, indent=2)Code Style
Python Style Guidelines
Follow PEP 8 with these specific guidelines:
# Imports
import os
import logging
from typing import Dict, List, Optional
# Constants
MAX_RETRIES = 3
API_TIMEOUT = 30
# Function definitions
def extract_property(props: List[Dict], prop_id: int, as_uri: bool = False) -> str:
"""Extract property value from Omeka property array.
Args:
props: List of property dictionaries
prop_id: Numerical property ID to find
as_uri: Return as formatted URI link
Returns:
Property value as string or empty string if not found
"""
for prop in props:
if prop.get("property_id") == prop_id:
if as_uri:
return f"[{prop.get('o:label', '')}]({prop.get('@id', '')})"
return prop.get("@value", "")
return ""
# Error handling
try:
result = api_call()
except requests.RequestException as e:
logging.error(f"API call failed: {e}")
raiseCode Formatting Tools
# Development dependencies are managed via uv
uv sync --dev # Installs black, flake8, isort if configured
# Format code
uv run black scripts/
uv run isort scripts/
# Check style
uv run flake8 scripts/
# Pre-commit hooks (configured via pnpm)
pnpm run pre-commit -- run --all-filesDocumentation Standards
def complex_function(param1: str, param2: Optional[Dict] = None) -> List[str]:
"""Brief description of function purpose.
Detailed description if needed. Explain complex logic,
assumptions, and important behaviors.
Args:
param1: Description of first parameter
param2: Description of optional parameter with default behavior
Returns:
Description of return value and its structure
Raises:
ValueError: When param1 is invalid
RequestException: When API calls fail
Example:
>>> result = complex_function("test", {"key": "value"})
>>> print(result)
['processed', 'values']
"""
# Implementation here
passRelease Process
Version Management
# Update version in setup files
# Follow semantic versioning: MAJOR.MINOR.PATCH
# Tag release
git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0Release Checklist
Deployment
# Production deployment checklist
- [ ] Backup current production data
- [ ] Deploy to staging environment
- [ ] Run integration tests
- [ ] Monitor staging for 24 hours
- [ ] Deploy to production
- [ ] Monitor production deployment