Development Guide

Modified

August 29, 2025

Guide for developers contributing to or extending the omeka2dsp system.

Development Environment Setup

We recommend using GitHub Codespaces for development.

  1. Fork this repository to your GitHub account.

  2. Click the green <> Code button and select β€œCodespaces”.

  3. Click β€œCreate codespace on main”. This provides:

    • βœ… Python with uv
    • βœ… Node.js with pnpm
    • βœ… Pre-configured development environment
    • βœ… All dependencies pre-installed
  4. Create your development configuration:

    # Create development environment file
    cp example.env .env.dev
    # Edit with your test instance credentials
  5. Create a development branch:

    git checkout -b feature/your-feature-name

Prerequisites

Initial Setup

# Clone repository
git clone https://github.com/Stadt-Geschichte-Basel/omeka2dsp.git
cd omeka2dsp

# Create development branch
git checkout -b feature/your-feature-name

# Setup Python environment with dev dependencies
uv sync --dev

# Install Node.js development tools
pnpm install
pnpm run prepare

Development Configuration

Create a development environment file:

cp example.env .env.dev

Edit .env.dev with test instance credentials:

# Development configuration
OMEKA_API_URL=https://test-omeka.example.com/api/
KEY_IDENTITY=test_key_identity
KEY_CREDENTIAL=test_key_credential
ITEM_SET_ID=test_collection_id

# DSP test instance
PROJECT_SHORT_CODE=TEST
API_HOST=https://test-api.dasch.swiss
INGEST_HOST=https://test-ingest.dasch.swiss
DSP_USER=test@example.com
DSP_PWD=test_password

# Development settings
DEBUG_MODE=true
LOG_LEVEL=DEBUG

Code Architecture

Module Structure

scripts/
β”œβ”€β”€ data_2_dasch.py              # Main migration orchestrator
β”œβ”€β”€ process_data_from_omeka.py   # Omeka API interface
β”œβ”€β”€ api_get_project.py           # DSP project utilities
β”œβ”€β”€ api_get_lists.py            # DSP lists utilities
└── api_get_lists_detailed.py   # Detailed list utilities

Also refer to the documentation on the Pipeline Architecture.

Key Design Patterns

  1. Repository Pattern: Data access abstraction
  2. Builder Pattern: Payload construction
  3. Strategy Pattern: Different resource types
  4. Command Pattern: Update operations

Core Components

classDiagram
    class MigrationOrchestrator {
        +main()
        +process_items()
        +handle_item()
    }
    
    class OmekaRepository {
        +get_items_from_collection()
        +get_media()
        +extract_property()
    }
    
    class DSPRepository {
        +login()
        +get_project()
        +get_resource_by_id()
        +create_resource()
    }
    
    class PayloadBuilder {
        +construct_payload()
        +map_properties()
        +extract_list_values()
    }
    
    class SynchronizationService {
        +check_values()
        +sync_resource()
        +update_value()
    }
    
    MigrationOrchestrator --> OmekaRepository
    MigrationOrchestrator --> DSPRepository
    MigrationOrchestrator --> PayloadBuilder
    MigrationOrchestrator --> SynchronizationService

Data Flow

sequenceDiagram
    participant Main as main()
    participant Omeka as OmekaRepository
    participant Builder as PayloadBuilder
    participant DSP as DSPRepository
    participant Sync as SyncService
    
    Main->>Omeka: get_items_from_collection()
    Omeka->>Main: items[]
    
    loop For each item
        Main->>Omeka: get_media(item_id)
        Omeka->>Main: media[]
        
        Main->>DSP: get_resource_by_id()
        DSP->>Main: existing_resource or None
        
        alt Resource doesn't exist
            Main->>Builder: construct_payload()
            Builder->>Main: payload
            Main->>DSP: create_resource(payload)
        else Resource exists
            Main->>Sync: check_values()
            Sync->>Main: changes[]
            Main->>Sync: apply_updates()
        end
    end

Also refer to the documentation on the Pipeline Workflow.

Contributing Guidelines

Git Workflow

  1. Create Feature Branch

    git checkout -b feature/description-of-feature
  2. Make Changes

    # Make your changes
    git add .
    git commit -m "feat: add new synchronization feature"
  3. Update Documentation

    # Update relevant documentation
    # Add tests for new features
  4. Submit Pull Request

    git push origin feature/description-of-feature
    # Create PR on GitHub

Commit Message Convention

Follow Conventional Commits:

# Types
feat: new feature
fix: bug fix
docs: documentation changes
style: code style changes
refactor: code refactoring
test: test additions/changes
chore: maintenance tasks

# Examples
feat: add support for video files
fix: handle missing media files gracefully
docs: update API documentation
refactor: extract payload building to separate module

Code Review Checklist

Debugging

Debugging Configuration

# Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)

# Add debug prints
def debug_item_processing(item):
    print(f"Processing item: {item.get('o:id')}")
    print(f"Title: {extract_property(item.get('dcterms:title', []), 1)}")
    print(f"Identifier: {extract_property(item.get('dcterms:identifier', []), 10)}")

Common Debugging Techniques

  1. Inspect API Responses

    import json
    
    # Pretty print API responses
    response = requests.get(url)
    print(json.dumps(response.json(), indent=2))
  2. Test Individual Functions

    # Test payload construction
    test_item = {...}  # Sample Omeka item
    payload = construct_payload(test_item, "sgb_OBJECT", "project_iri", [], "", "")
    print(json.dumps(payload, indent=2))
  3. Validate Data Transformations

    # Check property extraction
    props = item.get("dcterms:subject", [])
    subjects = extract_combined_values(props)
    print(f"Extracted subjects: {subjects}")

Debug Mode

Add debug mode to main script:

DEBUG_MODE = os.getenv('DEBUG_MODE', 'false').lower() == 'true'

if DEBUG_MODE:
    # Enable verbose logging
    logging.getLogger().setLevel(logging.DEBUG)
    
    # Add debug breakpoints
    import pdb; pdb.set_trace()
    
    # Save intermediate data
    with open('debug_payload.json', 'w') as f:
        json.dump(payload, f, indent=2)

Code Style

Python Style Guidelines

Follow PEP 8 with these specific guidelines:

# Imports
import os
import logging
from typing import Dict, List, Optional

# Constants
MAX_RETRIES = 3
API_TIMEOUT = 30

# Function definitions
def extract_property(props: List[Dict], prop_id: int, as_uri: bool = False) -> str:
    """Extract property value from Omeka property array.
    
    Args:
        props: List of property dictionaries
        prop_id: Numerical property ID to find
        as_uri: Return as formatted URI link
        
    Returns:
        Property value as string or empty string if not found
    """
    for prop in props:
        if prop.get("property_id") == prop_id:
            if as_uri:
                return f"[{prop.get('o:label', '')}]({prop.get('@id', '')})"
            return prop.get("@value", "")
    return ""

# Error handling
try:
    result = api_call()
except requests.RequestException as e:
    logging.error(f"API call failed: {e}")
    raise

Code Formatting Tools

# Development dependencies are managed via uv
uv sync --dev  # Installs black, flake8, isort if configured

# Format code
uv run black scripts/
uv run isort scripts/

# Check style
uv run flake8 scripts/

# Pre-commit hooks (configured via pnpm)
pnpm run pre-commit -- run --all-files

Documentation Standards

def complex_function(param1: str, param2: Optional[Dict] = None) -> List[str]:
    """Brief description of function purpose.
    
    Detailed description if needed. Explain complex logic,
    assumptions, and important behaviors.
    
    Args:
        param1: Description of first parameter
        param2: Description of optional parameter with default behavior
        
    Returns:
        Description of return value and its structure
        
    Raises:
        ValueError: When param1 is invalid
        RequestException: When API calls fail
        
    Example:
        >>> result = complex_function("test", {"key": "value"})
        >>> print(result)
        ['processed', 'values']
    """
    # Implementation here
    pass

Release Process

Version Management

# Update version in setup files
# Follow semantic versioning: MAJOR.MINOR.PATCH

# Tag release
git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0

Release Checklist

Deployment

# Production deployment checklist
- [ ] Backup current production data
- [ ] Deploy to staging environment
- [ ] Run integration tests
- [ ] Monitor staging for 24 hours
- [ ] Deploy to production
- [ ] Monitor production deployment
Back to top