omeka2dsp

Long-Term Archival Pipeline for Stadt.Geschichte.Basel

Modified

August 29, 2025

This repository contains the pipeline and data model for the long-term preservation of the research data of Stadt.Geschichte.Basel (SGB) on the DaSCH Service Platform (DSP).
It enables the transfer of metadata and media files from the SGB Omeka S instance to the DSP. The pipeline detects changes, updates existing records, and ensures reproducible and open research.

GitHub issues GitHub forks GitHub stars Code license Data license DOI

📚 Documentation

Comprehensive documentation is available in the docs/ directory:

📒 Quick Start Guides

⚡ Quick Installation

We recommend using GitHub Codespaces for a reproducible setup.

Local Installation

# Clone repository
git clone https://github.com/Stadt-Geschichte-Basel/omeka2dsp.git
cd omeka2dsp

# Install dependencies
pnpm install         # Node.js development tools
uv sync             # Python dependencies with uv

# Configure environment
cp example.env .env
# Edit .env with your credentials

# Test installation
uv run python scripts/api_get_project.py

🚀 Quick Usage

# Run sample data migration (recommended first test)
python scripts/data_2_dasch.py -m sample_data

# Run full migration
python scripts/data_2_dasch.py -m all_data

# Run test data migration
python scripts/data_2_dasch.py -m test_data

Processing Modes

Mode Description Use Case
all_data Process entire collection Production migrations
sample_data Process random subset Testing and validation
test_data Process predefined items Development, debugging

🏗️ System Architecture

graph LR
    A[Omeka API] --> B[Data Extraction]
    B --> C[Data Transformation]
    C --> D[DSP Upload]
    D --> E[DSP API]

    F[Configuration] --> B
    F --> C
    F --> D

    style A fill:#e1f5fe
    style E fill:#e8f5e8
    style B fill:#fff3e0
    style C fill:#fff3e0
    style D fill:#fff3e0

Features

  • ✅ Automated synchronization: detects and applies only necessary changes
  • ✅ Media file handling: transfers and processes associated files
  • ✅ Data validation: ensures data integrity throughout the process
  • ✅ Error recovery: robust error handling and retry mechanisms

📂 Repository Structure

This repository follows the Turing Way advanced structure:

  • assets/ – images, logos, etc.
  • data/ – data files
  • docs/ – documentation of the repository and data
  • project-management/ – project management documents
  • scripts/ – source code (migration scripts, utilities)
  • report.qmd – report describing the analysis of the data

📊 Data Model

The omeka2dsp system transforms data from Omeka’s metadata structure to the DaSCH Service Platform (DSP) using a specialized data model developed by Stadt.Geschichte.Basel’s research data management team.

Key Components

  • Resource Classes: Maps Omeka item types to DSP ontology classes (e.g., sgb_PHOTO, sgb_DOCUMENT)
  • Property Mappings: Converts Omeka metadata fields to DSP property values with appropriate data types
  • Value Transformations: Handles text values, URIs, dates, and linked resources according to DSP specifications
  • Media Integration: Processes and uploads associated files while maintaining metadata relationships

Standards Compliance

The data model follows the manual for creating non-discriminatory metadata for historical sources and research data, ensuring inclusive and accessible metadata practices.

For detailed data model documentation, see Data Model Reference.

🛠️ Support

This project is maintained by Stadt.Geschichte.Basel. Support is provided publicly through GitHub.

Type Platform
🚨 Bug Reports GitHub Issues
📊 Report bad data GitHub Issues
📚 Docs Issue GitHub Issues
🎁 Feature Requests GitHub Issues
🛡 Security vulnerabilities SECURITY.md
💬 General Questions GitHub Discussions

🗺 Roadmap

No changes are currently planned.

🤝 Contributing

Contributions are welcome. Please see CONTRIBUTING.md for guidelines. If you find errors, propose new features, or want to extend the dataset, open an issue or a pull request.

🔖 Versioning

We use Semantic Versioning. Available versions are listed in the tags.

✍️ Authors and Acknowledgment

See also the list of contributors.

📜 License

  • Code: GNU Affero General Public License v3.0 – see LICENSE-AGPL.md
  • Data: Creative Commons Attribution 4.0 International (CC BY 4.0) – see LICENSE-CCBY.md

By using this repository, you agree to provide appropriate credit and share modifications under the same license terms.

Back to top