graph LR A[Omeka API] --> B[Data Extraction] B --> C[Data Transformation] C --> D[DSP Upload] D --> E[DSP API] F[Configuration] --> B F --> C F --> D style A fill:#e1f5fe style E fill:#e8f5e8 style B fill:#fff3e0 style C fill:#fff3e0 style D fill:#fff3e0
omeka2dsp
Long-Term Archival Pipeline for Stadt.Geschichte.Basel
This repository contains the pipeline and data model for the long-term preservation of the research data of Stadt.Geschichte.Basel (SGB) on the DaSCH Service Platform (DSP).
It enables the transfer of metadata and media files from the SGB Omeka S instance to the DSP. The pipeline detects changes, updates existing records, and ensures reproducible and open research.
📚 Documentation
Comprehensive documentation is available in the docs/
directory:
- 📖 Complete Documentation – Full system documentation
- 🏗️ Architecture Overview – System design and components
- 🔄 Workflows – Data migration workflows with Mermaid diagrams
- 🔧 API Reference – Python function documentation
- 🧩 Data Model – Data model documentation
📒 Quick Start Guides
- ⚡ Installation & Setup
- ⚙️ Configuration
- 📋 Usage
- 🛠️ Development
- 🔍 Troubleshooting
⚡ Quick Installation
We recommend using GitHub Codespaces for a reproducible setup.
GitHub Codespaces (Recommended)
- Click the green
<> Code
button → “Codespaces” → “Create codespace onmain
” - Configure environment:
cp example.env .env
and edit with your credentials - Test installation:
uv run python scripts/api_get_project.py
Local Installation
# Clone repository
git clone https://github.com/Stadt-Geschichte-Basel/omeka2dsp.git
cd omeka2dsp
# Install dependencies
pnpm install # Node.js development tools
uv sync # Python dependencies with uv
# Configure environment
cp example.env .env
# Edit .env with your credentials
# Test installation
uv run python scripts/api_get_project.py
🚀 Quick Usage
# Run sample data migration (recommended first test)
python scripts/data_2_dasch.py -m sample_data
# Run full migration
python scripts/data_2_dasch.py -m all_data
# Run test data migration
python scripts/data_2_dasch.py -m test_data
Processing Modes
Mode | Description | Use Case |
---|---|---|
all_data |
Process entire collection | Production migrations |
sample_data |
Process random subset | Testing and validation |
test_data |
Process predefined items | Development, debugging |
🏗️ System Architecture
Features
- ✅ Automated synchronization: detects and applies only necessary changes
- ✅ Media file handling: transfers and processes associated files
- ✅ Data validation: ensures data integrity throughout the process
- ✅ Error recovery: robust error handling and retry mechanisms
📂 Repository Structure
This repository follows the Turing Way advanced structure:
assets/
– images, logos, etc.data/
– data filesdocs/
– documentation of the repository and dataproject-management/
– project management documentsscripts/
– source code (migration scripts, utilities)report.qmd
– report describing the analysis of the data
📊 Data Model
The omeka2dsp system transforms data from Omeka’s metadata structure to the DaSCH Service Platform (DSP) using a specialized data model developed by Stadt.Geschichte.Basel’s research data management team.
Key Components
- Resource Classes: Maps Omeka item types to DSP ontology classes (e.g.,
sgb_PHOTO
,sgb_DOCUMENT
) - Property Mappings: Converts Omeka metadata fields to DSP property values with appropriate data types
- Value Transformations: Handles text values, URIs, dates, and linked resources according to DSP specifications
- Media Integration: Processes and uploads associated files while maintaining metadata relationships
Standards Compliance
The data model follows the manual for creating non-discriminatory metadata for historical sources and research data, ensuring inclusive and accessible metadata practices.
For detailed data model documentation, see Data Model Reference.
🛠️ Support
This project is maintained by Stadt.Geschichte.Basel. Support is provided publicly through GitHub.
Type | Platform |
---|---|
🚨 Bug Reports | GitHub Issues |
📊 Report bad data | GitHub Issues |
📚 Docs Issue | GitHub Issues |
🎁 Feature Requests | GitHub Issues |
🛡 Security vulnerabilities | SECURITY.md |
💬 General Questions | GitHub Discussions |
🗺 Roadmap
No changes are currently planned.
🤝 Contributing
Contributions are welcome. Please see CONTRIBUTING.md for guidelines. If you find errors, propose new features, or want to extend the dataset, open an issue or a pull request.
🔖 Versioning
We use Semantic Versioning. Available versions are listed in the tags.
📜 License
- Code: GNU Affero General Public License v3.0 – see LICENSE-AGPL.md
- Data: Creative Commons Attribution 4.0 International (CC BY 4.0) – see LICENSE-CCBY.md
By using this repository, you agree to provide appropriate credit and share modifications under the same license terms.