graph LR
A[Omeka API] --> B[Data Extraction]
B --> C[Data Transformation]
C --> D[DSP Upload]
D --> E[DSP API]
F[Configuration] --> B
F --> C
F --> D
style A fill:#86bbd8
style E fill:#dbfe87
style B fill:#ffe880
style C fill:#ffe880
style D fill:#ffe880
omeka2dsp
Long-Term Archival Pipeline for Stadt.Geschichte.Basel
This repository contains the pipeline and data model for the long-term preservation of the research data of Stadt.Geschichte.Basel (SGB) on the DaSCH Service Platform (DSP).
It enables the transfer of metadata and media files from the SGB Omeka S instance to the DSP. The pipeline detects changes, updates existing records, and ensures reproducible and open research.
📚 Documentation
Comprehensive documentation is available in the docs/ directory:
- 📖 Complete Documentation – Full system documentation
- 🏗️ Architecture Overview – System design and components
- 🔄 Workflows – Data migration workflows with Mermaid diagrams
- 🔧 API Reference – Python function documentation
- 🧩 Data Model – Data model documentation
📒 Quick Start Guides
- ⚡ Installation & Setup
- ⚙️ Configuration
- 📋 Usage
- 🛠️ Development
- 🔍 Troubleshooting
⚡ Quick Installation
We recommend using GitHub Codespaces for a reproducible setup.
GitHub Codespaces (Recommended)
- Click the green
<> Codebutton → “Codespaces” → “Create codespace onmain” - Configure environment:
cp example.env .envand edit with your credentials - Test installation:
uv run python scripts/api_get_project.py
Local Installation
# Clone repository
git clone https://github.com/Stadt-Geschichte-Basel/omeka2dsp.git
cd omeka2dsp
# Install dependencies
pnpm install # Node.js development tools
uv sync # Python dependencies with uv
# Configure environment
cp example.env .env
# Edit .env with your credentials
# Test installation
uv run python scripts/api_get_project.py🚀 Quick Usage
# Run sample data migration (recommended first test)
python scripts/data_2_dasch.py -m sample_data
# Run full migration
python scripts/data_2_dasch.py -m all_data
# Run test data migration
python scripts/data_2_dasch.py -m test_dataProcessing Modes
| Mode | Description | Use Case |
|---|---|---|
all_data |
Process entire collection | Production migrations |
sample_data |
Process random subset | Testing and validation |
test_data |
Process predefined items | Development, debugging |
🏗️ System Architecture
Features
- ✅ Automated synchronization: detects and applies only necessary changes
- ✅ Media file handling: transfers and processes associated files
- ✅ Data validation: ensures data integrity throughout the process
- ✅ Error recovery: robust error handling and retry mechanisms
📂 Repository Structure
This repository follows the Turing Way advanced structure:
assets/– images, logos, etc.data/– data filesdocs/– documentation of the repository and dataproject-management/– project management documentsscripts/– source code (migration scripts, utilities)report.qmd– report describing the analysis of the data
📊 Data Model
The omeka2dsp system transforms data from Omeka’s metadata structure to the DaSCH Service Platform (DSP) using a specialized data model developed by Stadt.Geschichte.Basel’s research data management team.
Key Components
- Resource Classes: Maps Omeka item types to DSP ontology classes (e.g.,
sgb_PHOTO,sgb_DOCUMENT) - Property Mappings: Converts Omeka metadata fields to DSP property values with appropriate data types
- Value Transformations: Handles text values, URIs, dates, and linked resources according to DSP specifications
- Media Integration: Processes and uploads associated files while maintaining metadata relationships
Standards Compliance
The data model follows the manual for creating non-discriminatory metadata for historical sources and research data, ensuring inclusive and accessible metadata practices.
For detailed data model documentation, see Data Model Reference.
🛠️ Support
This project is maintained by Stadt.Geschichte.Basel. Support is provided publicly through GitHub.
| Type | Platform |
|---|---|
| 🚨 Bug Reports | GitHub Issues |
| 📊 Report bad data | GitHub Issues |
| 📚 Docs Issue | GitHub Issues |
| 🎁 Feature Requests | GitHub Issues |
| 🛡 Security vulnerabilities | SECURITY.md |
| 💬 General Questions | GitHub Discussions |
🗺 Roadmap
No changes are currently planned.
🤝 Contributing
Contributions are welcome. Please see CONTRIBUTING.md for guidelines. If you find errors, propose new features, or want to extend the dataset, open an issue or a pull request.
🔖 Versioning
We use Semantic Versioning. Available versions are listed in the tags.
📜 License
- Code: GNU Affero General Public License v3.0 – see LICENSE-AGPL.md
- Data: Creative Commons Attribution 4.0 International (CC BY 4.0) – see LICENSE-CCBY.md
By using this repository, you agree to provide appropriate credit and share modifications under the same license terms.