A Long-Term Archival Pipeline for the Forschungsdatenplattform Stadt.Geschichte.Basel
Vortrag
Der Vortrag ist Teil der Vormittags-Session an der DaSCHCon 2025 im Museum für Kommunikation in Bern.
Abstract
The Forschungsdatenplattform by Stadt.Geschichte.Basel provides open access to diverse historical materials relating to the city of Basel, including texts, images, statistical, and geospatial data. While technically robust and publicly available through GitHub Pages using the CollectionBuilder-CSV, our current infrastructure faces critical sustainability and scalability limitations.
At present, metadata and files are curated and managed using the Omeka-S instance provided by the University of Bern, from which we extract the data for display via CollectionBuilder. However, this setup introduces significant risks for long-term availability:
- Omeka is not under our institutional control and may not remain permanently funded.
- GitHub Pages is not suitable for serving large files or guaranteeing persistence.
- CollectionBuilder lacks built-in support for versioning and persistent identifiers at the level required by research data infrastructures.
To address these challenges, we are implementing a transition pipeline to archive all metadata and associated files with DaSCH, leveraging its infrastructure for versioned, durable, and FAIR-compliant research data publication. The new pipeline includes:
- Metadata harvesting and transformation from Omeka to the DaSCH data model.
- Automated deposit and update routines using the DaSCH REST API, including support for versioning existing records.
- Writing back stable DaSCH identifiers and access URLs to our public-facing platform, ensuring transparency and citability.
- Preservation of hierarchical relationships and media metadata, aligned with our custom metadata model, which incorporates principles of anti-discriminatory description practices.
This transition allows us to decouple the archival backend from the front-end presentation, ensuring long-term data accessibility, citability, and semantic interoperability. In our presentation, we will share the architectural overview, code-level considerations, and our reflections on working with DaSCH APIs in a real-world context, including:
- Lessons learned in metadata crosswalks and transformation logic.
- Technical caveats in version control, file transfers, and identifier management.
- Challenges in aligning minimal computing approaches (CollectionBuilder) with robust backend infrastructures (DSP).
This case study illustrates how lightweight digital publishing environments can be effectively combined with national research infrastructures to deliver sustainable, standards-based access to historical research data. It also raises broader questions about infrastructural independence, scalability of humanities platforms, and the practical challenges of implementing FAIR principles in community-developed software ecosystems.