Workflow

Repository Structure, Software and Data Model

Author
Affiliation

Moritz Twente

Universität Basel

Modified

October 7, 2025

This repository stores data and R code used by the Team for Research Data Management and Public History of the Stadt.Geschichte.Basel project to create figures published in the nine-volume book series.

To support open research with FAIR data, the RDM team developed a research data platform (Mähr, Görlich, and Twente 2024) ensuring open, long-term access to sources and research data regarding the history of Basel. The platform facilitates access to the data behind the publication and features rich metadata annotation (cf. Data Model).

Using raw data provided by the individual authors of Stadt.Geschichte.Basel, the RDM team created maps, diagrams, and other types of visualizations (Mähr 2022) for publication in the print and online (OA) versions of the book series. The sgb-figures repository stores code and data for charts and diagrams only (referred to as plots or figures). For other types of data visualization in the context of Stadt.Geschichte.Basel, refer to the extensive RDM Documentation (in German).

Visualization Workflow

The starting point for each plot is data the authors of Stadt.Geschichte.Basel provided to the RDM team. Using R (cf. Software), we tidy the data, assemble multiple sources into one dataset to use for visualization, write that data to a csv file and store information on the dataset in an accompanying json metadata file structured according to the W3C standard for tabular data and metadata on the web (W3C 2022).

Data stewards in Stadt.Geschichte.Basel’s RDM team then create visualizations for the book series, often in multiple iterations and in close cooperation with the researchers (Twente and Mähr 2025; Münch et al. 2023). Dedicated design principles and color schemes are implemented to ensure a common visual identity across all products. The finalized products are then processed for print production and long-term archival (Figure 1).

 
flowchart LR
    A(Author) -->|Request| B(RDM Team)
    B -->|Drafts| A

    B -->|"Illustration (pdf)"| C(Typesetter)
    C ---> D("Printed<br>Book Series")

    B --->|"Illustration (pdf)"| E("Open Research<br>Data Platform")
    B --->|"Dataset<br>(csv, geojson)"| E
    B --->|Metadata| E

    B --->|"Code (R)"| F(sgb-figures)
    B --->|"Dataset (csv)"| F
    B --->|"Metadata (json)"| F

    F <-.-> E

    style B fill:#3a1e3e,color:#fff,stroke:#3a1e3e
    style D fill:#86bbd8,stroke:#86bbd8
    style E fill:#86bbd8,stroke:#86bbd8
    style F fill:#ffe880

    click B href "https://dokumentation.stadtgeschichtebasel.ch/team.html" "Research Data Management Team"
    click D href "https://emono.unibas.ch/stadtgeschichtebasel/" "Open Access Version"
    click E href "https://forschung.stadtgeschichtebasel.ch/" "Research Data Platform"
    click F href "https://github.com/stadt-geschichte-basel/sgb-figures/" "sgb-figures GitHub Repository"
Figure 1: The workflow comprises several iterations with a range of actors.

While the generated pdf file for each plot, the corresponding legend and the csv dataset files are uploaded to the Research Data Platform, the R scripts used to clean the data and generate the plot are only linked to from the figures’ metadata, referring to sgb-figures on GitHub.

Repository Structure

Datasets and R scripts to generate the plots are accessible in this repository, providing a reproducible R environment to facilitate further work with the available code. Using the Open Research Template, research data is managed implementing best practices as outlined in The Turing Way. This structured approach includes automated release management, integrated archiving with Zenodo, structured documentation via Quarto, and long-term accessibility through GitHub Pages (Mähr and Twente 2025).

R scripts, built plots, processed data and metadata files are sorted according to the following folder structure:

  • build/ – helper scripts used to build the plots
  • data/ – data files
  • docs/ – documentation for the data and the repository
  • output/ – generated PDF files
  • src/ – source code for data processing and building plots
sgb-figures/                        
├── .devcontainer            <- GitHub Codespace settings
│   └── devcontainer.json
├── .github
│   └── workflows/
│       └── *.yaml           <- GitHub workflow settings
├── assets                   <- images, background data etc.
├── build                    <- helper scripts to build plots
├── data
│   └── clean/
│       └── <volID>/
│           └── <media_id>
|               └── *.csv    <- dataset file
|               └── *.json   <- metadata file
├── data
│   └── raw/
│       └── <volID>/
│           └── <media_id>/
|               └── *.xlsx   <- raw data file
├── docs
│   └── design/
│       └── index.qmd        <- visual identity description
│   └── plots/
│       └── *.qmd            <- plot and data summary
│   └── plots.qmd            <- list of all plots
│   └── workflow/
│       └── index.qmd        <- workflow and data model description
├── output
│   └── <volID>/
│       └── <media_id>/
│           └── *.pdf        <- built plot files
├── renv                     <- R package library
├── src                      
│   └── <media_id>/          <- source files
├── _quarto.yml              <- Quarto project settings 
├── .gitignore               <- files excluded from git version control
├── .prettierrc              <- linting settings
├── CHANGELOG.md             <- list of changes
├── CITATION.cff             <- citation specification
├── CODE_OF_CONDUCT.md       <- Code of Conduct for community projects
├── CONTRIBUTING.md          <- Contribution guideline for collaborators
├── index.qmd                <- .qmd file including README.md for Quarto
├── LICENSE-*.md             <- software and data licenses
├── package-lock.json        <- representation of the dependency tree
├── package.json             <- npm package definition and scripts
├── README.md                <- information about the repo
├── renv.lock                <- R environment settings
├── SECURITY.md              <- security policy
├── sgb-figures.Rproj        <- R project root
├── styles.css               <- R project root
└── ...                      <- other files

Build Workflow

From the raw data as input to the published figure, a number of objects play a role in the build process. The environment is illustrated here with a made-up example figure abb01313 published in Stadt.Geschichte.Basel volume <n>.

Software

All plots are produced using R. In addition to ggplot2 (Wickham 2016) and other parts of the tidyverse, this project uses several packages for data processing and visualization, including here (Müller and Bryan 2020) and renv (Ushey and Wickham 2025) for creating a reproducible environment as well as csvwr (Gower 2022) for writing metadata files.

Code, data and documentation are checked into version control and stored in a GitHub repository. Code formatting and linting is done via prettier (Long 2025) resp. styler (Müller and Walthert 2024) and lintr (Hester et al. 2025) for R code. The R environment including all necessary packages can be restored with the renv.lock file1. The documentation is rendered with Quarto (Allaire et al. 2022) and hosted on GitHub Pages.

Data Processing

The first step of the workflow is processing the raw data and creating an annotated dataset that is ready for both being published and for being used as input for creating a figure (Figure 2). The script in src/01313/01313_clean.R loads the raw data file from data/raw/Band<n>/01313/01313_Data_raw.xlsx into the R environment as data01313, processes the data set (reformatting columns, transforming absolute into relative values etc.) and exports the cleaned data into data/clean/Band<n>/01313/01313_3_Data.csv. Additionally, a metadata list object meta01313 is created in R and written to data/clean/Band<n>/01313/01313_3_Data.csv-metadata.json (see Datamodel).

flowchart LR
        clean_script <--->|load data| rawdata([01313_Data_raw.xlsx])
        clean_script([01313_clean.R]) -->|clean data| data01313(data01313)
        clean_script -->|annotate<br>data| meta01313(meta01313)
        data01313 -->|export data| csv01313([01313_3_Data.csv])
        meta01313 -->|export<br>metadata| json01313([01313_3_Data.csv-metadata.json])

    style csv01313 fill:#ffe880
    style json01313 fill:#fff3e0
    style clean_script fill:#86bbd8,stroke:#86bbd8
    style data01313 fill:#c0dceb,stroke:#c0dceb
    style meta01313 fill:#c0dceb,stroke:#c0dceb
Figure 2: The first steps include processing the raw data to produce an annotated dataset.

Plotting

After the dataset is processed, it can be used for building a plot (Figure 3). This is done by executing the script in src/01313/01313_plot.R. This sources the cleaning script first, making sure the plot is being drawn using up-to-date files. The data is loaded into R as data01313 again. If necessary, further transformations are applied (e.g. creating custom labels, aggregating columns, manipulating data for better readability etc.). The plot object plot01313 is created using ggplot2. For technical reasons, plot and legend must be exported to separate files2. To this end, a separate_legend object is created in R using ggpubr (Kassambara 2025). Both objects are then saved as pdf files 01313_1_Plot.pdf and 01313_2_Legende.pdf to output/Band<n>/01313. When using the scripts in this repository, the plots are not shipped with the project’s signature font family but with a generic system font.

For the print edition of Stadt.Geschichte.Basel, light post-processing using Adobe Illustrator was done by the RDM Team before delivering the figures to the typesetter for publication, incorporating further technical requirements, last-minute changes by authors etc.

flowchart LR
        plot_script([01313_plot.R]) <--> |source to<br>load data| clean_script([01313_clean.R])
        plot_script --> |transform data| data01313(data01313)
        data01313 -->|create plot<br>object| plot01313(plot01313)
        plot01313 --->|export plot| export01313([01313_1_Plot.pdf])
        plot01313 -->|extract legend| legend01313(separate_legend)
        legend01313 -->|export legend| exportlegend([01313_2_Legende.pdf])
        plot_script ----->|build preview| infopage([01313.qmd])

style clean_script fill:#86bbd8,stroke:#86bbd8
style plot_script fill:#86bbd8,stroke:#86bbd8
style data01313 fill:#c0dceb,stroke:#c0dceb
style plot01313 fill:#c0dceb,stroke:#c0dceb
style legend01313 fill:#c0dceb,stroke:#c0dceb
style export01313 fill:#ffe880
style exportlegend fill:#ffe880
style infopage fill:#fff3e0
Figure 3: The plot is built with the processed dataset, producing PDF files and preview documentation.

In addition to the plot itself, the 01313_plot.R script builds a separate qmd file in docs/plots/. This page is rendered when deploying the sgb-figures repository to GitHub Pages and provides previews of plot and data. The Quarto file contains a preview rendering of the figure itself, a table displaying the dataset used for creating the plot, as well as selected metadata on the dataset parsed from the json file. Metadata describing the plot (media object) itself is only available on the Research Data Platform.

Users can take advantage of a range of npm scripts, making it easier to build the plots directly from the command line. Running npm run list will print an overview of all available plots with a corresponding media ID. Using this ID, the plot, metadata and qmd files can be generated using npm run plot <ID>. A full list of all available npm scripts is available in the README.

Data Model

Metadata for research data of Stadt.Geschichte.Basel is provided according to a data model developed by the Stadt.Geschichte.Basel Research Data Management Team to meet the requirements of the wide range of sources used in the project. The model (and the subsequent annotation process) follow the Manual for Creating Non-Discriminatory Metadata for Historical Sources and Research Data (Mähr and Schnegg 2024).

For the data in sgb-figures, the model was slightly adapted to align with the requirements of publishing tabular data. To this end, recommendations as outlined in the W3C standard for tabular data and metadata on the web (W3C 2022) were consulted and implemented using the csvwr R package.

Metadata for the csv datasets used for creating the figures is stored in a separate json file. Again using the example metadata object abb01313, Figure 4 illustrates how this annotation integrates with the Stadt.Geschichte.Basel data model used for the research data platform. In this example, abb01313 has one child media object m01313. m01313 is a sgb-figures plot built using the dataset 01313_3 which in turn is described by the metadata file 01313_3_Data.csv-metadata.json.

Each json metadata file annotates one csv dataset. Each dataset is used for one figure, but a figure may be built taking multiple datasets as input. Each figure is represented as one media object on the Stadt.Geschichte.Basel Research Data Platform. This (child) media object is then part of a parent metadata object, alongside zero or more other child media objects. If a parent metadata object has more than one child media object, their id values – as well as the id values of the corresponding dataset(s) and metadata file(s) – are numbered consecutively (m01313_1, m01313_2, etc.).

classDiagram
    direction LR

    class metadata["metadata json<br><i>describing csv file</i>"]
    class metadata2["metadata object<br><i>describing media object(s)</i>"]
    class media["media object"]
    class csv["csv dataset"]

    metadata "1" --> "1" csv
    csv "1" --> "*" media

    media "m" <-- "n" metadata2

    class csv {
        id &lpar;01313_3&rpar;
    }

    class metadata {
        id &lpar;01313_3_Data.csv-metadata&rpar;
        url &lpar;01313_3_Data.csv&rpar;
        [columns] &lpar;title, datatype, description&rpar;
        media_id &lpar;m01313_3&rpar;
        [isPartOf]
        title
        description
        [creator] &lpar;incl. ORCID&rpar;
        [contributor] &lpar;incl. ORCID&rpar;
        publisher
        date &lpar;EDTF&rpar;
        coverage
        type
        format
        source
        language &lpar;ISO 639-2 code&rpar;
        [relation]
        rights
        license
        modified &lpar;ISO 8601&rpar;
        bibliographicCitation
    }

%%| no label for namespaces, see https://github.com/mermaid-js/mermaid-live-editor/issues/1452
    namespace sgb_datamodel {

        class media {
            id &lpar;m01313&rpar;
            title
            [subject;subject] &lpar;keywords from GenderOpen Index&rpar;
            description
            [abstract] &lpar;alt attribute for alternative text&rpar;
            [creator] &lpar;incl. link to Wikidata&rpar;
            [publisher] &lpar;incl. link to Wikidata&rpar;
            date
            temporal
            type
            format
            extent
            [source] &lpar;Source and catalogue link&rpar;
            language &lpar;ISO 639-2 code&rpar;
            [relation] &lpar;internal links to other items, link to GitHub, further information&rpar;
            rights
            license
        }

    class metadata2 {
        id &lpar;abb01313&rpar;
        title
        [subject;subject]
        description
        temporal
        [isPartOf;isPartOf] &lpar;Data DOIs&rpar;
    }
    }

style csv fill:#F7CB45,stroke:#777
style metadata fill:#fff3e0,stroke:#777
style media fill:#FFFFFF,stroke:#777,color:#3A1E3E

click media href "https://dokumentation.stadtgeschichtebasel.ch/products/coding/plattform/#datenmodell" "Main Data Model Documentation"
click metadata2 href "https://dokumentation.stadtgeschichtebasel.ch/products/coding/plattform/#datenmodell" "Main Data Model Documentation"
Figure 4: Relation of the sgb-figures annotation with the Stadt.Geschichte.Basel data model.
Back to top

References

Allaire, J. J., Charles Teague, Yihui Xie, and Christophe Dervieux. 2022. “Quarto.” Zenodo. https://doi.org/10.5281/ZENODO.5960048.
Gower, Robin. 2022. “Csvwr: Read and Write CSV on the Web (CSVW) Tables and Metadata.” https://robsteranium.github.io/csvwr/.
Hester, Jim, Florent Angly, Russ Hyde, Michael Chirico, Kun Ren, Alexander Rosenstock, and Indrajeet Patil. 2025. “Lintr: ALinter’ for R Code.” https://lintr.r-lib.org.
Kassambara, Alboukadel. 2025. “Ggpubr: ’Ggplot2’ Based Publication Ready Plots.” https://rpkgs.datanovia.com/ggpubr/.
Long, James. 2025. “Prettier: An Opinionated Code Formatter.” https://prettier.io/.
Mähr, Moritz. 2022. “Research Data Management in (Public) History.” Keynote. Istituto Svizzero di Roma. https://doi.org/10.5281/zenodo.6637118.
Mähr, Moritz, Nico Görlich, and Moritz Twente. 2024. “Stadt.Geschichte.Basel Research Data Platform.” https://github.com/Stadt-Geschichte-Basel/forschung.stadtgeschichtebasel.ch.
Mähr, Moritz, and Noëlle Schnegg. 2024. “Handbuch Zur Erstellung Diskriminierungsfreier Metadaten Für Historische Quellen Und Forschungsdaten: Erfahrungen Aus Dem Geschichtswissenschaftlichen Forschungsprojekt Stadt.Geschichte.Basel.” Basel: Zenodo. https://doi.org/10.5281/ZENODO.11124720.
Mähr, Moritz, and Moritz Twente. 2025. “One Template to Rule Them All: Interactive Research Data Documentation with Quarto.” Digital Humanities Tech Symposium, Universidade NOVA de Lisboa. https://maehr.github.io/one-template-to-rule-them-all/.
Müller, Kirill, and Jenny Bryan. 2020. “Here: A Simpler Way to Find Your Files.” https://doi.org/10.32614/CRAN.package.here.
Müller, Kirill, and Lorenz Walthert. 2024. “Styler: Non-Invasive Pretty Printing of R Code.” https://styler.r-lib.org.
Münch, Cristina, Nico Görlich, Moritz Mähr, and Moritz Twente. 2023. “Karten Als "Boundary Objects" Oder Wie Man Mit Geodaten Historische Thesen Bildet.” Poster. Digital History-Tagung, Humboldt-Universität zu Berlin. https://doi.org/10.5281/zenodo.7960744.
Twente, Moritz, and Moritz Mähr. 2025. “Navigating Disconcertment in Map-Making: How to Turn Conflict and Collaboration into Accessible Geodata.” Digital Humanities 2025, Universidade NOVA de Lisboa. https://doi.org/10.5281/zenodo.16042822.
Ushey, Kevin, and Hadley Wickham. 2025. “Renv: Project Environments.” https://rstudio.github.io/renv/.
W3C. 2022. “Model for Tabular Data and Metadata on the Web.” https://w3c.github.io/csvw/syntax/.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. New York: Springer. https://ggplot2.tidyverse.org.

Footnotes

  1. Run npm run setup for a shortcut that will install R dependencies.↩︎

  2. For technical reasons, the actual plot and the corresponding legend are often stored in separate PDF files during the book production process. These two files are represented as one object programmatically (plot01313) and are collectively referred to as one object in this context.↩︎