Validation Reports

Overview

This page provides access to validation report CSV files generated by the sgb-data-validator. These reports help identify and correct data quality issues in the Stadt.Geschichte.Basel Omeka S instance.

Available Reports

The validator generates three types of CSV reports:

1. Items Validation Report

File: validation_reports/items_validation.csv

This report contains validation results for all items with issues. Each row represents one item, and each column represents a validation field.

Columns:

  • resource_id: The Omeka item ID
  • edit_link: Direct link to edit the item in Omeka admin interface
  • Field columns: One column per validated field (e.g., dcterms:identifier, dcterms:title, dcterms:subject)

Cell content interpretation:

  • Empty cell: Field is valid, no issues detected
  • error: <message>: Validation error - field must be corrected
  • warning: <message>: Validation warning - field should be reviewed (informational)

Example:

resource_id,edit_link,dcterms:identifier,dcterms:description,o:title
121200,https://omeka.unibe.ch/admin/items/121200,error: Field is required,error: Field is required,
121201,https://omeka.unibe.ch/admin/items/121201,,warning: Missing field,
121202,https://omeka.unibe.ch/admin/items/121202,,,

In this example: - Item 121200 has errors in dcterms:identifier and dcterms:description fields - Item 121201 has a warning about a missing dcterms:description field - Item 121202 is valid (all empty cells mean no issues)

2. Media Validation Report

File: validation_reports/media_validation.csv

This report contains validation results for all media objects with issues. The structure is identical to the items report.

Columns:

  • resource_id: The Omeka media ID
  • edit_link: Direct link to edit the media in Omeka admin interface
  • Field columns: One column per validated field (e.g., dcterms:identifier, dcterms:creator, o:media_type)

Cell content interpretation: Same as items report (empty = valid, error:/warning: = issue)

Example:

resource_id,edit_link,dcterms:identifier,dcterms:creator,o:media_type
10778,https://omeka.unibe.ch/admin/media/10778,,,
10779,https://omeka.unibe.ch/admin/media/10779,error: Field is required,warning: Missing field,

3. Validation Summary Report

File: validation_reports/validation_summary.csv

This report provides aggregate statistics about the validation run.

Metrics:

  • items_validated: Total number of items validated
  • media_validated: Total number of media objects validated
  • total_errors: Total count of validation errors
  • total_warnings: Total count of validation warnings
  • items_with_issues: Number of items that have at least one error or warning
  • media_with_issues: Number of media objects that have at least one error or warning
  • uris_checked: Total number of URIs checked (if URI checking was enabled)
  • failed_uris: Number of URIs that failed validation (if URI checking was enabled)

Example:

metric,value
items_validated,150
media_validated,300
total_errors,12
total_warnings,45
items_with_issues,8
media_with_issues,15
uris_checked,450
failed_uris,3

Understanding Validation Messages

Common Error Messages

Message Meaning Action Required
error: Field is required A required field is missing Add the field to the resource
error: Invalid Iconclass code: <code> The Iconclass notation is invalid Verify and correct the code against the Iconclass vocabulary
error: Invalid license URI The license URI doesn’t match the controlled vocabulary Use a valid license URI from the vocabulary
error: Invalid MIME type The media type is not in the controlled vocabulary Correct the MIME type
error: Duplicate identifier '<id>' found Multiple resources have the same identifier Make identifiers unique

Common Warning Messages

Message Meaning Action Recommended
warning: Missing field An optional but recommended field is missing Consider adding the field for completeness
warning: URI is not reachable: <url> A referenced URL returns an error (404, 500, etc.) Verify and correct the URL
warning: URL redirects to different domain A URL redirects to an unexpected domain Verify the redirect is intentional
warning: Literal field contains URL A plain text field contains a URL (should use URI type) Consider using a URI field instead

Generating Reports

To generate these CSV reports, run the validator with the --export-csv flag:

# Generate CSV reports in default directory (validation_reports/)
uv run python validate.py --export-csv

# Specify custom output directory
uv run python validate.py --export-csv --csv-output my_reports/

# Generate reports with URI checking enabled
uv run python validate.py --export-csv --check-uris

See the README for more usage examples and configuration options.

Data Quality Best Practices

When reviewing validation reports:

  1. Prioritize errors over warnings: Errors indicate data that doesn’t conform to the schema and should be corrected first
  2. Review warnings systematically: Warnings indicate potential quality issues or missing recommended fields
  3. Use edit links efficiently: The direct links save time when making corrections
  4. Validate incrementally: After making corrections, run the validator again to verify fixes
  5. Track progress: Use the summary report to monitor overall data quality improvements

Technical Details

CSV Format Specifications

  • Encoding: UTF-8
  • Delimiter: Comma (,)
  • Quote character: Double quote (") for fields containing commas or newlines
  • Newline: LF (\n) or CRLF (\r\n)
  • Header row: Always present (first row)

Empty Cells

Empty cells in validation reports have special meaning: - In field columns: The field is valid (no issues) - This is by design to make it easy to spot issues at a glance

Multiple Issues per Field

If a field has multiple validation issues, they may be combined in a single cell, separated by semicolons.

Support

For questions or issues with validation reports:

Validation Output

Below are the actual validation reports generated during the last site build by running:

uv run python validate.py --output validation_report.txt \
  --check-uris --check-redirects --profile \
  --profile-output analysis/ --export-csv --csv-output analysis/
NoteLive Validation Data

The reports shown below are actual validation results from the Stadt.Geschichte.Basel Omeka S instance, automatically generated during each site build. These reports reflect the current state of the data.

Download live validation files: - items_validation.csv - Items with validation issues - media_validation.csv - Media with validation issues - validation_summary.csv - Summary statistics - validation_report.txt - Text report with errors and warnings

Sample/example files are also available in the repository at examples/sample_reports/ for reference.

Text Report Output

Example format (actual report available for download above):

================================================================================
VALIDATION REPORT
================================================================================
Items validated: 150
Media validated: 300
Total errors: 12
Total warnings: 45
URIs checked: 450
Failed URIs: 3
================================================================================

ERRORS:
  [Item 10780] dcterms:identifier: Field is required
  [Item 10782] dcterms:subject[0]: Invalid Iconclass code: XYZ123
  [Item 10783] dcterms:language[0]: Invalid language code (must be valid ISO 639-1 two-letter code): xyz
  [Media 10779] dcterms:identifier: Field is required
  [Media 10779] o:media_type: Invalid MIME type: application/unknown

WARNINGS (informational):
  [Item 10781] dcterms:description: Missing field
  [Media 10778] dcterms:creator: Missing field
  [Item 10777] dcterms:description: Literal field contains URL: https://example.com/...
  [Item 10785] dcterms:source: URI is not reachable: https://broken-link.example.com (404 Not Found)

Report saved to: analysis/validation_report.txt

Items Validation CSV

File: analysis/items_validation.csv

Example format (download the link above for actual live data):

resource_id,edit_link,dcterms:identifier,dcterms:description,dcterms:temporal,dcterms:subject,dcterms:language
10780,https://omeka.unibe.ch/admin/items/10780,error: Field is required,,,,
10781,https://omeka.unibe.ch/admin/items/10781,,warning: Missing field,,,
10782,https://omeka.unibe.ch/admin/items/10782,,,,error: Invalid Iconclass code: XYZ123,
10783,https://omeka.unibe.ch/admin/items/10783,,,,,error: Invalid language code (must be valid ISO 639-1 two-letter code): xyz

Key observations (from example): - Item 10780 is missing required dcterms:identifier (error) - Item 10781 has missing dcterms:description (warning - optional field) - Item 10782 has an invalid Iconclass code in dcterms:subject - Item 10783 has an invalid ISO 639-1 language code

Media Validation CSV

File: analysis/media_validation.csv

Example format (download the link above for actual live data):

resource_id,edit_link,dcterms:identifier,dcterms:creator,dcterms:license,o:media_type
10778,https://omeka.unibe.ch/admin/media/10778,,warning: Missing field,,
10779,https://omeka.unibe.ch/admin/media/10779,error: Field is required,,,error: Invalid MIME type: application/unknown

Key observations (from example): - Media 10778 has missing dcterms:creator (warning - recommended field) - Media 10779 has both a missing required identifier and an invalid MIME type

Validation Summary CSV

File: analysis/validation_summary.csv

Example format (download the link above for actual live data):

metric,value
items_validated,150
media_validated,300
total_errors,12
total_warnings,45
items_with_issues,4
media_with_issues,2
uris_checked,450
failed_uris,3

Summary insights (from example): - 150 items and 300 media objects validated - 12 errors require immediate attention - 45 warnings suggest improvements - Only 4 items and 2 media have issues (high quality: 97.3% items, 99.3% media) - 3 of 450 URIs checked are unreachable (99.3% URI health)

TipLive Data

Download the actual validation summary CSV from the link in the blue box above to see current statistics for the Stadt.Geschichte.Basel collection.

Interpreting the Results

Prioritization

  1. Fix errors first: Review items and media with validation errors in the downloaded CSV files
  2. Review warnings: Check resources with missing optional fields
  3. Check failed URIs: Verify and update unreachable URLs

Data Quality Metrics

The validation provides comprehensive data quality metrics. Download the summary CSV to see current statistics for your collection, including error counts, warning counts, and overall data quality percentages.

Back to top