Validation Reports
Overview
This page provides access to validation report CSV files generated by the sgb-data-validator. These reports help identify and correct data quality issues in the Stadt.Geschichte.Basel Omeka S instance.
Available Reports
The validator generates three types of CSV reports:
1. Items Validation Report
File: validation_reports/items_validation.csv
This report contains validation results for all items with issues. Each row represents one item, and each column represents a validation field.
Columns:
resource_id
: The Omeka item IDedit_link
: Direct link to edit the item in Omeka admin interface- Field columns: One column per validated field (e.g.,
dcterms:identifier
,dcterms:title
,dcterms:subject
)
Cell content interpretation:
- Empty cell: Field is valid, no issues detected
error: <message>
: Validation error - field must be correctedwarning: <message>
: Validation warning - field should be reviewed (informational)
Example:
resource_id,edit_link,dcterms:identifier,dcterms:description,o:title
121200,https://omeka.unibe.ch/admin/items/121200,error: Field is required,error: Field is required,
121201,https://omeka.unibe.ch/admin/items/121201,,warning: Missing field,
121202,https://omeka.unibe.ch/admin/items/121202,,,
In this example: - Item 121200 has errors in dcterms:identifier
and dcterms:description
fields - Item 121201 has a warning about a missing dcterms:description
field - Item 121202 is valid (all empty cells mean no issues)
2. Media Validation Report
File: validation_reports/media_validation.csv
This report contains validation results for all media objects with issues. The structure is identical to the items report.
Columns:
resource_id
: The Omeka media IDedit_link
: Direct link to edit the media in Omeka admin interface- Field columns: One column per validated field (e.g.,
dcterms:identifier
,dcterms:creator
,o:media_type
)
Cell content interpretation: Same as items report (empty = valid, error:/warning: = issue)
Example:
resource_id,edit_link,dcterms:identifier,dcterms:creator,o:media_type
10778,https://omeka.unibe.ch/admin/media/10778,,,
10779,https://omeka.unibe.ch/admin/media/10779,error: Field is required,warning: Missing field,
3. Validation Summary Report
File: validation_reports/validation_summary.csv
This report provides aggregate statistics about the validation run.
Metrics:
items_validated
: Total number of items validatedmedia_validated
: Total number of media objects validatedtotal_errors
: Total count of validation errorstotal_warnings
: Total count of validation warningsitems_with_issues
: Number of items that have at least one error or warningmedia_with_issues
: Number of media objects that have at least one error or warninguris_checked
: Total number of URIs checked (if URI checking was enabled)failed_uris
: Number of URIs that failed validation (if URI checking was enabled)
Example:
metric,value
items_validated,150
media_validated,300
total_errors,12
total_warnings,45
items_with_issues,8
media_with_issues,15
uris_checked,450
failed_uris,3
Using the Edit Links
Each validation report includes an edit_link
column that provides direct access to the Omeka admin interface for correcting issues:
- Items:
https://omeka.unibe.ch/admin/items/<item_id>
- Media:
https://omeka.unibe.ch/admin/media/<media_id>
Click these links to quickly navigate to the resource and make corrections.
Understanding Validation Messages
Common Error Messages
Message | Meaning | Action Required |
---|---|---|
error: Field is required |
A required field is missing | Add the field to the resource |
error: Invalid Iconclass code: <code> |
The Iconclass notation is invalid | Verify and correct the code against the Iconclass vocabulary |
error: Invalid license URI |
The license URI doesn’t match the controlled vocabulary | Use a valid license URI from the vocabulary |
error: Invalid MIME type |
The media type is not in the controlled vocabulary | Correct the MIME type |
error: Duplicate identifier '<id>' found |
Multiple resources have the same identifier | Make identifiers unique |
Common Warning Messages
Message | Meaning | Action Recommended |
---|---|---|
warning: Missing field |
An optional but recommended field is missing | Consider adding the field for completeness |
warning: URI is not reachable: <url> |
A referenced URL returns an error (404, 500, etc.) | Verify and correct the URL |
warning: URL redirects to different domain |
A URL redirects to an unexpected domain | Verify the redirect is intentional |
warning: Literal field contains URL |
A plain text field contains a URL (should use URI type) | Consider using a URI field instead |
Generating Reports
To generate these CSV reports, run the validator with the --export-csv
flag:
# Generate CSV reports in default directory (validation_reports/)
uv run python validate.py --export-csv
# Specify custom output directory
uv run python validate.py --export-csv --csv-output my_reports/
# Generate reports with URI checking enabled
uv run python validate.py --export-csv --check-uris
See the README for more usage examples and configuration options.
Data Quality Best Practices
When reviewing validation reports:
- Prioritize errors over warnings: Errors indicate data that doesn’t conform to the schema and should be corrected first
- Review warnings systematically: Warnings indicate potential quality issues or missing recommended fields
- Use edit links efficiently: The direct links save time when making corrections
- Validate incrementally: After making corrections, run the validator again to verify fixes
- Track progress: Use the summary report to monitor overall data quality improvements
Technical Details
CSV Format Specifications
- Encoding: UTF-8
- Delimiter: Comma (
,
) - Quote character: Double quote (
"
) for fields containing commas or newlines - Newline: LF (
\n
) or CRLF (\r\n
) - Header row: Always present (first row)
Empty Cells
Empty cells in validation reports have special meaning: - In field columns: The field is valid (no issues) - This is by design to make it easy to spot issues at a glance
Multiple Issues per Field
If a field has multiple validation issues, they may be combined in a single cell, separated by semicolons.
Support
For questions or issues with validation reports:
Validation Output
Below are the actual validation reports generated during the last site build by running:
uv run python validate.py --output validation_report.txt \
--check-uris --check-redirects --profile \
--profile-output analysis/ --export-csv --csv-output analysis/
The reports shown below are actual validation results from the Stadt.Geschichte.Basel Omeka S instance, automatically generated during each site build. These reports reflect the current state of the data.
Download live validation files: - items_validation.csv - Items with validation issues - media_validation.csv - Media with validation issues - validation_summary.csv - Summary statistics - validation_report.txt - Text report with errors and warnings
Sample/example files are also available in the repository at examples/sample_reports/
for reference.
Text Report Output
Example format (actual report available for download above):
================================================================================
VALIDATION REPORT
================================================================================
Items validated: 150
Media validated: 300
Total errors: 12
Total warnings: 45
URIs checked: 450
Failed URIs: 3
================================================================================
ERRORS:
[Item 10780] dcterms:identifier: Field is required
[Item 10782] dcterms:subject[0]: Invalid Iconclass code: XYZ123
[Item 10783] dcterms:language[0]: Invalid language code (must be valid ISO 639-1 two-letter code): xyz
[Media 10779] dcterms:identifier: Field is required
[Media 10779] o:media_type: Invalid MIME type: application/unknown
WARNINGS (informational):
[Item 10781] dcterms:description: Missing field
[Media 10778] dcterms:creator: Missing field
[Item 10777] dcterms:description: Literal field contains URL: https://example.com/...
[Item 10785] dcterms:source: URI is not reachable: https://broken-link.example.com (404 Not Found)
Report saved to: analysis/validation_report.txt
Items Validation CSV
File: analysis/items_validation.csv
Example format (download the link above for actual live data):
resource_id,edit_link,dcterms:identifier,dcterms:description,dcterms:temporal,dcterms:subject,dcterms:language
10780,https://omeka.unibe.ch/admin/items/10780,error: Field is required,,,,
10781,https://omeka.unibe.ch/admin/items/10781,,warning: Missing field,,,
10782,https://omeka.unibe.ch/admin/items/10782,,,,error: Invalid Iconclass code: XYZ123,
10783,https://omeka.unibe.ch/admin/items/10783,,,,,error: Invalid language code (must be valid ISO 639-1 two-letter code): xyz
Key observations (from example): - Item 10780 is missing required dcterms:identifier
(error) - Item 10781 has missing dcterms:description
(warning - optional field) - Item 10782 has an invalid Iconclass code in dcterms:subject
- Item 10783 has an invalid ISO 639-1 language code
Media Validation CSV
File: analysis/media_validation.csv
Example format (download the link above for actual live data):
resource_id,edit_link,dcterms:identifier,dcterms:creator,dcterms:license,o:media_type
10778,https://omeka.unibe.ch/admin/media/10778,,warning: Missing field,,
10779,https://omeka.unibe.ch/admin/media/10779,error: Field is required,,,error: Invalid MIME type: application/unknown
Key observations (from example): - Media 10778 has missing dcterms:creator
(warning - recommended field) - Media 10779 has both a missing required identifier and an invalid MIME type
Validation Summary CSV
File: analysis/validation_summary.csv
Example format (download the link above for actual live data):
metric,value
items_validated,150
media_validated,300
total_errors,12
total_warnings,45
items_with_issues,4
media_with_issues,2
uris_checked,450
failed_uris,3
Summary insights (from example): - 150 items and 300 media objects validated - 12 errors require immediate attention - 45 warnings suggest improvements - Only 4 items and 2 media have issues (high quality: 97.3% items, 99.3% media) - 3 of 450 URIs checked are unreachable (99.3% URI health)
Download the actual validation summary CSV from the link in the blue box above to see current statistics for the Stadt.Geschichte.Basel collection.
Interpreting the Results
Prioritization
- Fix errors first: Review items and media with validation errors in the downloaded CSV files
- Review warnings: Check resources with missing optional fields
- Check failed URIs: Verify and update unreachable URLs
Using Edit Links
Click the edit_link
values in the downloaded CSV files to navigate directly to the Omeka admin interface for quick corrections.
Example edit links: - Items: https://omeka.unibe.ch/admin/items/<item_id>
- Media: https://omeka.unibe.ch/admin/media/<media_id>
Data Quality Metrics
The validation provides comprehensive data quality metrics. Download the summary CSV to see current statistics for your collection, including error counts, warning counts, and overall data quality percentages.