API Reference Documentation

Modified

August 29, 2025

This document provides comprehensive documentation for all Python modules and functions in the omeka2dsp system.

Module Overview

The omeka2dsp system consists of five main Python modules:

graph TD
    A[data_2_dasch.py<br/>Main Migration Script] --> B[process_data_from_omeka.py<br/>Omeka Data Extraction]
    A --> C[api_get_project.py<br/>DSP Project Info]
    A --> D[api_get_lists.py<br/>DSP Lists]
    A --> E[api_get_lists_detailed.py<br/>Detailed List Data]

    click A href "#data_2_dasch.py" "Jump to data_2_dasch.py"
    click B href "#process_data_from_omeka.py" "Jump to process_data_from_omeka.py"
    click C href "#api_get_project.py" "Jump to api_get_project.py"
    click D href "#api_get_lists.py" "Jump to api_get_lists.py"
    click E href "#api_get_lists_detailed.py" "Jump to api_get_lists_detailed.py"
    
    style A fill:#e3f2fd
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#f3e5f5
    style E fill:#f3e5f5

data_2_dasch.py

The main migration script that orchestrates the entire data transfer process from Omeka to DSP.

Core Functions

main() -> None

Purpose: Entry point that orchestrates the entire migration process.

Workflow:

  1. Parse command-line arguments for processing mode
  2. Fetch and filter data based on mode (all_data, sample_data, test_data)
  3. Authenticate with DSP and retrieve project information
  4. Process each item: create new or synchronize existing resources
  5. Handle associated media files

Parameters: None (uses command-line arguments)

Returns: None

Example Usage:

uv run python scripts/data_2_dasch.py -m sample_data

parse_arguments() -> Namespace

Purpose: Parses command-line arguments for processing mode selection.

Parameters: None (reads from sys.argv)

Returns:

  • Namespace: Contains parsed arguments with mode attribute

Available Modes:

  • all_data: Process entire collection
  • sample_data: Process random sample (configurable size)
  • test_data: Process predefined test dataset

Example:

args = parse_arguments()
print(args.mode)  # 'sample_data'

Authentication & Project Functions

login(email: str, password: str) -> str

Purpose: Authenticates with DSP API and retrieves JWT token.

Parameters:

  • email: DSP user email address
  • password: DSP user password

Returns:

  • str: JWT authentication token

Raises:

  • requests.RequestException: On authentication failure
  • KeyError: If response format is unexpected

Example:

token = login("user@example.com", "password")

get_project() -> str

Purpose: Retrieves project information from DSP API using project shortcode.

Parameters: None (uses PROJECT_SHORT_CODE environment variable)

Returns:

  • str: Project IRI/identifier

Side Effects: Logs project information

Example:

project_iri = get_project()
# Returns: "http://rdfh.ch/projects/IbwoJlv8SEa6L13vXyCzMg"

get_lists(project_iri: str) -> list

Purpose: Retrieves all list configurations for a DSP project.

Parameters:

  • project_iri: The project IRI to fetch lists for

Returns:

  • list: Array of complete list objects with nodes and values

Process:

  1. Fetches list summary from /admin/lists/
  2. For each list, fetches detailed information from /v2/lists/{id}
  3. Returns complete list data for mapping operations

Example:

lists = get_lists(project_iri)
for list_obj in lists:
    print(f"List: {list_obj['listinfo']['name']}")

Resource Management Functions

get_full_resource(token: str, resource_iri: str) -> dict

Purpose: Retrieves complete resource data from DSP API.

Parameters:

  • token: JWT authentication token
  • resource_iri: URL-encoded resource IRI

Returns:

  • dict: Complete resource JSON object

Usage: Used for synchronization to compare existing DSP data with Omeka data.

Example:

resource_data = get_full_resource(token, urllib.parse.quote(resource_iri, safe=''))

get_resource_by_id(token: str, object_class: str, identifier: str) -> dict

Purpose: Finds a resource by its identifier using SPARQL query.

Parameters:

  • token: JWT authentication token
  • object_class: DSP resource class (e.g., “sgb_OBJECT”)
  • identifier: Unique identifier to search for

Returns:

  • dict: Resource data if found, empty dict if not found

SPARQL Query: Constructs and executes a SPARQL query to find resources by identifier.

Example:

resource = get_resource_by_id(token, f"{PREFIX}sgb_OBJECT", "abb13025")
if resource:
    print(f"Found resource: {resource['@id']}")

create_resource(payload: dict, token: str) -> None

Purpose: Creates a new resource in DSP using the provided payload.

Parameters:

  • payload: Complete DSP resource payload (JSON-LD format)
  • token: JWT authentication token

Returns: None

Side Effects:

  • Creates resource in DSP
  • Logs creation success/failure

Example:

payload = construct_payload(omeka_item, f"{PREFIX}sgb_OBJECT", project_iri, lists, "", "")
create_resource(payload, token)

Data Extraction & Transformation Functions

extract_dasch_propvalue(item: dict, prop: str) -> str

Purpose: Extracts a single property value from a DSP resource.

Parameters:

  • item: DSP resource data
  • prop: Property name (without prefix)

Returns:

  • str: Property value or empty string if not found

Supported Value Types: TextValue, ListValue, LinkValue, UriValue

Example:

title = extract_dasch_propvalue(dsp_resource, "title")

extract_dasch_propvalue_multiple(item: dict, prop: str) -> list

Purpose: Extracts multiple values for a property from a DSP resource.

Parameters:

  • item: DSP resource data
  • prop: Property name (without prefix)

Returns:

  • list: Array of property values

Usage: For properties that can have multiple values (arrays).

Example:

subjects = extract_dasch_propvalue_multiple(dsp_resource, "subject")

extract_value_from_entry(entry: dict) -> str

Purpose: Extracts the actual value from a DSP property entry based on its type.

Parameters:

  • entry: DSP property entry with @type and value fields

Returns:

  • str: Extracted value or None

Supported Types:

  • knora-api:TextValue: Returns knora-api:valueAsString
  • knora-api:ListValue: Returns node IRI from knora-api:listValueAsListNode
  • knora-api:LinkValue: Returns target IRI from knora-api:linkValueHasTargetIri
  • knora-api:UriValue: Returns URI from knora-api:uriValueAsUri

Example:

value = extract_value_from_entry({
    "@type": "knora-api:TextValue",
    "knora-api:valueAsString": "Example text"
})
# Returns: "Example text"

construct_payload(item: dict, type: str, project_iri: str, lists: list, parent_iri: str, internalMediaFilename: str) -> dict

Purpose: Converts Omeka item data into DSP-compatible JSON-LD payload.

Parameters:

  • item: Omeka item data
  • type: DSP resource type (e.g., “sgb_OBJECT”, “sgb_MEDIA_IMAGE”)
  • project_iri: Project IRI for resource association
  • lists: DSP lists for value mapping
  • parent_iri: Parent resource IRI for linking
  • internalMediaFilename: Internal filename for media resources

Returns:

  • dict: Complete DSP resource payload in JSON-LD format

Key Transformations:

Omeka Property DSP Property Value Type Notes
dcterms:title rdfs:label String Required field
dcterms:identifier identifier TextValue Unique identifier
dcterms:description description TextValue Item description
dcterms:creator creator TextValue Creator information
dcterms:date date TextValue Date information
dcterms:subject subject TextValue Array Subject tags
dcterms:type type ListValue Mapped to DSP lists
dcterms:format format ListValue Media format mapping
dcterms:language language ListValue Language mapping
dcterms:rights rights TextValue Rights information
dcterms:license license UriValue License URL

Example:

payload = construct_payload(
    item=omeka_item,
    type=f"{PREFIX}sgb_OBJECT",
    project_iri=project_iri,
    lists=project_lists,
    parent_iri="",
    internalMediaFilename=""
)

extract_listvalueiri_from_value(value: str, list_label: str, lists: list) -> str

Purpose: Maps an Omeka value to a DSP list node IRI.

Parameters:

  • value: Value to map (e.g., “image/jpeg”)
  • list_label: Name of the DSP list to search in
  • lists: Array of DSP list objects

Returns:

  • str: DSP list node IRI if found, empty string otherwise

Process:

  1. Finds the list with matching label
  2. Searches through list nodes for matching value
  3. Returns the node IRI for API operations

Example:

format_iri = extract_listvalueiri_from_value(
    "image/jpeg", 
    "Internet Media Type", 
    project_lists
)
# Returns: "http://rdfh.ch/lists/IbwoJlv8SEa6L13vXyCzMg/image-jpeg"

Synchronization Functions

check_values(dasch_item: dict, omeka_item: dict, lists: list) -> list

Purpose: Compares DSP and Omeka data to identify changes that need synchronization.

Parameters:

  • dasch_item: Current DSP resource data
  • omeka_item: Current Omeka item data
  • lists: DSP lists for value mapping

Returns:

  • list: Array of change operations (create, update, delete)

Change Detection:

  • Compares each property between systems
  • Identifies additions, deletions, and modifications
  • Handles both single values and arrays

Example:

changes = check_values(dsp_resource, omeka_item, project_lists)
for change in changes:
    print(f"Action: {change['type']}, Field: {change['field']}")

sync_value(prop: str, prop_type: str, dasch_value: str, omeka_value: str) -> list

Purpose: Generates sync operations for single-value properties.

Parameters:

  • prop: Property name
  • prop_type: DSP property type (TextValue, ListValue, etc.)
  • dasch_value: Current value in DSP
  • omeka_value: Current value in Omeka

Returns:

  • list: Array of change operations

Logic:

  • If values are different, creates update operation
  • If DSP has value but Omeka doesn’t, creates delete operation
  • If Omeka has value but DSP doesn’t, creates create operation

sync_array_value(prop: str, prop_type: str, dasch_array: list, omeka_array: list) -> list

Purpose: Generates sync operations for multi-value properties.

Parameters:

  • prop: Property name
  • prop_type: DSP property type
  • dasch_array: Current values in DSP
  • omeka_array: Current values in Omeka

Returns:

  • list: Array of change operations

Algorithm:

  1. Converts arrays to sets for comparison
  2. Calculates additions (in Omeka but not DSP)
  3. Calculates deletions (in DSP but not Omeka)
  4. Generates corresponding create/delete operations

update_value(token: str, item: dict, value: str, field: str, field_type: str, type_of_change: str) -> None

Purpose: Executes a single value update operation via DSP API.

Parameters:

  • token: JWT authentication token
  • item: DSP resource data
  • value: New value to set
  • field: Property name
  • field_type: DSP value type
  • type_of_change: Operation type (“create”, “update”, “delete”)

Returns: None

Side Effects:

  • Modifies DSP resource via API
  • Logs operation results

File Upload Functions

upload_file_from_url(file_url: str, token: str, zip: bool = False) -> str

Purpose: Downloads a file from Omeka and uploads it to DSP storage.

Parameters:

  • file_url: URL of file in Omeka
  • token: JWT authentication token
  • zip: Whether to compress file before upload

Returns:

  • str: Internal filename assigned by DSP

Process:

  1. Downloads file from Omeka URL
  2. Saves to temporary file
  3. Optionally creates ZIP archive
  4. Uploads to DSP via multipart form
  5. Returns DSP internal filename

Example:

internal_filename = upload_file_from_url(
    "https://omeka.example.com/files/image.jpg",
    token,
    zip=False
)

specify_mediaclass(media_type: str) -> str

Purpose: Determines appropriate DSP media class based on MIME type.

Parameters:

  • media_type: MIME type string (e.g., “image/jpeg”)

Returns:

  • str: DSP media class name

Mapping:

  • image/*sgb_MEDIA_IMAGE
  • application/pdf, text/*, application/zipsgb_MEDIA_ARCHIV
  • All others → sgb_MEDIA_ARCHIV (default)

Example:

media_class = specify_mediaclass("image/jpeg")
# Returns: "StadtGeschichteBasel_v1:sgb_MEDIA_IMAGE"

Utility Functions

arrays_equal(array1: list, array2: list) -> bool

Purpose: Compares two arrays for equality, ignoring order.

Parameters:

  • array1: First array to compare
  • array2: Second array to compare

Returns:

  • bool: True if arrays contain the same elements

Usage: Used in synchronization to detect array changes.

process_data_from_omeka.py

Handles data extraction and processing from the Omeka API.

Core Functions

get_items_from_collection(collection_id: str) -> list

Purpose: Retrieves all items from a specified Omeka collection with pagination handling.

Parameters:

  • collection_id: Omeka collection/item set ID

Returns:

  • list: Array of all items in the collection

Features:

  • Automatic pagination handling
  • Rate limiting compliance
  • Error recovery for temporary failures

Example:

items = get_items_from_collection("10780")
print(f"Found {len(items)} items")

get_media(item_id: str) -> list

Purpose: Retrieves all media files associated with a specific Omeka item.

Parameters:

  • item_id: Omeka item ID

Returns:

  • list: Array of media objects with metadata and file URLs

Example:

media_files = get_media("12345")
for media in media_files:
    print(f"Media: {media.get('o:filename')}")

get_paginated_items(url: str, params: dict) -> list

Purpose: Generic function to handle paginated API requests.

Parameters:

  • url: Base API endpoint URL
  • params: Query parameters for first request

Returns:

  • list: Combined results from all pages

Features:

  • Follows pagination links automatically
  • Handles rate limiting
  • Error recovery

Data Extraction Functions

extract_property(props: list, prop_id: int, as_uri: bool = False, only_label: bool = False) -> str

Purpose: Extracts a specific property value from Omeka property array.

Parameters:

  • props: Array of Omeka property objects
  • prop_id: Numerical ID of property to extract
  • as_uri: Return as formatted URI link (default: False)
  • only_label: Return only the label (default: False)

Returns:

  • str: Property value in requested format

Formats:

  • Default: Returns @value field
  • as_uri=True: Returns [label](uri) markdown format
  • only_label=True: Returns o:label field only

Example:

title = extract_property(item.get("dcterms:title", []), 1)
creator_link = extract_property(item.get("dcterms:creator", []), 2, as_uri=True)

extract_combined_values(props: list) -> list

Purpose: Combines text values and URI references from properties into a single array.

Parameters:

  • props: Array of Omeka property objects

Returns:

  • list: Combined array of text values and formatted URI links

Process:

  1. Extracts all @value text fields
  2. Formats URI references as HTML links
  3. Escapes semicolons to prevent conflicts
  4. Returns combined array

Example:

subjects = extract_combined_values(item.get("dcterms:subject", []))
# Returns: ["History", "Basel", "<a href='...'>Authority Record</a>"]

Utility Functions

is_valid_url(url: str) -> bool

Purpose: Validates if a string is a properly formatted URL.

Parameters:

  • url: URL string to validate

Returns:

  • bool: True if URL is valid

Example:

valid = is_valid_url("https://example.com/file.jpg")

download_file(url: str, dest_path: str) -> None

Purpose: Downloads a file from URL to local path.

Parameters:

  • url: Source file URL
  • dest_path: Destination file path

Returns: None

Features:

  • Creates directories as needed
  • Streaming download for large files
  • Error handling and logging

api_get_project.py

Standalone script to fetch DSP project information.

get_project() -> None

Purpose: Fetches project data from DSP API and saves to file.

Environment Variables:

  • PROJECT_SHORT_CODE: DSP project shortcode
  • API_HOST: DSP API base URL

Output: Saves project data to ../data/project_data.json

Example Usage:

export PROJECT_SHORT_CODE="0123"
export API_HOST="https://api.dasch.swiss"
uv run python scripts/api_get_project.py

api_get_lists.py

Standalone script to fetch DSP list configurations.

get_lists() -> None

Purpose: Fetches list summary from DSP API and saves to file.

Configuration:

  • Hardcoded project IRI (should be updated for different projects)
  • Fixed API host (should be made configurable)

Output: Saves list data to ../data/data_lists.json

Example Usage:

uv run python scripts/api_get_lists.py

api_get_lists_detailed.py

Standalone script to fetch detailed DSP list information.

get_complete_list(list_id: str) -> dict

Purpose: Fetches complete list data for a single list ID.

Parameters:

  • list_id: DSP list IRI

Returns:

  • dict: Complete list object with all nodes and values

Process:

  1. URL-encodes the list IRI
  2. Requests detailed list data from /v2/lists/{id}
  3. Returns complete list structure

Main Script Logic:

  1. Loads list summary from data_lists.json
  2. Iterates through each list
  3. Fetches detailed information for each
  4. Saves all detailed lists to data_lists_detail.json

Configuration Constants

Environment Variables

Variable Type Required Description Default
ITEM_SET_ID string No Omeka collection ID “10780”
PROJECT_SHORT_CODE string Yes DSP project shortcode None
API_HOST string Yes DSP API base URL None
INGEST_HOST string Yes DSP ingest service URL None
DSP_USER string Yes DSP username None
DSP_PWD string Yes DSP password None
PREFIX string No Ontology prefix “StadtGeschichteBasel_v1:”
OMEKA_API_URL string No Omeka API base URL “https://omeka.unibe.ch/api/”
KEY_IDENTITY string Yes Omeka API key identity None
KEY_CREDENTIAL string Yes Omeka API key credential None

Processing Constants

Constant Value Description
NUMBER_RANDOM_OBJECTS 2 Number of items for sample mode
TEST_DATA Set of identifiers Specific items for test mode

Error Handling

Exception Types

The system handles several types of errors:

  1. Authentication Errors: Invalid credentials, expired tokens
  2. Network Errors: Connection timeouts, API unavailability
  3. Data Validation Errors: Invalid payloads, missing required fields
  4. Rate Limiting: API quota exceeded
  5. File System Errors: Permission issues, disk space

Logging Configuration

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s
 -%(levelname)s
 -%(message)s",
    handlers=[
        logging.StreamHandler(),  # Console output
        logging.FileHandler("data_2_dasch.log", mode='w')  # File output
    ]
)

Error Recovery Strategies

  1. Retry with Exponential Backoff: For temporary network issues
  2. Skip and Continue: For individual item processing errors
  3. Fail Fast: For critical configuration or authentication errors
  4. Graceful Degradation: Continue with reduced functionality when possible

Common Error Scenarios

Error Cause Recovery Strategy
Authentication failure Invalid credentials Re-authenticate or exit
Resource not found Item doesn’t exist in DSP Create new resource
Rate limit exceeded Too many API requests Wait and retry
Invalid payload Data format error Log error, skip item
Network timeout Connection issues Retry with backoff
File upload failure File system or network issue Retry or skip media
Back to top