API Reference Documentation

Modified

October 15, 2025

This document provides comprehensive documentation for all Python modules and functions in the omeka2dsp system.

Module Overview

The omeka2dsp system consists of five main Python modules:

graph TD
    A[data_2_dasch.py<br/>Main Migration Script] --> B[process_data_from_omeka.py<br/>Omeka Data Extraction]
    A --> C[api_get_project.py<br/>DSP Project Info]
    A --> D[api_get_lists.py<br/>DSP Lists]
    A --> E[api_get_lists_detailed.py<br/>Detailed List Data]

    click A href "#data_2_dasch.py" "Jump to data_2_dasch.py"
    click B href "#process_data_from_omeka.py" "Jump to process_data_from_omeka.py"
    click C href "#api_get_project.py" "Jump to api_get_project.py"
    click D href "#api_get_lists.py" "Jump to api_get_lists.py"
    click E href "#api_get_lists_detailed.py" "Jump to api_get_lists_detailed.py"
    
    style A fill:#86bbd8
    style B fill:#ffe880
    style C fill:#3a1e3e,color:#fff
    style D fill:#3a1e3e,color:#fff
    style E fill:#3a1e3e,color:#fff

data_2_dasch.py

The main migration script that orchestrates the entire data transfer process from Omeka to DSP.

Core Functions

`main() -> None`

Purpose: Entry point that orchestrates the entire migration process.

Workflow:

Parse command-line arguments for processing mode
Fetch and filter data based on mode (all_data, sample_data, test_data)
Authenticate with DSP and retrieve project information
Process each item: create new or synchronize existing resources
Handle associated media files

Parameters: None (uses command-line arguments)

Returns: None

Example Usage:

uv run python scripts/data_2_dasch.py -m sample_data

`parse_arguments() -> Namespace`

Purpose: Parses command-line arguments for processing mode selection.

Parameters: None (reads from sys.argv)

Returns:

Namespace: Contains parsed arguments with mode attribute

Available Modes:

all_data: Process entire collection
sample_data: Process random sample (configurable size)
test_data: Process predefined test dataset

Example:

args = parse_arguments()
print(args.mode)  # 'sample_data'

Authentication & Project Functions

`login(email: str, password: str) -> str`

Purpose: Authenticates with DSP API and retrieves JWT token.

Parameters:

email: DSP user email address
password: DSP user password

Returns:

str: JWT authentication token

Raises:

requests.RequestException: On authentication failure
KeyError: If response format is unexpected

Example:

token = login("user@example.com", "password")

`get_project() -> str`

Purpose: Retrieves project information from DSP API using project shortcode.

Parameters: None (uses PROJECT_SHORT_CODE environment variable)

Returns:

str: Project IRI/identifier

Side Effects: Logs project information

Example:

project_iri = get_project()
# Returns: "http://rdfh.ch/projects/IbwoJlv8SEa6L13vXyCzMg"

`get_lists(project_iri: str) -> list`

Purpose: Retrieves all list configurations for a DSP project.

Parameters:

project_iri: The project IRI to fetch lists for

Returns:

list: Array of complete list objects with nodes and values

Process:

Fetches list summary from /admin/lists/
For each list, fetches detailed information from /v2/lists/{id}
Returns complete list data for mapping operations

Example:

lists = get_lists(project_iri)
for list_obj in lists:
    print(f"List: {list_obj['listinfo']['name']}")

Resource Management Functions

`get_full_resource(token: str, resource_iri: str) -> dict`

Purpose: Retrieves complete resource data from DSP API.

Parameters:

token: JWT authentication token
resource_iri: URL-encoded resource IRI

Returns:

dict: Complete resource JSON object

Usage: Used for synchronization to compare existing DSP data with Omeka data.

Example:

resource_data = get_full_resource(token, urllib.parse.quote(resource_iri, safe=''))

`get_resource_by_id(token: str, object_class: str, identifier: str) -> dict`

Purpose: Finds a resource by its identifier using SPARQL query.

Parameters:

token: JWT authentication token
object_class: DSP resource class (e.g., “SGB:Parent”)
identifier: Unique identifier to search for

Returns:

dict: Resource data if found, empty dict if not found

SPARQL Query: Constructs and executes a SPARQL query to find resources by identifier.

Example:

resource = get_resource_by_id(token, f"{PREFIX}Parent", "abb13025")
if resource:
    print(f"Found resource: {resource['@id']}")

`create_resource(payload: dict, token: str) -> None`

Purpose: Creates a new resource in DSP using the provided payload.

Parameters:

payload: Complete DSP resource payload (JSON-LD format)
token: JWT authentication token

Returns: None

Side Effects:

Creates resource in DSP
Logs creation success/failure

Example:

payload = construct_payload(omeka_item, f"{PREFIX}Parent", project_iri, lists, "", "")
create_resource(payload, token)

Data Extraction & Transformation Functions

`extract_dasch_propvalue(item: dict, prop: str) -> str`

Purpose: Extracts a single property value from a DSP resource.

Parameters:

item: DSP resource data
prop: Property name (without prefix)

Returns:

str: Property value or empty string if not found

Supported Value Types: TextValue, ListValue, LinkValue, UriValue

Example:

title = extract_dasch_propvalue(dsp_resource, "title")

`extract_dasch_propvalue_multiple(item: dict, prop: str) -> list`

Purpose: Extracts multiple values for a property from a DSP resource.

Parameters:

item: DSP resource data
prop: Property name (without prefix)

Returns:

list: Array of property values

Usage: For properties that can have multiple values (arrays).

Example:

subjects = extract_dasch_propvalue_multiple(dsp_resource, "subject")

`extract_value_from_entry(entry: dict) -> str`

Purpose: Extracts the actual value from a DSP property entry based on its type.

Parameters:

entry: DSP property entry with @type and value fields

Returns:

str: Extracted value or None

Supported Types:

knora-api:TextValue: Returns knora-api:valueAsString
knora-api:ListValue: Returns node IRI from knora-api:listValueAsListNode
knora-api:LinkValue: Returns target IRI from knora-api:linkValueHasTargetIri
knora-api:UriValue: Returns URI from knora-api:uriValueAsUri

Example:

value = extract_value_from_entry({
    "@type": "knora-api:TextValue",
    "knora-api:valueAsString": "Example text"
})
# Returns: "Example text"

`construct_payload(item: dict, type: str, project_iri: str, lists: list, parent_iri: str, internalMediaFilename: str) -> dict`

Purpose: Converts Omeka item data into DSP-compatible JSON-LD payload.

Parameters:

item: Omeka item data
type: DSP resource type (e.g., “SGB:Parent”, “SGB:Image”, “SGB:Document”)
project_iri: Project IRI for resource association
lists: DSP lists for value mapping
parent_iri: Parent resource IRI for linking
internalMediaFilename: Internal filename for media resources

Returns:

dict: Complete DSP resource payload in JSON-LD format

Key Transformations:

Omeka Property	DSP Property	Value Type	Notes
`dcterms:title`	`rdfs:label`	String	Resource label
`dcterms:identifier`	`hasIdentifier`	TextValue	Unique identifier
`dcterms:description`	`hasDescription`	TextValue	Item description
`dcterms:creator`	`hasCreator`	TextValue Array	Multi-valued creator entries
`dcterms:date`	`hasDate`	TextValue	EDTF date string
`dcterms:subject`	`hasSubjectList`	ListValue Array	Iconclass subject list
`dcterms:type`	`hasTypeList`	ListValue	DCMI Type vocabulary
`dcterms:format`	`hasFormatList`	ListValue	Internet media type list
`dcterms:language`	`hasLanguageList`	ListValue	ISO 639-1 codes
`dcterms:source`	`hasSource`	TextValue Array	Provenance/source notes
`dcterms:relation`	`hasRelation`	TextValue Array	Related resources
`dcterms:rights`	`hasRights`	TextValue	Rights statement
`dcterms:license`	`hasLicenseList`	ListValue	Controlled license list (CC/Rights)

Example:

payload = construct_payload(
    item=omeka_item,
    type=f"{PREFIX}Parent",
    project_iri=project_iri,
    lists=project_lists,
    parent_iri="",
    internalMediaFilename=""
)

`extract_listvalueiri_from_value(value: str, list_label: str, lists: list) -> str`

Purpose: Maps an Omeka value to a DSP list node IRI.

Parameters:

value: Value to map (e.g., “image/jpeg”)
list_label: Name of the DSP list to search in
lists: Array of DSP list objects

Returns:

str: DSP list node IRI if found, empty string otherwise

Process:

Finds the list with matching label
Searches through list nodes for matching value
Returns the node IRI for API operations

Example:

format_iri = extract_listvalueiri_from_value(
    "image/jpeg", 
    "Internet Media Type", 
    project_lists
)
# Returns: "http://rdfh.ch/lists/IbwoJlv8SEa6L13vXyCzMg/image-jpeg"

Synchronization Functions

`check_values(dasch_item: dict, omeka_item: dict, lists: list) -> list`

Purpose: Compares DSP and Omeka data to identify changes that need synchronization.

Parameters:

dasch_item: Current DSP resource data
omeka_item: Current Omeka item data
lists: DSP lists for value mapping

Returns:

list: Array of change operations (create, update, delete)

Change Detection:

Compares each property between systems
Identifies additions, deletions, and modifications
Handles both single values and arrays

Example:

changes = check_values(dsp_resource, omeka_item, project_lists)
for change in changes:
    print(f"Action: {change['type']}, Field: {change['field']}")

`sync_value(prop: str, prop_type: str, dasch_value: str, omeka_value: str) -> list`

Purpose: Generates sync operations for single-value properties.

Parameters:

prop: Property name
prop_type: DSP property type (TextValue, ListValue, etc.)
dasch_value: Current value in DSP
omeka_value: Current value in Omeka

Returns:

list: Array of change operations

Logic:

If values are different, creates update operation
If DSP has value but Omeka doesn’t, creates delete operation
If Omeka has value but DSP doesn’t, creates create operation

`sync_array_value(prop: str, prop_type: str, dasch_array: list, omeka_array: list) -> list`

Purpose: Generates sync operations for multi-value properties.

Parameters:

prop: Property name
prop_type: DSP property type
dasch_array: Current values in DSP
omeka_array: Current values in Omeka

Returns:

list: Array of change operations

Algorithm:

Converts arrays to sets for comparison
Calculates additions (in Omeka but not DSP)
Calculates deletions (in DSP but not Omeka)
Generates corresponding create/delete operations

`update_value(token: str, item: dict, value: str, field: str, field_type: str, type_of_change: str) -> None`

Purpose: Executes a single value update operation via DSP API.

Parameters:

token: JWT authentication token
item: DSP resource data
value: New value to set
field: Property name
field_type: DSP value type
type_of_change: Operation type (“create”, “update”, “delete”)

Returns: None

Side Effects:

Modifies DSP resource via API
Logs operation results

File Upload Functions

`upload_file_from_url(file_url: str, token: str, zip: bool = False) -> str`

Purpose: Downloads a file from Omeka and uploads it to DSP storage.

Parameters:

file_url: URL of file in Omeka
token: JWT authentication token
zip: Whether to compress file before upload

Returns:

str: Internal filename assigned by DSP

Process:

Downloads file from Omeka URL
Saves to temporary file
Optionally creates ZIP archive
Uploads to DSP via multipart form
Returns DSP internal filename

Example:

internal_filename = upload_file_from_url(
    "https://omeka.example.com/files/image.jpg",
    token,
    zip=False
)

`specify_mediaclass(media_type: str) -> str`

Purpose: Determines appropriate DSP media class based on MIME type.

Parameters:

media_type: MIME type string (e.g., “image/jpeg”)

Returns:

str: DSP media class name

Mapping:

image/* → SGB:Image
application/pdf, text/*, archives → SGB:Document

Example:

media_class = specify_mediaclass("image/jpeg")
# Returns: "SGB:Image"

Utility Functions

`arrays_equal(array1: list, array2: list) -> bool`

Purpose: Compares two arrays for equality, ignoring order.

Parameters:

array1: First array to compare
array2: Second array to compare

Returns:

bool: True if arrays contain the same elements

Usage: Used in synchronization to detect array changes.

process_data_from_omeka.py

Handles data extraction and processing from the Omeka API.

Core Functions

`get_items_from_collection(collection_id: str) -> list`

Purpose: Retrieves all items from a specified Omeka collection with pagination handling.

Parameters:

collection_id: Omeka collection/item set ID

Returns:

list: Array of all items in the collection

Features:

Automatic pagination handling
Rate limiting compliance
Error recovery for temporary failures

Example:

items = get_items_from_collection("10780")
print(f"Found {len(items)} items")

`get_media(item_id: str) -> list`

Purpose: Retrieves all media files associated with a specific Omeka item.

Parameters:

item_id: Omeka item ID

Returns:

list: Array of media objects with metadata and file URLs

Example:

media_files = get_media("12345")
for media in media_files:
    print(f"Media: {media.get('o:filename')}")

`get_paginated_items(url: str, params: dict) -> list`

Purpose: Generic function to handle paginated API requests.

Parameters:

url: Base API endpoint URL
params: Query parameters for first request

Returns:

list: Combined results from all pages

Features:

Follows pagination links automatically
Handles rate limiting
Error recovery

Data Extraction Functions

`extract_property(props: list, prop_id: int, as_uri: bool = False, only_label: bool = False) -> str`

Purpose: Extracts a specific property value from Omeka property array.

Parameters:

props: Array of Omeka property objects
prop_id: Numerical ID of property to extract
as_uri: Return as formatted URI link (default: False)
only_label: Return only the label (default: False)

Returns:

str: Property value in requested format

Formats:

Default: Returns @value field
as_uri=True: Returns [label](uri) markdown format
only_label=True: Returns o:label field only

Example:

title = extract_property(item.get("dcterms:title", []), 1)
creator_link = extract_property(item.get("dcterms:creator", []), 2, as_uri=True)

`extract_combined_values(props: list) -> list`

Purpose: Combines text values and URI references from properties into a single array.

Parameters:

props: Array of Omeka property objects

Returns:

list: Combined array of text values and formatted URI links

Process:

Extracts all @value text fields
Formats URI references as HTML links
Escapes semicolons to prevent conflicts
Returns combined array

Example:

subjects = extract_combined_values(item.get("dcterms:subject", []))
# Returns: ["History", "Basel", "<a href='...'>Authority Record</a>"]

Utility Functions

`is_valid_url(url: str) -> bool`

Purpose: Validates if a string is a properly formatted URL.

Parameters:

url: URL string to validate

Returns:

bool: True if URL is valid

Example:

valid = is_valid_url("https://example.com/file.jpg")

`download_file(url: str, dest_path: str) -> None`

Purpose: Downloads a file from URL to local path.

Parameters:

url: Source file URL
dest_path: Destination file path

Returns: None

Features:

Creates directories as needed
Streaming download for large files
Error handling and logging

api_get_project.py

Standalone script to fetch DSP project information.

`get_project() -> None`

Purpose: Fetches project data from DSP API and saves to file.

Environment Variables:

PROJECT_SHORT_CODE: DSP project shortcode
API_HOST: DSP API base URL

Output: Saves project data to ../data/project_data.json

Example Usage:

export PROJECT_SHORT_CODE="0123"
export API_HOST="https://api.dasch.swiss"
uv run python scripts/api_get_project.py

api_get_lists.py

Standalone script to fetch DSP list configurations.

`get_lists() -> None`

Purpose: Fetches list summary from DSP API and saves to file.

Configuration:

Hardcoded project IRI (should be updated for different projects)
Fixed API host (should be made configurable)

Output: Saves list data to ../data/data_lists.json

Example Usage:

uv run python scripts/api_get_lists.py

api_get_lists_detailed.py

Standalone script to fetch detailed DSP list information.

`get_complete_list(list_id: str) -> dict`

Purpose: Fetches complete list data for a single list ID.

Parameters:

list_id: DSP list IRI

Returns:

dict: Complete list object with all nodes and values

Process:

URL-encodes the list IRI
Requests detailed list data from /v2/lists/{id}
Returns complete list structure

Main Script Logic:

Loads list summary from data_lists.json
Iterates through each list
Fetches detailed information for each
Saves all detailed lists to data_lists_detail.json

Configuration Constants

Environment Variables

Variable	Type	Required	Description	Default
`ITEM_SET_ID`	string	No	Omeka collection ID	“10780”
`PROJECT_SHORT_CODE`	string	Yes	DSP project shortcode	None
`API_HOST`	string	Yes	DSP API base URL	None
`INGEST_HOST`	string	Yes	DSP ingest service URL	None
`DSP_USER`	string	Yes	DSP username	None
`DSP_PWD`	string	Yes	DSP password	None
`ONTOLOGY_NAME`	string	No	Ontology name	“SGB”
`OMEKA_API_URL`	string	No	Omeka API base URL	“https://omeka.unibe.ch/api/”
`KEY_IDENTITY`	string	Yes	Omeka API key identity	None
`KEY_CREDENTIAL`	string	Yes	Omeka API key credential	None

Processing Constants

Constant	Value	Description
`NUMBER_RANDOM_OBJECTS`	2	Number of items for sample mode
`TEST_DATA`	Set of identifiers	Specific items for test mode

Error Handling

Exception Types

The system handles several types of errors:

Authentication Errors: Invalid credentials, expired tokens
Network Errors: Connection timeouts, API unavailability
Data Validation Errors: Invalid payloads, missing required fields
Rate Limiting: API quota exceeded
File System Errors: Permission issues, disk space

Logging Configuration

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s
 -%(levelname)s
 -%(message)s",
    handlers=[
        logging.StreamHandler(),  # Console output
        logging.FileHandler("data_2_dasch.log", mode='w')  # File output
    ]
)

Error Recovery Strategies

Retry with Exponential Backoff: For temporary network issues
Skip and Continue: For individual item processing errors
Fail Fast: For critical configuration or authentication errors
Graceful Degradation: Continue with reduced functionality when possible

Common Error Scenarios

Error	Cause	Recovery Strategy
Authentication failure	Invalid credentials	Re-authenticate or exit
Resource not found	Item doesn’t exist in DSP	Create new resource
Rate limit exceeded	Too many API requests	Wait and retry
Invalid payload	Data format error	Log error, skip item
Network timeout	Connection issues	Retry with backoff
File upload failure	File system or network issue	Retry or skip media