Troubleshooting Guide

Modified

August 29, 2025

Common issues and solutions for the omeka2dsp system.

Quick Diagnostics

System Health Check

Run this comprehensive health check to identify common issues:

#!/bin/bash
# health_check.sh
echo "=== omeka2dsp Health Check ==="

# Check Python version
echo "Python version:"
python --version

# Check required modules
echo -e "\nChecking Python modules:"
python -c "import requests; print('✓ requests module available')" || echo "✗ requests module missing"
python -c "import json; print('✓ json module available')" || echo "✗ json module missing"

# Check environment file
echo -e "\nChecking configuration:"
if [ -f ".env" ]; then
    echo "✓ .env file exists"
    
    # Check required variables
    source .env
    [ -n "$OMEKA_API_URL" ] && echo "✓ OMEKA_API_URL set" || echo "✗ OMEKA_API_URL missing"
    [ -n "$DSP_USER" ] && echo "✓ DSP_USER set" || echo "✗ DSP_USER missing"
    [ -n "$PROJECT_SHORT_CODE" ] && echo "✓ PROJECT_SHORT_CODE set" || echo "✗ PROJECT_SHORT_CODE missing"
else
    echo "✗ .env file missing"
fi

# Check network connectivity
echo -e "\nChecking network connectivity:"
curl -s --connect-timeout 5 https://api.dasch.swiss/health > /dev/null && echo "✓ DSP API reachable" || echo "✗ DSP API not reachable"

# Check directory structure
echo -e "\nChecking directory structure:"
[ -d "data" ] && echo "✓ data directory exists" || echo "✗ data directory missing"
[ -d "scripts" ] && echo "✓ scripts directory exists" || echo "✗ scripts directory missing"

# Check permissions
echo -e "\nChecking permissions:"
[ -w "." ] && echo "✓ Current directory writable" || echo "✗ Current directory not writable"
[ -w "data" ] && echo "✓ Data directory writable" || echo "✗ Data directory not writable"

echo -e "\n=== Health Check Complete ==="

Run the health check:

chmod +x health_check.sh
./health_check.sh

Log Analysis Script

Quickly analyze log files for common issues:

#!/bin/bash
# analyze_logs.sh
echo "=== Log Analysis ==="

if [ -f "data_2_dasch.log" ]; then
    echo "Errors found:"
    grep -c "ERROR" data_2_dasch.log
    
    echo -e "\nMost recent errors:"
    grep "ERROR" data_2_dasch.log | tail -5
    
    echo -e "\nSuccess statistics:"
    echo "Items processed: $(grep -c 'Processing item:' data_2_dasch.log)"
    echo "Resources created: $(grep -c 'Resource created successfully' data_2_dasch.log)"
    echo "Resources updated: $(grep -c 'Resource updated successfully' data_2_dasch.log)"
    
    echo -e "\nCommon issues:"
    grep -c "Authentication failed" data_2_dasch.log && echo "- Authentication failures detected"
    grep -c "Network timeout" data_2_dasch.log && echo "- Network timeout issues detected"
    grep -c "File not found" data_2_dasch.log && echo "- Missing file issues detected"
else
    echo "No log file found (data_2_dasch.log)"
fi

Authentication Issues

Issue: “Authentication failed” or “Login unsuccessful”

Symptoms:

ERROR - Authentication failed
ERROR - Failed to retrieve project. Status code: 401

Diagnostic Steps:

  1. Verify Credentials

    # Test credentials manually
    python -c "
    import os, requests
    response = requests.post('$API_HOST/v2/authentication', 
        json={'email': '$DSP_USER', 'password': '$DSP_PWD'})
    print(f'Status: {response.status_code}')
    print(f'Response: {response.text}')
    "
  2. Check Account Status

    • Verify account is not locked
    • Confirm account has necessary permissions
    • Check if password has expired

Solutions:

Problem Solution
Invalid credentials Update .env with correct username/password
Account locked Contact DSP administrator to unlock
Password expired Reset password through DSP interface
Insufficient permissions Request project access from administrator
Wrong API endpoint Verify API_HOST in configuration

Issue: “Token expired” during migration

Symptoms:

ERROR - Request failed with status 401
WARNING - Token may have expired

Solutions:

  1. Increase Token Lifetime (if possible)

    # Add token refresh logic to data_2_dasch.py
    def refresh_token_if_needed(token):
        # Check token expiration
        # Re-authenticate if necessary
        pass
  2. Implement Auto-Retry

    # Add retry with re-authentication
    def api_call_with_retry(url, token, max_retries=3):
        for attempt in range(max_retries):
            try:
                response = requests.get(url, headers={'Authorization': f'Bearer {token}'})
                if response.status_code == 401 and attempt < max_retries - 1:
                    token = login(DSP_USER, DSP_PWD)  # Re-authenticate
                    continue
                return response
            except Exception as e:
                if attempt == max_retries - 1:
                    raise
        return None

Network and Connectivity

Issue: Connection timeouts or network errors

Symptoms:

ERROR - Connection timeout
ERROR - Network unreachable
requests.exceptions.ConnectionError

Diagnostic Steps:

  1. Test Basic Connectivity

    # Test DSP API
    curl -v https://api.dasch.swiss/health
    
    # Test Omeka API
    curl -v "$OMEKA_API_URL"
    
    # Check DNS resolution
    nslookup api.dasch.swiss
  2. Test API Endpoints

    # Test specific endpoints
    curl -X POST https://api.dasch.swiss/v2/authentication \
      -H "Content-Type: application/json" \
      -d '{"email":"test","password":"test"}'

Solutions:

Problem Solution
Firewall blocking Configure firewall to allow HTTPS (443)
Proxy issues Set HTTP_PROXY and HTTPS_PROXY variables
DNS issues Use IP addresses or different DNS servers
SSL certificate problems Update certificates or bypass verification (testing only)
Network instability Increase timeout values, implement retry logic

Issue: SSL Certificate Verification Failed

Symptoms:

requests.exceptions.SSLError: certificate verify failed

Solutions:

  1. Update Certificates

    # Update system certificates
    sudo apt-get update && sudo apt-get install ca-certificates
    
    # Or on macOS
    brew install ca-certificates
  2. Temporary Bypass (development only)

    # Add SSL verification bypass
    import requests
    requests.packages.urllib3.disable_warnings()
    
    # In API calls, add verify=False
    response = requests.get(url, verify=False)

Data Processing Errors

Issue: “KeyError” or missing property errors

Symptoms:

KeyError: 'dcterms:title'
ERROR - Required field missing

Diagnostic Steps:

  1. Inspect Problematic Item

    # Debug specific item
    from scripts.process_data_from_omeka import get_items_from_collection
    import json
    
    items = get_items_from_collection('10780') # SGB Item Set ID
    problem_item = items[0]  # Adjust index as needed
    print(json.dumps(problem_item, indent=2))
  2. Check Property Mappings

    # Verify property structure
    item = get_problem_item()
    print("Available properties:")
    for key in item.keys():
        if key.startswith('dcterms:'):
            print(f"  {key}: {item[key]}")

Solutions:

  1. Add Null Checks

    # Defensive programming
    def safe_extract_property(item, property_name, property_id):
        props = item.get(property_name, [])
        if not props:
            return ""
        return extract_property(props, property_id)
  2. Update Property Mappings

    # Handle different property structures
    def flexible_property_extract(item, property_names, property_id):
        for prop_name in property_names:
            if prop_name in item and item[prop_name]:
                return extract_property(item[prop_name], property_id)
        return ""

Issue: Invalid payload or validation errors

Symptoms:

ERROR - Invalid payload structure
ERROR - Required property missing in DSP

Solutions:

  1. Payload Validation

    def validate_payload(payload):
        required_fields = ['@context', '@type', 'rdfs:label']
        for field in required_fields:
            if field not in payload:
                raise ValueError(f"Missing required field: {field}")
    
        # Validate field types
        if not isinstance(payload.get('rdfs:label'), str):
            raise ValueError("rdfs:label must be a string")
    
        return True
  2. Schema Compliance

    # Ensure payload matches DSP expectations
    def ensure_dsp_compliance(payload):
        # Remove empty fields
        cleaned_payload = {k: v for k, v in payload.items() if v}
    
        # Ensure required structure
        if '@context' not in cleaned_payload:
            cleaned_payload['@context'] = get_default_context()
    
        return cleaned_payload

File Upload Problems

Issue: Media files fail to upload

Symptoms:

ERROR - File upload failed
ERROR - File not found: filename.jpg
requests.exceptions.RequestException

Diagnostic Steps:

  1. Test File Accessibility

    # Test file URL directly
    curl -I "https://omeka.unibe.ch/files/original/98d8559515187ec4a710347c7b9e6cda0bdd58d2.tif"
    
    # Check file size
    curl -sI "URL" | grep -i content-length
  2. Test Upload Process

    # Test file download and upload separately
    import requests
    
    # Download test
    response = requests.get(file_url)
    print(f"Download status: {response.status_code}")
    print(f"Content length: {len(response.content)}")
    
    # Upload test
    files = {'file': ('test.jpg', response.content, 'image/jpeg')}
    upload_response = requests.post(upload_url, files=files, headers=headers)
    print(f"Upload status: {upload_response.status_code}")

Solutions:

Problem Solution
File not accessible Check Omeka permissions and URL
File too large Implement file compression or chunked upload
Timeout during upload Increase timeout, implement retry logic
Unsupported format Add format conversion or skip unsupported files
Insufficient storage Clean up temporary files, check disk space

Issue: File format not supported

Solutions:

  1. Add Format Support

    def specify_mediaclass(media_type: str) -> str:
        """Enhanced media type detection"""
        type_mapping = {
            'image/jpeg': f'{PREFIX}sgb_MEDIA_IMAGE',
            'image/png': f'{PREFIX}sgb_MEDIA_IMAGE',
            'image/gif': f'{PREFIX}sgb_MEDIA_IMAGE',
            'image/tiff': f'{PREFIX}sgb_MEDIA_IMAGE',
            'application/pdf': f'{PREFIX}sgb_MEDIA_ARCHIV',
            'text/plain': f'{PREFIX}sgb_MEDIA_ARCHIV',
            'audio/mpeg': f'{PREFIX}sgb_MEDIA_AUDIO',  # New
            'video/mp4': f'{PREFIX}sgb_MEDIA_VIDEO'    # New
        }
    
        return type_mapping.get(media_type, f'{PREFIX}sgb_MEDIA_ARCHIV')
  2. Format Conversion

    from PIL import Image
    
    def convert_image_format(file_path, target_format='JPEG'):
        """Convert image to supported format"""
        with Image.open(file_path) as img:
            if img.format != target_format:
                converted_path = file_path.replace(img.format.lower(), target_format.lower())
                img.convert('RGB').save(converted_path, target_format)
                return converted_path
        return file_path

Performance Issues

Issue: Migration running very slowly

Symptoms:

  • Processing takes much longer than expected
  • High memory usage
  • Network timeouts
  • System becomes unresponsive

Diagnostic Steps:

  1. Performance Monitoring

    # Monitor system resources
    htop
    
    # Monitor network usage
    iftop
    
    # Check disk I/O
    iostat -x 1
    
    # Monitor Python process
    py-spy top --pid $(pgrep -f data_2_dasch.py)
  2. Profile the Code

    import cProfile
    import pstats
    
    # Profile the migration
    cProfile.run('main()', 'migration_profile.prof')
    
    # Analyze results
    stats = pstats.Stats('migration_profile.prof')
    stats.sort_stats('cumulative').print_stats(10)

Solutions:

  1. Batch Processing

    def process_items_in_batches(items, batch_size=50):
        """Process items in smaller batches"""
        for i in range(0, len(items), batch_size):
            batch = items[i:i+batch_size]
            for item in batch:
                process_item(item)
    
            # Small delay between batches
            time.sleep(1)
    
            # Memory cleanup
            import gc
            gc.collect()
  2. Optimize API Calls

    import requests
    from requests.adapters import HTTPAdapter
    from urllib3.util.retry import Retry
    
    def create_optimized_session():
        session = requests.Session()
    
        # Connection pooling
        adapter = HTTPAdapter(
            pool_connections=10,
            pool_maxsize=20,
            max_retries=Retry(
                total=3,
                backoff_factor=0.3,
                status_forcelist=[500, 502, 504]
            )
        )
    
        session.mount('http://', adapter)
        session.mount('https://', adapter)
        return session

Configuration Problems

Issue: Environment variables not loading

Symptoms:

KeyError: 'PROJECT_SHORT_CODE'
ERROR - Configuration variable not found

Solutions:

  1. Verify .env File

    # Check file exists and has content
    ls -la .env
    cat .env
    
    # Check for hidden characters
    hexdump -C .env | head
  2. Load Environment Explicitly

    # Add explicit environment loading
    from dotenv import load_dotenv
    import os
    
    # Load .env file
    load_dotenv()
    
    # Verify variables
    required_vars = ['OMEKA_API_URL', 'DSP_USER', 'PROJECT_SHORT_CODE']
    for var in required_vars:
        if not os.getenv(var):
            raise EnvironmentError(f"Required environment variable {var} not set")

Issue: Incorrect API endpoints

Symptoms:

ERROR - 404 Not Found
ERROR - Invalid API endpoint

Solutions:

  1. Endpoint Validation

    def validate_endpoints():
        """Validate API endpoints are reachable"""
        endpoints = {
            'DSP API': f"{API_HOST}/health",
            'Omeka API': f"{OMEKA_API_URL}items?per_page=1"
        }
    
        for name, url in endpoints.items():
            try:
                response = requests.get(url, timeout=10)
                print(f"✓ {name}: {response.status_code}")
            except Exception as e:
                print(f"✗ {name}: {e}")
  2. Dynamic Endpoint Discovery

    def discover_api_version():
        """Discover correct API version"""
        base_url = API_HOST.rstrip('/')
        versions = ['/v2', '/v1', '']
    
        for version in versions:
            try:
                url = f"{base_url}{version}/health"
                response = requests.get(url, timeout=5)
                if response.status_code == 200:
                    return f"{base_url}{version}"
            except:
                continue
    
        raise Exception("No valid API endpoint found")

DSP-Specific Issues

Issue: “Project not found” or invalid project shortcode

Solutions:

  1. Verify Project Information

    # Test project endpoint
    curl -H "Authorization: Bearer $TOKEN" \
      "$API_HOST/admin/projects/shortcode/$PROJECT_SHORT_CODE"
  2. List Available Projects

    def list_available_projects(token):
        """List projects user has access to"""
        response = requests.get(
            f"{API_HOST}/admin/projects",
            headers={'Authorization': f'Bearer {token}'}
        )
    
        if response.status_code == 200:
            projects = response.json().get('projects', [])
            for project in projects:
                print(f"Shortcode: {project['shortcode']}, Name: {project['shortname']}")
        else:
            print(f"Failed to fetch projects: {response.status_code}")

Issue: List value mapping failures

Symptoms:

WARNING - List value not found for: unknown_value
ERROR - Invalid list node IRI

Solutions:

  1. Debug List Mappings

    def debug_list_mappings(lists, search_value):
        """Debug list value mappings"""
        print(f"Searching for: {search_value}")
    
        for list_obj in lists:
            list_name = list_obj['listinfo']['name']
            print(f"\nList: {list_name}")
    
            nodes = list_obj['list']['children']
            for node in nodes:
                labels = node['labels']
                for label in labels:
                    if search_value.lower() in label['value'].lower():
                        print(f"  Found match: {label['value']} -> {node['id']}")
  2. Fuzzy Matching

    from difflib import get_close_matches
    
    def find_closest_list_value(value, list_values, cutoff=0.6):
        """Find closest matching list value"""
        matches = get_close_matches(value, list_values, n=1, cutoff=cutoff)
        if matches:
            print(f"Fuzzy match: '{value}' -> '{matches[0]}'")
            return matches[0]
        return None

Getting Help

Log Information to Include

When reporting issues, include:

  1. System Information

    python --version
    uname -a
    pip list | grep requests
  2. Configuration (sanitized)

    # Remove sensitive data first
    sed 's/PASSWORD=.*/PASSWORD=***/' .env
  3. Error Logs

    # Last 50 lines of log
    tail -50 data_2_dasch.log
    
    # All error messages
    grep "ERROR" data_2_dasch.log
  4. Network Diagnostics

    curl -v https://api.dasch.swiss/health
    nslookup api.dasch.swiss

Creating Effective Bug Reports

Use this template:

## Bug Description
Brief description of the issue

## Steps to Reproduce
1. Step 1
2. Step 2
3. Error occurs

## Expected Behavior
What should happen

## Actual Behavior
What actually happens

## Environment
- Python version: 3.9.0
- OS: Ubuntu 20.04
- Network: Corporate/Home/University

## Configuration
(Sanitized .env contents)

## Logs
```

Error log contents here

```         

## Additional Context
Any other relevant information

Support Channels

  1. GitHub Issues: For bugs and feature requests
  2. Documentation: Check all documentation first
  3. Community: DSP community forums
  4. Email: info@stadtgeschichtebasel.ch, for sensitive security issues only

This troubleshooting guide should help resolve the most common issues encountered when using the omeka2dsp system.

Back to top