Troubleshooting Guide
Common issues and solutions for the omeka2dsp system.
Quick Diagnostics
System Health Check
Run this comprehensive health check to identify common issues:
#!/bin/bash
# health_check.sh
echo "=== omeka2dsp Health Check ==="
# Check Python version
echo "Python version:"
python --version
# Check required modules
echo -e "\nChecking Python modules:"
python -c "import requests; print('✓ requests module available')" || echo "✗ requests module missing"
python -c "import json; print('✓ json module available')" || echo "✗ json module missing"
# Check environment file
echo -e "\nChecking configuration:"
if [ -f ".env" ]; then
echo "✓ .env file exists"
# Check required variables
source .env
[ -n "$OMEKA_API_URL" ] && echo "✓ OMEKA_API_URL set" || echo "✗ OMEKA_API_URL missing"
[ -n "$DSP_USER" ] && echo "✓ DSP_USER set" || echo "✗ DSP_USER missing"
[ -n "$PROJECT_SHORT_CODE" ] && echo "✓ PROJECT_SHORT_CODE set" || echo "✗ PROJECT_SHORT_CODE missing"
else
echo "✗ .env file missing"
fi
# Check network connectivity
echo -e "\nChecking network connectivity:"
curl -s --connect-timeout 5 https://api.dasch.swiss/health > /dev/null && echo "✓ DSP API reachable" || echo "✗ DSP API not reachable"
# Check directory structure
echo -e "\nChecking directory structure:"
[ -d "data" ] && echo "✓ data directory exists" || echo "✗ data directory missing"
[ -d "scripts" ] && echo "✓ scripts directory exists" || echo "✗ scripts directory missing"
# Check permissions
echo -e "\nChecking permissions:"
[ -w "." ] && echo "✓ Current directory writable" || echo "✗ Current directory not writable"
[ -w "data" ] && echo "✓ Data directory writable" || echo "✗ Data directory not writable"
echo -e "\n=== Health Check Complete ==="
Run the health check:
chmod +x health_check.sh
./health_check.sh
Log Analysis Script
Quickly analyze log files for common issues:
#!/bin/bash
# analyze_logs.sh
echo "=== Log Analysis ==="
if [ -f "data_2_dasch.log" ]; then
echo "Errors found:"
grep -c "ERROR" data_2_dasch.log
echo -e "\nMost recent errors:"
grep "ERROR" data_2_dasch.log | tail -5
echo -e "\nSuccess statistics:"
echo "Items processed: $(grep -c 'Processing item:' data_2_dasch.log)"
echo "Resources created: $(grep -c 'Resource created successfully' data_2_dasch.log)"
echo "Resources updated: $(grep -c 'Resource updated successfully' data_2_dasch.log)"
echo -e "\nCommon issues:"
grep -c "Authentication failed" data_2_dasch.log && echo "- Authentication failures detected"
grep -c "Network timeout" data_2_dasch.log && echo "- Network timeout issues detected"
grep -c "File not found" data_2_dasch.log && echo "- Missing file issues detected"
else
echo "No log file found (data_2_dasch.log)"
fi
Authentication Issues
Issue: “Authentication failed” or “Login unsuccessful”
Symptoms:
Diagnostic Steps:
Verify Credentials
# Test credentials manually python -c " import os, requests response = requests.post('$API_HOST/v2/authentication', json={'email': '$DSP_USER', 'password': '$DSP_PWD'}) print(f'Status: {response.status_code}') print(f'Response: {response.text}') "
Check Account Status
- Verify account is not locked
- Confirm account has necessary permissions
- Check if password has expired
Solutions:
Problem | Solution |
---|---|
Invalid credentials | Update .env with correct username/password |
Account locked | Contact DSP administrator to unlock |
Password expired | Reset password through DSP interface |
Insufficient permissions | Request project access from administrator |
Wrong API endpoint | Verify API_HOST in configuration |
Issue: “Token expired” during migration
Symptoms:
Solutions:
Increase Token Lifetime (if possible)
# Add token refresh logic to data_2_dasch.py def refresh_token_if_needed(token): # Check token expiration # Re-authenticate if necessary pass
Implement Auto-Retry
# Add retry with re-authentication def api_call_with_retry(url, token, max_retries=3): for attempt in range(max_retries): try: = requests.get(url, headers={'Authorization': f'Bearer {token}'}) response if response.status_code == 401 and attempt < max_retries - 1: = login(DSP_USER, DSP_PWD) # Re-authenticate token continue return response except Exception as e: if attempt == max_retries - 1: raise return None
Network and Connectivity
Issue: Connection timeouts or network errors
Symptoms:
Diagnostic Steps:
Test Basic Connectivity
# Test DSP API curl -v https://api.dasch.swiss/health # Test Omeka API curl -v "$OMEKA_API_URL" # Check DNS resolution nslookup api.dasch.swiss
Test API Endpoints
# Test specific endpoints curl -X POST https://api.dasch.swiss/v2/authentication \ -H "Content-Type: application/json" \ -d '{"email":"test","password":"test"}'
Solutions:
Problem | Solution |
---|---|
Firewall blocking | Configure firewall to allow HTTPS (443) |
Proxy issues | Set HTTP_PROXY and HTTPS_PROXY variables |
DNS issues | Use IP addresses or different DNS servers |
SSL certificate problems | Update certificates or bypass verification (testing only) |
Network instability | Increase timeout values, implement retry logic |
Issue: SSL Certificate Verification Failed
Symptoms:
Solutions:
Update Certificates
# Update system certificates sudo apt-get update && sudo apt-get install ca-certificates # Or on macOS brew install ca-certificates
Temporary Bypass (development only)
# Add SSL verification bypass import requests requests.packages.urllib3.disable_warnings() # In API calls, add verify=False = requests.get(url, verify=False) response
Data Processing Errors
Issue: “KeyError” or missing property errors
Symptoms:
Diagnostic Steps:
Inspect Problematic Item
# Debug specific item from scripts.process_data_from_omeka import get_items_from_collection import json = get_items_from_collection('10780') # SGB Item Set ID items = items[0] # Adjust index as needed problem_item print(json.dumps(problem_item, indent=2))
Check Property Mappings
# Verify property structure = get_problem_item() item print("Available properties:") for key in item.keys(): if key.startswith('dcterms:'): print(f" {key}: {item[key]}")
Solutions:
Add Null Checks
# Defensive programming def safe_extract_property(item, property_name, property_id): = item.get(property_name, []) props if not props: return "" return extract_property(props, property_id)
Update Property Mappings
# Handle different property structures def flexible_property_extract(item, property_names, property_id): for prop_name in property_names: if prop_name in item and item[prop_name]: return extract_property(item[prop_name], property_id) return ""
Issue: Invalid payload or validation errors
Symptoms:
Solutions:
Payload Validation
def validate_payload(payload): = ['@context', '@type', 'rdfs:label'] required_fields for field in required_fields: if field not in payload: raise ValueError(f"Missing required field: {field}") # Validate field types if not isinstance(payload.get('rdfs:label'), str): raise ValueError("rdfs:label must be a string") return True
Schema Compliance
# Ensure payload matches DSP expectations def ensure_dsp_compliance(payload): # Remove empty fields = {k: v for k, v in payload.items() if v} cleaned_payload # Ensure required structure if '@context' not in cleaned_payload: '@context'] = get_default_context() cleaned_payload[ return cleaned_payload
File Upload Problems
Issue: Media files fail to upload
Symptoms:
Diagnostic Steps:
Test File Accessibility
# Test file URL directly curl -I "https://omeka.unibe.ch/files/original/98d8559515187ec4a710347c7b9e6cda0bdd58d2.tif" # Check file size curl -sI "URL" | grep -i content-length
Test Upload Process
# Test file download and upload separately import requests # Download test = requests.get(file_url) response print(f"Download status: {response.status_code}") print(f"Content length: {len(response.content)}") # Upload test = {'file': ('test.jpg', response.content, 'image/jpeg')} files = requests.post(upload_url, files=files, headers=headers) upload_response print(f"Upload status: {upload_response.status_code}")
Solutions:
Problem | Solution |
---|---|
File not accessible | Check Omeka permissions and URL |
File too large | Implement file compression or chunked upload |
Timeout during upload | Increase timeout, implement retry logic |
Unsupported format | Add format conversion or skip unsupported files |
Insufficient storage | Clean up temporary files, check disk space |
Issue: File format not supported
Solutions:
Add Format Support
def specify_mediaclass(media_type: str) -> str: """Enhanced media type detection""" = { type_mapping 'image/jpeg': f'{PREFIX}sgb_MEDIA_IMAGE', 'image/png': f'{PREFIX}sgb_MEDIA_IMAGE', 'image/gif': f'{PREFIX}sgb_MEDIA_IMAGE', 'image/tiff': f'{PREFIX}sgb_MEDIA_IMAGE', 'application/pdf': f'{PREFIX}sgb_MEDIA_ARCHIV', 'text/plain': f'{PREFIX}sgb_MEDIA_ARCHIV', 'audio/mpeg': f'{PREFIX}sgb_MEDIA_AUDIO', # New 'video/mp4': f'{PREFIX}sgb_MEDIA_VIDEO' # New } return type_mapping.get(media_type, f'{PREFIX}sgb_MEDIA_ARCHIV')
Format Conversion
from PIL import Image def convert_image_format(file_path, target_format='JPEG'): """Convert image to supported format""" with Image.open(file_path) as img: if img.format != target_format: = file_path.replace(img.format.lower(), target_format.lower()) converted_path 'RGB').save(converted_path, target_format) img.convert(return converted_path return file_path
Performance Issues
Issue: Migration running very slowly
Symptoms:
Diagnostic Steps:
Performance Monitoring
# Monitor system resources htop # Monitor network usage iftop # Check disk I/O iostat -x 1 # Monitor Python process py-spy top --pid $(pgrep -f data_2_dasch.py)
Profile the Code
import cProfile import pstats # Profile the migration 'main()', 'migration_profile.prof') cProfile.run( # Analyze results = pstats.Stats('migration_profile.prof') stats 'cumulative').print_stats(10) stats.sort_stats(
Solutions:
Batch Processing
def process_items_in_batches(items, batch_size=50): """Process items in smaller batches""" for i in range(0, len(items), batch_size): = items[i:i+batch_size] batch for item in batch: process_item(item) # Small delay between batches 1) time.sleep( # Memory cleanup import gc gc.collect()
Optimize API Calls
import requests from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry def create_optimized_session(): = requests.Session() session # Connection pooling = HTTPAdapter( adapter =10, pool_connections=20, pool_maxsize=Retry( max_retries=3, total=0.3, backoff_factor=[500, 502, 504] status_forcelist ) ) 'http://', adapter) session.mount('https://', adapter) session.mount(return session
Configuration Problems
Issue: Environment variables not loading
Symptoms:
Solutions:
Verify .env File
# Check file exists and has content ls -la .env cat .env # Check for hidden characters hexdump -C .env | head
Load Environment Explicitly
# Add explicit environment loading from dotenv import load_dotenv import os # Load .env file load_dotenv() # Verify variables = ['OMEKA_API_URL', 'DSP_USER', 'PROJECT_SHORT_CODE'] required_vars for var in required_vars: if not os.getenv(var): raise EnvironmentError(f"Required environment variable {var} not set")
Issue: Incorrect API endpoints
Symptoms:
Solutions:
Endpoint Validation
def validate_endpoints(): """Validate API endpoints are reachable""" = { endpoints 'DSP API': f"{API_HOST}/health", 'Omeka API': f"{OMEKA_API_URL}items?per_page=1" } for name, url in endpoints.items(): try: = requests.get(url, timeout=10) response print(f"✓ {name}: {response.status_code}") except Exception as e: print(f"✗ {name}: {e}")
Dynamic Endpoint Discovery
def discover_api_version(): """Discover correct API version""" = API_HOST.rstrip('/') base_url = ['/v2', '/v1', ''] versions for version in versions: try: = f"{base_url}{version}/health" url = requests.get(url, timeout=5) response if response.status_code == 200: return f"{base_url}{version}" except: continue raise Exception("No valid API endpoint found")
DSP-Specific Issues
Issue: “Project not found” or invalid project shortcode
Solutions:
Verify Project Information
# Test project endpoint curl -H "Authorization: Bearer $TOKEN" \ "$API_HOST/admin/projects/shortcode/$PROJECT_SHORT_CODE"
List Available Projects
def list_available_projects(token): """List projects user has access to""" = requests.get( response f"{API_HOST}/admin/projects", ={'Authorization': f'Bearer {token}'} headers ) if response.status_code == 200: = response.json().get('projects', []) projects for project in projects: print(f"Shortcode: {project['shortcode']}, Name: {project['shortname']}") else: print(f"Failed to fetch projects: {response.status_code}")
Issue: List value mapping failures
Symptoms:
Solutions:
Debug List Mappings
def debug_list_mappings(lists, search_value): """Debug list value mappings""" print(f"Searching for: {search_value}") for list_obj in lists: = list_obj['listinfo']['name'] list_name print(f"\nList: {list_name}") = list_obj['list']['children'] nodes for node in nodes: = node['labels'] labels for label in labels: if search_value.lower() in label['value'].lower(): print(f" Found match: {label['value']} -> {node['id']}")
Fuzzy Matching
from difflib import get_close_matches def find_closest_list_value(value, list_values, cutoff=0.6): """Find closest matching list value""" = get_close_matches(value, list_values, n=1, cutoff=cutoff) matches if matches: print(f"Fuzzy match: '{value}' -> '{matches[0]}'") return matches[0] return None
Getting Help
Log Information to Include
When reporting issues, include:
System Information
python --version uname -a pip list | grep requests
Configuration (sanitized)
# Remove sensitive data first sed 's/PASSWORD=.*/PASSWORD=***/' .env
Error Logs
# Last 50 lines of log tail -50 data_2_dasch.log # All error messages grep "ERROR" data_2_dasch.log
Network Diagnostics
curl -v https://api.dasch.swiss/health nslookup api.dasch.swiss
Creating Effective Bug Reports
Use this template:
## Bug Description
Brief description of the issue
## Steps to Reproduce
1. Step 1
2. Step 2
3. Error occurs
## Expected Behavior
What should happen
## Actual Behavior
What actually happens
## Environment
- Python version: 3.9.0
- OS: Ubuntu 20.04
- Network: Corporate/Home/University
## Configuration
(Sanitized .env contents)
## Logs
```
Error log contents here
```
## Additional Context
Any other relevant information
Support Channels
- GitHub Issues: For bugs and feature requests
- Documentation: Check all documentation first
- Community: DSP community forums
- Email: info@stadtgeschichtebasel.ch, for sensitive security issues only
This troubleshooting guide should help resolve the most common issues encountered when using the omeka2dsp system.