Skip to main content

ADR-003: Comprehensive Error Handling Strategy

Statusโ€‹

Accepted - Implemented across all layers with production-grade resilience and monitoring

Contextโ€‹

A robust error handling strategy was needed to:

  • Prevent cascading failures and system crashes
  • Provide consistent error reporting and user feedback
  • Enable comprehensive debugging and monitoring
  • Maintain application stability under error conditions
  • Preserve error context and stack traces for troubleshooting
  • Support production monitoring and alerting
  • Handle race conditions and concurrent operation failures

Decisionโ€‹

We will implement a comprehensive multi-layered error handling strategy with shared utilities, consistent patterns, and production-grade resilience across frontend and backend.

1. Enhanced Shared Error Handling Utilityโ€‹

The withErrorHandling() function provides unified error handling with context-aware overloads:

// Frontend usage with store integration
await withErrorHandling(async () => {
return await performAsyncOperation();
}, errorStore);

// Backend usage with logger integration and operation naming
await withErrorHandling(
async () => {
return await performAsyncOperation();
},
{
logger,
operationName: "database-operation",
correlationId: "op-12345",
}
);

// Utility operations with minimal overhead
await withUtilityErrorHandling(
async () => doUtilityWork(),
"utility-operation",
undefined,
false // silent mode for non-critical operations
);

2. Production-Grade Operational Hooksโ€‹

All database operations use withDatabaseOperation() which provides:

  • Exponential backoff retry logic with jitter
  • Event emission for operation lifecycle and monitoring
  • Consistent error handling across all database operations
  • Performance monitoring and comprehensive logging
  • Correlation tracking for distributed debugging
  • Circuit breaker pattern for failing operations
return withDatabaseOperation(
async () => this.databaseService.executeTransaction(operation),
"SiteRepository.deleteAll"
);

3. Frontend Store Error Protectionโ€‹

Frontend stores implement safe error handling to prevent UI crashes:

function safeStoreOperation(storeOperation: () => void, operationName: string) {
try {
storeOperation();
} catch (error) {
console.warn("Store operation failed for:", operationName, error);
}
}

4. Error Preservation Principleโ€‹

All error handling utilities preserve original errors:

  • Stack traces are maintained
  • Error types are preserved
  • Error properties remain intact
  • Re-throwing after logging/handling

Error Handling Layersโ€‹

Layer 1: Utility Levelโ€‹

// Frontend utilities
export async function withUtilityErrorHandling<T>(
operation: () => Promise<T>,
operationName: string,
fallbackValue?: T
): Promise<T> {
try {
return await operation();
} catch (error) {
logger.error(`${operationName} failed`, error);
if (fallbackValue !== undefined) {
return fallbackValue;
}
throw error;
}
}

Layer 2: Repository Levelโ€‹

// Database operations with transaction safety
public async deleteAll(): Promise<void> {
return withDatabaseOperation(async () => {
return this.databaseService.executeTransaction((db) => {
this.deleteAllInternal(db);
return Promise.resolve();
});
}, "Repository.deleteAll");
}

Layer 3: Service Levelโ€‹

// Service operations with event emission
async performOperation() {
try {
const result = await this.repository.operation();
await this.eventBus.emitTyped('operation:completed', { result });
return result;
} catch (error) {
await this.eventBus.emitTyped('operation:failed', { error: error.message });
throw error;
}
}

Layer 4: UI Levelโ€‹

// Frontend operations with store integration
const handleAction = async () => {
await withErrorHandling(async () => {
const result = await window.electronAPI.sites.addSite(siteData);
// Success handling
return result;
}, errorStore);
};

Error Categories and Handlingโ€‹

1. Database Errorsโ€‹

  • Transaction rollback on failure
  • Retry logic for transient failures
  • Event emission for monitoring
  • Structured logging with operation context

2. Network Errorsโ€‹

  • Timeout handling with configurable limits
  • Retry strategies based on error type
  • Fallback mechanisms for offline scenarios
  • Connection state tracking

3. Validation Errorsโ€‹

  • Type-safe validation at boundaries
  • User-friendly error messages
  • Field-specific error reporting
  • Prevention of invalid state propagation

4. UI Errorsโ€‹

  • Error boundaries for component isolation
  • Graceful degradation with fallback UI
  • User notification without technical details
  • State recovery mechanisms

Monitoring and Observabilityโ€‹

Event-Driven Error Trackingโ€‹

// Automatic error event emission
await eventBus.emitTyped("database:error", {
operation: "query",
error: error.message,
correlationId: generateId(),
timestamp: Date.now(),
});

Advanced Memory Safety and Resource Managementโ€‹

Error handling utilities ensure proper resource cleanup:

// Automatic cleanup in event handlers
const cleanup = window.electronAPI.events.onMonitorStatusChanged((data) => {
try {
handleStatusChange(data);
} catch (error) {
logger.error("Status change handler failed", error);
// Handler failure doesn't affect cleanup
}
});

// Cleanup always called even if handler throws
useEffect(() => cleanup, []);

Race Condition Protectionโ€‹

// Operation correlation prevents race conditions
const operationId = this.operationRegistry.initiateCheck(monitorId);
try {
const result = await performCheck();
// Validate operation still active before updating state
if (this.operationRegistry.validateOperation(operationId)) {
await updateMonitorStatus(result);
}
} finally {
this.operationRegistry.completeOperation(operationId);
}

Correlation ID Trackingโ€‹

All operations include correlation IDs for distributed tracing:

const correlationId = generateCorrelationId();
await this.eventBus.emitTyped("operation:started", {
operationId,
correlationId,
timestamp: Date.now(),
});

Production Monitoring Integrationโ€‹

Operations include comprehensive metrics for observability:

const startTime = performance.now();
try {
const result = await operation();
metrics.recordSuccess(operationName, performance.now() - startTime);
return result;
} catch (error) {
metrics.recordFailure(operationName, error.constructor.name);
throw error;
}

Consequencesโ€‹

Positiveโ€‹

  • Enhanced system stability - Errors don't cascade or crash the application
  • Superior debugging capability - Rich error context and correlation tracking
  • Optimal user experience - Graceful error handling with appropriate messaging
  • Comprehensive monitoring - Error tracking, metrics, and observability
  • Excellent maintainability - Consistent error handling patterns across all layers
  • Memory safety - Proper resource cleanup and leak prevention
  • Race condition immunity - Operation correlation prevents state corruption
  • Production readiness - Circuit breakers and retry mechanisms

Negativeโ€‹

  • Moderate complexity increase - Multiple error handling layers require understanding
  • Minimal performance overhead - Error handling adds negligible processing time
  • Learning curve - Developers need to understand comprehensive error handling patterns
  • Debugging complexity - Rich error context requires proper tooling to interpret

Quality Assuranceโ€‹

Memory Managementโ€‹

  • Automatic cleanup: All error handlers ensure resource cleanup
  • Event listener management: Cleanup functions prevent memory leaks
  • Resource disposal: Failed operations properly dispose of allocated resources

Concurrency Safetyโ€‹

  • Operation correlation: Prevents race conditions in async operations
  • State validation: Operations validate state before making changes
  • Atomic operations: Critical sections use proper synchronization

Production Monitoringโ€‹

  • Error classification: Errors categorized by severity and type
  • Metric collection: Performance and failure metrics for alerting
  • Distributed tracing: Correlation IDs enable cross-service debugging

Implementation Guidelinesโ€‹

1. Always Preserve Errorsโ€‹

// โœ… Good - preserves original error
try {
return await operation();
} catch (error) {
logger.error("Operation failed", error);
throw error; // Re-throw original error
}

// โŒ Bad - loses error context
try {
return await operation();
} catch (error) {
throw new Error("Operation failed"); // Loses original error
}

2. Use Appropriate Error Handling Levelโ€‹

  • Utilities: withUtilityErrorHandling()
  • Database: withDatabaseOperation()
  • Frontend: withErrorHandling() with store
  • Backend: withErrorHandling() with logger

3. Emit Events for Failuresโ€‹

All significant operations should emit failure events for monitoring.

Complianceโ€‹

All layers implement this error handling strategy:

  • Repository operations use withDatabaseOperation()
  • Frontend operations use withErrorHandling() with stores
  • Utilities provide fallback mechanisms
  • Services emit error events