Inventory EDI Stack Overflow Fix

Problem Description

The inventory EDI system was experiencing "stack level too deep" exceptions when processing certain catalog items. This was caused by infinite recursion in the next_available method when processing kit items with circular references in their structure.

Root Cause

The issue occurred in the StoreItem#next_available method when processing kit items:

  1. Kit Processing: When a kit item calls next_available, it iterates through its kit components
  2. Recursive Calls: For each component, it calls next_available again
  3. Circular References: If there are circular relationships in the kit structure (e.g., Kit A contains Kit B, and Kit B contains Kit A), this causes infinite recursion
  4. Stack Overflow: Eventually, the call stack exceeds the system limit, causing a SystemStackError

Solution Implemented

1. Exception Handling and Batch Continuity

  • Individual Item Error Handling: Added try-catch blocks around individual catalog item processing in InventoryMessageProcessor
  • Batch Continuity: One bad item no longer stops the entire batch - errors are logged and processing continues
  • Enhanced Logging: Each error is logged to Rollbar with detailed context

2. Infinite Recursion Prevention

  • Depth-Limited Methods: Created next_available_with_depth_limit and next_available_by_warehouse_with_depth_limit methods
  • Maximum Depth: Set default maximum depth to 10 levels to prevent infinite recursion
  • Graceful Degradation: When depth limit is reached, the method returns nil instead of crashing

3. Enhanced Rollbar Context

  • Detailed Error Context: Added comprehensive context information including:
    • Catalog item ID, SKU, and index
    • Partner and orchestrator information
    • Batch size and position
    • Worker details (JID, host, timestamp)
    • Error type classification

Files Modified

Core Models

  • app/models/catalog_item.rb - Added depth-limited methods
  • app/models/store_item.rb - Added depth-limited methods
  • app/models/order.rb - Updated to use depth-limited methods

EDI Services

  • app/services/edi/mft_gateway/inventory_message_processor.rb - Added exception handling and depth-limited calls
  • app/services/edi/commercehub/inventory_message_processor.rb - Added exception handling and depth-limited calls
  • app/services/edi/walmart/inventory_message_processor.rb - Added exception handling and depth-limited calls
  • app/services/edi/base_orchestrator.rb - Enhanced error handling and context

Workers

  • app/workers/edi_inventory_flow_worker.rb - Enhanced Rollbar context

Tools

  • lib/tasks/identify_circular_kit_references.rake - Diagnostic tools
  • doc/inventory_edi_stack_overflow_fix.md - Complete documentation

Usage

Running the Fix

The fix is automatically applied when processing inventory flows. The system will:

  1. Continue Processing: Individual item failures won't stop the batch
  2. Log Errors: All errors are logged to Rollbar with detailed context
  3. Prevent Recursion: Depth-limited methods prevent stack overflow
  4. Provide Fallbacks: When recursion is detected, fallback values are used

Identifying Problematic Kits

Use the provided Rake tasks to identify kits with potential circular references:

# Identify circular references in kit structures
bundle exec rake kit:identify_circular_references

# Test next_available method for all kits
bundle exec rake kit:test_next_available

Monitoring

Monitor Rollbar for the following error types:

  • stack_level_too_deep - Infinite recursion detected
  • catalog_item_processing_error - General catalog item processing errors
  • orchestrator_execution_error - Orchestrator-level execution errors
  • inventory_flow_execution_error - Worker-level execution errors

Configuration

Depth Limits

The default maximum depth is set to 10 levels. This can be adjusted by modifying the max_depth parameter in the depth-limited methods:

# In inventory_message_processor.rb
next_available_by_warehouse = ci.next_available_by_warehouse_with_depth_limit(
  use_alternate_warehouse: true, 
  max_depth: 15  # Adjust as needed
)

Error Handling

Error handling can be customized by modifying the exception handling blocks in the processor methods.

Testing

Unit Tests

Test the depth-limited methods to ensure they prevent infinite recursion:

# Test that depth limit is respected
store_item = StoreItem.find(123)
result = store_item.next_available_with_depth_limit(max_depth: 5, current_depth: 0)
expect(result).to be_present # Should not cause stack overflow

Integration Tests

Test the full inventory flow to ensure errors are handled gracefully:

# Test that one bad item doesn't stop the batch
result = Edi::MftGateway::InventoryMessageProcessor.new.process
expect(result).to be_present # Should complete even with errors

Rollback Plan

If issues arise, the changes can be rolled back by:

  1. Reverting Method Calls: Change next_available_by_warehouse_with_depth_limit back to next_available_by_warehouse
  2. Removing Exception Handling: Remove the try-catch blocks around individual item processing
  3. Removing Depth-Limited Methods: Remove the new depth-limited methods from models

Future Improvements

1. Circular Reference Detection

Implement database-level constraints to prevent circular references from being created in the first place.

2. Performance Optimization

Cache next_available results to avoid recalculating the same values multiple times.

3. Monitoring and Alerting

Set up proactive monitoring to detect when depth limits are being reached, indicating potential data quality issues.

4. Data Validation

Add validation rules to prevent circular references during kit creation and updates.

Support

For questions or issues related to this fix:

  1. Check Rollbar logs for detailed error context
  2. Run the diagnostic Rake tasks to identify problematic kits
  3. Review the error handling logs for specific failure patterns
  4. Contact the development team with specific error details