Resilience Quick Start
This guide provides quick-start checklists and patterns for the two most common resilience scenarios: node restarts and retry delay strategies.
⚠️ RESILIENCE READINESS CHECKLIST
Before deploying any pipeline with resilience features, you MUST satisfy all three of these requirements:
- ResilientExecutionStrategy Configured: Node is wrapped with
ResilientExecutionStrategy() - MaxNodeRestartAttempts > 0: Set to a positive number (typically 2-3) in
PipelineRetryOptions - MaxMaterializedItems is Set: Configured to a positive bounded value (recommended: 100-1,000) - NEVER null
If ANY of these is missing or incorrect, your resilience configuration is silently incomplete and your pipeline will fail in production instead of recovering gracefully.
Use the NPipeline analyzers to catch these issues at build time:
- NP9001: Detects incomplete restart configuration (error severity recommended)
- NP9501: Detects unbounded materialization (error severity required)
Enable these in your .editorconfig to make them blocking errors in your build pipeline.
Node Restart - Quick Start Checklist
⚠️ CRITICAL: READ THIS BEFORE USING
PipelineErrorDecision.RestartNodeNode restart is a powerful resilience feature, but it requires three mandatory prerequisites. Missing even one silently disables the feature and your entire pipeline fails.
If you've experienced mysterious pipeline failures where restart seemed enabled but didn't work, one of these requirements was missing.
Good news: The NPipeline build-time analyzer (NP9001) automatically detects incomplete restart configurations at compile time. Enable NP9001 in your
.editorconfigto catch these issues before deployment.
The Three-Step Mandatory Checklist
STEP 1: Apply ResilientExecutionStrategy
Your node must be wrapped with ResilientExecutionStrategy. This enables the restart capability at the node level.
What it does: Allows the pipeline to restart the node when an error occurs.
Without it: Restart decisions are ignored; the node cannot recover.
How to configure:
var nodeHandle = builder
.AddTransform<MyTransform, Input, Output>("myNode")
.WithExecutionStrategy(
builder,
new ResilientExecutionStrategy(new SequentialExecutionStrategy())
);
Learn more: Execution Strategies
STEP 2: Configure Maximum Restart Attempts
Set MaxNodeRestartAttempts > 0 in PipelineRetryOptions. This tells the pipeline how many times to attempt restarting a failed node.
What it does: Limits how many restart attempts the pipeline will make before giving up.
Without it: No restarts will be attempted.
How to configure:
var options = new PipelineRetryOptions(
MaxItemRetries: 3,
MaxNodeRestartAttempts: 2, // ← At least 1
MaxMaterializedItems: 1000 // (see Step 3!)
);
var context = PipelineContext.WithRetry(options);
Recommended starting values:
- Transient failures (network, temporary service issues): 2-3 attempts
- Persistent issues (resource exhaustion): 3-5 attempts
- Critical nodes: 5+ attempts
Learn more: Retry Configuration
STEP 3: Enable Input Materialization (⚠️ CRITICAL)
Set MaxMaterializedItems to a non-null, positive number on the input to the node you want to be restartable. This is the replay buffer.
What it does: Buffers items from the input source so the node can be replayed from a known state if it fails.
🚨 CRITICAL ISSUE: If
MaxMaterializedItemsisnull(unbounded), the system silently falls back toFailPipeline, even if you've configured restart logic. Your entire pipeline crashes instead of just restarting the node.
How to configure:
var options = new PipelineRetryOptions(
MaxItemRetries: 3,
MaxNodeRestartAttempts: 2,
MaxMaterializedItems: 1000 // ← Must be set! Not null!
);
Choosing a value:
| Scenario | Recommended Value | Rationale |
|---|---|---|
| Small objects (< 100 bytes) | 5,000-10,000 | Minimal memory overhead |
| Medium objects (100-1KB) | 1,000-5,000 | Balanced buffer size |
| Large objects (1KB+) | 500-1,000 | Respect memory constraints |
| Critical/high-priority node | 2,000-5,000 | More replay buffer = higher restart success rate |
Conservative Starting Point
Here's a good default configuration:
var options = new PipelineRetryOptions(
MaxItemRetries: 3,
MaxNodeRestartAttempts: 2,
MaxMaterializedItems: 1000 // ← Good default: buffers ~1000 items
);
Learn more: Materialization and Buffering
⚠️ CRITICAL WARNING: Unbounded Materialization
Never set MaxMaterializedItems to null:
// WRONG - This disables restart silently!
var options = new PipelineRetryOptions(
MaxItemRetries: 3,
MaxNodeRestartAttempts: 2,
MaxMaterializedItems: null // ← DANGER!
);
If you do:
- Your pipeline will NOT restart on failures
- The system silently falls back to
FailPipeline - Your entire pipeline will crash (not just the failing node)
- Risk of Out-of-Memory exceptions with unbounded data streams
Symptom: You've configured restart logic, but when an error occurs, the pipeline fails completely instead of restarting the node.
Why Unbounded Memory Buffers Break Resilience Guarantees
Unbounded materialization (MaxMaterializedItems: null) creates a fundamental contradiction in the resilience model:
-
Memory Safety vs. Recovery Trade-off: Unbounded buffers can consume all available memory, causing OutOfMemoryException that cannot be recovered from. This defeats the purpose of resilience.
-
Silent Failure Mode: When the system detects unbounded materialization with a RestartNode decision, it cannot safely buffer items for replay. Instead of risking memory exhaustion, it silently falls back to
FailPipelineto protect the system. -
Unpredictable Behavior: In production, unbounded buffers lead to unpredictable memory usage patterns that can cause cascading failures across the entire system.
-
Resource Contention: Unbounded buffers compete with other processes for memory, potentially causing system-wide instability.
The Design Philosophy: NPipeline prioritizes system stability over incomplete recovery. An unbounded buffer represents an undefined recovery boundary, making safe restart impossible. By requiring explicit buffer limits, NPipeline ensures that restart operations have predictable memory footprints and can be safely executed.
Choosing Not to Set a Memory Cap = Choosing Complete Pipeline Failure
When you set MaxMaterializedItems: null, you are making an explicit choice to sacrifice restart capability in favor of unlimited buffering. This means:
- You accept that RestartNode will not work
- You accept that your pipeline will fail completely on node errors
- You accept the risk of OutOfMemoryException
If you need node restart functionality, you must set a memory cap. The system cannot provide resilience guarantees without defined resource boundaries.
Complete Configuration Example
Here's a complete example with all three requirements:
// Step 1: Create retry options with ALL three settings
var options = new PipelineRetryOptions(
MaxItemRetries: 3,
MaxNodeRestartAttempts: 2, // ← Step 2: Restart attempts
MaxMaterializedItems: 1000 // ← Step 3: Replay buffer
);
var context = PipelineContext.WithRetry(options);
// Step 2: Build pipeline with resilient nodes
var definition = new MyPipelineDefinition();
definition.Define(builder, context);
// In MyPipelineDefinition.Define():
//
// Step 1: Wrap node with ResilientExecutionStrategy
// var nodeHandle = builder
// .AddTransform<MyTransform, Input, Output>("risky")
// .WithExecutionStrategy(
// builder,
// new ResilientExecutionStrategy(
// new SequentialExecutionStrategy()
// )
// );
//
// Add error handler that returns RestartNode
// builder.AddPipelineErrorHandler<MyErrorHandler>();
//
// In MyErrorHandler.HandleNodeFailureAsync():
//
// return error switch
// {
// TimeoutException => PipelineErrorDecision.RestartNode,
// _ => PipelineErrorDecision.FailPipeline
// };
Retry Delay Quick Start
Retry delays are essential for building resilient pipelines that can recover from transient failures without overwhelming downstream services.
Quick Start: Common Retry Patterns
Web API Calls (Most Common)
// Recommended for external API calls
context.UseExponentialBackoffDelay(
baseDelay: TimeSpan.FromSeconds(1),
multiplier: 2.0,
maxDelay: TimeSpan.FromMinutes(1));
Why this pattern:
- Exponential backoff gives services time to recover
- Prevents thundering herd problems
- Balanced for typical API rate limits
Database Operations
// Recommended for database retry scenarios
context.UseLinearBackoffDelay(
baseDelay: TimeSpan.FromMilliseconds(100),
increment: TimeSpan.FromMilliseconds(200),
maxDelay: TimeSpan.FromSeconds(5));
Why this pattern:
- Linear growth is predictable for database connection pools
- Shorter delays work well for transient lock contention
- Conservative max delay prevents long-running transactions
File Processing
// Recommended for file system operations
context.UseFixedDelay(TimeSpan.FromSeconds(2));
Why this pattern:
- File system recovery is typically immediate
- Fixed delay provides predictable behavior
- Simple and effective for I/O operations
Decision Flow for Choosing Retry Strategy
Implementation Examples
Basic Pipeline with Retry Delays
using NPipeline;
using NPipeline.Pipeline;
public sealed class RetryQuickstartPipeline : IPipelineDefinition
{
public void Define(PipelineBuilder builder, PipelineContext context)
{
// Configure retry delay strategy based on operation type
context.UseExponentialBackoffDelay(
baseDelay: TimeSpan.FromSeconds(1),
multiplier: 2.0,
maxDelay: TimeSpan.FromMinutes(1));
var source = builder.AddSource<ApiSource, ApiResponse>("api-source");
var transform = builder.AddTransform<DataTransform, ApiResponse, ProcessedData>("transform");
var sink = builder.AddSink<DataSink, ProcessedData>("sink");
builder.Connect(source, transform);
builder.Connect(transform, sink);
// Configure retry options
builder.WithRetryOptions(new PipelineRetryOptions(
MaxItemRetries: 3,
MaxNodeRestartAttempts: 2,
MaxSequentialNodeAttempts: 5
));
}
}
Per-Node Retry Configuration
public void Define(PipelineBuilder builder, PipelineContext context)
{
var source = builder.AddSource<DataSource, string>("source");
// Fast retries for in-memory operations
context.UseFixedDelay(TimeSpan.FromMilliseconds(50));
var memoryTransform = builder.AddTransform<MemoryTransform, string, string>("memory-ops");
// Slower retries for external API calls
context.UseExponentialBackoffDelay(
baseDelay: TimeSpan.FromSeconds(1),
multiplier: 2.0,
maxDelay: TimeSpan.FromMinutes(1));
var apiTransform = builder.AddTransform<ApiTransform, string, string>("api-ops");
// Default retries for other operations
var sink = builder.AddSink<DataSink, string>("sink");
builder.Connect(source, memoryTransform);
builder.Connect(memoryTransform, apiTransform);
builder.Connect(apiTransform, sink);
}
Common Pitfalls to Avoid
⚠️ Common Mistake: Not configuring retry delays at all
Without retry delays, failed operations retry immediately, which can overwhelm recovering services.
⚠️ Common Mistake: Using exponential backoff for in-memory operations
In-memory operations typically recover immediately, so exponential backoff adds unnecessary delays.
⚠️ Common Mistake: Setting max delay too high
Very high max delays can cause long recovery times. Consider your SLA requirements.
Pro Tip: Always test retry behavior in development
Use fixed delays in tests for predictable behavior, then switch to appropriate backoff in production.
Next Steps
- Error Handling - Comprehensive error handling strategies
- Retry Configuration - Detailed configuration options and advanced patterns
- Circuit Breakers - Preventing cascading failures
- Troubleshooting - Common issues and solutions