Pipeline Performance
Optimization techniques and best practices for fast, efficient pipelines.
Performance Fundamentals
What Makes Pipelines Fast or Slow?
Fast pipelines:
- Small datasets (< 1MB)
- Simple transformations (filter, pick fields)
- Few steps (< 10)
- Efficient utilities
Slow pipelines:
- Large datasets (> 10MB)
- Complex transformations (aggregate, sort)
- Many steps (> 20)
- Inefficient utility combinations
Key insight: Pipeline execution time is mostly about data size and step complexity.
Optimization Strategies
1. Filter Early
β¨ Best practice: Reduce dataset size as early as possible.
Before (slow):
Input β Aggregate β Filter β OutputAggregates entire dataset, then filters. Wasteful.
After (fast):
Input β Filter β Aggregate β OutputFilters first, aggregates smaller dataset. Much faster.
Impact: Can reduce execution time by 50-90%.
2. Pick Fields First
Remove unnecessary fields early:
Input β Pick Fields β [process with fewer fields] β OutputWhy it helps:
- Fewer fields = less data to process
- Less memory usage
- Faster downstream operations
Impact: 20-40% faster for wide datasets (50+ fields).
3. Avoid Redundant Steps
Donβt process the same data multiple times:
β Bad:
Input β Filter Fields β Remove Fields β Sort β Output
(keeps A,B) (removes C) (sorts)
β
Good:
Input β Filter Fields β Sort β Output
(keeps A,B) (sorts)Why: Two field-filtering steps are redundant. Combine them.
4. Use Efficient Utilities
Some utilities are faster than others:
| Utility | Speed | Use Case |
|---|---|---|
| Pick Fields | Very fast | Keep specific fields |
| Filter Fields | Fast | Remove specific fields |
| Clean JSON | Fast | Basic cleanup |
| Format Values | Moderate | Type conversions |
| Aggregate | Slow | Calculations on arrays |
| Sort | Slow | Reorder arrays |
Tip: Use faster utilities when possible.
5. Split Large Pipelines
Break into multiple smaller pipelines:
Pipeline 1: Input β Filter β Clean β Output
Pipeline 2: [Pipeline 1 output] β Transform β Output
Pipeline 3: [Pipeline 2 output] β Aggregate β OutputBenefits:
- Easier to debug
- Can optimize each pipeline separately
- Easier to reuse components
Data Size Optimization
Estimate Execution Time
Rough estimates:
| Dataset Size | Simple Pipeline | Complex Pipeline |
|---|---|---|
| < 1MB | < 1 sec | 1-5 sec |
| 1-10MB | 1-5 sec | 5-30 sec |
| 10-50MB | 5-30 sec | 30-120 sec |
| > 50MB | 30+ sec | 2+ min |
Your mileage may vary based on step complexity.
Reduce Data Size
Strategies:
-
Sample data for development:
Input β Filter (take first 100 items) β [dev pipeline] -
Filter early:
Input β Filter (by date range) β [process recent data] -
Pick essential fields:
Input β Pick Fields (only what you need) β [process]
Utility-Specific Optimization
Aggregation
Aggregating large arrays is slow. Optimize:
1. Filter before aggregating:
β Input β Aggregate (sum of all) β Output
β
Input β Filter (by condition) β Aggregate (sum of filtered) β Output2. Pick fields before aggregating:
β Input β Aggregate (on all 50 fields) β Output
β
Input β Pick Fields (only 5 needed) β Aggregate β Output3. Use specific aggregation operations:
β Aggregate (get stats, then extract sum)
β
Aggregate (operation: sum) β fasterSorting
Sorting is slow for large arrays.
Optimizations:
-
Sort filtered data:
Input β Filter (reduce size) β Sort β Output -
Sort once, not multiple times:
β Input β Sort β [process] β Sort β Output β Input β Sort β [process] β Output
Find & Replace
Optimizing find/replace:
- Be specific β Narrow search scope
- Use case-sensitive β Faster than case-insensitive
- Limit scope β Use target paths when possible
Memory Optimization
Memory Usage Patterns
Per execution:
- Input data: 1x dataset size
- Each step output: 1x dataset size (temporarily)
- Peak usage: 2-3x dataset size
Example:
- 10MB input β ~20-30MB peak memory usage
Reduce Memory Usage
Strategies:
- Simplify pipeline β Fewer steps = less memory
- Filter early β Reduce data size sooner
- Avoid multiple outputs β Donβt branch if not needed
- Close other tabs β Free up browser memory
Execution Optimization
Worker Efficiency
Web Worker Architecture:
Main Thread (UI) ββ Worker (Execution) ββ Storage Worker (Persistence)Optimizations:
- Worker reuse β Worker stays alive, avoid recreation
- Efficient messaging β Minimize data transfer between threads
- Lazy loading β Load step outputs on-demand, not all at once
Step Execution Order
Topological sort determines execution order:
Input β A β C β Output
β B βExecution order: Input β A β (B, C in parallel) β Output
Optimization:
- Place expensive steps later in pipeline
- Only if they donβt affect upstream filtering
Caching Strategies
Step Output Caching
How it works:
- Step outputs stored during execution
- Loaded on-demand when viewed
- Cached in memory for fast access
- Cleared when pipeline changes
Benefits:
- Faster initial execution (no rendering overhead)
- Lower memory usage (only load viewed steps)
- Better UX for large pipelines
Manual cache control:
- Refresh page β clears cache
- Pipeline change β clears cache
- Can manually clear cache in DevTools
Browser Performance
Browser Differences
Fastest: Chrome, Edge (Blink engine)
- OPFS support (fastest storage)
- Efficient worker implementation
- Good performance
Moderate: Firefox (Gecko engine)
- No OPFS (uses IndexedDB)
- Slower storage
- Still good performance
Slowest: Safari (WebKit engine)
- No OPFS (uses IndexedDB)
- Slower worker performance
- Higher overhead
Mobile Performance
Mobile browsers are slower:
- Less memory available
- Slower JavaScript execution
- Limited storage
Optimizations for mobile:
- Use smaller datasets
- Simplify pipelines
- Use desktop for complex work
Monitoring Performance
Measure Execution Time
In the UI:
- Check step duration labels where available
- Look for slow steps
In the console:
// Enable performance logging
localStorage.debug = 'pipeline:*';Identify Bottlenecks
Find the slowest step:
- Run pipeline
- Review each completed stepβs duration
- Note execution times
- Focus on optimizing slowest step
Common bottlenecks:
- Aggregation on large arrays
- Sorting large arrays
- Complex transformations (restructure, compute)
Performance Testing
Test with Sample Data
Development workflow:
- Test small β Use 10-100 item sample
- Verify logic β Ensure correctness
- Test medium β Use 1K-10K items
- Measure performance β Check execution time
- Optimize if needed β Apply optimizations
- Test large β Use full dataset (if needed)
Benchmark Your Pipeline
Create performance baseline:
- Run pipeline with representative data
- Record execution time
- Document baseline performance
- Re-measure after changes
- Compare to baseline
Performance Trade-offs
Speed vs. Completeness
Faster (less complete):
- Fewer steps
- Simpler transformations
- Less data validation
Slower (more complete):
- More steps
- Complex transformations
- Thorough validation
Choose based on needs:
- Development β Faster is better
- Production β Completeness matters
Speed vs. Memory
Less memory (slower):
- Stream processing (future feature)
- Incremental loading
- Frequent garbage collection
More memory (faster):
- Load everything at once
- Cache intermediate results
- Less processing overhead
Advanced Optimization
Pipeline Parallelization
Future feature: Execute independent steps in parallel
Input β Filter β Sort β Output
β Aggregate βCurrently sequential, could be parallel in future.
Incremental Execution
Future feature: Only re-run changed steps
Benefit: Much faster iteration during development.
Streaming Execution
Future feature: Process data in chunks
Benefit: Start showing results before completion.
Performance Checklist
Use this checklist to optimize your pipelines:
Data Size:
- Dataset < 10MB for optimal performance
- Filter early to reduce size
- Pick fields early to reduce width
Pipeline Structure:
- Less than 20 steps
- No redundant steps
- Efficient utility combinations
- Simple before complex
Execution:
- Fast steps first, slow steps last
- Minimal branching
- No circular dependencies
Memory:
- Close unnecessary tabs
- Clear cache periodically
- Use desktop for complex pipelines
Next Steps
- Troubleshooting β Common issues
- Building Basics β Creating pipelines
- Execution β Running pipelines