Skip to Content

Pipeline Performance

Optimization techniques and best practices for fast, efficient pipelines.


Performance Fundamentals

What Makes Pipelines Fast or Slow?

Fast pipelines:

  • Small datasets (< 1MB)
  • Simple transformations (filter, pick fields)
  • Few steps (< 10)
  • Efficient utilities

Slow pipelines:

  • Large datasets (> 10MB)
  • Complex transformations (aggregate, sort)
  • Many steps (> 20)
  • Inefficient utility combinations

Key insight: Pipeline execution time is mostly about data size and step complexity.


Optimization Strategies

1. Filter Early

✨ Best practice: Reduce dataset size as early as possible.

Before (slow):

Input β†’ Aggregate β†’ Filter β†’ Output

Aggregates entire dataset, then filters. Wasteful.

After (fast):

Input β†’ Filter β†’ Aggregate β†’ Output

Filters first, aggregates smaller dataset. Much faster.

Impact: Can reduce execution time by 50-90%.


2. Pick Fields First

Remove unnecessary fields early:

Input β†’ Pick Fields β†’ [process with fewer fields] β†’ Output

Why it helps:

  • Fewer fields = less data to process
  • Less memory usage
  • Faster downstream operations

Impact: 20-40% faster for wide datasets (50+ fields).


3. Avoid Redundant Steps

Don’t process the same data multiple times:

❌ Bad: Input β†’ Filter Fields β†’ Remove Fields β†’ Sort β†’ Output (keeps A,B) (removes C) (sorts) βœ… Good: Input β†’ Filter Fields β†’ Sort β†’ Output (keeps A,B) (sorts)

Why: Two field-filtering steps are redundant. Combine them.


4. Use Efficient Utilities

Some utilities are faster than others:

UtilitySpeedUse Case
Pick FieldsVery fastKeep specific fields
Filter FieldsFastRemove specific fields
Clean JSONFastBasic cleanup
Format ValuesModerateType conversions
AggregateSlowCalculations on arrays
SortSlowReorder arrays

Tip: Use faster utilities when possible.


5. Split Large Pipelines

Break into multiple smaller pipelines:

Pipeline 1: Input β†’ Filter β†’ Clean β†’ Output Pipeline 2: [Pipeline 1 output] β†’ Transform β†’ Output Pipeline 3: [Pipeline 2 output] β†’ Aggregate β†’ Output

Benefits:

  • Easier to debug
  • Can optimize each pipeline separately
  • Easier to reuse components

Data Size Optimization

Estimate Execution Time

Rough estimates:

Dataset SizeSimple PipelineComplex Pipeline
< 1MB< 1 sec1-5 sec
1-10MB1-5 sec5-30 sec
10-50MB5-30 sec30-120 sec
> 50MB30+ sec2+ min

Your mileage may vary based on step complexity.

Reduce Data Size

Strategies:

  1. Sample data for development:

    Input β†’ Filter (take first 100 items) β†’ [dev pipeline]
  2. Filter early:

    Input β†’ Filter (by date range) β†’ [process recent data]
  3. Pick essential fields:

    Input β†’ Pick Fields (only what you need) β†’ [process]

Utility-Specific Optimization

Aggregation

Aggregating large arrays is slow. Optimize:

1. Filter before aggregating:

❌ Input β†’ Aggregate (sum of all) β†’ Output βœ… Input β†’ Filter (by condition) β†’ Aggregate (sum of filtered) β†’ Output

2. Pick fields before aggregating:

❌ Input β†’ Aggregate (on all 50 fields) β†’ Output βœ… Input β†’ Pick Fields (only 5 needed) β†’ Aggregate β†’ Output

3. Use specific aggregation operations:

❌ Aggregate (get stats, then extract sum) βœ… Aggregate (operation: sum) β€” faster

Sorting

Sorting is slow for large arrays.

Optimizations:

  1. Sort filtered data:

    Input β†’ Filter (reduce size) β†’ Sort β†’ Output
  2. Sort once, not multiple times:

    ❌ Input β†’ Sort β†’ [process] β†’ Sort β†’ Output βœ… Input β†’ Sort β†’ [process] β†’ Output

Find & Replace

Optimizing find/replace:

  1. Be specific β€” Narrow search scope
  2. Use case-sensitive β€” Faster than case-insensitive
  3. Limit scope β€” Use target paths when possible

Memory Optimization

Memory Usage Patterns

Per execution:

  • Input data: 1x dataset size
  • Each step output: 1x dataset size (temporarily)
  • Peak usage: 2-3x dataset size

Example:

  • 10MB input β†’ ~20-30MB peak memory usage

Reduce Memory Usage

Strategies:

  1. Simplify pipeline β€” Fewer steps = less memory
  2. Filter early β€” Reduce data size sooner
  3. Avoid multiple outputs β€” Don’t branch if not needed
  4. Close other tabs β€” Free up browser memory

Execution Optimization

Worker Efficiency

Web Worker Architecture:

Main Thread (UI) ←→ Worker (Execution) ←→ Storage Worker (Persistence)

Optimizations:

  1. Worker reuse β€” Worker stays alive, avoid recreation
  2. Efficient messaging β€” Minimize data transfer between threads
  3. Lazy loading β€” Load step outputs on-demand, not all at once

Step Execution Order

Topological sort determines execution order:

Input β†’ A β†’ C β†’ Output β†˜ B β†—

Execution order: Input β†’ A β†’ (B, C in parallel) β†’ Output

Optimization:

  • Place expensive steps later in pipeline
  • Only if they don’t affect upstream filtering

Caching Strategies

Step Output Caching

How it works:

  • Step outputs stored during execution
  • Loaded on-demand when viewed
  • Cached in memory for fast access
  • Cleared when pipeline changes

Benefits:

  • Faster initial execution (no rendering overhead)
  • Lower memory usage (only load viewed steps)
  • Better UX for large pipelines

Manual cache control:

  • Refresh page β†’ clears cache
  • Pipeline change β†’ clears cache
  • Can manually clear cache in DevTools

Browser Performance

Browser Differences

Fastest: Chrome, Edge (Blink engine)

  • OPFS support (fastest storage)
  • Efficient worker implementation
  • Good performance

Moderate: Firefox (Gecko engine)

  • No OPFS (uses IndexedDB)
  • Slower storage
  • Still good performance

Slowest: Safari (WebKit engine)

  • No OPFS (uses IndexedDB)
  • Slower worker performance
  • Higher overhead

Mobile Performance

Mobile browsers are slower:

  • Less memory available
  • Slower JavaScript execution
  • Limited storage

Optimizations for mobile:

  • Use smaller datasets
  • Simplify pipelines
  • Use desktop for complex work

Monitoring Performance

Measure Execution Time

In the UI:

  • Check step duration labels where available
  • Look for slow steps

In the console:

// Enable performance logging localStorage.debug = 'pipeline:*';

Identify Bottlenecks

Find the slowest step:

  1. Run pipeline
  2. Review each completed step’s duration
  3. Note execution times
  4. Focus on optimizing slowest step

Common bottlenecks:

  • Aggregation on large arrays
  • Sorting large arrays
  • Complex transformations (restructure, compute)

Performance Testing

Test with Sample Data

Development workflow:

  1. Test small β€” Use 10-100 item sample
  2. Verify logic β€” Ensure correctness
  3. Test medium β€” Use 1K-10K items
  4. Measure performance β€” Check execution time
  5. Optimize if needed β€” Apply optimizations
  6. Test large β€” Use full dataset (if needed)

Benchmark Your Pipeline

Create performance baseline:

  1. Run pipeline with representative data
  2. Record execution time
  3. Document baseline performance
  4. Re-measure after changes
  5. Compare to baseline

Performance Trade-offs

Speed vs. Completeness

Faster (less complete):

  • Fewer steps
  • Simpler transformations
  • Less data validation

Slower (more complete):

  • More steps
  • Complex transformations
  • Thorough validation

Choose based on needs:

  • Development β†’ Faster is better
  • Production β†’ Completeness matters

Speed vs. Memory

Less memory (slower):

  • Stream processing (future feature)
  • Incremental loading
  • Frequent garbage collection

More memory (faster):

  • Load everything at once
  • Cache intermediate results
  • Less processing overhead

Advanced Optimization

Pipeline Parallelization

Future feature: Execute independent steps in parallel

Input β†’ Filter β†’ Sort β†’ Output β†˜ Aggregate β†—

Currently sequential, could be parallel in future.

Incremental Execution

Future feature: Only re-run changed steps

Benefit: Much faster iteration during development.

Streaming Execution

Future feature: Process data in chunks

Benefit: Start showing results before completion.


Performance Checklist

Use this checklist to optimize your pipelines:

Data Size:

  • Dataset < 10MB for optimal performance
  • Filter early to reduce size
  • Pick fields early to reduce width

Pipeline Structure:

  • Less than 20 steps
  • No redundant steps
  • Efficient utility combinations
  • Simple before complex

Execution:

  • Fast steps first, slow steps last
  • Minimal branching
  • No circular dependencies

Memory:

  • Close unnecessary tabs
  • Clear cache periodically
  • Use desktop for complex pipelines

Next Steps

Last updated on