Performance Guide¶

Tips and benchmarks for optimal PyBGPKITStream performance.

Performance Overview¶

PyBGPKITStream achieves significant performance improvements over PyBGPStream for update processing:

Updates: 3–10× faster than PyBGPStream
RIBs: Currently 3–4× slower (optimization in progress)
Memory: Minimal due to lazy loading

For detailed benchmark results, see perf.md in the repository.

Parser Selection¶

The biggest performance factor is parser choice:

# Fastest: bgpkit-parser (10x speedup)
stream = BGPKITStream.from_config(config, parser_name="bgpkit")

# Slow but no dependencies: pybgpkit (baseline)
stream = BGPKITStream.from_config(config, parser_name="pybgpkit")

# Fast: bgpdump
stream = BGPKITStream.from_config(config, parser_name="bgpdump")

Recommendation: Install bgpkit-parser for production use.

Filtering for Performance¶

Applying filters reduces data processed and improves speed:

# Original: processes all elements
stream = BGPKITStream(
    collectors=["route-views.wide"],
    data_type=["updates"],
    ts_start=1283203200,
    ts_end=1283289600,
)

# Filtered: processes fewer elements
stream = BGPKITStream(
    collectors=["route-views.wide"],
    data_type=["updates"],
    ts_start=1283203200,
    ts_end=1283289600,
    filters=FilterOptions(origin_asn=2497),  # Reduces dataset
)

Concurrent Downloads¶

Control parallel downloads:

# Default: 10 concurrent downloads
stream = BGPKITStream(..., max_concurrent_downloads=10)

# For memory-constrained systems
stream = BGPKITStream(..., max_concurrent_downloads=5)

# For high-throughput systems with plenty of memory
stream = BGPKITStream(..., max_concurrent_downloads=20)

RAM Disk Usage¶

Use RAM disk (if available) for temporary file storage:

# Automatic: uses /dev/shm (Linux) or /Volumes/RAMDisk (macOS)
stream = BGPKITStream(..., ram_fetch=True)

# Disable to use system temp directory
stream = BGPKITStream(..., ram_fetch=False)

Performance benefit: 2–3× faster I/O on systems with sufficient free RAM.

Caching Strategy¶

Reuse cached files to avoid re-downloading:

# Use persistent cache
stream1 = BGPKITStream(
    ...,
    cache_dir="/data/bgp_cache",
)

# Later: same data is reused from cache
stream2 = BGPKITStream(
    ...,
    cache_dir="/data/bgp_cache",
)

Benefit: Subsequent runs skip downloads entirely.

Chunk Time Settings¶

Set the archive prefetch/parse interval

# Default: 2 hours per chunk
stream = BGPKITStream(..., chunk_time=7200)

# Smaller chunks: more requests but finer control
stream = BGPKITStream(..., chunk_time=1800)  # 30 minutes

# Larger chunks: fewer requests, more data at once
stream = BGPKITStream(..., chunk_time=86400)  # 1 day

Memory Optimization¶

For very large datasets, minimize memory usage:

# Process in streaming fashion, not storing all elements
element_count = 0
for elem in stream:
    # Process element immediately
    process(elem)
    element_count += 1

    # Don't accumulate elements in lists
    # elements.append(elem)  # AVOID!

print(f"Processed {element_count} elements")

Benefit: Constant memory regardless of dataset size.

Troubleshooting Performance¶

Slow Downloads¶

Check network connectivity
Reduce max_concurrent_downloads if bandwidth is limited
Use cache_dir to avoid re-downloading
Consider using smaller time windows

High Memory Usage¶

Reduce max_concurrent_downloads to 5 or less
Set ram_fetch=False to use disk instead of RAM
Process elements in streaming fashion, don't accumulate

Slow Parsing¶

Switch the parser from the defaultpybgpkit
Use more specific filters to reduce parsing load