User Tools

Site Tools


articles:snailtrail

This is an old revision of the document!


Snailtrail: Generalizing critical paths for online analysis of distributed dataflows

In this post, we'll look at Snailtrail, a tool to diagnose latency performance issues for distributed dataflows which has been developed in the Systems Group at ETH Zurich. It allows to answer the question of where are potential latency bottlenecks in a distributed streaming dataflow computation. Snailtrail can be applied to many distributed streaming applications. Only a lightweight stream of trace data is required, we'll go into details about it later. Snailtrail does the hard work of constructing an activity graph for time-based windows and ranking activities according to the critical participation, a novel metric we introduce. In this post, we'll walk through the concepts of activity graphs, time-based windows, and critical participation. Snailtrail currently supports Flink, Spark, Heron, TensorFlow and Timely dataflow.

Snailtrail was presented at NSDI'18.

articles/snailtrail.1537258265.txt.gz · Last modified: 2018/09/18 10:11 by moritz

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki