Logo Logo
Switch Language to German
Luckow, Andre; Chantzialexiou, George; Jha, Shantenu (2018): Pilot-Streaming: A Stream Processing Framework for High-Performance Computing. In: 2018 Ieee 14Th International Conference on E-Science (E-Science 2018): pp. 177-188
Full text not available from 'Open Access LMU'.


An increasing number of scientific applications utilize stream processing to analyze data feeds of scientific instruments, sensors, and simulations. In this paper, we study the streaming and data processing requirements of light source experiments, which are projected to generate data at 20 GB/sec in the near future. As beamtimes available to users are typically short, it is essential that processing and analysis can be conducted in a streaming mode. The development and deployment of streaming applications is a complex task and requires the integration of heterogeneous, distributed infrastructure, frameworks, middleware and application components written in different languages and abstractions. Streaming applications may be extremely dynamic due to factors, such as variable data rates, network congestions, and application-specific characteristics, such as adaptive sampling techniques and the different processing techniques. Consequently, streaming system are often subject to back-pressures and instabilities requiring additional infrastructure to mitigate these issues. We propose Pilot-Streaming, a framework for supporting streaming applications and their resource management needs on HPC infrastructure. Underlying Pilot-Streaming is a unifying architecture that decouples important concerns and functions, such as message brokering, transport and communication, and processing. Pilot-Streaming simplifies the deployment of stream processing frameworks, such as Kafka and Spark Streaming, while providing a high-level abstraction for managing streaming infrastructure, e.g. adding/removing resources as required by the application at runtime. This capability is critical for balancing complex streaming pipelines. To address the complexity in the development of streaming applications, we present the Streaming Mini-Apps, which supports different plug-able algorithms for data generation and processing, e.g., for reconstructing light source images using different techniques. We use the streaming Mini-Apps to evaluate the Pilot-Streaming framework demonstrating its suitability for different use cases and workloads.