The exploitation of throughput in a parallel application that processes an input data stream is a difficult challenge. For typical coarse-grain applications, where the computation time of tasks is greater than their communication time, the maximum achievable throughput is determined by the maximum task computation time. Thus, the improvement in throughput above this maximum would eventually require the modification of the source code of the tasks. In this work, we address the improvement of throughput by proposing two task replication methodologies that have the target throughput to be achieved as an input parameter. They proceed by generating a new task graph structure that permits the target throughput to be achieved. The first replication mechanism, named DPRM (Data Parallel Replication Mechanism), exploits the inner task data parallelism. The second mechanism, named TCRM (Task Copy Replication Mechanism), creates new execution paths inside the application task graph structure that allows more than one instance of data to be processed concurrently. We evaluate the effectiveness of these mechanisms with three real applications executed in a cluster system: the MPEG2 video compressor, the IVUS (Intra-Vascular Ultra-Sound) medical image application and the BASIZ (Bright and SAtured Images Zone) video processing application. In all these cases, the obtained throughput was greater after applying the proposed replication mechanism than what the application could provide with the original implementation. © 2013 Elsevier Inc. All rights reserved.
|Journal||Journal of Parallel and Distributed Computing|
|Publication status||Published - 1 Jan 2013|
- Data parallelism
- Pipeline execution
- Streaming applications
- Task parallelism
- Task replication mechanisms