IEEE Transactions on Parallel and Distributed Systems: Special Issue on Many-Task Computing

Special Issue Abstract

The Special Issue on Many-Task Computing (MTC) will provide the scientific community a dedicated forum, within the prestigious IEEE Transactions on Parallel and Distributed Systems Journal, for presenting new research, development, and deployment efforts of loosely coupled large scale applications on large scale clusters, Grids, Supercomputers, and Cloud Computing infrastructure. MTC, the focus of the special issue, encompasses loosely coupled applications, which are generally composed of many tasks (both independent and dependent tasks) to achieve some larger application goal. This special issue will cover challenges that can hamper efficiency and utilization in running applications on large-scale systems, such as local resource manager scalability and granularity, efficient utilization of the raw hardware, parallel file system contention and scalability, data management, I/O management, reliability at scale, and application scalability. We welcome paper submissions on all topics related to MTC on large scale systems. For more information, please see http://www2.computer.org/portal/c/document_library/get_file?uuid=c6bfaa2c-eeec-4278-839c-cf4a806f905f&groupId=808735.

News

April 27th, 2011	Proceedings are online at http://www.computer.org/portal/web/csdl/abs/trans/td/2011/06/ttd201106toc.htm.
February 25th, 2011	I. Raicu, I. Foster, Y. Zhao. "Guest Editors' Introduction: Special Issue on Many-Task Computing", IEEE Transactions on Parallel and Distributed Systems, Special Issue on Many-Task Computing, 2011
September 23rd, 2010	New Workshop: 3rd IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS) 2010, co-located with IEEE/ACM Supercomputing 2010
September 23rd, 2010	New Workshop: 2nd Workshop on Scientific Cloud Computing (ScienceCloud) 2011, co-located with ACM HPDC 2011
August 20th, 2010	New Workshop: 1st International Workshop on Data Intensive Computing in the Clouds (DataCloud) 2011, co-located with IEEE IPDPS 2011
August 20th, 2010	Final Decisions have been made, with an acceptance rate of 24% (10 papers out of 42): Dengpan Yin, Esma Yildirim, Sivakumar Kulasekaran, Brandon Ross, Tevfik Kosar. "A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing", IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue on Many-Task Computing, to appear in print January 2011 Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu. "ThriftStore: Finessing Reliability Tradeoffs in Replicated Storage Systems", IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue on Many-Task Computing, to appear in print January 2011 Rafael Moreno-Vozmediano, Ruben S. Montero, Ignacio M. Llorente. "Multi-Cloud Deployment of Computing Clusters for Loosely-Coupled MTC Applications", IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue on Many-Task Computing, to appear in print January 2011 Alexandru Iosup, Simon Ostermann, Nezih Yigitbasi, Radu Prodan, Thomas Fahringer, Dick Epema. "Performance Analysis of Cloud Computing Services for MTC-Based Scientific Computing", IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue on Many-Task Computing, to appear in print January 2011 Florin Isaila, Javier Garcia Blas, Jesus Carretero, Robert Latham, Robert Ross. "Design and evaluation of multiple level data staging for BlueGene systems", IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue on Many-Task Computing, to appear in print January 2011 David Abramson, Blair Bethwaite, Colin Enticott, Slavisa Garic, Tom Peachey. "Parameter Exploration in Science and Engineering using Many-Task Computing", IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue on Many-Task Computing, to appear in print January 2011 Yunhong Gu, Robert Grossman. "Towards Efficient and Simplified Distributed Data Intensive Computing", IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue on Many-Task Computing, to appear in print January 2011 Daniel Warneke, Odej Kao. "Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud", IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue on Many-Task Computing, to appear in print January 2011 Jaliya Ekanayake, Thilina Gunarathne, Xiaohong Qiu, Geoffrey Fox. "Cloud Technologies for Bioinformatics Applications", IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue on Many-Task Computing, to appear in print January 2011 Constantinos Evangelinos, Pierre F.J. Lermusiaux, Jinshan Xu, Patrick J. Haley Jr., Chris N. Hill. "Many Task Computing for Real-Time Uncertainty Prediction and Data Assimilation in the Ocean", IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue on Many-Task Computing, to appear in print January 2011
May 5th, 2010	Important dates have been updated
May 4th, 2010	Final first round decisions have been made: 2 papers under review with minor revisions 9 papers under review with major revisions 2 papers resubmit as new 29 papers reject
March 15th, 2010	Some decisions have been made and announced: 1 paper under review with minor revisions 3 papers under review with major revisions 29 papers reject 9 papers pending for more reviews
January 4th, 2010	42 full paper submissions received
December 2nd, 2009:	New submission guidelines for the initial abstract; click here for more info.
December 1st, 2009:	Issues with the Manuscript Central submission system have been resolved.
November 30th, 2009:	Abstract submission deadline extended to December 14th, 2009, due to issues with the online submission system (Manuscript Central at https://mc.manuscriptcentral.com/tpds-cs). We expect to have the issues resolved in the next few days, we will add another post to this news section when the issues are resolved. Don't hesitate to contact us at mtc@computer.org if you have any questions.

Overview

This special issue will focus on the ability to manage and execute large scale applications on today's largest clusters, Grids, and Supercomputers. Clusters with tens of thousands of processor cores are readily available, Grids (i.e. TeraGrid) with a dozen sites and 100K+ processors, and supercomputers with up to 200K processors (i.e. IBM BlueGene/L and BlueGene/P, Cray XT5, Sun Constellation), are all now available to the broader scientific community for open science research. Large clusters and supercomputers have traditionally been high performance computing (HPC) systems, as they are efficient at executing tightly coupled parallel jobs within a particular machine with low-latency interconnects; the applications typically use message passing interface (MPI) to achieve the needed inter-process communication. On the other hand, Grids have been the preferred platform for more loosely coupled applications that tend to be managed and executed through workflow systems, commonly known to fit in the high-throughput computing (HTC) paradigm.

Many-task computing (MTC) aims to bridge the gap between two computing paradigms, HTC and HPC. MTC is reminiscent to HTC, but it differs in the emphasis of using many computing resources over short periods of time to accomplish many computational tasks (i.e. including both dependent and independent tasks), where the primary metrics are measured in seconds (e.g. FLOPS, tasks/s, MB/s I/O rates), as opposed to operations (e.g. jobs) per month. MTC denotes high-performance computations comprising multiple distinct activities, coupled via file system operations. Tasks may be small or large, uniprocessor or multiprocessor, compute-intensive or data-intensive. The set of tasks may be static or dynamic, homogeneous or heterogeneous, loosely coupled or tightly coupled. The aggregate number of tasks, quantity of computing, and volumes of data may be extremely large. MTC includes loosely coupled applications that are generally communication-intensive but not naturally expressed using standard message passing interface commonly found in HPC, drawing attention to the many computations that are heterogeneous but not "happily" parallel.

There is more to HPC than tightly coupled MPI, and more to HTC than embarrassingly parallel long running jobs. Like HPC applications, and science itself, applications are becoming increasingly complex opening new doors for many opportunities to apply HPC in new ways if we broaden our perspective. Some applications have just so many simple tasks that managing them is hard. Applications that operate on or produce large amounts of data need sophisticated data management in order to scale. There exist applications that involve many tasks, each composed of tightly coupled MPI tasks. Loosely coupled applications often have dependencies among tasks, and typically use files for inter-process communication. Efficient support for these sorts of applications on existing large scale systems will involve substantial technical challenges and will have big impact on science.

Today's existing HPC systems are a viable platform to host MTC applications. However, some challenges arise in large scale applications when run on large scale systems, which can hamper the efficiency and utilization of these large scale systems. These challenges vary from local resource manager scalability and granularity, efficient utilization of the raw hardware, parallel file system contention and scalability, data management, I/O management, reliability at scale, application scalability, and understanding the limitations of the HPC systems in order to identify good candidate MTC applications. Furthermore, the MTC paradigm can be naturally applied to the emerging Cloud Computing paradigm due to its loosely coupled nature, which is being adopted by industry as the next wave of technological advancement to reduce operational costs while improving efficiencies in large scale infrastructures.

For an interesting discussion in a blog by Ian Foster on the difference between MTC and HTC, please see his blog at http://ianfoster.typepad.com/blog/2008/07/many-tasks-comp.html. The proposed editors also published several papers highly relevant to this special issue. One paper is titled "Toward Loosely Coupled Programming on Petascale Systems", and was published in IEEE/ACM Supercomputing 2008 (SC08) Conference; the second paper is titled “Many-Task Computing for Grids and Supercomputers”, which was published in the IEEE Workshop on Many-Task Computing on Grids and Supercomputers 2008 (MTAGS08). To see last year’s workshop program agenda, and accepted papers and presentations, please see http://dsl.cs.uchicago.edu/MTAGS08/. To see this year’s workshop web site, see http://dsl.cs.uchicago.edu/MTAGS09/.

Topics

The topics of interest include, but are not limited to:

Compute Resource Management in large scale clusters, large Grids, Supercomputers, and Cloud Computing infrastructures

- Scheduling
- Job execution frameworks
- Local resource manager extensions
- Performance evaluation of resource managers in use on large scale systems
- Challenges in running many-task workloads on HPC systems
- Challenges in running many-task workloads on Cloud Computing infrastructures

Data Management in large scale Grid and Supercomputer environments:

- Data-Aware Scheduling
- Shared File System performance and scalability in large deployments
- Distributed file systems
- Data caching frameworks and techniques

Large-Scale Workflow Systems

- Workflow system performance and scalability analysis
- Scalability of workflow systems
- Workflow infrastructure and e-Science middleware
- Programming Paradigms and Models

Large-Scale Many-Task Applications

- Large-scale many-task applications
- Large-scale many-task data-intensive applications
- Large-scale high throughput computing (HTC) applications
- Quasi-supercomputing applications, deployments, and experiences

Paper Submission and Publication

Authors are invited to submit papers with unpublished, original work of not more than 14 pages of double column text using single spaced 9.5 point size on 8.5 x 11 inch pages and 0.5 inch margins (http://www2.computer.org/portal/c/document_library/get_file?uuid=02e1509b-5526-4658-afb2-fe8b35044552&groupId=525767). Papers will be peer-reviewed, and accepted papers will be published in the IEEE digital library. Submitted articles must not have been previously published or currently submitted for journal publication elsewhere. As an author, you are responsible for understanding and adhering to our submission guidelines. You can access them by clicking on the following web link: http://www.computer.org/mc/tpds/author.htm. Please thoroughly read these before submitting your manuscript.

Please submit the following information by email to mtc@computer.org by December 14th, 2009 for the abstract submission.

Subject: [TPDS MTC] new abstract submission

Title:

Author Names:

Author Affiliations:

Author Emails:

Title:

Abstract:

Your completed and final paper should be submitted to Manuscript Central at https://mc.manuscriptcentral.com/tpds-cs. Please feel free to contact the Peer Review Publications Coordinator, Annissia Bryant at tpds@computer.org or the guest editors at mtc@computer.org if you have any questions. For more information on this special issue, please see http://dsl.cs.uchicago.edu/TPDS_MTC/.

Important Dates

Abstract Due: December 1st, 2009 December 14th, 2009

Papers Due: December 21st, 2009 January 4th, 2010

First Round Decisions: February 22nd, 2010 May 4th, 2010

Major Revisions if needed: April 19th, 2010 July 2nd, 2010

Second Round Decisions: May 24th, 2010 July 30th, 2010

Minor Revisions if needed: June 7th, 2010 August 13th, 2010

Final Decision: June 21st, 2010 August 21st, 2010

Publication Date: November, 2010 December, 2010

Special Issue Guest Editors

Ian Foster, University of Chicago & Argonne National Laboratory

Ioan Raicu, Illinois Institute of Technology

Yong Zhao, Microsoft

Dr. Ian Foster is the Associate Division Director and a Senior Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory, where he leads the Distributed Systems Laboratory, and he is an Arthur Holly Compton Professor in the Department of Computer Science at the University of Chicago. He is also involved with both the Open Grid Forum and with the Globus Alliance as an open source strategist. In 2006, he was appointed director of the Computation Institute, a joint project between the University of Chicago, and Argonne. An earlier project, Strand, received the British Computer Society Award for technical innovation. His research resulted in the development of techniques, tools and algorithms for high-performance distributed computing and parallel computing. As a result he is denoted as "the father of the Grid". Foster led research and development of software for the I-WAY wide-area distributed computing experiment, which connected supercomputers, databases and other high-end resources at 17 sites across North America in 1995. His own labs, the Distributed Systems Laboratory is the nexus of the multi-institute Globus Project, a research and development effort that encourages collaborative computing by providing advances necessary for engineering, business and other fields. Furthermore the Computation Institute addresses many of the most challenging computational and communications problems facing Grid implementations today. In 2004, he founded Univa Corporation, which was merged with United Devices in 2007 and operate under the name Univa UD. Foster's honors include the Lovelace Medal of the British Computer Society, the Gordon Bell Prize for high-performance computing (2001), as well as others. He was elected Fellow of the American Association for the Advancement of Science in 2003. Dr. Foster also serves as PI or Co-PI on projects connected to the DOE global change program, the National Computational Science Alliance, the NASA Information Power Grid project, the NSF Grid Physics Network, GRIDS Center, and International Virtual Data Grid Laboratory projects, and other DOE and NSF programs. His research is supported by DOE, NSF, NASA, Microsoft, and IBM.

Dr. Ioan Raicu is an assistant professor in the Department of Computer Science at Illinois Institute of Technology. He was a NSF/CRA Computation Innovation Fellow at Northwestern University in 2009 - 2010, and obtained his Ph.D. in Computer Science from University of Chicago under the guidance of Dr. Ian Foster in 2009. He is a 3-year award winner of the GSRP Fellowship from NASA Ames Research Center. His research work and interests are in the general area of distributed systems. His work focuses on a relatively new paradigm of Many-Task Computing (MTC), which aims to bridge the gap between two predominant paradigms from distributed systems, High-Throughput Computing (HTC) and High-Performance Computing (HPC). His work has focused on defining and exploring both the theory and practical aspects of realizing MTC across a wide range of large-scale distributed systems. He is particularly interested in efficient task dispatch and execution systems, resource provisioning, data management, scheduling, and performance evaluations in distributed systems. His work has been funded by the NASA Ames Research Center, DOE Office of Advanced Scientific Computing Research, and the NSF/CRA CIFellows program. Ioan's research interests include resource management in large scale distributed systems with a focus on many-task computing, data intensive computing, cloud computing, grid computing, and many-core computing. He is a member of the ACM and IEEE.

Dr. Yong Zhao obtained his PhD in Computer Science from The University of Chicago under Dr. Ian Foster's supervision, and is best known for the GriPhyN Virtual Data System (VDS), a data and workflow management system for data-intensive science collaborations. VDS plays a fundamental role in various Data Grid projects such as iVDGL (International Virtual Data Grid Laboratory), PPDG (Partical Physics Data Grid), OSG (Open Science Grid) etc. The system has been applied to scientific applications in various disciplines such as the high energy physics experiments CMS and ATLAS, the astrophysics project Sloan Digital Sky Survey, the QuarkNet science education project, and various Neuroscience and bioinformatics projects. He also developed the Swift system, a programming tool for fast, scalable and reliable loosely-coupled parallel computation. Swift comprises a simple scripting language called SwiftScript to represent complex scientific workflows, and a scalable runtime system to schedule hundreds of thousands of jobs onto distributed and parallel computing resources. The Angle cyber-infrastructure protection project, the SC'07 Analytics Challenge first place winner, is based on the Swift system. He has also been actively involved in the Falkon project, a lightweight task execution framework for high throughput computing. He is now working at Microsoft on Business Intelligence projects that leverage large scale storage and computing infrastructures for Web analytics and behavior targeting. He is a member of ACM and IEEE.

To return to MTC10 main page, click here.