DataSys: Data-Intensive Distributed Systems LaboratoryData-Intensive Distributed Systems Laboratory

Illinois Institute of Technology
Department of Computer Science

CFP (TXT, PDF) | News | Topics | Dates | Submission | Organization | Program | Sponsors

7th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS) 2014

Co-located with Supercomputing/SC 2014
In cooperation with ACM SIGHPC 
New Orleans, Louisiana -- November 16th, 2014

Panel - Many-Task Computing in a Big Data World

Many-Task Computing (MTC) encompasses loosely coupled applications, which are generally composed of many tasks to achieve some larger application goal. MTC in computational science is an approach to parallel computing that aims to bridge the gap between two computing paradigms, high throughput computing (HTC) and high-performance computing (HPC). MTC denotes high-performance computations comprising multiple distinct activities, coupled via file system or in-memory operations. Rapid advances in digital sensors, networks, storage, and computation along with their availability at low cost is leading to the creation of huge collections of data -- dubbed as Big Data. This data has the potential for enabling new insights that can change the way business, science, and governments deliver services to their consumers and can impact society as a whole. This has led to the emergence of the Big Data Computing paradigm focusing on sensing, collection, storage, management and analysis of data from variety of sources to enable new value and insights.

The key questions this panel will address are:

  1. Is Big Data just a buz word? Or is there something really new that is going to turn what we know about data-intensive computing on its head? Since we have industry, national labs, and academics on the panel, it would be great to hear your position on Big Data from each of these angles.
  2. Many-task computing, and big data computing, are both computing paradigms. How important are the underlying storage systems for these computing paradigms?
  3. Why is Big Data hard? What can we do bridge the gap, and bring Big Data capabilities to the masses? 

We are aiming for a very interactive panel discussion, so please bring your questions regarding many-task computing and big data computing. 

Panelists:

 

Michael Wilde is a software architect in the Mathematics and Computer Science Division, Argonne National Laboratory, and a Senior Fellow of the University of Chicago/Argonne National Laboratory Computation Institute. His research focus is the application of parallel scripting to enhance scientific productivity by making parallel and distributed computing systems easier to use. He also conducts research into data provenance to record and query the history and metadata of scientific computations and datasets. His work centers on development and application of the Swift parallel scripting language, http://swift-lang.org.  
Owen O'MalleyOwen O'Malley is a co-founder and software architect at Hortonworks, a rapidly growing company (25 to 525 employees in 3.5 years), which develops the completely open source Hortonworks Data Platform (HDP). HDP includes Hadoop and the large ecosystem of big data tools that enterprises need for their data analytics. Owen has been working on Hadoop since the beginning of 2006 at Yahoo, was the first committer added to the project, and used Hadoop to set the Gray sort benchmark in 2008 and 2009. In the last 8 years, he has been the architect of MapReduce, Security, and now Hive. Recently he has been driving the development of the ORC file format and adding ACID transactions to Hive. Before working on Hadoop, he worked on Yahoo Search's WebMap project, which was the original motivation for Yahoo to work on Hadoop.  Prior to Yahoo, he wandered between testing (UCI), static analysis (Reasoning), configuration management (Sun), and software model checking (NASA). He received his PhD in Software Engineering from University of California, Irvine.  
Matei RipeanuMatei Ripeanu is an Associate Professor at the University of British Columbia (Electrical and Computer Engineering Department). Matei is broadly interested in experimental parallel and distributed systems research with a focus on massively parallel accelerators, data analytics, and storage systems.  For please check the Networked Systems Laboratory website (netsyslab.ece.ubc.ca) for an up-to-date overview of the projects he works on together with a fantastic group of students.