DataSys: Data-Intensive Distributed Systems LaboratoryData-Intensive Distributed Systems Laboratory

Illinois Institute of Technology
Department of Computer Science

NSF CAREER: Avoiding Achilles’ Heel in Exascale Computing with Distributed File Systems 

Exascale (i.e. 1018 operations/sec) computers will enable the unraveling of significant scientific mysteries, covering many domains (e.g. weather modeling, national security, energy, and drug discovery). Predictions are that exascales will be reached in 2019, with millions of compute-nodes and billions of threads of execution. The current state-of-the-art storage in high-end computing (HEC), in which storage is segregated from compute-nodes and connected by a network (e.g. parallel filesystems), will not scale with the expected exponential growth in concurrency. At exascales, basic functionality (e.g. booting, check-pointing, metadata/data access) at high concurrency levels will suffer poor performance, and combined with system mean-time-to-failure in hours, will lead to a performance collapse. 

The investigator envisions future HEC systems to be designed with non-volatile memory on every compute node, and every node to actively participate in the metadata and data management. This work aims to: 1) design, analyze, and implement a distributed data structure (ZHT) optimized for HEC, to be used for distributed metadata management; 2) design, analyze, and implement a distributed filesystem (FusionFS) optimized for a subset of important high-performance computing (HPC) as well as many-task computing (MTC) workloads, and scalable to millions of nodes; and 3) evaluate work with real workloads, applications, and simulations up to exascales. The results of this work has the potential to make exascale computing more tractable, touching virtually all disciplines in HEC, fueling scientific discovery and economic development at the national level. The HEC knowledgebase will extend into commodity systems as the fastest machines generally become mainstream systems in five to seven years. This work can also open doors for research in radical parallel programming paradigms (e.g. MTC) that rely on scalable storage infrastructure. 

Award: $590K, 01/2011 - 06/2016; for more details, see the NSF description

Collaborators:

Students (* denotes partial funding, ** denotes full funding):

PhD:

MS:

 

Bachelor

 

Highschool:

 

Projects:

 

Proposals / Thesis / Dissertations:

  1. Itua Ijagbone, Ioan Raicu (advisor). "Scalable Indexing and Searching on Distributed File Systems", Department of Computer Science, Illinois Institute of Technology, MS Thesis, 2016

  2. Tonglin Li, Ioan Raicu. "Distributed NoSQL Storage for Extreme-SCale System Services in Supercomputers and Clouds", Illinois Institute of Technology, Computer Science Department, PhD Dissertation, 2015

  3. Dongfang Zhao, Ioan Raicu. Big Data System Infrastructure at Extreme Scales, Illinois Institute of Technology, Computer Science Department, PhD Dissertation, 2015

  4. Ke Wang, Ioan Raicu. Scalable Resource Management System Software for Extreme-Scale Distributed Systems, Illinois Institute of Technology, Computer Science Department, PhD Dissertation, 2015

  5. Jason Arnold, Boris Glavic, Ioan Raicu. "HRDBMS: Combining the Best of Modern and Traditional Relational Databases", Illinois Institute of Technology, Department of Computer Science, PhD Oral Qualifier, 2015  

  6. Tonglin Li, Ioan Raicu. "A Convergence of NoSQL Storage Systems from Clouds to Supercomputers", Illinois Institute of Technology, Computer Science Department, PhD Proposal, 2014 

  7. Dongfang Zhao, Ioan Raicu. "Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems," Illinois Institute of Technology, Computer Science Department, PhD Proposal, 2014

  8. Tonglin Li, Ioan Raicu. "ZHT: a Zero-hop DHT for High-End Computing Environment", Illinois Institute of Technology, Department of Computer Science, PhD Oral Qualifier, 2012

  9. Dongfang Zhao, Ioan Raicu. "HyCache: A Hybrid User-Level File System with SSD Caching", Illinois Institute of Technology, Department of Computer Science, PhD Oral Qualifier, 2012

 

Peer Reviewed Publications:

  1. Dongfang Zhao, Ke Wang, Kan Qiao, Tonglin Li, Iman Sadooghi, Ioan Raicu. "Toward High-performance Key-value Stores through GPU Encoding and Locality-aware Encoding", Journal of Parallel and Distributed Computing, 2016 

  2. Dongfang Zhao, Kan Qiao, Zhou Zhou, Tonglin Li, Xiaobing Zhou, Ioan Raicu. “Exploiting Multi-cores for Efficient Interchange of Large Messages in Distributed Systems”, Concurrency and Computation: Practice and Experience (CCPE), 2015 (Impact Factor 1.0)

  3. Thomas Dubucq, Tony Forlini, Virgile Landeiro Dos Reis, Isabelle Santos, Ke Wang, Ioan Raicu. “Benchmarking State-of-the-art Many-Task Computing Runtime Systems”, ACM HPDC 2015

  4. Xiaobing Zhou, Tonglin Li, Ke Wang, Dongfang Zhao, Iman Sadooghi, Ioan Raicu. "MHT: A Light-weight Scalable Zero-hop MPI Enabling Distributed Hash Table", IEEE Big Data 2015 (poster)  

  5. Iman Sadooghi, Ke Wang, Dharmit Patel, Dongfang Zhao, Tonglin Li, Shiva Srivastava, Ioan Raicu. “FaBRiQ: Leveraging Distributed Hash Tables towards Distributed Publish-Subscribe Message Queues”, IEEE/ACM BDC 2015

  6. Tonglin Li, Ke Wang, Shiva Srivastava, Dongfang Zhao, Kan Qiao, Iman Sadooghi, Xiaobing Zhou, Ioan Raicu. "A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud", IEEE Big Data 2015

  7. Tonglin Li, Ioan Raicu. "Distributed NoSQL Storage for Extreme-Scale System Services", Doctoral Showcase, IEEE/ACM Supercomputing/SC 2015  

  8. Jason Arnold, Boris Glavic, Ioan Raicu. "HRDBMS: A NewSQL Database for Analytics", IEEE Cluster 2015 

  9. Tonglin Li, Chaoqi Ma, Jiabao Li, Xiaobing Zhou, Ke Wang, Dongfang Zhao, Iman Sadooghi, Ioan Raicu. "GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System", IEEE Cluster 2015 (poster)

  10. Ke Wang, Kan Qiao, Iman Sadooghi, Xiaobing Zhou, Tonglin Li, Michael Lang, Ioan Raicu. "Load-balanced and locality-aware scheduling for data-intensive workloads at extreme scales", Concurrency and Computation: Practice and Experience (CCPE) Journal 2015

  11. Dongfang Zhao , Kan Qiao , Jian Yin , and Ioan Raicu. "Dynamic Virtual Chunks: On Supporting Efficient Accesses to Compressed Scientific Data", IEEE Transaction on Service Computing (TSC) Journal 2015, Special Issue on Big Data

  12. Dongfang Zhao, Ning Liu, Dries Kimpe, Robert Ross, Xian-He Sun, and Ioan Raicu. "Towards Exploring Data-Intensive Scientific Applications at Extreme Scales through Systems and Simulations", IEEE Transaction on Parallel and Distributed Systems (TPDS) Journal 2015

  13. Tonglin Li, Xiaobing Zhou, Ke Wang, Dongfang Zhao, Iman Sadooghi, Zhao Zhang, Ioan Raicu. "A Convergence of Key-Value Storage Systems from Clouds to Supercomputers", Concurrency and Computation: Practice and Experience (CCPE) Journal 2015

  14. Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, Xiaobing Zhou, Michael Lang, Xian-He Sun, Ioan Raicu. "Overcoming Hadoop Scaling Limitations through Distributed Task Execution", IEEE Cluster 2015; 24% acceptance rate

  15. Ben Walters, Alex Ballmer, Andrei Dumitru, Adnan Haider, Serapheim Dimitropoulos, Ariel Young, William Scullin, Ben Allen, Ioan Raicu. "15 TFlops Haswell vs. 60 TFlops Knight Landing for HPC Scientific Computing Applications", Student Cluster Competition (SCC), IEEE/ACM Supercomputing/SC 2015

  16. Ke Wang, Abhishek Kulkarni, Michael Lang, Dorian Arnold, and Ioan Raicu. "Exploring the Design Tradeoffs for Extreme-Scale High-Performance Computing System Software", IEEE Transaction on Parallel and Distributed Systems (TPDS) 2015

  17. Dongfang Zhao, Xu Yang, Iman Sadooghi, Gabriele Garzoglio, Steven Timm, Ioan Raicu. "High-Performance Storage Support for Scientific Applications on the Cloud", Invited Paper, ACM ScienceCloud 2015

  18. Tonglin Li, Kate Keahey, Ke Wang, Dongfang Zhao, Ioan Raicu. "A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks", Invited Paper, ACM ScienceCloud 2015

  19. Iman Sadooghi, Jesús Hernández Martin, Tonglin Li, Kevin Brandstatter, Ketan Maheshwari, Tiago Pais Pitta de Lacerda Ruivo, Gabriele Garzoglio, Steven Timm,Yong Zhao, Ioan Raicu. “Understanding the Performance and Potential of Cloud Computing for Scientific Applications”, IEEE Transaction on Cloud Computing (TCC) 2015

  20. Dongfang Zhao, Kan Qiao, Ioan Raicu. "Towards Cost-Effective and High-Performance Caching Middleware for Distributed Systems", International Journal of Big Data Intelligence (IJBDI) 2015, Special Issue on High-Performance Data Intensive Computing

  21. Dongfang Zhao and Ioan Raicu. "Storage Support for Data-Intensive Scientific Applications on the Cloud", NSFCloud Workshop on Experimental Support for Cloud Computing 2014

  22. Dongfang Zhao and Ioan Raicu. "Storage Support for Data-Intensive Applications on Extreme-Scale HPC Systems", Doctoral Showcase, IEEE/ACM Supercomputing/SC 2014

  23. Tonglin Li, Kate Keahey, Rajesh Sankaran, Pete Beckman, Ioan Raicu. “A Cloud-based Interactive Data Infrastructure for Sensor Networks”, IEEE/ACM Supercomputing/SC 2014

  24. Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries Kimpe, Philip Carns, Robert Ross, and Ioan Raicu. "FusionFS: Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems", IEEE International Conference on Big Data 2014; 18% acceptance rate

  25. Ke Wang, Xiaobing Zhou, Tonglin Li, Dongfang Zhao, Michael Lang, Ioan Raicu. "Optimizing Load Balancing and Data-Locality with Data-aware Scheduling", IEEE International Conference on Big Data 2014; 18% acceptance rate

  26. Dongfang Zhao, Jian Yin, Kan Qiao, Ioan Raicu. "Virtual Chunks: On Supporting Random Accesses to Scientific Data in Compressible Storage Systems", IEEE International Conference on Big Data 2014; 18% acceptance rate

  27. Kevin Brandstatter, Jason DiBabbo, Daniel Gordon, Ben Walters, Alex Ballmer, Lauren Ribordy, Ioan Raicu. "Delivering 3.5 Double Precision GFlops/Watt and 200Gb/sec Bi-Section Bandwidth with Intel Xeon Phi-based Cisco Servers", Student Cluster Competition (SCC), IEEE/ACM Supercomputing/SC 2014

  28. Tonglin Li, Ioan Raicu, Lavanya Ramakrishnan. "Scalable State Management for Scientific Applications in the Cloud", IEEE BigData 2014; 19% acceptance rate

  29. Dongfang Zhao, Kan Qiao, Ioan Raicu. “HyCache+: Towards Scalable High-Performance Caching Middleware for Parallel File Systems”, 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2014; 19% acceptance rate

  30. Dongfang Zhao, Jian Yin, Ioan Raicu. “Improving the I/O Throughput for Data-Intensive Scientific Applications with Efficient Compression Mechanisms”, IEEE/ACM Supercomputing 2013

  31. Dongfang Zhao, Chen Shou, Tanu Malik, Ioan Raicu. “Distributed Data Provenance for Large-Scale Data-Intensive Computing”, IEEE Cluster 2013; 31% acceptance rate

  32. Dongfang Zhao, Kent Burlingame, Corentin Debains, Pedro Alvarez-Tabio, Ioan Raicu. “Towards High-Performance and Cost-Effective Distributed Storage Systems with Information Dispersal Algorithms”, IEEE Cluster 2013; 31% acceptance rate

  33. Tonglin Li, Xiaobing Zhou, Kevin Brandstatter, Dongfang Zhao, Ke Wang, Anupam Rajendran, Zhao Zhang, Ioan Raicu. “ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table”, IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2013; 21% acceptance rate

  34. Dongfang Zhao, Da Zhang, Ke Wang, Ioan Raicu. “Exploring Reliability of Exascale Systems through Simulations”, ACM HPC 2013

  35. Chen Shou, Dongfang Zhao, Tanu Malik, Ioan Raicu. “Towards a Provenance-Aware a Distributed File System”, USENIX TaPP13

  36. Dongfang Zhao, Ioan Raicu. “HyCache: A User-Level Caching Middleware for Distributed File Systems”, IEEE HPDIC 2013

  37. Dongfang Zhao, Ioan Raicu. “Distributed File Systems for Exascale Computing”, Doctoral Showcase, IEEE/ACM Supercomputing/SC 2012 (poster)

  38. Tonglin Li, Raman Verma, Xi Duan, Hui Jin, Ioan Raicu. “Exploring Distributed Hash Tables in High-End Computing”, ACM Performance Evaluation Review (PER), 2011

  39. I. Raicu, P. Beckman, I. Foster. “Making a Case for Distributed File Systems at Exascale”, ACM Workshop on Large-scale System and Application Performance (LSAP), 2011

 

Technical Reports:

  1. Itua Ijagbone, Shivakumar Vinayagam, David Pisanski, Kevin Brandstatter, Dongfang Zhao, Ioan Raicu. "Towards Scalable Searching of Distributed File Systems", GCASR 2016    

  2. Mermer Dupree, Mike Wilde, Justin Wozniak, Ioan Raicu. "Optimizing Data Locality with Swift/T and FusionFS", GCASR 2016 

  3. David Pisanski, Kevin Brandstatter, Dongfang Zhao, Calin Segarceanu, Ioan Raicu. "Enabling Distributed Data Indexing and Search in the FusionFS Distributed File System", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2015

  4. Ariel Young, Ioan Raicu. "HPC Power Management on Haswell CPU Architecture", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2015

  5. Mermer Dupree, Justin M. Wozniak, Michael Wilde, Ioan Raicu. "Optimizing Data Locality between the Swift Parallel Programming System and the FusionFS Distributed File System", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2015  

  6. Tonglin Li, Chaoqi Ma, Jiabao Li, Ioan Raicu. "ZHT+: A Graph Database On ZHT", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2015

  7. Sughosh Divanji, Raghav Kapoor, Dongfang Zhao, Ioan Raicu. "PVFS simulation using CODES/ROSS simulator", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2015

  8. Kevin Brandstatter, Ben Walters, Alexander Ballmer, Adnan Haider, Andrei Dumitru, Serapheim Dimitropoulos, Ariel Young, William Scullin, Ben Allen, Ioan Raicu. "Experiences in Optimizing Cluster Performance For Scientific Applications: Controlling Configuration, Utilization, and Power Consumption", GCASR 2015

  9. Kiran Ramamurthy, Ioan Raicu. "Exploring Distributed HPC Scheduling with Randomized Resource Stealing", 3rd Greater Chicago Area System Research Workshop (GCASR), 2014 (poster)

  10. Ke Wang, Ioan Raicu. "Achieving Data-Aware Load Balancing through Distributed Queues and Key/Value Stores", 3rd Greater Chicago Area System Research Workshop (GCASR), 2014 (poster)

  11. Ioan Raicu. "Towards Data-Intensive Extreme-Scale Computing", NSF CyberBridges Workshop, 2014 (poster)

  12. Dongfang Zhao, Ioan Raicu. "Exploring Data Compression in Distributed File Systems", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013

  13. Kun Feng, Tianyang Che, Tonglin Li, Ioan Raicu. "OHT: Hierarchical Distributed Hash Tables", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013

  14. Shukun Xie, Ran Xin, Tonglin Li, Ioan Raicu. "Exploring Eventual Consistency Support in ZHT", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013

  15. Dongfang Zhao, Ioan Raicu. Supporting Large Scale Data-Intensive Computing with the FusionFS Distributed File System, Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013

  16. Ioan Raicu. “Distributed Storage Systems for Extreme-Scale Data-Intensive Computing”, NSF CyberBridges Workshop, 2013 (poster)

  17. Tonglin Li, Xiaobing Zhou, Kevin Brandstatter, Ioan Raicu. "Distributed Kev-Value Store on HPC and Cloud Systems", 2nd Greater Chicago Area System Research Workshop (GCASR), 2013 (poster)

  18. Dongfang Zhao, Chen Shou, Zhao Zhang, Iman Sadooghi, Xiaobing Zhou, Tonglin Li, Ioan Raicu. "FusionFS: a distributed file system for large scale data-intensive computing", 2nd Greater Chicago Area System Research Workshop (GCASR), 2013 (poster)

  19. Kevin Brandstatter, Tonglin Li, Xiaobing Zhou, Ioan Raicu. "NoVoHT: a Lightweight Dynamic Persistent NoSQL Key/Value Store", 2nd Greater Chicago Area System Research Workshop (GCASR), 2013 (poster)

  20. Chen Shou, Dongfang Zhao, Tanu Malik, Ioan Raicu. "Towards a Provenance-aware Distributed Filesystem", 2nd Greater Chicago Area System Research Workshop (GCASR), 2013 (poster)

  21. Corentin Debains, Pedro Alvarez-Tabio, Dongfang Zhao, Kent Burlingame, Ioan Raicu. "IStore: Towards High Efficiency, Performance, and Reliability in Distributed Data Storage with Information Dispersal Algorithms", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2013 

  22. Kevin Brandstatter, Ioan Raicu. “CiteSearcher: A Google Scholar frontend for Mobile Devices", Illinois Institute of Technology Research Day, 2012 (Poster)

  23. Ioan Raicu. “Building Blocks for Scalable Distributed Storage Systems”, NSF CyberBridges Workshop, 2012 (poster)

  24. Tonglin Li, Antonio Perez De Tejada, Kevin Brandstatter, Zhao Zhang, Ioan Raicu. “ZHT: a Zero-hop DHT for High-End Computing Environment”, 1st Greater Chicago Area System Research Workshop, 2012 (poster)

  25. Corentin Debains, Pedro Manuel Alvarez-tabio Togores, Ioan Raicu. “Evaluating Information Dispersal Algorithms”, 1st Greater Chicago Area System Research Workshop, 2012 (poster)

  26. Dongfang Zhao, Ioan Raicu. “HyCache: A Hybrid User-Level File System with SSD Caching”, 1st Greater Chicago Area System Research Workshop, 2012 (poster)

  27. Da Zhang, Ioan Raicu. "SimHEC: Simulator for High-End Computing Systems”, 1st Greater Chicago Area System Research Workshop, 2012 (poster)

  28. Dongfang Zhao, Ioan Raicu. "HyCache: A Hybrid User-Level File System with SSD Caching", Illinois Institute of Technology, Department of Computer Science, PhD Oral Qualifier, 2012

  29. Tonglin Li, Ioan Raicu. "ZHT: a Zero-hop DHT for High-End Computing Environment", Illinois Institute of Technology, Department of Computer Science, PhD Oral Qualifier, 2012

  30. Jesús Hernández Martin, Ioan Raicu. "Performance evaluation of AWS: Exploring storage alternatives in Amazon Web Services", Illinois Institute of Technology, Department of Computer Science, Technical Report, 2012