DataSys: Data-Intensive Distributed Systems LaboratoryData-Intensive Distributed Systems Laboratory

Illinois Institute of Technology
Department of Computer Science

Apply Now | Highlights | Pictures | Mentors | Publications | Alumni

 

BigDataX: From theory to practice in Big Data computing at eXtreme scales

This award aims to establish a Research Experiences for Undergraduates (REU) site named BigDataX, which will focus on undergraduate research in both theory and practice of big data computing at extreme scales. The primary objective of this award is to promote a data-centric view of scientific and technical computing, at the intersection of distributed systems theory and practice. This award aims has four mentors at the Illinois Institute of Technology and University of Chicago, with a variety of complementing expertise from theory to programming languages to distributed systems. This work includes a comprehensive educational plan integrating ten undergraduate students with senior PhD students with incremental manageable goals, aimed at allowing undergraduate students to achieve publishable results within the ten week summer program. The group maintains a public social network presence through a LinkedIn group at https://www.linkedin.com/groups/8301753. For REU BigDataX program highlights from prior years, see the BigDataX Overview. You will find the official 2024 announcement online.

Apply to the 2024 Summer REU BigData Program online (deadline March 4th, 2024).

BigDataX Program Highlights

The BigDataX program will host 10 students spending 10 weeks each summer doing research in one of two interdisciplinary laboratories: 1) the DataSys Lab in CS at IIT (5 students) or 2) the Systems Group in the CI/CS at UChicago (5 students). The two labs physically close (4 miles apart), and students will have weekly activities that place all students in a single lab to strengthen the cohort experience. The PIs will aim to recruit most of the students from outside of the host institutions, focusing on recruiting students from institutions without research opportunities as well as women and minorities.

Each 10-week program will begin in the 3rd or 4th week of May aiming to end the program by the end of July or early August. Students will be encouraged to work in teams of two to help reinforce the positive aspects of teamwork, and each team will be assigned both a graduate student and faculty mentor. The BigDataX program will emphasize collaborative research teams in which graduate students are formally involved in the mentoring activities of the REU students. Each REU student will be involved in conducting research as part of a larger group, presenting research results to the entire BigDataX group, and as well as other audiences such as the DataSys lab and Globus Labs. The students will learn to both read research papers, and to write up their research work and results in the format of a conference/journal article. The significant collaboration between the mentors over the past eight years coupled with advanced graduate students working with the mentors at both institutions, will enable REU students to become productive faster on more complex research problems than is typically possible in such a short timeframe. The mentors will work with the students during the summer, and continue working on refining the research output towards publishing the students’ summer work. The mentors will also help students prepare for presentations, and help them with the application process to graduate schools. Students will also be encouraged to continue their work at their respective institutions as part of an undergraduate thesis.

Pictures over the years

Below is Jamison Kerney (REU 2023 students) who won 3rd place in the ACM Undergraduate Student Research Competition at IEEE/ACM Supercomputing/SC 2023 conference for their work "Supercharging Scientific Serverless: Slashing Cold Starts with Python UniKernels".

SC23 awards

Below are 6 REU students along with 5 mentors attending the IEEE/ACM Supercomputing/SC 2023 conference to present their summer research.

BigDataX @ SC23

Below is Kathryn Leung (REU 2019 students) who won 2nd place in the ACM Undergraduate Student Research Competition at IEEE/ACM Supercomputing/SC 2019 conference for their work "Walking the cost-accuracy tightrope: balancing trade-offs in data-intensive genomics".

SC19 awards

Below is Luann Jung & Brendan Whitaker (REU 2018 students) won 1st place in the ACM Undergraduate Student Research Competition at IEEE/ACM Supercomputing/SC 2018 conference. See the news article for more information.

SC18 awards

Below are REU and DataSys students at the IEEE/ACM Supercomputing/SC 2018 conference in Dallas Texas in November 2018.

REU BigDataX 2018 

Below is BigDataX REU 2017 Site along with the MEDIX REU 2017 Site at the summer picnic in July 2017.

REU BigDataX 2017 

Below is Blue Keleher and Emily Herron (REU 2017 students) won 2nd and 3rd place respectively in the ACM Undergraduate Student Research Competition at IEEE/ACM SC 2017 conference.

SC17 awards

Below is BigDataX REU 2016 Site along with the MEDIX REU 2016 Site visiting Argonne National Laboratory in June 2016.

REU BigDataX 2016 

Below are REU and DataSys students at the IEEE/ACM Supercomputing/SC 2016 conference in Salt Lake City Utah in November 2016.

SC16 students

Below is William Agnew (REU 2016 student) receiving the ACM Undergraduate Student Research Award at the IEEE/ACM SC 2016 conference.

SC16 awards

Below is BigDataX REU 2015 Site along with the MEDIX REU 2015 Site visiting Argonne National Laboratory in July 2015. 

BigDataX REU 2015 Group Visiting Argonne National Laboratory

 

Award: $404K, 07/2022 - 07/2025:

  • NSF award 2150500
  • NSF award 2150501
  • Award: $370K, 03/2018 - 02/2021:

  • NSF award 1757964
  • NSF award 1757970
  • Award: $288K, 03/2015 - 02/2018

  • NSF award 1461260
  • Funding Period: 2015 - 2025

    Institutions:

    IIT is a private Ph.D.-granting research university. Computer science at IIT goes back to 1959, when computers were a part of a physical chemistry course. The department was founded in 1971. Graduates of the department and IIT have created such things as the SPMD — single program, multiple data — model for parallel execution of applications on multiprocessors (Frederica Darema, M.S. PHYS ’72); Linksys networks (Victor Tsao, M.S. CS ’80); key technologies supporting Twitter (Abdur Chowdhury, Ph.D. CS ’01); Intel Pentium microprocessor architecture (Rajeev Chandrasekhar, M.S. CS ’88); and much more. The site will be hosted in the DataSys Laboratory in which PI Raicu is the director. The DataSys lab conducts research in various areas of distributed systems with an emphasis on designing, implementing, and evaluating systems, protocols, and middleware. The lab's mission is to investigate challenging, high-impact research projects to support data-intensive distributed computing on a variety of systems, from many-core systems, clusters, grids, clouds, and supercomputers. The lab has been recognized as a CUDA Teaching/Research Center, and is at the center of a new NSF-funded testbed Mystic. The lab has students ranging from high school level to undergraduate, and graduate students.

    UChicago is a private research university that is affiliated with 89 Nobel Laureates, 49 Rhodes Scholars and 9 Fields Medalists. The Computation Institute (CI), in which two of the UChicago mentors are members, was established in 2000 as a joint initiative between UChicago and Argonne to advance science through innovative computational approaches. The CI is home to over 100 researchers and staff that have active collaborations with over 50 prestigious academic and research institutions across the globe. Current research is targeted at solving complex system-level problems in many disciplines such as bioinformatics, biomedicine, materials science, chemistry, sociology, etc. The UChicago mentors are members of Globus Labs at the UChicago, which is a research group led by Prof. Ian Foster and Dr. Kyle Chard that spans the Computation Institute, Department of Computer Science, and Math and Computer Science Division at the University of Chicago and Argonne National Laboratory. The lab’s modest goal is to realize a world in which all research data are reliably, rapidly, and securely accessible, discoverable, and usable. To this end, the lab works on a broad range of research problems in data-intensive computing and research data management. This work is made possible by support from the National Science Foundation, National Institutes of Health, Department of Energy, National Institute of Standards and Technology, and other sources, and in addition to computer science, engages fields as diverse as materials science, biology, archaeology, climate policy, and social sciences. The lab works closely with the team developing the Globus research data management platform. Globus Labs has 11 members, including 1 faculty member, 3 research staff members, 2 postdoctoral fellows, 6 PhD students, and several undergraduate students.

    Mentors (bigdatax-group@iit.edu):

    Dr. Ioan Raicu is an associate professor in CS at Illinois Institute of Technology (IIT), as well as a guest research faculty in MCS at Argonne National Laboratory (ANL). He received his PhD in 2009 from University of Chicago (UChicago) under the guidance of Ian Foster. His research interests are in distributed systems, with particular interests in resource management in large scale systems. He has co-authored over 100 peer reviewed articles, which received over 8618 citations, with a H-index of 39. His work has been funded by NASA, DOE, and NSF. He was the PI for the past BigDataX program from 2015 to 2021, and has mentored 12 REU students.

    Dr. Kyle Hale is an assistant professor in CS at Illinois Institute of Technology (IIT). He has extensive experience in OS research and development, particularly across layers and at the hardware/software boundary. He has developed a custom specialized OS kernel framework called Nautilus intended to support high-performance parallel applications and runtime systems. He also has experience in NoC architectures for massively multicore machines. He will focus on low-level experimentation at the NoC level and will investigate new OS abstractions and mechanisms for leveraging the NoC fabric and for bridging inter- and intra-chip communication. He was a mentor in 2017 in the REU BigDataX program, and has been the co-PI for the IIT site from 2018 to 2021. He has mentored 8 students since 2017.

    Dr. Kyle Chard is a Senior Researcher and Fellow in the Computation Institute (CI) at the UChicago and ANL. His research focuses on applying computational and data-intensive approaches to solve scientific problems. He is particularly interested in the application of autonomic computing and cost-aware provisioning to make use of on-demand infrastructure; service oriented science, as part of the Globus project; and information extraction and data analytics. He received his Ph.D. from Victoria University of Wellington in 2011 under the supervision of Dr. Kris Bubendorfer and Prof. Peter Komisarczuk. He was a REU BigDataX mentor in 2016 and 2017, and has been the PI for the UChicago site from 2018 to 2021. He mentored 10 students since 2016.

    Dr. Aaron J. Elmore is an assistant professor in CS at UChicago. Aaron was previously a Postdoctoral Associate at MIT working with Mike Stonebraker on elastic database systems, and Sam Madden on the DataHub project. Aaron's thesis on Elasticity Primitives for Database-as-a-Service was completed at the University of California, Santa Barbara under Divy Agrawal and Amr El Abbadi. Prior to receiving a PhD, Aaron received his MS from UChicago and spent several years in industry. At MIT Aaron mentored five undergraduates, and at the University of Chicago he has mentored two undergraduate students. He has been a mentor since 2018, and has worked with 6 REU students.

    Dr. Gruia Calinescu is an associate professor in CS at IIT. He received his PhD in 1998 from Georgia Institute of Technology, has been a postdoctoral fellow at DIMACS, and had visiting positions at University of Waterloo (Canada), Max-Plank-Institut fur Informatik (Germany) and University of Wisconsin - Milwaukee. His research work is in the areas of algorithms and networking. He has published 38 journal papers and 49 conference papers, and five book chapters, and one patent, with an H-index of 20 according to Google Scholar. His work has been funded by NSF. He was the co-PI of the REU BigDataX program from 2015 to 2017, and mentored 6 students in that timeframe.

    Dr. Justin M. Wozniak is a Computer Scientist at ANL and a research staff member at the CI at UChicago. He received his Ph.D. in 2008 from the University of Notre Dame. His research focuses on HPC, language and runtime development, data management and storage, and scientific applications; he is the lead developer of the Swift parallel programming system that is used on HPC systems such as the Cray XE6 and IBM Blue Gene/Q. He has guided the research work of many postdocs, graduate students, and undergraduates, and was a Google Summer of Code mentor. He was a REU BigDataX mentor all three years from 2015 to 2017, and mentored 8 REU students.

    Evaluator:

    Publications

    The publications below have been co-authored by REU students (highlighted in bold) at workshops, conferences, and journals.

    Current Students

    2024

    Alumni

    2023

    2021

    2019

    2018

    2017

    2016

    2015