Apply Now | Highlights | Pictures | Mentors | Publications | Alumni
BigDataX: From theory to practice in Big Data computing at eXtreme scales
This award aims to establish a Research Experiences for Undergraduates (REU) site named BigDataX, which will focus on undergraduate research in both theory and practice of big data computing at extreme scales. The primary objective of this award is to promote a data-centric view of scientific and technical computing, at the intersection of distributed systems theory and practice. This award aims has four mentors at the Illinois Institute of Technology and University of Chicago, with a variety of complementing expertise from theory to programming languages to distributed systems. This work includes a comprehensive educational plan integrating ten undergraduate students with senior PhD students with incremental manageable goals, aimed at allowing undergraduate students to achieve publishable results within the ten week summer program. The group maintains a public social network presence through a LinkedIn group at https://www.linkedin.com/groups/8301753. For REU BigDataX program highlights from prior years, see the BigDataX Overview. You will find the official 2024 announcement online.
Apply to the 2024 Summer REU BigData Program online (deadline March 4th, 2024).
The BigDataX program will host 10 students spending 10 weeks each summer doing research in one of two interdisciplinary laboratories: 1) the DataSys Lab in CS at IIT (5 students) or 2) the Systems Group in the CI/CS at UChicago (5 students). The two labs physically close (4 miles apart), and students will have weekly activities that place all students in a single lab to strengthen the cohort experience. The PIs will aim to recruit most of the students from outside of the host institutions, focusing on recruiting students from institutions without research opportunities as well as women and minorities.
Each 10-week program will begin in the 3rd or 4th week of May aiming to end the program by the end of July or early August. Students will be encouraged to work in teams of two to help reinforce the positive aspects of teamwork, and each team will be assigned both a graduate student and faculty mentor. The BigDataX program will emphasize collaborative research teams in which graduate students are formally involved in the mentoring activities of the REU students. Each REU student will be involved in conducting research as part of a larger group, presenting research results to the entire BigDataX group, and as well as other audiences such as the DataSys lab and Globus Labs. The students will learn to both read research papers, and to write up their research work and results in the format of a conference/journal article. The significant collaboration between the mentors over the past eight years coupled with advanced graduate students working with the mentors at both institutions, will enable REU students to become productive faster on more complex research problems than is typically possible in such a short timeframe. The mentors will work with the students during the summer, and continue working on refining the research output towards publishing the students’ summer work. The mentors will also help students prepare for presentations, and help them with the application process to graduate schools. Students will also be encouraged to continue their work at their respective institutions as part of an undergraduate thesis.
Below is Jamison Kerney (REU 2023 students) who won 3rd place in the ACM Undergraduate Student Research Competition at IEEE/ACM Supercomputing/SC 2023 conference for their work "Supercharging Scientific Serverless: Slashing Cold Starts with Python UniKernels".
Below are 6 REU students along with 5 mentors attending the IEEE/ACM Supercomputing/SC 2023 conference to present their summer research.
Below is Kathryn Leung (REU 2019 students) who won 2nd place in the ACM Undergraduate Student Research Competition at IEEE/ACM Supercomputing/SC 2019 conference for their work "Walking the cost-accuracy tightrope: balancing trade-offs in data-intensive genomics".
Below is Luann Jung & Brendan Whitaker (REU 2018 students) won 1st place in the ACM Undergraduate Student Research Competition at IEEE/ACM Supercomputing/SC 2018 conference. See the news article for more information.
Below are REU and DataSys students at the IEEE/ACM Supercomputing/SC 2018 conference in Dallas Texas in November 2018.
Below is BigDataX REU 2017 Site along with the MEDIX REU 2017 Site at the summer picnic in July 2017.
Below is Blue Keleher and Emily Herron (REU 2017 students) won 2nd and 3rd place respectively in the ACM Undergraduate Student Research Competition at IEEE/ACM SC 2017 conference.
Below is BigDataX REU 2016 Site along with the MEDIX REU 2016 Site visiting Argonne National Laboratory in June 2016.
Below are REU and DataSys students at the IEEE/ACM Supercomputing/SC 2016 conference in Salt Lake City Utah in November 2016.
Below is William Agnew (REU 2016 student) receiving the ACM Undergraduate Student Research Award at the IEEE/ACM SC 2016 conference.
Below is BigDataX REU 2015 Site along with the MEDIX REU 2015 Site visiting Argonne National Laboratory in July 2015.
Award: $404K, 07/2022 - 07/2025:
Award: $370K, 03/2018 - 02/2021:
Award: $288K, 03/2015 - 02/2018
Funding Period: 2015 - 2025
- Summer 2024 (05-28-2024 to 08-02-2024)
- Final Reports from 2024 projects
- Publications from 2024 summer
- Pictures from 2024 summer program
- Call for Applications
- Summer 2023 (05-30-2023 to 08-04-2023)
- Final Reports from 2023 projects
- Publications from 2023 summer
- Pictures from 2023 summer program
- Call for Applications
- Virtual Summer 2021 (05-24-2021 to 07-30-2021)
- Final Reports from 2021 projects
- Publications from 2021 summer
- Pictures from 2021 summer program
- Call for Applications
- Summer 2020
- Summer 2019 (05-28-2019 to 08-02-2019)
- Summer 2018 (05-29-2018 to 08-03-2018)
- Summer 2017 (05-22-2017 to 07-28-2017)
- Final Reports from 2017 projects
- Publications from 2017 summer
- Pictures from 2017 summer program
- Summer 2016 (05-23-2016 to 07-29-2016)
- Final Reports from 2016 projects
- Publications from 2016 summer
- Pictures from 2016 summer program
- Summer 2015 (06-15-2015 to 08-21-2015)
- Final Reports from 2015 projects
- Publications from 2015 summer
- Pictures from 2015 summer program
Institutions:
IIT is a private Ph.D.-granting research university. Computer science at IIT goes back to 1959, when computers were a part of a physical chemistry course. The department was founded in 1971. Graduates of the department and IIT have created such things as the SPMD — single program, multiple data — model for parallel execution of applications on multiprocessors (Frederica Darema, M.S. PHYS ’72); Linksys networks (Victor Tsao, M.S. CS ’80); key technologies supporting Twitter (Abdur Chowdhury, Ph.D. CS ’01); Intel Pentium microprocessor architecture (Rajeev Chandrasekhar, M.S. CS ’88); and much more. The site will be hosted in the DataSys Laboratory in which PI Raicu is the director. The DataSys lab conducts research in various areas of distributed systems with an emphasis on designing, implementing, and evaluating systems, protocols, and middleware. The lab's mission is to investigate challenging, high-impact research projects to support data-intensive distributed computing on a variety of systems, from many-core systems, clusters, grids, clouds, and supercomputers. The lab has been recognized as a CUDA Teaching/Research Center, and is at the center of a new NSF-funded testbed Mystic. The lab has students ranging from high school level to undergraduate, and graduate students.
UChicago is a private research university that is affiliated with 89 Nobel Laureates, 49 Rhodes Scholars and 9 Fields Medalists. The Computation Institute (CI), in which two of the UChicago mentors are members, was established in 2000 as a joint initiative between UChicago and Argonne to advance science through innovative computational approaches. The CI is home to over 100 researchers and staff that have active collaborations with over 50 prestigious academic and research institutions across the globe. Current research is targeted at solving complex system-level problems in many disciplines such as bioinformatics, biomedicine, materials science, chemistry, sociology, etc. The UChicago mentors are members of Globus Labs at the UChicago, which is a research group led by Prof. Ian Foster and Dr. Kyle Chard that spans the Computation Institute, Department of Computer Science, and Math and Computer Science Division at the University of Chicago and Argonne National Laboratory. The lab’s modest goal is to realize a world in which all research data are reliably, rapidly, and securely accessible, discoverable, and usable. To this end, the lab works on a broad range of research problems in data-intensive computing and research data management. This work is made possible by support from the National Science Foundation, National Institutes of Health, Department of Energy, National Institute of Standards and Technology, and other sources, and in addition to computer science, engages fields as diverse as materials science, biology, archaeology, climate policy, and social sciences. The lab works closely with the team developing the Globus research data management platform. Globus Labs has 11 members, including 1 faculty member, 3 research staff members, 2 postdoctoral fellows, 6 PhD students, and several undergraduate students.
Mentors (bigdatax-group@iit.edu):
- Ioan Raicu (IIT)
- Gruia Calinescu (IIT)
- Kyle Hale (IIT)
- Aaron J. Elmore (UChicago)
- Justin Wozniak (UChicago)
- Kyle Chard (UChicago)
Dr. Ioan Raicu is an associate professor in CS at Illinois Institute of Technology (IIT), as well as a guest research faculty in MCS at Argonne National Laboratory (ANL). He received his PhD in 2009 from University of Chicago (UChicago) under the guidance of Ian Foster. His research interests are in distributed systems, with particular interests in resource management in large scale systems. He has co-authored over 100 peer reviewed articles, which received over 8618 citations, with a H-index of 39. His work has been funded by NASA, DOE, and NSF. He was the PI for the past BigDataX program from 2015 to 2021, and has mentored 12 REU students.
Dr. Kyle Hale is an assistant professor in CS at Illinois Institute of Technology (IIT). He has extensive experience in OS research and development, particularly across layers and at the hardware/software boundary. He has developed a custom specialized OS kernel framework called Nautilus intended to support high-performance parallel applications and runtime systems. He also has experience in NoC architectures for massively multicore machines. He will focus on low-level experimentation at the NoC level and will investigate new OS abstractions and mechanisms for leveraging the NoC fabric and for bridging inter- and intra-chip communication. He was a mentor in 2017 in the REU BigDataX program, and has been the co-PI for the IIT site from 2018 to 2021. He has mentored 8 students since 2017.
Dr. Kyle Chard is a Senior Researcher and Fellow in the Computation Institute (CI) at the UChicago and ANL. His research focuses on applying computational and data-intensive approaches to solve scientific problems. He is particularly interested in the application of autonomic computing and cost-aware provisioning to make use of on-demand infrastructure; service oriented science, as part of the Globus project; and information extraction and data analytics. He received his Ph.D. from Victoria University of Wellington in 2011 under the supervision of Dr. Kris Bubendorfer and Prof. Peter Komisarczuk. He was a REU BigDataX mentor in 2016 and 2017, and has been the PI for the UChicago site from 2018 to 2021. He mentored 10 students since 2016.
Dr. Aaron J. Elmore is an assistant professor in CS at UChicago. Aaron was previously a Postdoctoral Associate at MIT working with Mike Stonebraker on elastic database systems, and Sam Madden on the DataHub project. Aaron's thesis on Elasticity Primitives for Database-as-a-Service was completed at the University of California, Santa Barbara under Divy Agrawal and Amr El Abbadi. Prior to receiving a PhD, Aaron received his MS from UChicago and spent several years in industry. At MIT Aaron mentored five undergraduates, and at the University of Chicago he has mentored two undergraduate students. He has been a mentor since 2018, and has worked with 6 REU students.
Dr. Gruia Calinescu is an associate professor in CS at IIT. He received his PhD in 1998 from Georgia Institute of Technology, has been a postdoctoral fellow at DIMACS, and had visiting positions at University of Waterloo (Canada), Max-Plank-Institut fur Informatik (Germany) and University of Wisconsin - Milwaukee. His research work is in the areas of algorithms and networking. He has published 38 journal papers and 49 conference papers, and five book chapters, and one patent, with an H-index of 20 according to Google Scholar. His work has been funded by NSF. He was the co-PI of the REU BigDataX program from 2015 to 2017, and mentored 6 students in that timeframe.
Dr. Justin M. Wozniak is a Computer Scientist at ANL and a research staff member at the CI at UChicago. He received his Ph.D. in 2008 from the University of Notre Dame. His research focuses on HPC, language and runtime development, data management and storage, and scientific applications; he is the lead developer of the Swift parallel programming system that is used on HPC systems such as the Cray XE6 and IBM Blue Gene/Q. He has guided the research work of many postdocs, graduate students, and undergraduates, and was a Google Summer of Code mentor. He was a REU BigDataX mentor all three years from 2015 to 2017, and mentored 8 REU students.
Evaluator:
- Michael Saelee (IIT) [2018-2021]
- Matthew Bauer (IIT) [2015-2017]
The publications below have been co-authored by REU students (highlighted in bold) at workshops, conferences, and journals.
- Caleb Lehman, Poornima Nookala, Ioan Raicu. " Scalable Load-Balancing Concurrent Queues on Many-Core Architectures", IEEE/ACM Supercomputing 2019 (poster)
- Kathryn Leung, Meghan Kimball, Jason Pitt, Anna Woodard, Kyle Chard. " Walking the cost-accuracy tightrope: balancing trade-offs in data-intensive genomics", IEEE/ACM Supercomputing 2019 (poster)
- Luann Jung, Brendan Whitaker, Kyle Chard, Aaron J. Elmore. "Measuring Swampiness: Quantifying Chaos in Large Heterogeneous Data Repositories", IEEE/ACM Supercomputing 2018 (poster, appendix)
- Samuel Grayson, Kyle Hale. "NautDB: Towards a Hybrid Runtime for Processing Compiled Queries", IEEE/ACM Supercomputing 2018 (poster, appendix)
- Anna Blue Keleher, Kyle Chard, Ian Foster, Alexandru Iulian Orhean, Ioan Raicu. “Finding a Needle in a Field of Haystacks: Metadata Search for Distributed Research Repositories”, IEEE/ACM Supercomputing/SC 2017 (poster)
- Andrew Y. Choliy, M.D. Whitmore, Gruia Calinescu. “Multi-Size Optional Offline Caching Algorithms”, IEEE/ACM Supercomputing/SC 2017 (poster)
- E. Herron, T.J. Skluzacek, I. Foster, and K. Chard. “Applying Image Feature Extraction to Cluttered Scientific Repositories”, IEEE/ACM Supercomputing/SC 2017 (poster)
- M. Baughman, J. Wozniak. “CANDLE/Supervisor: A Workflow Framework for Machine Learning Applied to Cancer Research”, Workshop on Computational Approaches for Cancer 2017, IEEE/ACM Supercomputing/SC 2017
- Prajakt Shastry, Daniel Parker, Sanjiv Kapoor, Ioan Raicu. “Exploring Randomized Multipath Routing on Multi-Dimensional Torus Networks”, IEEE/ACM SuperComputing/SC 2016 (Poster)
- Jennifer A. Steffens, Justin Wozniak. “Parallel Provenance Databases for High Performance Workflows”, IEEE/ACM SuperComputing/SC 2016 (Poster)
- William Agnew, Michael Fischer, Kyle Chard and Ian Foster. “Touring Dataland? Automated Recommendations for the Big Data Traveler”, IEEE/ACM SuperComputing/SC 2016 (Poster)
- William Agnew, Michael Fischer, Kyle Chard and Ian Foster. “An Ensemble-based Recommendation Engine for Scientific Data Transfers”, IEEE/ACM DataCloud 2016
- Ian Albuquerque Raymundo Da Silva, Gruia Calinescu and Nathan De Graaf. “Faster Compression of Patterns to Rectangle Rule Lists”, AAIM 2018
- Ben Walters, Alex Ballmer, Andrei Dumitru, Adnan Haider, Serapheim Dimitropoulos, Ariel Young, William Scullin, Ben Allen, Ioan Raicu. 15 TFlops Haswell vs. 60 TFlops Knight Landing for HPC Scientific Computing Applications. Student Cluster Competition (SCC), IEEE/ACM Supercomputing/SC 2015 (Poster)
- Samuel Baugh, Gruia Calinescu, David Rincon-Cruz, Kan Qiao: Improved Algorithms for Two Energy-Optimal Routing Problems in Ad-Hoc Wireless Networks. BDCloud-SocialCom-SustainCom 2016: 509-516
- Kevin Brandstatter, Ben Walters, Alexander Ballmer, Adnan Haider, Andrei Dumitru, Serapheim Dimitropoulos, Ariel Young, William Scullin, Ben Allen, Ioan Raicu. "Experiences in Optimizing Cluster Performance For Scientific Applications: Controlling Configuration, Utilization, and Power Consumption", GCASR 2015 .
- Prajakt Shastry, Daniel Parker, Sanjiv Kapoor, Ioan Raicu. “Exploring Randomized Multipath Routing on Multi-Dimensional Torus Networks”, IEEE/ACM SuperComputing/SC 2016 (Poster)
- Daniel Parker, Sanjiv Kapoor, Ioan Raicu. "Towards the Exploration of Dynamic Multipath Routing in 3D Torus Networks through the CODES/ROSS Simulation Framework", GCASR 2016
- Itua Ijagbone, Shivakumar Vinayagam, David Pisanski, Kevin Brandstatter, Dongfang Zhao, Ioan Raicu. "Towards Scalable Searching of Distributed File Systems", GCASR 2016
- Mermer Dupree, Mike Wilde, Justin Wozniak, Ioan Raicu. "Optimizing Data Locality with Swift/T and FusionFS", GCASR 2016
2024
- TBA
2023
- William Fowler, Tufts University
- Jamison Kerney, Illinois Institute of Technology
- Marelle Leon, Illinois Institute of Technology
- Sean Dudo, University of Texas, Austin
- Nazanin Mahmoudi, Wayne State University
- Adhishree Kathikar, Indiana University, Bloomington
- Aahad Abubaker, DePaul University
- Kyle Zheng, Modesto Junior College
- Jordan Wels, Elon University
- Yilin Yu, New York University
2021
- Shreya Bhatta, University of Texas at Arlington
- Matthew Chen, University of Illinois at Urbana Champaign
- Kevin Gao, University of California, Berkeley
- Anthony Garcia, Indiana University Northwest – Gary
- Aadiba Haque, New York University
- Frederick Huang, University of Maryland - College Park
- Akhil Kodumuri, University of Illinois Urbana-Champaign
- Jamie Loring, North Carolina State University, Raleigh
- Alexandra Suarez, Southern Illinois University
- Fahim Tran, Georgia State University
2019
- Caleb Lehman, Ohio State University
- Nalin Ranjan, Princeton University, Princeton, New Jersey
- Leopold Ringmayr, Occidental College, Los Angeles, California
- Meghan Kimball, DePaul University
- Kathryn Leung, Princeton University, Princeton, New Jersey
- Anna Kong, Illinois Institute of Technology, Chicago, Illinois
- Harry Gollakota, DePaul University
- Hussain Khajanchi, The College of New Jersey , Ewing, New Jersey
- Justin Goodman, University of Maryland, College Park, MD
- Siddharth Kumar, University of Texas at Dallas, Richardson TX
2018
- Ehson Umrani, Indiana University Northwest, Gary, Indiana
- Monika Worwa, DePauw University, Greencastle, Indiana
- Luann Jung, Massachusetts Institute of Technology, Cambridge, MA
- McKade Umbenhower, University of Wyoming, Laramie, WY
- Samuel Grayson, The University of Texas at Dallas, Texas
- Brendan Whitaker, Ohio State University, Columbus, Ohio
- Patrick Gardner, Washington University in St. Louis, St. Louis, MO
- Zhenye Lin, Williams College
- Jagruti Depan, University of Louisiana at Lafayette, Lafayette, Louisiana
- Destiny Dong, University of North Texas, Denton, Texas
2017
- Andrew Choliy, Rutgers University, New Brunswick, NJ
- Josue Rodriguez Nieves, Inter American University of Puerto Rico, Puerto Rico Bayamon
- Jonathon Anderson, Concordia University, Nebraska, Seward, NE
- Emily Herron, Mercer University, Macon, GA
- Hannah Wernher, University of Redlands
- Max Whitmore, Brandeis University
- Anna Blue Keleher, University of Maryland, College Park, Maryland
- Matthew Baughman, The Minerva Schools at KGI, San Francisco, CA
2016
- Jennifer Steffens, Drake University, Des Moines, Iowa
- Diptodip Deb, Georgia Institute of Technology, Atlanta, GA
- Adelina Voukadinova, University of Illinois at Chicago, Chicago, Illinois
- Michael Collins, Rutgers, The State University of New Jersey, New Brunswick, NJ
- Nathan De Graaf, Iowa State University, Ames, Iowa
- Jonathan Wu, Washington University in St. Louis
- Michael Fischer, University of Wisconsin Parkside, Kenosha, Wisconsin
- William Agnew, Georgia Institute of Technology, Atlanta, Georgia
2015
- Sam Baugh, University of Chicago, Chicago, Illinois
- Jacob Taylor, Wayne State University, Detroit, Michigan
- Ariel Young, Illinois Institute of Technology, Chicago, Illinois
- Basheer Subei, University of Illinois at Chicago, Chicago, Illinois
- David Pisanski, University of Illinois at Chicago, Chicago, Illinois
- Jonathan Burge, Whitworth University, Spokane, WA
- David Rincon-Cruz, Knox College, Galesburg, IL
- Mermer Dupree, University of Illinois at Chicago, Chicago, Illinois
- Daniel Parker, The University of Chicago, Chicago, Illinois