There is an ever-increasing demand in science and engineering, and arguably all areas of research, on the creation, analysis, archival and sharing of extremely large data sets - often referred to as “huge data”. For example, the blackhole image comes from 5 petabytes of data collected by the Event Horizon Telescope over a period of 7 days. Scientific instruments such as confocal and multiphoton microscopes generate huge images in the order of 10 GB per image and the total size can grow quickly when the number of images generated increases. The Large Hadron Collider generates 2000 petabytes of data over a typical 12 hour run. These data sets reside at the high end of the “big data” spectrum and can include data sets that are continuously growing without bounds. They are often collected from distributed devices (e.g., sensors), potentially processed on-site or at distributed clouds, and can be intentionally placed/duplicated in distributed sites for reliability, scalability and/or availability reasons. Data creation resulting from measurement, generation, and transformation over distributed locations is stressing the contemporary computing paradigm. Efficient processing, persistent availability and timely delivery (especially over wide-area) of huge data have become critically important to the success of scientific research.
While distributed systems and networking research has well explored the fundamental challenges and solution space for a broad spectrum of distributed computing models operating on large data sets, the sheer size of the data in question today has well surpassed that assumed in prior research. To-date, the majority of computing systems and applications operate based on clear delineation of data movement and data computing. Data is moved from one or more data stores to a computing system, and then it is computed “locally” on that system. This paradigm consumes significant storage capacity at each computing system to hold the transferred data and data generated by the computation, as well as significant time for data transfer before and after the computation. Looking forward, researchers have begun to discuss the potential benefits of a completely new computing paradigm that more efficiently supports “in situ” computation of extremely large data at unprecedented scales across distributed computing systems interconnected by high speed networks, with high performance data transfer functions more closely integrated in software (e.g., operating systems) and hardware infrastructure than have been so far. Such a new paradigm has the potential to avoid bottlenecks for scientific discoveries and engineering innovations through much faster, efficient, and scalable computation across a globally distributed, highly interconnected and vast collection of data and computation infrastructure.
This workshop intends to bring together domain scientists, network and systems researchers, and infrastructure providers, to understand the challenges and requirements of “huge-data” sciences and engineering research needs and explore new paradigms to address the problems associated with processing, storing, and transferring huge data. Topics of interest include, but are not limited to:
Individuals interested in attending should submit a 1-2 page white paper that addresses a problem related to huge data transfer and processing. White papers should be submitted as PDF attachments by email to hugedata@netlab.uky.edu no later than February 15, 2020.
A limited number of travel grants are available for authors of accepted white papers to support attendance at the workshop. Registration and travel grant application information can be found by following "Registration/Travel Grant" tab on the top of this page. The deadline is March 1, 2020.
Deadline for submission of white papers: | February 15, 2020 |
Acceptance notification: | February 25, 2020 |
Registration and travel grants application: | March 1, 2020 |
Notification of travel grant approval: | March 7, 2020 |
Workshop dates | April 13-14, 2020 |