Huge Data Workshop 2020

Large Scale Networking (LSN) Workshop on Huge Data:
A Computing, Networking and Distributed Systems Perspective

Sponsored by the National Science Foundation (NSF)

Chicago, IL, April 13 -- 14, 2020

We have decided to change the workshop to a virtual meeting via zoom.
Online workshop program is now available. Please register for the online workshop by following the Registration link above.

co-located with FABRIC Community Visioning Workshop

There is an ever-increasing demand in science and engineering, and arguably all areas of research, on the creation, analysis, archival and sharing of extremely large data sets - often referred to as “huge data”. For example, the blackhole image comes from 5 petabytes of data collected by the Event Horizon Telescope over a period of 7 days. Scientific instruments such as confocal and multiphoton microscopes generate huge images in the order of 10 GB per image and the total size can grow quickly when the number of images generated increases. The Large Hadron Collider generates 2000 petabytes of data over a typical 12 hour run. These data sets reside at the high end of the “big data” spectrum and can include data sets that are continuously growing without bounds. They are often collected from distributed devices (e.g., sensors), potentially processed on-site or at distributed clouds, and can be intentionally placed/duplicated in distributed sites for reliability, scalability and/or availability reasons. Data creation resulting from measurement, generation, and transformation over distributed locations is stressing the contemporary computing paradigm. Efficient processing, persistent availability and timely delivery (especially over wide-area) of huge data have become critically important to the success of scientific research.

While distributed systems and networking research has well explored the fundamental challenges and solution space for a broad spectrum of distributed computing models operating on large data sets, the sheer size of the data in question today has well surpassed that assumed in prior research. To-date, the majority of computing systems and applications operate based on clear delineation of data movement and data computing. Data is moved from one or more data stores to a computing system, and then it is computed “locally” on that system. This paradigm consumes significant storage capacity at each computing system to hold the transferred data and data generated by the computation, as well as significant time for data transfer before and after the computation. Looking forward, researchers have begun to discuss the potential benefits of a completely new computing paradigm that more efficiently supports “in situ” computation of extremely large data at unprecedented scales across distributed computing systems interconnected by high speed networks, with high performance data transfer functions more closely integrated in software (e.g., operating systems) and hardware infrastructure than have been so far. Such a new paradigm has the potential to avoid bottlenecks for scientific discoveries and engineering innovations through much faster, efficient, and scalable computation across a globally distributed, highly interconnected and vast collection of data and computation infrastructure.

This workshop intends to bring together domain scientists, network and systems researchers, and infrastructure providers, to understand the challenges and requirements of “huge-data” sciences and engineering research needs and explore new paradigms to address the problems associated with processing, storing, and transferring huge data. Topics of interest include, but are not limited to:

huge data applications, requirements and challenges
challenges of designing and working with devices for huge data generation
storage systems for huge data
software systems and network protocols for huge data
in-network computing/storage for huge data
software-defined networking and infrastructure for huge data
infrastructure support for huge data
debugging and troubleshooting of huge data infrastructure
AI/ML technologies for huge data
measuring the huge data transfer and computation
scientific workflow of huge data
access to (portions of) huge data sets
protecting/securing (portions of) huge data sets

Submission of White Papers

Individuals interested in attending should submit a 1-2 page white paper that addresses a problem related to huge data transfer and processing. White papers should be submitted as PDF attachments by email to hugedata@netlab.uky.edu no later than February 15, 2020.

Registration and Travel Grant

A limited number of travel grants are available for authors of accepted white papers to support attendance at the workshop. Registration and travel grant application information can be found by following "Registration/Travel Grant" tab on the top of this page. The deadline is March 1, 2020.

Important Dates

Deadline for submission of white papers:	February 15, 2020
Acceptance notification:	February 25, 2020
Registration and travel grants application:	March 1, 2020
Notification of travel grant approval:	March 7, 2020
Workshop dates	April 13-14, 2020

Organizing Committee

Kuang-Ching Wang, Clemson University
James Griffioen, University of Kentucky
Ronald Hutchins, University of Virginia
Zongming Fei, University of Kentucky

Acknowledgment: The workshop is supported in part by the National Science Foundation (NSF) under grant CNS-1747856 and by NITRD Large Scale Networking (LSN) Interworking Group.

Large Scale Networking (LSN) Workshop on Huge Data: A Computing, Networking and Distributed Systems Perspective