On the Analysis of Massive Date Sets: Sampling, Sparse Recovery, and Communication Complexity
- 2017-03-09 (Thu.), 10:30 AM
- Recreation Hall, 2F, Institute of Statistical Science
- Mr. Meng-Tsung Tsai
- Dept. of Computer Science, Rutgers University, USA
Abstract
Physical memory is small.? When data sets do not fit into memory, RAM algorithms cannot be used to extract valuable information from unstructured data efficiently. One frequently-used approach to resolve the issue is to draw some random samples for analysis rather than process the entire set of data. Such sampling technique is space-efficient and is usually sufficient to obtain useful information.????? For most cases, data sets are not static, i.e. new data can be added and aged data can be removed. These dynamic updates make the sampling more challenging. To tackle the challenges, we appeal to sparse recovery and demonstrate this technique by showing some applications in graph streaming, such as k-connectivity, degeneracy, etc.???? Sampling techniques are useful. However, for some difficult problems, it is impossible to give a sufficiently good solution using little memory. We show how to prove the impossibility by communication complexity.