Sampling and Summarization for Big Network Data
- 2014-03-17 (Mon.), 10:30 AM
- Recreation Hall, 2F, Institute of Statistical Science
- Dr. Mi-Yen, Yeh
- Institute of Information Science, Academia Sinica
Abstract
Nowadays, large amounts of real-world network data are generated by various modern applications at a fast growing speed. These networks are usually ‘big” in terms of volume (i.e., the scale of the network size is large) and variety (i.e., the network is heterogeneous with various node and link types). For example, Internet-of-things (IOT) is a network comprising hundreds of thousands of different types of devices that can communicate with each other via various protocols; A cyber social networking application such as Facebook and Twitter comprise billions of vertices such as people, places, posts, and events, where the link relationship among them can be friendship between two people, authorship between people and posts, and “likeship” between people and events. Given a large-scale and heterogeneous network, it is very challenging for us to understand it in a short time because its size is too huge to handle with limited computing resources such as CPU/RAM, its structure is too complicated to analyze, and it contains too much semantic information. To efficiently exploit the wealth of abundant information embedded in the big network, in this talk, I will lay out the related research challenges and introduce our recent works on sampling and summarization for these big network data. ?