As natural as the real data, ghost data is everywhere—it is just data that you cannot see. We need to learn how to handle it, how to model with it, and how to put it to work. Some examples of ghost data are (see, Sall, 2017):
(a) Virtual data—it isn’t there until you look at it;
(b) Missing data—there is a slot to hold a value, but the slot is empty;
(c) Pretend data—data that is made up;
(d) Highly Sparse Data—whose absence implies a near zero, and
(e) Simulation data—data to answer “what if.”
For example, absence of evidence/data is not evidence of absence. In fact, it can be evidence of something. More Ghost Data can be extended to other existing areas: Hidden Markov Chain, Two-stage Least Square Estimate, Optimization via Simulation, Partition Model, Topological Data, just to name a few.
Three movies will be discussed in this talk: (1) “The Sixth Sense” (Bruce Wallis)—I can see things that you cannot see; (2) “Sherlock Holmes” (Robert Downey)—absence of expected facts; and (3) “Edge of Tomorrow” (Tom Cruise)—how to speed up your learning (AlphaGo-Zero will also be discussed). It will be helpful, if you watch these movies before coming to my talk. This is an early stage of my research in this area--any feedback from you is deeply appreciated. Much of the basic idea is highly influenced via Mr. John Sall (JMP-SAS).