**Abstract**

In this study, we use a vector functional auto-regressive model to analyze the supply and demand (S-D) curves of a Limit Order Book (LOB) simultaneously. The S-D curves are represented by a linear combination of multi-resolution B-spline basis functions. The corresponding coefficients of the basis functions are shown to follow a vector auto-regressive model, which can be applied to the prediction of future S-D curves. Numerical results indicate that the proposed method has satisfactory performance and the areas under the S-D curves are capable of improving the classification of market trends.

]]>**Abstract**

Sufficient dimension reduction (SDR) is continuing an active research field nowadays. When estimating the central subspace (CS), inverse regression based SDR methods involve solving a generalized eigenvalue problem, which can be problematic under the large-p-small-n situation. In recent years, there are emerging new techniques in numerical linear algebra, called randomized algorithms or random sketching, for high dimensional and large scale problems. To overcome the large-p-small-n problem in SDR, we combine the idea of statistical inference with random sketching to propose a new SDR method, named integrated random-partition SDR (iRP-SDR). Our method consists of the following steps. (1) Randomly partition the covariates into subsets to construct an envelope subspace with low dimension. (2) Obtain a sketch estimate of the CS by applying conventional SDR method in the constructed envelope subspace. (3) Repeat the above two steps for multiple times and integrate these multiple sketches to form a final estimate of the CS. The advantageous performance of iRP-SDR is demonstrated via simulation studies and an EEG data analysis. (joint with Hung Hung, National Taiwan University)

]]>**Abstract**

In this talk, I will briefly introduce three latest projects in our lab at Academia Sinica on creative applications in music, including the singing voice separation project, GenMusic (music generation) project, and the DJnet project. The first project is about separating the singing voice from the musical accompaniments, which can be used as a pre-processing step for many music related applications. The second project is about learning from massive collection of MIDI files to generate multi-track music by a generative adversarial network (GAN). The generative model can be used for generating music either from scratch, or by accompanying a given (instrument) track. The third project is about creating an AI DJ that knows how to manipulate, sample, and sequence musical pieces to create a personalized playlist. The goal of these projects is to enrich the way people create and interact with music in their daily lives, using the latest machine learning (deep learning) techniques.

]]>**Abstract**

The preferential attachment (PA) network is a popular way of modeling the social networks, the collaboration networks and etc. The PA network model is an evolving network model where new nodes keep coming in. When a new node comes in, it establishes only one connection with an existing node. The random choice on the existing node is via a multinomial distribution with probability weights based on a preferential function f on the degrees. f maps the natural numbers to the positive real line and is assumed apriori non-decreasing, which means the nodes with high degrees are more likely to get new connections, i.e. "the rich get richer". Under sublinear parametric assumptions on the PA function, we proposed the maximum likelihood estimator on f. We show that the MLE yields optimal performance with the asymptotic normality results. Despite the optimal property of the MLE, it depends on the history of the network evolution, which is often difficult to obtain in practise. To avoid such shortcomings of the MLE, we propose the quasi maximum likelihood estimator (QMLE), a history-free remedy of the MLE. To prove the asymptotic normality of the QMLE, a connection between the PA model and Svante Janson's urn models is exploited. This is partially joint work with Aad van der Vaart.

]]>**Abstract**

When studying treatments for psychiatric diseases in a placebo-controlled trial, we may consider use of the sequential parallel comparison design (SPCD) to decrease the number of patients needed through the reduction of the high placebo response rate. Using the conditional arguments to remove nuisance parameters, we derive the conditional maximum likelihood estimator (CMLE) for the odds ratio (OR) of responses under the SPCD. We further derive three asymptotic interval estimators and an exact interval estimator for the OR of responses. We employ Monte Carlo simulation to evaluate the performance of these interval estimators in a variety of situations. We find that asymptotic interval estimators and the exact interval estimator can all perform well. We use the double-blind, placebo-controlled study to assess the efficacy of a low dose of aripiprazole adjunctive to antidepressant therapy for treating patients with major depressive disorder (MDD) to illustrate the use of estimators developed here.

]]>**Abstract**

As natural as the real data, ghost data is everywhere—it is just data that you cannot see. We need to learn how to handle it, how to model with it, and how to put it to work. Some examples of ghost data are (see, Sall, 2017):

(a) Virtual data—it isn’t there until you look at it;

(b) Missing data—there is a slot to hold a value, but the slot is empty;

(c) Pretend data—data that is made up;

(d) Highly Sparse Data—whose absence implies a near zero, and

(e) Simulation data—data to answer “what if.”

For example, absence of evidence/data is not evidence of absence. In fact, it can be evidence of something. More Ghost Data can be extended to other existing areas: Hidden Markov Chain, Two-stage Least Square Estimate, Optimization via Simulation, Partition Model, Topological Data, just to name a few.

Three movies will be discussed in this talk: (1) “The Sixth Sense” (Bruce Wallis)—I can see things that you cannot see; (2) “Sherlock Holmes” (Robert Downey)—absence of expected facts; and (3) “Edge of Tomorrow” (Tom Cruise)—how to speed up your learning (AlphaGo-Zero will also be discussed). It will be helpful, if you watch these movies before coming to my talk. This is an early stage of my research in this area--any feedback from you is deeply appreciated. Much of the basic idea is highly influenced via Mr. John Sall (JMP-SAS).

]]>