**Abstract**

When the data size is large, even performing a standard statistical analysis can become challenging. For instance, in linear regression, if the number of predictors is very large and exceeds the sample size, the usual least squares estimation will fail. A common practice for solving the above problem is to perform variable screening and variable cleaning. In this talk, we focus on the screening step and propose an algorithm involves randomization. The proposed method incorporates a random partitioning step, which improves the variable screening accuracy. Our simulation results show that the proposed approach works well.

]]>**Abstract**

Although increasingly used as a data resource for assembling cohorts, electronic medical records (EMR) pose a number of analytic challenges because they are primarily collected for clinical encounters rather than for research purpose. In particular, patient's health status influences when and what data are recorded, leading to bias in the collected data. In this paper, we consider recurrent event data analysis by leveraging the EMR data. Conventional regression methods for event risk analysis usually require the values of covariates to be observed throughout the follow-up period. In EMR databases, time-dependent risk factors are intermittently measured at clinical visits, and the timing of these visits is informative in the sense that it depends on the disease course. Simple methods, such as the last-observation-carried-forward approach, can lead to biased estimation; on the other hand, complex joint models require additional assumptions on the covariate process and can not be easily extended to handle multiple longitudinal predictors. We present a novel kernel-smoothing estimation procedure for the semiparametric proportional rate model of recurrent events with time-dependent covariates by correcting the sampling bias resulting from the informative observation times in EMR-derived clinic data. The proposed method does not require model specifications for the covariate processes and can easily handle multiple time-dependent covariates. The estimator for the regression parameters is asymptotically unbiased and normally distributed with a root-${n}$ convergence rate. We further investigate the bias in scenarios where the assumptions on the observation time process are violated. Simulation studies are conducted to evaluate the performance of the proposed estimator. Our method is applied to a kidney transplant study for illustration.

]]>**Abstract**

Stochastic gradient descent (SGD) is a popular algorithm that can handle extremely large data sets due to its low computational cost at each iteration and low memory requirement. Asymptotic distributional results of SGD are very well-known (Kushner and Yin, 2003). However, a major drawback of SGD is that it does not adapt well to the underlying structure of the solution, such as sparsity or constraints. Thus, many variations of SGD have been developed, and a lot of them are based on the concept of stochastic mirror descent (SMD). In this paper, we develop diffusion approximation to SMD with constant step size, using a theoretical tool termed “local Bregman divergence”. In particular, we establish a novel continuous mapping theorem type result for a sequence of conjugates of the local Bregman divergence. The diffusion approximation results shed light on how to fine-tune an l1-norm based SMD algorithm to yield “asymptotically unbiased” estimator which zeros inactive coefficients.

]]>