MicroRNA (miRNA) is a small non-coding RNA
that functions in epigenetics control of gene expression, which can be used as
a useful biomarker for diseases. Anti-NMDA receptor (Anti-NMDAR) encephalitis
is an acute autoimmune disorder. Some patients were found to have tumors,
especially teratoma. It occurs more often in female than in male. Most of them have a significant recovery
after tumor resection. It reveals that the tumor may induce anti-NMDAR
encephalitis. In this study, we review microRNA (miRNA) biomarkers that
associate with anti-NMDAR encephalitis and related tumors, respectively. To the
best of our knowledge, there have not been any researches in the literature
investigating the relationship between anti-NMDAR encephalitis and tumors
through their miRNA biomarkers. We adopt a phylogenetic analysis to plot the
phylogenetic trees of their miRNA biomarkers. From the analyzed results, we may
explain that (i) there is a relationship between these tumors and anti-NMDAR
encephalitis, and (ii) this disease occurs more often in female than in male.
This sheds light on exploring this issue through miRNA intervention.

For the analysis of competing risks data,
three different types of hazard functions have been considered in the
literature, namely the cause-specific hazard, the sub-distribution hazard, and
the marginal hazard function. Accordingly, medical researchers can fit three
different types of the Cox model to estimate the effect of covariates on each
of the hazard function. Many authors studied the difference between the
cause-specific hazard and the sub-distribution hazard. Comparative studies
including the marginal hazard function do not exist due to the difficulties
related to non-identifiability. In this paper, we adopt an assumed copula model
to deal with the model identifiability issue, making it possible to establish a
relationship between the sub-distribution hazard and the marginal hazard
function.

We develop a model diagnostic tool for
comparing the subhazard and marginal hazard models. We then extend our
comparative analysis to clustered competing risks data that are frequently used
in medical studies. To facilitate the numerical comparison, we implement the
computing algorithm for marginal Cox regression with clustered competing risks
data in the R joint.Cox package and check its performance via simulations. For
illustration, we analyze two survival datasets from lung cancer and bladder
cancer patients. This is joint work with Shih Jia-Han, Il-Do Ha, and Ralf
Wilke.

Mapping of disease incidence has long been
of importance to epidemiology and public health. In this paper, we consider identification
of clusters of spatial units with elevated disease rates and develop a new
approach that estimates the relative disease risk in association with potential
risk factors and simultaneously identifies clusters corresponding to elevated
risks. A heterogeneity measure is proposed to enable the comparison of a
candidate cluster and its complement under a pair of complementary models. A
quasi-likelihood procedure is developed for estimating the model parameters and
identifying the clusters. An advantage of our approach over traditional spatial
clustering methods is the identification of clusters that can have arbitrary
shapes due to abrupt or non-contiguous changes while accounting for risk
factors and spatial correlation. Asymptotic properties of the proposed
methodology are established and a simulation study shows empirically sound
finite-sample properties. The mapping and clustering of enterovirus 71
infection in Taiwan are carried out for illustration.

In natural ecological communities, most
species are rare and thus very likely to become extinct. As a consequence, the
prediction and identification of rare species are of enormous value for
conservation purposes. The main research question of interest is: how many
newly found species will be rare in the next field survey? By using
biodiversity information in an ecological sample, we developed an accurate
estimator for estimating the number of new rare species (e.g., singletons,
doubletons, and tripletons) that will be found in an as-yet-unsurveyed sample.
A semi-numerical study showed that the proposed Bayesian-weight estimator
accurately predicted the number of rare new species with low relative bias and
relative root mean squared error and accordingly, high accuracy. Additionally,
in this talk, I will employ some conservation-directed empirical applications
to demonstrate the predicting power of the proposed method.

We will discuss the problem of finding
principal components to the multivariate datasets, that lie on an embedded
nonlinear Riemannian manifold within the higher-dimensional space. Our aim is
to extend the geometric interpretation of PCA, while being able to capture the
nongeodesic form of variation in the data. We introduce the concept of a
principal sub-manifold, a manifold passing through the center of the data, and
at any point of the manifold, it moves in the direction of the highest
curvature in the space spanned by the eigenvectors of the local tangent space
PCA. We show the principal sub-manifold yields the usual principal components
in Euclidean space. We illustrate how to find, use and interpret the principal
sub-manifold, with which a classification boundary can be defined for data sets
on manifolds.

The distribution of ranked heights of excursions of a Brownian bridge is given in a paper by Pitman and Yor (2001). In this talk, we consider excursions of a Brownian excursion above a random level. We study the maximum heights of these excursions as Pitman and Yor did for excursions of a Brownian bridge.

]]>Abstract

Detailed interactions between biological
molecules are the fundamental to life, of which the compromise may cause
diseases. These interactions can be exemplified by famous antibody-antigen
interactions and many others. Direct visualization of such interactions at
atomic resolution or at the level of chemical bonds have been made possible by
protein X-ray crystallography depends on many technical advances in particular,
synchrotron radiation and crystallization screen. Recent advance in low
temperature electron microscopy (cryo-EM) has fulfilled a long-waited promise
that protein structure can be revealed to near atomic resolution in the absence
of crystal. This means a structure of a protein in its working conditions is
now accessible. However, it has been a mis-concept that these detailed
structures are directly available in the raw data that getting a powerful
microscope is sufficient. In this talk, I will first brief X-ray
crystallography and the ground truths of protein structure established by it.
Then I will use a few detailed structures obtained here to illustrate the
process of getting ground truths out from the very noisy cryo-EM data through
correct “data averaging” through computation. As it is evident, the challenges
of data reduction from very noisy data have presented great opportunities for
statisticians.

We propose a nonparametric multiple
imputation approach to recover information for censored observations while
analyzing survival data with presence of informative censoring. A working
shared frailty model is proposed to estimate the magnitude of informative
censoring, which is only used to determine the size of imputing risk set for
each censored subject. We have shown that the distance between the posterior
means of frailty is equivalent to the distance between the observed times. We,
therefore, propose to use the observed times for subjects at risk to calculate
the distance from each censored subject to select an imputing risk set for each
censored subject. In simulation, we have shown the nonparametric multiple
imputation approach produces survival estimates comparable to the targeted
values and coverage rates comparable to the nominal level 95% even in a
situation with a high degree of informative censoring. We have also
demonstrated the approach on ACTG-175 and developed an alternative sensitivity
analysis based on the approach for informative censoring.

Many real time series data sets exhibit
structural changes over time. It is then of interest to both estimate the
(unknown) number of structural break points, together with the parameters of
the statistical model employed to capture the relationships amongst the
variables/features of interest. An additional challenge emerges in the presence
of very large data sets, namely on how to accomplish these two objectives in a
computational efficient manner. In this talk, we outline a novel procedure
which leverages a block segmentation scheme (BSS) that reduces the number of
model parameters to be estimated through a regularized least squares criterion.
Specifically, BSS examines appropriately defined blocks of the available data,
which when combined with a fused lasso based estimation criterion, leads to
significant computational gains without compromising on the statistical
accuracy in identifying the number and location of the structural breaks. This
procedure is further coupled with new local and global screening steps to
consistently estimate the number and location of break points. The procedure is
scalable to large size high-dimensional time series data sets and can provably
achieve significant computational gains. It is further applicable to various
statistical models, including regression, graphical models and
vector-autoregressive models. Extensive numerical work on synthetic data
supports the theoretical findings and illustrates the attractive properties of
the procedure. Applications to neuroimaging data will also be discussed.

This paper is about how we study
statistical methods. As an example, it uses the random regressions model, in
which the intercept and slope of cluster-specific regression lines are modeled
as a bivariate random effect. Maximizing this model's restricted likelihood
often gives a boundary value for the random effect correlation or variances. We
argue that this is a problem; that it is a problem because our discipline has
little understanding of how contemporary models and methods map data to
inferential summaries; that we lack such understanding, even for models as
simple as this, because of a near-exclusive reliance on mathematics as a means
of understanding; and that math alone is no longer sufficient. We then argue
that as a discipline, we can and should break open our black-box methods by
mimicking the five steps that molecular biologists commonly use to break open
Nature's black boxes: design a simple model system, formulate hypotheses using
that system, test them in experiments on that system, iterate as needed to
reformulate and test hypotheses, and finally test the results in an "in
vivo" system. We demonstrate this by identifying conditions under which
the random-regressions restricted likelihood is likely to be maximized at a
boundary value. Resistance to this approach seems to arise from a view that it
lacks the certainty or intellectual heft of mathematics, perhaps because
simulation experiments in our literature rarely do more than measure a new
method's operating characteristics in a small range of situations. We argue
that such work can make useful contributions including, as in molecular
biology, the findings themselves and sometimes the designs used in the five
steps; that these contributions have as much practical value as mathematical
results; and that therefore they merit publication as much as the mathematical
results our discipline esteems so highly.