A- A A+



Regularization and Variable Selection for Data with Interdependent Structures


Variable selection methods are powerful tools for analyzing high dimensional massive data. In bioinformatics, the methods have been applied in gene expression microarray data analysis. It is well known that for genes sharing a common biological pathway or a similar function, the correlations among them can be very high. However, most of the available variable selection methods cannot deal with complicated interdependence among data. We propose two new algorithms, namely gLars and gRidge, to select groups of highly correlated variables together in regression models. The new approaches intent to conduct grouping and selecting at the same time. Simulations and a real example show that our proposed methods often outperform the existing variable selection methods, including LARS and elastic net, in terms of both prediction error and preserving sparsity of representation.
