中央研究院統計科學研究所

演講公告

演講公告演講公告

:::

Composition and sample size determination for training set in genomic prediction

2023-03-06 (Mon.), 10:30 AM
統計所B1演講廳；茶會：上午10：10。
實體與線上視訊同步進行。
Prof. Chen-Tuo Liao ( 廖振鐸教授 )
國立臺灣大學生物資源暨農學院農藝學系

Abstract

Genomic prediction (GP) is a statistical method used to select quantitative traits in animal or plant breeding. For this purpose, a GP model is first built that uses phenotype and genotype data in a training set. The trained model is then used to predict genomic estimated breeding values (GEBVs) for individuals with genotypic data along. For a specified test set, we develop a highly efficient algorithm to determine an optimal subset from a candidate set in which the individuals have been genotyped but not phenotyped yet. The chosen subset serves as the training set to be phenotyped, and then the GP model is built using its phenotype and genotype data. In this study, we propose an optimality criterion, called as r-score, to determine the required training set. The r-score criterion is derived directly from Pearson’s correlation between GEBVs and phenotypic values of the test set. The proposed method is shown to be advantageous over existing ones, mainly because that it fully uses the genomic relationship between the test set and the training set by taking into account both the variance and bias for predicting the GEBVs. By applying the logistic growth curve to draw a connection between r-score and the training set size, a practical approach is proposed to determine the sample size of the optimal training set. Some real genome datasets are used to illustrate the proposed approach.

線上視訊請點選連結

附件下載

最後更新日期：2023-02-24 14:04

回列表頁