Miika Ahdesmaki M, Hill L, Goffard N, McDyer F, Davison T, Proutski V, Bylesjo M
In: ISMB/ECCB. Vienna (Austria); 2011
Biomarker discovery involves identifying variables (e.g. genes) that are related to an endpoint of interest, for instance patient risk stratification or drug response. When used in classification setting, the biomarker discovery process is commonly aimed towards identifying a signature consisting of a panel of variables that together allow prediction of clinical outcome.
When signatures are developed according to the best practices there are often several models being evaluated giving similar classification performance. Having only information related to classification performance available for each model makes the model selection step difficult and in part arbitrary, as one model can rarely be said to be significantly better than another.
In this research we have explored the use of additional model properties to make more informed and relevant decisions in selecting the top ranking models by building additional metrics into the model generation step, such as biological relevance, clinical utility and analytical precision. The methodology is evaluated using gene expression data sets from the MAQC-II project. We have generated predictive signatures within cross-validation using several classifiers and feature selection methods and ranked them given their classification performance across signature lengths. The signatures were simultaneously analysed by functional enrichment, independence to known clinical covariates, permutation tests and analytical precision, also within cross validation. Our proposed extended biomarker analysis gives considerably more insight into model properties that might otherwise be left unaccounted for and is crucial in selecting a signature that also generalises well and passes external validation.
Tags: Biomarker, Gene expression