Classification across gene expression microarray studies
| Item Type |
Journal Article |
| Author |
Andreas Buness |
| Author |
Markus Ruschhaupt |
| Author |
Ruprecht Kuner |
| Author |
Achim Tresch |
| URL |
http://www.biomedcentral.com/1471-2105/10/453
|
| Volume |
10 |
| Issue |
1 |
| Pages |
453 |
| Publication |
BMC Bioinformatics |
| ISSN |
1471-2105 |
| Date |
2009 |
| DOI |
10.1186/1471-2105-10-453 |
| Accessed |
2010-01-01 14:08:47 |
| Library Catalog |
BioMed Central and More |
| Abstract |
BACKGROUND:The increasing number of gene expression microarray studies represents an important resource in
biomedical research. As a result, gene expression based diagnosis has entered clinical practice for patient
stratification in breast cancer. However, the integration and combined analysis of microarray studies remains
still a challenge. We assessed the potential benefit of data integration on the classification accuracy and
systematically evaluated the generalization performance of selected methods on four breast cancer studies
comprising almost 1000 independent samples. To this end, we introduced an evaluation framework which aims
to establish good statistical practice and a graphical way to monitor differences. The classification goal was to
correctly predict estrogen receptor status (negative/positive) and histological grade (low/high) of each tumor
sample in an independent study which was not used for the training. For the classification we chose support
vector machines (SVM), predictive analysis of microarrays (PAM), random forest (RF) and k-top scoring pairs
(kTSP). Guided by considerations relevant for classification across studies we developed a generalization of
kTSP which we evaluated in addition. Our derived version (DV) aims to improve the robustness of the intrinsic
invariance of kTSP with respect to technologies and preprocessing.RESULTS:For each individual study the generalization error was benchmarked via complete cross-validation and
was found to be similar for all classification methods. The misclassification rates were substantially higher in
classification across studies, when each single study was used as an independent test set while all remaining
studies were combined for the training of the classifier. However, with increasing number of independent microarray studies used in the training, the overall classification performance improved. DV performed better
than the average and showed slightly less variance. In particular, the better predictive results of DV in across
platform classification indicate higher robustness of the classifier when trained on single channel data and
applied to gene expression ratios.CONCLUSIONS:We present a systematic evaluation of strategies for the integration of independent microarray
studies in a classification task. Our findings in across studies classification may guide further research aiming on
the construction of more robust and reliable methods for stratification and diagnosis in clinical practice. |
| Title |
Classification across gene expression microarray studies |
| Date Added |
2009-01-01 09:08 |
| Date Modified |
2009-01-01 09:08 |
Notes and Attachments