|
篇目详细内容 |
【篇名】 |
Learning random forests for ranking |
【刊名】 |
Frontiers of Computer Science in China |
【刊名缩写】 |
Front. Comput. Sci. China |
【ISSN】 |
1673-7350 |
【EISSN】 |
1673-7466 |
【DOI】 |
10.1007/s11704-010-0388-5 |
【出版社】 |
Higher Education Press and Springer-Verlag Berlin
Heidelberg |
【出版年】 |
2011 |
【卷期】 |
5
卷1期 |
【页码】 |
79-86
页,共
8
页 |
【作者】 |
Liangxiao JIANG;
|
【关键词】 |
random forests (RF); decision tree; random selection; class probability estimation; ranking; the area under the receiver operating characteristics curve (AUC) |
【摘要】 |
The random forests (RF) algorithm, which combines the predictions from an ensemble of random trees, has achieved significant improvements in terms of classification accuracy. In many real-world applications, however, ranking is often required in order to make optimal decisions. Thus, we focus our attention on the ranking performance of RF in this paper. Our experimental results based on the entire 36 UC Irvine Machine Learning Repository (UCI) data sets published on the main website of Weka platform show that RF doesn’t perform well in ranking, and is even about the same as a single C4.4 tree. This fact raises the question of whether several improvements to RF can scale up its ranking performance. To answer this question, we single out an improved random forests (IRF) algorithm. Instead of the information gain measure and the maximum-likelihood estimate, the average gain measure and the similarity-weighted estimate are used in IRF. Our experiments show that IRF significantly outperforms all the other algorithms used to compare in terms of ranking while maintains the high classification accuracy characterizing RF. |
|