The problem of building a decision tree is a search problem. It means searching through the space of all possible decision trees (for a given dataset) for the tree which will best fit the data. We can consider the "usual" tree learning algorithms as performing a greedy look-ahead search: at each step they choose the split which is the best according to some criteria (ie covering the maximum number of positive examples, maximally reducing the entropy, etc.). "Nearly all decision tree induction algorithms create a single decision tree based upon local information of how well a feature partitions the training data" (Murphy and Pazzani, 1994).
The purpose of this project is to code a best-first seach algorithm in the space of possible decision trees. This approach has exponential run-time respectively to the number of attributes (data object descriptors) so special care will have to be taken of memory management. It will have to be integrated to the open-source Weka platform. Please also use javadoc.
Desired software:
A new version of Weka which would offer the Tree-List algorithm among the list of tree classifiers. All other Weka tools must function normally, for example I should be able to use Weka cross-validation tool with the Tree-List algorithm.
Requirements: A good knowledge of Java and some understanding of tree-based data-mining algorithms. If you read my algorithm but can't fathom the purpose of the corresponding code please bid for another project.
If you wonder what a decision tree is: [login to view URL]
Weka is very well documented: [login to view URL]~ml/weka/[login to view URL]
Many classes for Weka are already implement: [login to view URL]
Hi, I can help you to solve this tree algorithm. I have experience doing such work like implementing dJkshtra's Algo. for graphs and also algo for depth first search and breadth first search. Waiting for your reply in PMB. Thanks.