can "splitting attribute" appear many times in decision tree?

11,529

Solution 1

For obvious reasons, it does not make sense to use the same decision within the same branch.

On different branches, this reasoning obviously does not hold.

Consider the classic XOR(x,y) problem. You can solve it with a two layer decision tree, but you will need to split on the same attribute in both branches.

If x is true:
    If y is true:  return false
    If y is false: return true
If x is false:
    If y is true:  return true
    If y is false: return false

Another example is the following: assume your data is positive in x=[0;1], and negative outside. A good tree would be the following:

If x > 1:      return negative
If x <= 1:
    If x >= 0: return positive
    If x < 0:  return negative

It's not the same decision, so it can make sense to use x twice.

Solution 2

  1. In general , you can do whatever you want, as long as you keep a structure of a "tree". They can be customized in many ways and while there can be redundancy it doesn't undermine its validity.

  2. Binary attributes shouldn't appear twice in the same brunch, that would be redundant. However, continuous attributes can appear in same branch several times.

Solution 3

If the attribute is categorical, it cannot be used as the split attribute for more than one time. If the attribute is numerical, in principle, it can be used for many times, but the standard decision tree algorithm (C4.5 algorithm) does not implemented that way.

The following description is based on the assumption that the attributes are all categorical.

From the explanation perspective, decision tree is explainable, how an instance labeled can be explained by the attributes (as well as the value of the attributes) used from the root to the leaf. Therefore, it does not make sense to have duplicate attributes in one branch of the tree.

From the algorithm perspective, once an attribute is selected as the split attribute, the attributes would have no chance to be selected again based on the attribute selection criteria, e.g. information gain would be 0. This is because all the instances would have the same attribute value once they have been filtered by the attribute. Using the attribute again cannot bring more information for classification.

Share:
11,529
yvetterowe
Author by

yvetterowe

Updated on June 26, 2022

Comments

  • yvetterowe
    yvetterowe almost 2 years

    Just want to clarify one thing: the same attribute can appear in decision tree for many times as long as they are in different "branches" right?