When trying to fit scikit-learn DecisionTreeClassifier
on my data, I am observing some weird behavior.
x[54]
(a boolan feature) is used to break the 19
samples into 2
and 17
on top left node. Then again, the same feature with exact same condition appears again in its True
branch.
This time it again has True
and False
branches leading to leaf nodes.
I am using gini
for deciding the split.
My question is, since we are in True
branch, how can same boolean feature generate non-zero entropy or impurity at all? After all the new set can only have 0s
for that feature. So there should not be any posibility of split.
What am I missing.
When trying to fit scikit-learn DecisionTreeClassifier
on my data, I am observing some weird behavior.
x[54]
(a boolan feature) is used to break the 19
samples into 2
and 17
on top left node. Then again, the same feature with exact same condition appears again in its True
branch.
This time it again has True
and False
branches leading to leaf nodes.
I am using gini
for deciding the split.
My question is, since we are in True
branch, how can same boolean feature generate non-zero entropy or impurity at all? After all the new set can only have 0s
for that feature. So there should not be any posibility of split.
What am I missing.