What is the problem with the simple calculation of class probability? How to solve it?

Question

Answer 1

Some of the leafs in a tree might be pure or might not contain enough target variables to deliver reliable results. The problem for pure sets is, that they would deliver a probability of 100% of belonging to that class!

Solution-> Laplace correction

n+1/(n+m+2)

-> this affects small sets more than very large sets

E.g.

1+1/(1+2+2) = 0.4 [small set]

20+1/(20+40+2) = 0.333

Feldtrenner
Texttrenner

Feldtrenner
Texttrenner