Data Science

What is the problem with the simple calculation of class probability? How to solve it?

Some of the leafs in a tree might be pure or might not contain enough target variables to deliver reliable results. The problem for pure sets is, that they would deliver a probability of 100% of belonging to that class!
 
Solution-> Laplace correction
 
n+1/(n+m+2)
 
-> this affects small sets more than very large sets
 
E.g.
 
1+1/(1+2+2) = 0.4 [small set]
 
20+1/(20+40+2) = 0.333

Diskussion