Data Science

What is entropy?

Entropy describes the purity of a set of attributes. Entropy ranges between 0 and 1 and is calculated like this:
 
p(i) = probability of i occuring in the dataset
 
- p(i)*log(p(i)) - p(j)*log(p(j) - ....
 
high entropy means the data set consitst of a set of many different attributes
low entropy means, the data set consist of a pure set of an attribute
 
entropy is the highest, when the probability of all attributes in the set is equal (eg. 50/50)

Diskussion