Given a discrete random variable $X$, with possible outcomes $x_{1},x_{2},…,x_{n}$, which occur with probability $P(x_{1}),P(x_{2}),…,P(x_{n})$, the entropy of X is formally defined as: $H(X) = -\sum_{i=1}^{N} P(x_{i})log_{2}P(x_{i})$
B. ID3 algorithm
Calculate the entropy of every attribute $\alpha$ of the data set $S$.
Partition (“split”) the set $S$ into subsets using the attribute for which the resulting entropy after splitting is minimized; or, equivalently, information gain is maximum.
Make a decision tree node containing that attribute.
Recurse on subsets using the remaining attributes.
ID3 (Examples, Target_Attribute, Attributes) Create a root node for the tree Ifall examples are positive, Return the single-node tree Root, with label = +. Ifall examples are negative, Return the single-node tree Root, with label = -. If number of predicting attributes is empty, thenReturn the single node tree Root, with label = most common valueof the target attributein the examples. Otherwise Begin A ← The Attribute that best classifies examples. Decision Tree attributefor Root = A. Foreach possible value, vi, of A, Add a new tree branch below Root, corresponding to the test A = vi. Let Examples(vi) be the subset of examples that have the value vi for A If Examples(vi) is empty Then below this new branch add a leaf node with label = most common target valuein the examples Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes – {A}) End Return Root