How to input knowledge to the Machines (aka ) Machine learning ?!

Learning Algorithms :

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks in T , as measured by P improves with experience E.

(by Mitchell (1997))

So what is E, T, P???

Task T :

* Classification
* Regression
* Transcription
* Machine translation
* Anomaly detection
* Synthesis and sampling
* Imputation of missing values
* Denoising

The Performance Measure P :

    * Accuracy
    * Precision, Recall
    * Reconstruction Error
    * RMSE ,MSE ,MAE

The Experience E :

* Supervised Learning p(y/x)
* Unsupervised Learning p(x)
* Reinforcement learning (continous learning)
* Small line which separates Supervised from unsupervised learning (Arinthaal ... arinthaal....)

Hint

Generally ,The Best practice is to choose the Dataset/design matrix for your algorithm in suchway that your model able to see all possible combinations of examples (data points)

What you want from Learning Algorithms !?

* Generlization 

What are all the learning Algorithms you know ?

we will broadly classify the Algorithms into three parts : (for simplicity)

* Statistical models
* Linear Models
* Tree based Models
* Neural Networks

Drawbacks of Linear Models

* Scaling is required 
* Normalization is required
* Encoding categorical varaibles is challenging 
* Interpreting the results needs some addtional work
* Implementing Multi class classification needs some extra thinking
* Rest are Assignment !!!!

DECISION TREE

decsion tree

Example of Decsion tree

https://en.akinator.com/game

dt idea

think about how to get predictions ?

So Which feature to Query and which threshold to choose !? how to write a generlized code for best split !???

node purity

Consider the following example to split red from green

which one is good split !?

The Basic Algorithm

1 Start at the root node as parent node
2 Split the parent node at the feature xi to minimize the sum of the child node impurities (maximize information gain)
3 Assign training samples to new child nodes
4 Stop if leave nodes are pure or early stopping criteria is satisfied, else repeat steps 1 and 2 for each new child node


Stopping Rules

1 The leaf nodes are pure
2 A maximal node depth is reached
3 Splitting a note does not lead to an information gain

Talking about what to reduce (i.e how impurity has to be reduced) we have following measurements

* Gini index
* Entropy
* Misclassification Error
* ID3
* Chi-Square
* Reduction in Variance

Reference :

https://medium.com/@rishabhjain_22692/decision-trees-it-begins-here-93ff54ef134

https://www.bogotobogo.com/python/scikit-learn/scikt_machine_learning_Decision_Tree_Learning_Informatioin_Gain_IG_Impurity_Entropy_Gini_Classification_Error.php
https://sebastianraschka.com/faq/docs/decisiontree-error-vs-entropy.html

Information gain

Gini index :

Gini index (a criterion to minimize the probability of misclassification)

Gini =$1-\sum_jp_j^2$

where $p_j$ is the probability of class j.

More Generalized Form

Entropy :

* Way to measure impurity
* calculates the homogeneity of a sample. If the sample is completely homogeneous the entropy is zero and if the sample is equally divided then it has entropy of one

Entropy = $-\sum_jp_j\log_2p_j$

Example :

entropy = $-1 \log_2 1 = 0$

entropy = $-0.5 \log_2 0.5 - 0.5 \log_2 0.5 = 1$

In [ ]:
 
In [1]:
from sklearn.datasets import load_diabetes
from sklearn.tree import DecisionTreeRegressor,DecisionTreeClassifier
from sklearn.model_selection import train_test_split
X,y=load_diabetes(return_X_y=True)
In [2]:
X_train,X_test,y_train,y_test=train_test_split(X,y)
X_train.shape,y_train.shape
Out[2]:
((331, 10), (331,))
In [3]:
import matplotlib.pyplot as plt
import numpy as np
plt.plot(y)
plt.show()
In [4]:
dtr=DecisionTreeRegressor()

dtr.fit(X_train,y_train)
Out[4]:
DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')
In [5]:
pred=dtr.predict(X_test)
plt.plot(pred,label='prediction')
plt.plot(y_test,label='Actual')
plt.legend()
plt.show()
In [6]:
from sklearn.metrics import mean_squared_error,mean_absolute_error
mae=mean_absolute_error(y_test,pred)
mse=mean_squared_error(y_test,pred)
rmse=mse**0.5
plt.bar(['mae','mse','rmse'],[mae,mse,rmse])
plt.title('mae={},mse={},rmse={}'.format(mae,mse,rmse))
plt.show()
In [1]:
!pip install pydotplus
Requirement already satisfied: pydotplus in c:\users\gurunath.lv\appdata\local\continuum\anaconda3\lib\site-packages (2.0.2)
Requirement already satisfied: pyparsing>=2.0.1 in c:\users\gurunath.lv\appdata\local\continuum\anaconda3\lib\site-packages (from pydotplus) (2.2.0)
In [11]:
from treeinterpreter import treeinterpreter as ti
prediction, bias, contributions=ti.predict(dtr,X_test)
In [14]:
import pydotplus
from sklearn.tree import export_graphviz
from IPython.display import Image,HTML,SVG

from io import StringIO
dot_data = StringIO()
export_graphviz(dtr, out_file=dot_data,  
                filled=True, rounded=True,
                special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())  
#Image(graph.create_png())
In [20]:
conda_fix(graph)
Image(graph.create_jpg())
Out[20]:

Advantages of Tree based Models :

* Easy categorical Handling 
* interpreting the results are easy 
* No scaling /Normalization are required
* Rest are Assignment !!!