# Machine Learning - Model Evaluation

Bootcamp AI — Session 3

Author: Miguel Calle | Slides: Roberto Sanchez

In this article, we are going to explain the basic definitions and give some tips of use of the different kind of machine learning algorithms.

Several Models to choose for Machine Learning.

• Logistic Regression
• Naive Bayes Classifiers
• Support Vector Machines(SVMs)
• Decision Trees
• Random Forest
• Kernel Methods
• Genetic Algorithms
• Neural Networks

Now let’s go to explore the most popular algorithms used in the analysis of data in Machine Learning.

Decision Tree

The idea of this algorithm is that we choose the best split among all features and all possible split points. The models created with this algorithm have the structure of a tree. It is compared with a flow diagram.

Decision Tree

max_depth = 2

….

Maximum depth refers to the the length of the longest path from a root to a leaf

Logistic Regression

We use this algorithm in groups of data that we have the dependent variable (target) is categorical. A clear example of logistic regression is when we need to decide if a email is spam or not.

Logistic Regression

solver = “liblinear”

multi_class = “ovr” (binary)

Logistic regression algorithm can use to solve the multi-classification problems. In the multi class case, the training algorithm uses the one-vs-rest (OvR) scheme.

SVM (Gaussian kernel)

We can use Support Vector Machine for a linear model in machine learning for classification and regression problems. with this algorithm we can solve linear and non-linear problems. the idea os SVM is that the algorithm creates a line or a hyperplane which separates the data into classes.

SVM

kernel = “poly”

c =1 (penalty parameter for the error )

Thus SVM tries to make a decision boundary in such a way that the separation between the two classes(that street) is as wide as possible. Depend on the type of data, and if the data is linearly separable or not. We can choose in kernel the options: “poly” or “linear”

Neural Network

The neural network algorithm is a computational model that is thinking to imitate the functionality of a biological neural network, with the finality to realize works of learn and solve problems.

Neural Network

hidden_layer = 2

activation = “identity”

Hidden layer is the layer between the input and the output called hidden_layer. The activation function is responsible for returning an output from an input value, usually the set of output values in a given range such as (0,1) or (-1,1).

Neural Network

A random forest is made of many decision trees.

Random Forest

n_estimator = 10 (Nb. of trees)

Max_depth = 2

n_estiamtor is the number of trees to be used in the forest. max_features on the other hand, determines the maximum number of features to consider while looking for a split

HyperParameter

In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training.

We have metrics: Acurracy, confusion matrix, precision, recall, ROC, etc.

Why do we evaluate a model ‘s performance?

1. To find the best preforming models
2. Part of parameter tuning
3. To report/publish results
- Can the model be used as is?
- Do we need to try to improve?

Performance Metrics

SUPERVISED

— Classification:

— — Accuracy

— — Precision & Recall

— — ROC curves & AUC

— Regression:

— — Mean square error (MSE) + Root MSE

— — Percent error

— — Mean absolute percent error

— Ranking (ordinal/discrete regression):

— — Precision at N

UNSUPERVISED

— — There is no a clear measure. It depends on the problem

Classification

When when use a Classification algorithm the result is to predict the categorical class labels of new data based on past observation.

Always in our project we need to measure the effectiveness of our model. we want see if the model has better the effectiveness, better the performance. The Confusion Matrix is a performance measurement for machine learning classification.

• Used in classification
• Show Actual vs predicted results
• Enables visualizing performance and calculating performance metrics

Classification metrics

Precision (a.k.a PPV): What percent of our predictions are correct?

Recall (a.k.a sensitivity): What percent of the accurate predictions did we capture?

F1 score: A single number that combines the two values above. Good for ranking/sorting, and imbalanced classes

Accuracy: What percent of all our predictions (positive and negative) are correct?

Classification: ROC Curve

Area under curve (the ROC curve)

A Receiver Characteristic Curve (ROC) plots the True positive rate (TPR) vs. the False positive rate (FPR). The maximum area under the curve (AUC) is 1. Completely random predictions have an AUC of 0.5. The advantage of this metric is that it is continuous.

Constructing a ROC Curve

Evaluating a Classifier: What Affects the Performance?

- Large amounts of features (high dimensionality)
-Feature(s) appears very few times (sparse data)
• Few instances for a complex classification task
• Missing feature values for instances
• Errors in attribute values for instances
• Errors in the labels of training instances
• Uneven availability of instances in classes
• Overfitting

Overfitting

A model overfits the training data when it is very accurate with that data, and may not do so well with new test data (see model 2)

What if there is not a best model?

Approach: Ensembles

• An ensemble method uses several algorithms that do the same task, and combines their results
- “Ensemble learning”
• A combination function joins the results
- Majority vote: each algorithm gets a vote
- Weighted voting: each algorithm’s vote has a weight
- Other complex combination functions

A combination function joins the results:

• Majority vote: each algorithm gets a vote
• Weighted voting: each algorithm’s vote has a weight
• Other complex combination functions

Reference

1. https://medium.com/@miguelcalleromero/a6a6d7395874

Written by