In this article we will try to understand how the Gradient Boosting algorithm works for Classification with an example and also python code.

- What is Gradient Boosting?
- How does Gradient Boosting work?
- Python Code Implementation.
- Pros & Cons.
- Conclusion.

- In my previous article, I have explained the basics of Boosting and its types, please do check that article for better understanding.

- Gradient boosting is a machine learning technique which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.
- In Gradient Boosting, each predictor(decision tree) tries to improve on its predecessor by reducing errors. But the main idea in this technique is that, instead of fitting a predictor on the data at each iteration, it actually fits a new predictor to the residual errors made by the previous predictor. …

This article mainly focuses on Boosting which is most commonly used and very important algorithm in Machine learning.

- What is Boosting?
- How does Boosting work?
- Types of Boosting.
- Conclusion.

- Boosting is a type of ensemble algorithm which converts weak learners into strong learners.
- Boosting is used for improving the model predictions of any given learning algorithm.
- Just as humans learn from their mistakes and try not to repeat them further in life,
**Boosting**algorithms try to build a strong learner from the mistakes of several weaker models. - Boosting basically tries to reduce the bias error which arises when models are not able to identify relevant trends in the data. …

This article covers another important algorithm in Machine Learning which is KNN.

- What is KNN?
- KNN Example.
- Steps in KNN.
- Selecting K in KNN.
- Code Implementation in python.
- Pros & Cons of KNN.
- Conclusion.

- KNN is an acronym for K-Nearest Neighbors algorithm which is a non-parametric supervised learning technique.
- It is used for both classification and regression tasks.
- It is a non-parametric and instance-based learning algorithm,
**Non-parametric:**KNN makes no explicit assumptions about the underlying data.**Instance-based:**The algorithm doesn’t explicitly learn a model. Instead, it chooses to memorize the training instances which are subsequently used for predictions. …

**Sampling** is a technique that allows us to get information about the population based on the statistics from a subset of population, without investigating every individual from the population.

Before we start understanding sampling and it’s techniques, we need to know what population and sample are:

A population data set contains all members of a specified group (the entire list of possible data values). For example, **population **can be all the people living in a particular country.

A sample data set contains a part, or a subset of a population. The size of the sample is always less than the size of the population from which it is taken. …

This is continuation of my previous blog , where we have seen the theoretical concept of how splitting is done in Decision Trees. In this article we will check the code implementation of Decision trees. Before starting this I would suggest you to check out my previous blog for better understanding.

- CART

- CART stands for Classification and Regression Tree.
- This algorithm uses Gini Impurity technique for splitting.

2. ID3

- ID3 stands for Iterative Dichotomiser 3.
- This algorithm uses Information Gain/ Entropy technique for splitting.

3. C4.5

- This is an extension to ID3.
- This algorithm also uses Information Gain/Entropy technique for splitting. …

In this article we will focus on brief introduction about Decision Trees and different ways on how the splitting happens in Decision Trees.

**Introduction to Decision Trees.****Terminology related to Decision tree.****Types of Decision Trees.****What is Node splitting, need for it?****Types of splitting****Conclusion.**

- Decision tree is a type of supervised learning algorithm, it is a tree-like model of decisions and their possible consequences. It is one way to display an algorithm that contains conditional control statements.
- Decision trees are used for both Classification and Regression problems.
- People use decision trees in a variety of situations, from something personal to more complex business, financial, or investment undertakings. …

The purpose of this article is to explain the Multiple Linear Regression algorithm. I would recommend to check my article on Simple Linear Regression before you start reading this for better understanding.

Multiple Linear Regression **(MLR), **also known simply as multiple regression, is the most common form linear regression analysis. It is a statistical technique that uses several independent variables to predict the outcome of a dependent variable. The independent variables can be continuous or categorical(dummy coded as appropriate).

Multiple Linear Regression is used to estimate the relationship between two or more independent variables and one dependent variable.

Similar to how we have a best fit line in Simple linear regression, we have a best fit plane or hyper-plane in MLR. …

The main purpose of this blog is to understand the important topic of statistics called Correlation Coefficient.

- What is Correlation?
- What is Correlation Coefficient?
- Types of Correlation Coefficient.
- Need for understanding Correlation.
- How to deal with it?
- Conclusion

Correlation is the statistical relationship, whether it is causal or not, between two continuous random variables. In simple words by using correlation we can know how two variables are moving. There are 3 types of correlation,

- Positive Correlation : In this type of correlation the two variables move together. If one variable increases the other variable also increases and vice-versa. …

This blog mainly focuses on explaining how a simple linear regression works. You can find the code and the dataset here.

- SLR is a statistical technique used for finding the existence of a linear relationship between a dependent variable(a.k.a. response variable or outcome variable) and an independent variable(a.k.a. explanatory variable ,predictor variable or feature).
- The linear relationship between these two variables can be represented by a straight line which is called a
**regression line**.

Don’t worry , We will understand the concept with the help of a simple dataset.

We need to first import the data.

This is my first attempt at writing a blog and I hope you like it.

I have kept everything in plain English without using any jargon. The main purpose is to help you in understanding the Text classification algorithm in Machine Learning-** Naive Bayes **with a simple example.

By reading this blog completely you would learn what this algorithm is all about and applications of Naive Bayes in Machine Learning to solve real world problems.

- Quick Introduction to Naive Bayes
- Mathematics of Probability required for this algorithm
- Simple example: i) Training Phase ii) Testing Phase
- Applications of Naive Bayes
- Common Mistakes to…

About