K Nearest Neighbor

We will look into the basics of K Nearest Neighbor

Prerequisites

We will be using Jupyter lab, so make sure you have that installed

What is KNN (K Nearest Neighbor)

KNN is a classification algorithm meaning we are trying to determine if an instance belongs to a specific group or not. For example, if a person will make a purchase or not, if a stock price will go up or not, if a student til pass a test or not; We are not trying to determine the magnitude of anything, like, predicting how much rain will fall tomorrow or what the price of something should be.

Imagine we have a sample of people looking to buy a car, some will buy, others will not. The data looks something like:

Buy car

One way to figure out if someone will buy a car? We can look at people who are 'close' to someone who is also buying a car in terms of the number of kids they have. If we look at individual 3, who has 2 kids, and look at the 2 nearest neighbors, one with one kid and one with 3 kids, they both also bought cars, but if we look at the 3 nearest neighbors, then we also include the 2 people with zero kids, who didn't buy cars.

Another way to illustrate KNN is to like this:

KNN

If we look at the 3 nearest neighbors we will predict purple for our white ball, but if we take k = 5, then the prediction would be green.

References

KNN lazy programmer

Nmist dataset