Linear Regression in Python to find Relationship between two columns - Code Explanation

import numpy as np

Importing Numpy Library and giving it a short name as np

import pandas as pd

Importing pandas librabry and giving it a short name as pd


from sklearn.linear_model import LinearRegression

Importing Linear Regression Model from Sckit Learn Library.


import matplotlib.pyplot as plt

Importing Matplotlib library for making graphs and giving a short name as plt


filepath = r'C:\Users\kkumaran\Downloads\Python - Regression Practice Workbook.xlsx'

This is the Location of the Excel File in my Computer. I am importing Excel file into variable called "filepath"


data = pd.read_excel(filepath,sheet_name='Linear Regression Practice 1')

This code reads the excel file using pandas library and load the sheet Linear Regression Practice 1 into a dataframe called "data". Now your excel data is inside python.


print(data)

This show the dataset for you to verify.






x = data[['Square Feet']]

I am creating a variable called x and putting Square feet Column data into x


y = data['Price']

I am creating a variable called y and putting Price column data into y


model = LinearRegression()

I am creating a object called 'model' and assigning Linear Regression into that object


model.fit(x,y)


I am fitting x and y variables into the model. It will train with the data and find Slope M and Intercept C of the data.


slope = model.coef_[0]

I am putting the Slope M value in the variable called 'slope'


intercept = model.intercept_

I am putting the Intercept C value in the variable called 'intercept'


print("Slope:", slope)

Now this will show the slope M value.


print("Intercept:",intercept)

this will show the intercept C value.


y_pred = model.predict(x)

This code will use this formula Y = MX + C to predict the Y values for all X values.





plt.figure(figsize=(12,8))

It creates a Graph size of 12 Inch wide and 8 inch height


plt.scatter(x,y, color='blue', label='Actual Data')

It creates data point as blue dots for original x and original y data


plt.plot(x,y_pred, color='red', label='Regression Line')

it creates data point as red dots for original x and predicted y data


for i in range(len(x)):

plt.text(x.iloc[i,0],y.iloc[i]+10,f"({x.iloc[i,0]},{y.iloc[i]})",color='blue')


this code writes original x and original y data values next to each blue dot.



for i in range(len(x)):

plt.text(x.iloc[i,0], y_pred[i] - 10, f"({x.iloc[i,0]},{round(y_pred[i],1)})", color='red')


this code writes original x and predicted y data values next to each red dot



plt.xlabel('Square feet')

It adds text as 'Square feet' for X axis


plt.ylabel('Price')

It adds text as 'Price' for Y axis


plt.legend()

adds Legend of the graph in corner to show which is original data and which is regression line


plt.show()

Display the final graph.






r2 = model.score(x,y)

This code checks how much Y column is dependent on x column


print("r_squared:",r2)

this code will show the strength of the X and Y relationship. 0 is Bad relationship. 1 is Best Relationship.

Comments

Popular posts from this blog

What is Artificial Intelligence? What is Machine Learning? What is Data Science? how they are related to each other?

Linear Algebra - What is it?

What is a Python Library?