Linear Regression in Python to find Relationship between two columns - Code Explanation
import numpy as np
Importing Numpy Library and giving it a short name as np
Importing pandas librabry and giving it a short name as pd
from sklearn.linear_model import LinearRegression
Importing Linear Regression Model from Sckit Learn Library.
import matplotlib.pyplot as plt
Importing Matplotlib library for making graphs and giving a short name as plt
filepath = r'C:\Users\kkumaran\Downloads\Python - Regression Practice Workbook.xlsx'
This is the Location of the Excel File in my Computer. I am importing Excel file into variable called "filepath"
data = pd.read_excel(filepath,sheet_name='Linear Regression Practice 1')
This code reads the excel file using pandas library and load the sheet Linear Regression Practice 1 into a dataframe called "data". Now your excel data is inside python.
print(data)
This show the dataset for you to verify.
x = data[['Square Feet']]
I am creating a variable called x and putting Square feet Column data into x
y = data['Price']
I am creating a variable called y and putting Price column data into y
model = LinearRegression()
I am creating a object called 'model' and assigning Linear Regression into that object
model.fit(x,y)
I am fitting x and y variables into the model. It will train with the data and find Slope M and Intercept C of the data.
slope = model.coef_[0]
I am putting the Slope M value in the variable called 'slope'
intercept = model.intercept_
I am putting the Intercept C value in the variable called 'intercept'
print("Slope:", slope)
Now this will show the slope M value.
print("Intercept:",intercept)
this will show the intercept C value.
print("Intercept:",intercept)
this will show the intercept C value.
y_pred = model.predict(x)
This code will use this formula Y = MX + C to predict the Y values for all X values.
plt.figure(figsize=(12,8))
It creates a Graph size of 12 Inch wide and 8 inch height
plt.scatter(x,y, color='blue', label='Actual Data')
It creates data point as blue dots for original x and original y data
plt.plot(x,y_pred, color='red', label='Regression Line')
it creates data point as red dots for original x and predicted y data
for i in range(len(x)):
plt.text(x.iloc[i,0],y.iloc[i]+10,f"({x.iloc[i,0]},{y.iloc[i]})",color='blue')
this code writes original x and original y data values next to each blue dot.
for i in range(len(x)):
plt.text(x.iloc[i,0], y_pred[i] - 10, f"({x.iloc[i,0]},{round(y_pred[i],1)})", color='red')
this code writes original x and predicted y data values next to each red dot
plt.xlabel('Square feet')
It adds text as 'Square feet' for X axis
plt.ylabel('Price')
It adds text as 'Price' for Y axis
plt.legend()
adds Legend of the graph in corner to show which is original data and which is regression line
plt.show()
Display the final graph.
r2 = model.score(x,y)
This code checks how much Y column is dependent on x column
print("r_squared:",r2)
this code will show the strength of the X and Y relationship. 0 is Bad relationship. 1 is Best Relationship.
Comments
Post a Comment