Linear Regression in Python to find Relationship between two columns - Formula Explanation
Have you ever noticed in your Life that One thing affects another thing?
If Someone Study Well, they score High Marks
If Someone Eat Well, they gain Weight
If Someone Sleep Well, they remain Younger
Doing One thing affects Another thing.
But, How much?
How much One thing affects Another thing?
50%? 90%? or only 20%?
To find that, we use Linear Regression method.
Linear Regression Method uses this Formula: Y = MX + C
What is Y:
Y is also called as 'Dependent' Value. Because it depends on X. If X Changes, Y Changes. So Y is Dependent value.
So we always assign Dependent Column to Variable Y is Python.
example:
We have 2 Columns. "Study Hours" and "Exam Score"
Now, does the Exam score increase when Study Hour increase?
YESS!
If Someone study more time, their score will obviously increase.
So 'Exam Score' Column is depending on 'Study Hours' Column, Right?
We call 'Exam Score' Column as Y Column. Because, it is dependent on another Column, so it is called as Y.
What is X:
X is also called as 'Independent Value'. Because it does not depend on "Exam Score".
So we always assign Independent Column to Variable X is Python.
example:
'Study Hours' is the input. Based on the Study Hour, 'Exam score' changes.
'Study Hours' is the input column and 'Exam Score' is the output column.
So, here 'Study Hours' is not depending on any other column. It is Independent Column, also called as X.
What is M:
M is also called Slope. Slope means a surface that goes up or down like (stairs, water slide)
It is also called as Coefficient - (amount, volume, quantity)
M tells us how much Y values Change when X changes by 1 value.
Formula for M is (change in Y) / (Change in X)
We find M value in python using Scikit-learn Library using the code: model.coef_
example:
If M = 5 for our dataset, it means for every 1 Hour increase in 'Study Hour' Column, 5 Marks increase in 'Exam Score' Column.
1:5
So M means, rate of change or ratio of change between X and Y Column.
What is C:
C is also called the Intercept.
Intercept means the value of Y when X = 0.
We find the C value in Python using the scikit-learn library with the code: model.intercept_
Example:
We have 2 Columns. "Study Hours" and "Exam Score"
If you don’t study at all (Study Hours = 0), what will be your Exam Score?
That starting value is C.
If C = 5 for our dataset, it means even if you study 0 hours, you still get 5 marks.
This could be from class attendance, internal marks, or some random correct answers.
Linear Regression Method uses this Formula: Y = MX + C
So, if you know M, X and C, you can find Connection between X and Y Column.
You can also find what will be the Y Value just by using X value.
Which means, you can predict what will be the Future Score of a Student using his Study Time Column.
Comments
Post a Comment