Linear Regression: Normally Vs with Seaborn

Subarna Lamsal
3 min readMay 23, 2019

In Data Science, Linear Regression(LR) is a linear approach to modelling the relationship between dependent variables and independent variables. There are two types of linear regression: Simple LR, and Multiple LR.

Simple LR : having single explanatory variable.

Multiple LR: having more than one explanatory variable

Here, we will work on Simple LR. First of all, we will implement linear regression using sklearn (normal way), then with seaborn.

Let’s look at the dataset which contains the Position, Level, and Salary of different designations in a company.

We are going to implement Simple Linear Regression(SLR) . First of all, let’s import required libraries.

import pandas as pd      # to read csv file
import matplotlib.pyplot as plt #to plot the predicted line
from sklearn.linear_model import LinearRegression
#to implement linear model

We only import LinearRegression from sklearn.linear_model.
Now, let’s read the csvfile.

pos_sal=pd.read_csv('salary.csv')

Now, let’s work on simple linear regression. Since we don’t have large data, we don’t need to segregate into training sets and test sets.

linear_reg=LinearRegression()

We have instantiated the LinearRegression. Now, let’s identify the independent and dependent variable.

Level: Independent Variable

Salary: Dependent Variable

Now, let’s work on it.

X=pos_sal.iloc[:,1].values.reshape(-1,1)Y=pos_sal.iloc[:,2].values.reshape(-1,1)

‘iloc’ is used for integer-location based indexing by position. Since our data has a single feature/sample, we need to use reshape function to provide 2D array instead of 1D array. The limit is -1 to 1 which creates values to list of lists.
Now, it’s time to fit X and Y.

linear_reg.fit(X,Y)

It gives the detailed description of the LinearRegression status.

Output: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)

Now we have to predict the value of “Salary” as it is a dependent variable.

Y_pred=linear_reg.predict(X)

We have completed our linear regression. Now, let’s plot the original graph and the predicted graph in the same figure.

plt.scatter(X,Y)
plt.plot(X,Y_pred)
plt.legend('R')
plt.xlabel('Level')
plt.ylabel('Salary')
plt.title('Linear Regression')
plt.show()
Linear Regression

Also, you can set the y-limit using plt.ylim([lower_value,upper_value])

The straight line is the graph of linear model of the given variables.

However, a new data visualisation package (Seaborn) has an attribute of linear regression in just of code.

First of all, let’s import the seaborn package.

import seaborn as ssn

Seaborn has a feature of lmplot which directly plot the linear model of the given attributes.

Let’s see the format of code.

ssn.lmplot(x=”Independent variable,y= “ Dependent variable” ,data= dataframe_name)

Now, let’s see the code.

ssn.lmplot(x="Level",y="Salary",data=pos_sal)

Preety simple…. Just a single line code.

It also provides the best case, and worst case scenario of linear model as shown in shade part.

That’s it. It’s upto you to perform normally or using seaborn package.

The seaborn package can be installed using the following code:

pip install seaborn

If you are using Anaconda Jupyter, you can install the released version using

conda install seaborn

Alternatively, you can use pip to install the development version directly from github.

pip install git+https://github.com/mwaskom/seaborn.git

Happy coding!!!

--

--