Linear Regression on Time Series with SciKit Learn and Pandas
This post demonstrates simple linear regression from time series data using scikit learn and pandas.
Imports
Import required libraries like so.
import numpy as np
import pandas as pd
import datetime
from sklearn import linear_model
Create time series data
There are many ways to do this. Refer to the Time series section in the pandas documentation for more details. Here, we take a date range for the year of 2020 and create a datetime index based on each day.
start = datetime.datetime(2020, 1, 1)
end = datetime.datetime(2020, 12, 31)
index = pd.date_range(start, end)
index, len(index)
Output:
(DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
'2020-01-09', '2020-01-10',
...
'2020-12-22', '2020-12-23', '2020-12-24', '2020-12-25',
'2020-12-26', '2020-12-27', '2020-12-28', '2020-12-29',
'2020-12-30', '2020-12-31'],
dtype='datetime64[ns]', length=366, freq='D'), 366)
Create a pandas dataframe
Next, we can put this in a pandas dataframe. We can add an artificial “value” column that is a multiple of 5 for some generic target data.
multiple = 5
l = list(range(0, len(index)*multiple, multiple))
df = pd.DataFrame(l, index = index)
df.index.name = "date"
df.columns = ["value"]
df
Output:
value
date
2020-01-01 0
2020-01-02 5
2020-01-03 10
2020-01-04 15
2020-01-05 20
... ...
2020-12-27 1805
2020-12-28 1810
2020-12-29 1815
2020-12-30 1820
2020-12-31 1825
366 rows × 1 columns
Simple feature engineering from time series
We want something sensible to predict from. One simple option is to convert the date index into an integer from the minimum start date like so:
df['days_from_start'] = (df.index - df.index[0]).days; df
Output:
value days_from_start
date
2020-01-01 0 0
2020-01-02 5 1
2020-01-03 10 2
2020-01-04 15 3
2020-01-05 20 4
... ... ...
2020-12-27 1805 361
2020-12-28 1810 362
2020-12-29 1815 363
2020-12-30 1820 364
2020-12-31 1825 365
366 rows × 2 columns
Simple Regression
We can now pull out the columns for simple linear regression.
x = df['days_from_start'].values.reshape(-1, 1)
y = df['value'].values
Note that our input variables x have to be reshaped for input to the model.
Use scikit’s LinearRegression to fit the model
Run
model = linear_model.LinearRegression().fit(x, y)
linear_model.LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
model.predict([[1], [7], [50]])
Output:
array([ 5., 35., 250.])
More
Archive
chinese
tang-dynasty-poetry
李白
python
王维
rl
pytorch
numpy
emacs
杜牧
spinningup
networking
deep-learning
贺知章
白居易
王昌龄
杜甫
李商隐
tips
reinforcement-learning
macports
jekyll
骆宾王
贾岛
孟浩然
xcode
time-series
terminal
regression
rails
productivity
pandas
math
macosx
lesson-plan
helicopters
flying
fastai
conceptual-learning
command-line
bro
黄巢
韦应物
陈子昂
王翰
王之涣
柳宗元
杜秋娘
李绅
张继
孟郊
刘禹锡
元稹
youtube
visdom
system
sungho
stylelint
stripe
softmax
siri
sgd
scipy
scikit-learn
scikit
safari
research
qtran
qoe
qmix
pyhton
poetry
pedagogy
papers
paper-review
optimization
openssl
openmpi
nyc
node
neural-net
multiprocessing
mpi
morl
ml
mdp
marl
mandarin
macos
machine-learning
latex
language-learning
khan-academy
jupyter-notebooks
ios-programming
intuition
homebrew
hacking
google-cloud
github
flashcards
faker
docker
dme
deepmind
dec-pomdp
data-wrangling
craftsman
congestion-control
coding
books
book-review
atari
anki
analogy
3brown1blue
2fa