How important is scaling for SGDRegressor in SciKit Learn?
I’ve been playing around with SGDRegressor
from the scikit learn
library and was having some trouble with nonsensical outputs.
Even with a simple manufactured dataset, to which a LinearRegressor
could fit a perfect line, SGDRegressor
was spitting out nonsensical
values.
Here was the sample dataset I used, where the predicted value was simply 5 times the input value:
num_samples = 100
multiple = 5
y = np.array([i*multiple for i in range(num_samples)])
x = np.array([i for i in range(num_samples)])
x[:5], y[:5]
# Output:
# (array([0, 1, 2, 3, 4]), array([ 0, 5, 10, 15, 20]))
It wasn’t until I started scaling the data that I was able to get the results I expected.
From the scikit-learn website:
Stochastic Gradient Descent is sensitive to feature scaling, so it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standardize it to have mean 0 and variance 1. Note that the same scaling must be applied to the test vector to obtain meaningful results.
I created a Jupyter Notebook below as a simple demonstration.
More
Archive
chinese
tang-dynasty-poetry
李白
python
王维
rl
pytorch
numpy
emacs
杜牧
spinningup
networking
deep-learning
贺知章
白居易
王昌龄
杜甫
李商隐
tips
reinforcement-learning
macports
jekyll
骆宾王
贾岛
孟浩然
xcode
time-series
terminal
regression
rails
productivity
pandas
math
macosx
lesson-plan
helicopters
flying
fastai
conceptual-learning
command-line
bro
黄巢
韦应物
陈子昂
王翰
王之涣
柳宗元
杜秋娘
李绅
张继
孟郊
刘禹锡
元稹
youtube
visdom
system
sungho
stylelint
stripe
softmax
siri
sgd
scipy
scikit-learn
scikit
safari
research
qtran
qoe
qmix
pyhton
poetry
pedagogy
papers
paper-review
optimization
openssl
openmpi
nyc
node
neural-net
multiprocessing
mpi
morl
ml
mdp
marl
mandarin
macos
machine-learning
latex
language-learning
khan-academy
jupyter-notebooks
ios-programming
intuition
homebrew
hacking
google-cloud
github
flashcards
faker
docker
dme
deepmind
dec-pomdp
data-wrangling
craftsman
congestion-control
coding
books
book-review
atari
anki
analogy
3brown1blue
2fa