# How important is scaling for SGDRegressor in SciKit Learn?

I’ve been playing around with `SGDRegressor`

from the scikit learn
library and was having some trouble with nonsensical outputs.

Even with a simple manufactured dataset, to which a `LinearRegressor`

could fit a perfect line, `SGDRegressor`

was spitting out nonsensical
values.

Here was the sample dataset I used, where the predicted value was simply 5 times the input value:

```
num_samples = 100
multiple = 5
y = np.array([i*multiple for i in range(num_samples)])
x = np.array([i for i in range(num_samples)])
x[:5], y[:5]
# Output:
# (array([0, 1, 2, 3, 4]), array([ 0, 5, 10, 15, 20]))
```

It wasn’t until I started scaling the data that I was able to get the results I expected.

From the scikit-learn website:

Stochastic Gradient Descent is sensitive to feature scaling, so it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standardize it to have mean 0 and variance 1. Note that the same scaling must be applied to the test vector to obtain meaningful results.

I created a Jupyter Notebook below as a simple demonstration.

## More

## Archive

chinese

tang-dynasty-poetry

李白

python

王维

rl

pytorch

numpy

emacs

杜牧

spinningup

networking

deep-learning

贺知章

白居易

王昌龄

杜甫

李商隐

tips

reinforcement-learning

macports

jekyll

骆宾王

贾岛

孟浩然

xcode

time-series

terminal

regression

rails

productivity

pandas

math

macosx

lesson-plan

helicopters

flying

fastai

conceptual-learning

command-line

bro

黄巢

韦应物

陈子昂

王翰

王之涣

柳宗元

杜秋娘

李绅

张继

孟郊

刘禹锡

元稹

youtube

visdom

system

sungho

stylelint

stripe

softmax

siri

sgd

scipy

scikit-learn

scikit

safari

research

qtran

qoe

qmix

pyhton

poetry

pedagogy

papers

paper-review

optimization

openssl

openmpi

nyc

node

neural-net

multiprocessing

mpi

morl

ml

mdp

marl

mandarin

macos

machine-learning

latex

language-learning

khan-academy

jupyter-notebooks

ios-programming

intuition

homebrew

hacking

google-cloud

github

flashcards

faker

docker

dme

deepmind

dec-pomdp

data-wrangling

craftsman

congestion-control

coding

books

book-review

atari

anki

analogy

3brown1blue

2fa