Why can’t weight preferences be part of the state in multi objective reinforcement learning?
[
]
A friend and I were looking at the following diagram the other day of a multi-objective Q value network.
Image Credit: Runzhe Yang
And my friend, asked: “Why can’t the weight preferences be considered
part of the state?” The network takes both state
and weights
as
input, so why not just consider weighs
as part of the state
, which
can be formulated however we want?
I had to go back to the formal definition of a Markov Decision Process.
In an MDP is a 4-tuple (S, A, P, R), where P is the probability of the next state given a current state and action.
Weight preferences, however, are not part of this equation. Weight preferences typically come from the a user, and in multi-objective reinforcement learning, the problem setting deals with not knowing a user’s weight preferences beforehand.
More
Archive
chinese
tang-dynasty-poetry
李白
python
王维
rl
pytorch
numpy
emacs
杜牧
spinningup
networking
deep-learning
贺知章
白居易
王昌龄
杜甫
李商隐
tips
reinforcement-learning
macports
jekyll
骆宾王
贾岛
孟浩然
xcode
time-series
terminal
regression
rails
productivity
pandas
math
macosx
lesson-plan
helicopters
flying
fastai
conceptual-learning
command-line
bro
黄巢
韦应物
陈子昂
王翰
王之涣
柳宗元
杜秋娘
李绅
张继
孟郊
刘禹锡
元稹
youtube
visdom
system
sungho
stylelint
stripe
softmax
siri
sgd
scipy
scikit-learn
scikit
safari
research
qtran
qoe
qmix
pyhton
poetry
pedagogy
papers
paper-review
optimization
openssl
openmpi
nyc
node
neural-net
multiprocessing
mpi
morl
ml
mdp
marl
mandarin
macos
machine-learning
latex
language-learning
khan-academy
jupyter-notebooks
ios-programming
intuition
homebrew
hacking
google-cloud
github
flashcards
faker
docker
dme
deepmind
dec-pomdp
data-wrangling
craftsman
congestion-control
coding
books
book-review
atari
anki
analogy
3brown1blue
2fa