[Paper Review] Reinforcement learning for bandwidth estimation and congestion control in real-time communications

Mar 15, 2020

[ rl networking congestion-control ]

I’ve been reading a bunch of papers on reinforcment learning applied to video streaming recently, and a shorter paper out of Microsoft called Reinforcement learning for bandwidth estimation and congestion control in real-time communications caught my eye for its clarity.

Here are some of my notes.

Abstract

Bandwidth estimation and congestion control for real-time communications (i.e., audio and video conferencing) remains a difficult problem, despite many years of research. Achieving high quality of experience (QoE) for end users requires continual updates due to changing network architectures and technologies. In this paper, we apply reinforcement learning for the first time to the problem of real-time communications (RTC), where we seek to optimize user-perceived quality. We present initial proof-of-concept results, where we learn an agent to control sending rate in an RTC system, evaluating using both network simulation and real Internet video calls. We discuss the challenges we observed, particularly in designing realistic reward functions that reflect QoE, and in bridging the gap between the training environment and real-world networks.

My Summary

They focus on applying reinforcement learning (RL) to improving real time, two-way, communications in video streaming. In other words, some sort of two-way video call like FaceTime or a Skype video call.

RL has previously been applied to video streaming, but not in two-way real time communications. They are

video on demand (think of a regular YouTube)
real-time video streaming (think a live broadcast over Twitch, Facebook Live, etc.)

What makes real time communications so difficult?

If you want two-way interactivity, you can’t have a pre-fetched buffer which is typically used in the scenarios above.
Upload speed comes into play for all parties, which is typically more limited than downlaod speed.
The model needs to run in real-time, so decisions have a tighter window.
There are no pre-encoded quality levels for the video streams, so the action space is perhaps not discrete.

R3Net: An Initial Approach

They train a neural network model designed to estimate bandwidth in real-time. It runs on the receiver side, which then communicates its results to the sender, which then can use the data to control the sending rate of the stream (presumbly increasing data flow and quality, as bandwidth allows).

They formulate the state input as

receiver rate (kb/s)
average package interval (ms)
pacakge loss rate (%)
average RTT (ms)

and their action as a bandwidth estimate of 0 to 8 Mb/s. Their reward is then designed as

\[0.6 ln (4R+1) - D - 10L\]

where R is the receiver rate in (Mb/s), D is the average RTT, and L is the packet loss rate. This then supposedly translates to a representation of better QoE, because it rewards receiving more network packets (data) and punishes delay and dropped packets.

They use Unscented Kalman Filter (UKF) as a fixed-rule baseline, and show their neural network improves bandwidth usage by about 5% in a simulated environment, but find the neural network falters a bit when moved to a real environment, and measuring actual video quality (with Netflix’s VMAF), particularly over a slower 3G connection.

Takeaway

Their neural network setup wasn’t particularly interesting to me, as it is just a module that can be improved, but the paper provides and nice architectural overview of how RL might be applied in real-time communications environment, as well as a short introduction to what else is being done in this field of research.

I was also left wondering how they designed the coefficients in their reward function and why they didn’t try to tie the reward directly to the VMAF quality output in the real scenario.

Fang, Joyce, et al. “Reinforcement learning for bandwidth estimation and congestion control in real-time communications.” arXiv preprint arXiv:1912.02222 (2019).

Archive

chinese tang-dynasty-poetry 李白 python 王维 rl pytorch numpy emacs 杜牧 spinningup networking deep-learning 贺知章 白居易 王昌龄 杜甫 李商隐 tips reinforcement-learning macports jekyll 骆宾王 贾岛 孟浩然 xcode time-series terminal regression rails productivity pandas math macosx lesson-plan helicopters flying fastai conceptual-learning command-line bro 黄巢 韦应物 陈子昂 王翰 王之涣 柳宗元 杜秋娘 李绅 张继 孟郊 刘禹锡 元稹 youtube visdom system sungho stylelint stripe softmax siri sgd scipy scikit-learn scikit safari research qtran qoe qmix pyhton poetry pedagogy papers paper-review optimization openssl openmpi nyc node neural-net multiprocessing mpi morl ml mdp marl mandarin macos machine-learning latex language-learning khan-academy jupyter-notebooks ios-programming intuition homebrew hacking google-cloud github flashcards faker docker dme deepmind dec-pomdp data-wrangling craftsman congestion-control coding books book-review atari anki analogy 3brown1blue 2fa