RL S1: Intro to Reinforcement Learning
Supplemental Lecture 1 of CS 7642 - Reinforcement Learning @ Georgia Tech.
What is Reinforcement Learning? (Supplemental)
Motivation + Context
Artificial Intelligence (AI) refers to the ability for machines to simulate human behavior. AI has achieved outstanding performance in tasks that seemed out of reach a few years ago. One of the most appealing aspects of AI is its ability to discover patterns that are not immediately apparent or well-known to us, regardless of the application field.
There are many subfields of AI:
- Planning AI: concerned with the realization of strategies or action sequences (e.g., GPS navigation, resource allocation, tactic selection, etc.).
- Knowledge-Based AI: reasons and utilizes a knowledge base (database) to solve complex problems (e.g., decision support systems for medical, teaching, and Q&A applications).
- Machine Learning (ML): ability for machines to simulate human behavior by learning from data. In this context, learning refers for the ability of an algorithm to improve performance relative to some task via data, as opposed to being explicitly programmed to perform better.
- Supervised Learning: function approximation. Concerned with learning an optimal mapping between inputs $X$ and output(s) $y$. Examples include regression and classification. $f: X \rightarrow y$
- Unsupervised Learning: data description. Algorithm learns certain structural aspects of the data $X$ without any dedicated output. Examples include clustering, dimensionality reduction, etc. $f: X$.
- Reinforcement Learning: reward maximization. Concerned with learning optimal mapping from environmental state to agent action, represented as policy. $f: \pi(s) \rightarrow a$
Deep Learning (DL) is a subfield of AI based on methodology (as opposed to problem scope). DL utilizes multi-layered nonlinear function approximators, most commonly realized as neural networks with two or more hidden layers.
Definition
More formally, Reinforcement Learning (RL) refers to the ability of machines to simulate human behavior by learning from data that is simultaneously sequential, evaluative, and sampled.
- Sequential: actions not only determine performance in the immediate future, but also have long-term consequences. Must balance tradeoff between immediate and long-term goals.
- Evaluative: reward is relative to the agent. Algorithm must balance between gathering and utilization of information (exploration vs. exploitation, respectively).
- Sampled Feedback: since it isn’t possible to gather all possible information (e.g., state : action pairs) for a particular problem, the algorithm must learn to generalize + find patterns as opposed to performing an exhaustive search.
RL sits at the intersection of many fields, taking inspiration + concepts from many different areas!
Success Stories
Go is a game which was previously considered to be computationally insolvable, mostly due to its massive state space. In 2015, David Silver (Google DeepMind) released the initial version of AlphaGo, which was able to compete and win against legendary Go players! Subsequent versions of AlphaGo (AlphaGo Master, AlphaGo Zero) were able to achieve even better performance.
Over the last decade, reinforcement learning has been increasingly applied to fields including robotic control, video game agents, and niche practical applications such as datacenter cooling.
(all images obtained from Georgia Tech RL course materials)




