How Reinforcement Learning Helped Us Parent

3 min readJun 9, 2019

Very recently, my wife and I began studying Reinforcement Learning. Reinforcement Learning is an area of Machine Learning where a machine learns to map situations to actions to maximize reward. In other words, with enough trial and error, a machine can learn to respond to its environment well enough to win a game of chess or navigate an autonomous vehicle.

Surprisingly, buried in these abstract concepts, we discovered a new vocabulary that would help us better articulate and understand what had sometimes felt incomprehensible. That is how our teenager makes decisions!

My son has what’s considered a neuroatypical brain (more commonly known as ADHD). He’s a brilliant kid who makes choices that don’t always feel rational [to us]. The reward system in his brain doesn’t necessarily operate in expected ways.

Exploration vs. Exploitation

For a machine to maximize reward, it has to decide whether to attempt a brand new action or rely on ones that have yielded some reward before. This trade-off is often referred to as Exploitation vs. Exploration, and Sutton and Barto elegantly describe the concept in Reinforcement Learning — An Introduction.

“The agent has to exploit what it has already experienced in order to obtain reward, but it also has to explore in order to make better action selections in the future.”
- Sutton & Barto, Reinforcement Learning: An Introduction

The dilemma here is that an agent (or learner) can’t determine which actions are most valuable until it tries them repeatedly. Only then can it progressively choose better actions.

Trial and error are obviously a common pattern for humans, but it’s easy to forget that we don’t always learn from our choices the first time, the second time, the third, etc. For better or worse, my son (like most of us) was destined to repeat certain patterns and behaviors before ultimately discovering ones that would bring him success.

Greedy Actions

Even when humans create habits that have a positive outcome, we often frivolously abandon those habits. Somehow, even when my son completed his assignments a few days in a row and saw his grades improve, he still chose to skip some assignments later in the week. Why? Probably because his brain convinced him that — at that very moment — watching The Office on Netflix was more valuable than finishing his homework on time.

In Reinforcement Learning, this behavior is known as taking a Greedy Action. The crux of RL is that an agent must learn to consistently calculate the value of an action at any given state while remaining conscious of the long-term cumulative reward. Selecting an action greedily means that the agent exploits its current (or limited) knowledge of the environment and chooses the action with a higher short-term value. Whereas, if it had chosen the action with a lower immediate value, it could have uncovered a greater cumulative reward. Seems obvious, right? For a machine (and for my son), it isn’t always.

Where is epsilon when we need it?

In the end, all of the learnings an agent uses to inform decisions are represented as a formula called a policy. A policy is a term used to describe the mapping from various states in an environment to corresponding actions to be taken. For example, the epsilon-greedy policy introduces a parameter (epsilon) that, when increased, tells the agent to behave greedily less often. In other words, we can bias the agent so that it chooses to explore more often. An optimal balance between exploration and exploitation results in a better understanding of the environment; ultimately, leading to better decisions.

…where can I adjust the value of epsilon for my son?

The question is, where can I adjust the epsilon value for my son? The answer is that we can’t. I won’t go into my philosophy on parenting, but let’s say that we’ll wait for as many iterations as it takes for him to arrive at his optimal policy — preferably before he’s 25 — and we’ll support him the entire way.

Hidden inside a theoretical conversation about Reinforcement Learning, we discovered better words and a clearer understanding of how our son thinks.

References

Sutton RS, Barto AG. 2012. Reinforcement Learning: An Introduction. MIT Press. https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

William Dodson M., ADDitude’s ADHD Medical Review Panel. 2013 Apr 19. Secrets of your ADHD brain. Additudemag.com. https://www.additudemag.com/secrets-of-the-adhd-brain/.

How Reinforcement Learning Helped Us Parent

Exploration vs. Exploitation

Greedy Actions

Where is epsilon when we need it?

References

Written by Carlos Rodriguez (he/him)