Reinforcement Learning algorithms train an agent from the reward signal it obtains. Thus, a naïve agent relies on exploration to collect its first rewards. Random actions can proved to be efficient in some setting, like illustrated in Maze DQN . But, in environments where rewards are sparse, this exploration can prove to be overly long and hazardous. This is where curiosity comes at the rescue !
How can we make the most of curiosity to explore reward-sparse environments ?
Deep Q Learning gets out of the Maze
Reinforcement learning techniques shows great potential for game-based AI but fails to scale on real-world applications. Indeed, state space and action space become continuous and, thus, prevent any tabular-based learning.
Therefore, reinforcement learning agents require to be piloted by a learning decision function thus falling back on deep learning.
How reinforcement learning and deep learning team up to give birth to an efficient agent ?
Model-based Reinforcement Learning masters Board Games
DeepMind’s algorithm AlphaGo amazed the whole world by beating 4-1 Go champion Lee Sedol in 2016. This complex algorithm learned to play Go by crunching thousands Go grandmasters plays. AlphaZero
, DeepMind’s latest breakthrough in the field is far more impressive. It easily outperformed AlphaGo by solely learning on its own.
How was that made possible and what mathematical mechanisms lie underneath?