29.3 Reinforcement Learning

https://en.wikipedia.org/wiki/Multi-armed_bandit Q-Learning Deep Q-Network (DQN) A3C (Asynchronous Advantage Actor-Critic) Genetic Algorithm SARSA (State-Action-Reward-State-Action)