DeepMind’s Newest AI Programs Itself to Make All the Right Decisions – Singularity Hub

When Deep Blue defeated world chess champion Garry Kasparov in 1997, it may have seemed artificial intelligence had finally arrived. A computer had just taken down one of the top chess players of all time. But it wasnt to be.

Though Deep Blue was meticulously programmed top-to-bottom to play chess, the approach was too labor-intensive, too dependent on clear rules and bounded possibilities to succeed at more complex games, let alone in the real world. The next revolution would take a decade and a half, when vastly more computing power and data revived machine learning, an old idea in artificial intelligence just waiting for the world to catch up.

Today, machine learning dominates, mostly by way of a family of algorithms called deep learning, while symbolic AI, the dominant approach in Deep Blues day, has faded into the background.

Key to deep learnings success is the fact the algorithms basically write themselves. Given some high-level programming and a dataset, they learn from experience. No engineer anticipates every possibility in code. The algorithms just figure it.

Now, Alphabets DeepMind is taking this automation further by developing deep learning algorithms that can handle programming tasks which have been, to date, the sole domain of the worlds top computer scientists (and take them years to write).

In a paper recently published on the pre-print server arXiv, a database for research papers that havent been peer reviewed yet, the DeepMind team described a new deep reinforcement learning algorithm that was able to discover its own value functiona critical programming rule in deep reinforcement learningfrom scratch.

Surprisingly, the algorithm was also effective beyond the simple environments it trained in, going on to play Atari gamesa different, more complicated taskat a level that was, at times, competitive with human-designed algorithms and achieving superhuman levels of play in 14 games.

DeepMind says the approach could accelerate the development of reinforcement learning algorithms and even lead to a shift in focus, where instead of spending years writing the algorithms themselves, researchers work to perfect the environments in which they train.

First, a little background.

Three main deep learning approaches are supervised, unsupervised, and reinforcement learning.

The first two consume huge amounts of data (like images or articles), look for patterns in the data, and use those patterns to inform actions (like identifying an image of a cat). To us, this is a pretty alien way to learn about the world. Not only would it be mind-numbingly dull to review millions of cat images, itd take us years or more to do what these programs do in hours or days. And of course, we can learn what a cat looks like from just a few examples. So why bother?

While supervised and unsupervised deep learning emphasize the machine in machine learning, reinforcement learning is a bit more biological. It actually is the way we learn. Confronted with several possible actions, we predict which will be most rewarding based on experienceweighing the pleasure of eating a chocolate chip cookie against avoiding a cavity and trip to the dentist.

In deep reinforcement learning, algorithms go through a similar process as they take action. In the Atari game Breakout, for instance, a player guides a paddle to bounce a ball at a ceiling of bricks, trying to break as many as possible. When playing Breakout, should an algorithm move the paddle left or right? To decide, it runs a projectionthis is the value functionof which direction will maximize the total points, or rewards, it can earn.

Move by move, game by game, an algorithm combines experience and value function to learn which actions bring greater rewards and improves its play, until eventually, it becomes an uncanny Breakout player.

So, a key to deep reinforcement learning is developing a good value function. And thats difficult. According to the DeepMind team, it takes years of manual research to write the rules guiding algorithmic actionswhich is why automating the process is so alluring. Their new Learned Policy Gradient (LPG) algorithm makes solid progress in that direction.

LPG trained in a number of toy environments. Most of these were gridworldsliterally two-dimensional grids with objects in some squares. The AI moves square to square and earns points or punishments as it encounters objects. The grids vary in size, and the distribution of objects is either set or random. The training environments offer opportunities to learn fundamental lessons for reinforcement learning algorithms.

Only in LPGs case, it had no value function to guide that learning.

Instead, LPG has what DeepMind calls a meta-learner. You might think of this as an algorithm within an algorithm that, by interacting with its environment, discovers both what to predict, thereby forming its version of a value function, and how to learn from it, applying its newly discovered value function to each decision it makes in the future.

LPG builds on prior work in the area.

Recently, researchers at the Dalle Molle Institute for Artificial Intelligence Research (IDSIA) showed their MetaGenRL algorithm used meta-learning to learn an algorithm that generalizes beyond its training environments. DeepMind says LPG takes this a step further by discovering its own value function from scratch and generalizing to more complex environments.

The latter is particularly impressive because Atari games are so different from the simple worlds LPG trained inthat is, it had never seen anything like an Atari game.

LPG is still behind advanced human-designed algorithms, the researchers said. But it outperformed a human-designed benchmark in training and even some Atari games, which suggests it isnt strictly worse, just that it specializes in some environments.

This is where theres room for improvement and more research.

The more environments LPG saw, the more it could successfully generalize. Intriguingly, the researchers speculate that with enough well-designed training environments, the approach might yield a general-purpose reinforcement learning algorithm.

At the least, though, they say further automation of algorithm discoverythat is, algorithms learning to learnwill accelerate the field. In the near term, it can help researchers more quickly develop hand-designed algorithms. Further out, as self-discovered algorithms like LPG improve, engineers may shift from manually developing the algorithms themselves to building the environments where they learn.

Deep learning long ago left Deep Blue in the dust at games. Perhaps algorithms learning to learn will be a winning strategy in the real world too.

Update (6/27/20): Clarified description of preceding meta-learning research to include prior generalization of meta-learning in RL algorithms (MetaGenRL).

Image credit: Mike Szczepanski /Unsplash

Follow this link:

DeepMind's Newest AI Programs Itself to Make All the Right Decisions - Singularity Hub

Related Posts

Comments are closed.