Machine learning
Flesh, Blood, and Ego

Machine learning Illustration: © Amélie Tourangeau

How a historic Go game is probably the best lesson ever taught to humans by a machine.

Véronique Chagnon

The fifth game of the Go tournament in March 2016 between world champion Lee Sedol and AlphaGo, a program designed by the London-based artificial intelligence company DeepMind, is perhaps the best lesson ever taught to humans by machines. But not for the reasons you might think.
It all starts with a question: how do we teach artificial intelligence to play Go, given that the possible combinations in this millennia-old game are almost endless?
It’s apparently very simple (the player who conquers the largest territory wins the game). However, Go has a mystical element to it: each game can push the limits of the knowledge we have of the game. It is therefore impossible to program an algorithm with all the winning combinations in order to create a formidable artificial Go player. To beat the best players in the world, the software must learn to learn on its own.

Thousands of errors

Machine learning, the first theories of which were developed in the early 1950s, is a research field aimed at creating tools that teach machines to learn. At the center of the ultra-complex schemes that enabled the development of this research field, there is a rather elementary method: trial and error. Put simply, the machine observes which decisions bring it closer to its goal and which decisions harm it. From thousands of trials and errors, the algorithm “learns” to make the right choices for the initial objective.
With their clear rules and objective (to win the game), games are a prime field for machine learning research. It is easy to measure the performance and progress of software.   
To beat South Korean Lee Sedol, AlphaGo first analysed thousands of games between average players. Its programmers also fed the program the rules of the game. This initial input allowed the algorithm to learn the basics of Go and to determine the most frequently used combinations along with their success rates.
AlphaGo then entered its learning phase by testing things out. It played thousands of games against itself, each time slightly changing its strategy, sometimes successfully, sometimes not. After a series of defeats, AlphaGo was able to emerge from the minor leagues to face the best player in the world.

Playing for playing’s sake

The story is well known: Lee Sedol lost all but one of the games. Today, AlphaGo looks almost like an antique next to DeepMind’s new star, AlphaZero. The software, launched in 2017, is capable of winning go, chess, and shogi without the need for any human to provide it with the rules. That being said, the fourth and fifth games of the tournament between AlphaGo and world champion Sedol still have much to teach us about the symbolic power of machine learning.
On the morning of the fourth day of the five-day tournament, as the cameras started recording, the die had already been cast. Lee Sedol had already lost the championship after losing in the third game the day before. He finally seemed more relaxed. Resigned to the idea that he was being beaten by a machine, he could now play for playing’s sake. Against all odds, he won the game after having unseated AlphaGo with one of his creative moves, which are part of his secret repertoire (according to algorithm forecasts, only one human in 10,000 would have chosen to attempt this famous 78th move, which was decisive). South Korea celebrated in front of the running cameras. Whatever would happen tomorrow, mankind’s honor had been saved.
The next day, feverish hope spread to see Sedol take revenge on the machine for a second time. The game began and the tension rose. The unthinkable appeared to happen; we witnessed live what seemed to be the failure of the machine. All analysts agreed: AlphaGo committed multiple errors and made bizarre moves. Here and there, there was even laughter in the audience.
But then, under the gaze of dumbfounded spectators, a game of Go unfolded like no human had ever played before. An expert later said that we may have to add a tenth dan to the Go ranking system of nine dans. After 280 moves, the computer program won, knowing neither shame, rejection, nor ridicule. Lee Sedol, like the audience, was shaken by the capabilities of the software, which seemed to have learned from Sedol’s extraordinary 78th move the day before.
No game champion made of flesh, blood, and self-awareness would have attempted the incongruous moves that AlphaGo made in this fifth game. The player would have been passed off as mad in front of hundreds of thousands of spectators. But AlphaGo only knows the principle of trial and error.

You might also like