Wednesday , June 16 2021

Multi-picture details The "Star Wars" AI "AlphaStar" non-human level of DeepMind | TechNews



British local time from

1 from

month from

24 from

Day,from

DeepMind from

Online Live in London, games to the world from

AI from

Researchers and enthusiasts from

AI from

Recent developments in research and development.

Take part in the live show from

DeepMind from

Researcher is from

DeepMind from

Team Joint Leader in Research and Development from

Oriol Vianals from

and from

David Silverfrom

The latter is also from

AlphaGo Projectfrom

The main developer, the outside world must be familiar with it.

▲ Oriol Vinyals (left) and David Silver.

DeepMind from

"Old Marine Hegemony from

2 "AI from

Named "from

AlphaStarfrom

", the naming method is similar to the previous Go from

AIfrom

"from

AlphaGofrom

And calculate compression of proteins from

AIfrom

"from

AlphaFoldfrom

. "

According to from

DeepMind from

Introduction,from

Operation AlphaStarfrom

Protoss (from

Protossfrom

),from

2018 from

year from

12 from

month from

10 from

Day for from

5:0 from

The record is defeated from

Team Liquid from

Professional "Xinghai" from

2 & # 39;from

racer from

TLOfrom

And after further training,from

12 from

month from

19 from

Japan again from

5:0 from

The record of victory overcame professional players from the same team from

authorityfrom

Live streaming is repeated and explained a few of the games.

AlphaStar from

It shows the professional gaming strategy of professional players as well as micro-operations that go beyond the professional level. Even the battle can be held in several places on the map (human players will suffer greatly in this situation).

from

The architectural layout of the professional style of the players quickly sent scouts to explore the map and the other base.

St AlphaStar from

It will create a large number of workers and will quickly create resource advantages (more than human professional players) from

16 from

or from

18 from

Upper limit).

St AlphaStar from

Stalker besieges human players from three countries from

authority from

The Immortal.

St AlphaStar from

The two stalkers who control the black blood can escape.

Live broadcast from

DeepMind from

Still let it go again from

AlphaStar from

and from

authority from

Live play. The game from

AlphaStar from

This is a new version of retraining and you have to control the perspective yourself (unlike the previous version, you can read all the visible content on the map directly). This time from

authority from

Finally he won.

"Hindu Hegemony"from

AI from

background

From the complete information game / game presented by Go from

AlphaGo from

After conquering and overtaking the level of human top players, researchers immediately began a more aggressive attack on the incomplete information game. A typical incomplete information game, such as Texas Hold'em, requires players to make decisions without seeing their opponent's face.from

CMU from

German flickering from

AI from

The document also received from

NIPS 2017 from

Best Paper Award.

On the other hand, deep-learning researchers also hope to explore more complex games / games with the strength of deep reinforcement. Texas Hold'em is clearly not difficult enough. from

AI from

The father said he did not use any deep learning, look at Go, although the number of possible scenes in Go is astronomical, but each circle only has to choose to put some chess on the board. By contrast, the mobile space of today's competing video games is much more complex and the game can have from

2 from

More than one player is involved, each player can synchronize actions, each action has a different duration, displacement and movement are spatially uninterrupted and there are many changes to elements such as attacks.

Since the fanatics of the year are now researchers in the field of computer science, e-sports from

AI from

Research and development also quickly divide two major camps: "Hinnghia Hinhim / Hindu Hegemony from

2 & # 39;from

and from

DOTA 2from

Both have a broad base, players against games from

AI from

I like it and there are many high-level professionals. from

AI from

Learn from each other.

Although both from

RTS from

(real-time strategy) games, all must find a balance between resource-gathering and struggle, but Xinghai and from

DOTA 2 from

There are also many differences. Xinghai has to control different categories of units, each with its own motion and attack characteristics,from

DOTA 2 from

Control just the same hero from start to finish, there is only one player on each side of Xinghai, and from

DOTA 2 from

There are 5 players on each side. Hence, the difference between game strategy and performance also makes Xinghai from

AI from

and from

DOTA 2 AI from

Research has emerged from different development paths.

From this competition, "The Hindu Hegemonyfrom

"AI from

and from

DOTA 2 AI from

The strongest field of research is seen from

AI from

From Samsung and from

OpenAI:

  • 2018 from

    year from

    AIIDE "Star Wars"AI from

    Challenges are global from

    27 from

    Team with from

    AI from

    Participation in the race, winning a champion from

    botfrom

    "from

    Saidafrom

    From Samsung. this from

    bot from

    The main function is to have a robust play strategy, first to consider protection and then to take the opportunity to take the other side in the middle of the game. This strategy was taught by the South Korean professional Xinghai players. this from

    bot from

    Last year it was not possible to beat professional players.

  • "Star Maritime Hegemony" AI from

    Widely used fixed strategies and manual rules, Samsung from

    bot from

    Apply some machine training techniques to help control the device, explore the map, and the developer team is also trying to implement machine learning techniques. Participate in the same competition from

    Facebook from

    Zerg from

    botfrom

    "from

    CherryPifrom

    A large number of machine training techniques were applied, but only the second place was obtained.

  • 2018 from

    year from

    8 from

    month,from

    OpenAI was heldfrom

    Test your own offline game from

    DOTA 2 AI from

    System "from

    OpenAI Fivefrom

    In case of more competition restrictions, the team of former professional players in Europe and America won the victory later. from

    DOTA 2 from

    International Call from

    Ti8 from

    He failed in a team of Chinese (former) professional players. Then from

    OpenAI from

    Continuous improvement and claiming that the later version has significantly exceeded the previous version of the game.

  • "from

    OpenAI Fivefrom

    Is a well-developed deep reinforcement system from

    5 from

    Separate neural networks are controlled separately from

    5 from

    A hero. Researchers use many techniques to initiate AI training from

    DOTA 2 from

    Various behaviors, as well as hyperparametric design, help the online training team work together, but there is no direct communication between the AI ​​during the race.

AlphaStar from

Technical introduction

In the case of a pre-inventory notice from

DeepMind from

The Xinghai from

2 AI from

Trend of research. Artificial Intelligence Company, known for its deep intensive training, we now see from

DeepMindfrom

Hegemony Xinghai 2from

"AIfrom

"from

AlphaStarfrom

Naturally, this is not a system based on deep-strength training.

8 2018 from

year from

12 from

Monthly play, watch the room from

Oriol Vianals from

and from

David Silverfrom

(See who's in the middle?)

AlphaStar from

Fashionable design

AlphaStar from

This is an artificial intelligence for strengthening, which views the game as a long series of modeling learning activities, and the design of the model is based on a long series model. The data the model receives from the game interface is the list of units and attributes of these units after calculating the neural network, the game execution instructions are exported. The basis of this neural network is from

transformer from

Network and combine depth from

LSTM from

Network core, headline metric network automatic regression strategy, and centralized evaluation test. This network architecture is from

DeepMind from

The latest thinking about complex tandem modeling, they also believe that this sophisticated model can be used in other machine learning tasks that require long series of modeling and have a lot of room for action (such as machine translation, language modeling, and visual presentation). Play exceptional performance.

Related articles about web design:

AlphaStar from

Training strategy

AlphaStar from

Initial training and early stage strategy from

AlphaGo from

The same,from

DeepMind from

Researchers first use human competition from

playing from

Monitor the training model of training and mimic your learning ideas to allow the model to quickly learn the basic strategies and micro-operations used by advanced Xinghai ladder players. At this time from

AlphaStar from

Will be able to from

95% from

Winning rate wins "Xinghai Hegemony 2"from

"from

Built-in "elite" rating from

AIfrom

(Contrast from

OpenAI from

from from

DOTA 2 AI from

This is an intensive training that starts entirely from scratch and the initial phase takes a long time to play in meaningless games.

The next is, of course, to enhance learning self-determination and continues to improve the level which is also from

AlphaGo from

The training strategy is different. Presented before,from

AlphaGo from

The game in the self-improvement phase is generated from the best-performing version of all previous update processes, which means that there is a "best version" at a time and it is constantly looking for replacements of versions that are better than the predecessors. But for Hinang Hegemony,from

DeepMind from

Researchers believe that different excellent strategies can complement each other and no strategy can overcome other strategies. So this time they update and record many different versions of the network (collectively from

AlphaStar Leaguefrom

).

As shown above,from

For AlphaStarfrom

After the initial training of people, continue for several rounds. from

AlphaStar League from

In the self-competition, each round will be branched based on several previous stronger versions: the pre-branched version will retain the fixed parameters and take part in the next rounds of self-competitions, and different versions can organize different combat strategies and learning objectives. This will continue to improve the level of the network and increase the difficulty of the battle, while retaining enough diversity. According to the results of each round of self-contests, each network parameter is updated, this approach comes from the idea of ​​group consolidation, which provides continuous and steady improvement of work, and the new version will not "forget" how to beat the very early version. ,

The process of group self-games produces many strategies. Some strategies only slightly improve early strategies, while others have new sequences of construction, combinations of units and micro-operations, and strategies for stabilizing early strategies appear. For example from

AlphaStar League from

Early independent competition, fast from

tide from

The strategy has a higher winning rate, but as the learning process continues, other strategies start to show higher profits, such as the use of more workers to quickly expand the base, gain more resources to identify economic benefits or send Several soldiers in the other base to bully, Earn the advantage of speed of development. This kind of evolution of strategic changes is very similar to how human players have been exploring for years. As shown in the figure below, as the total training time becomes longer, the average number of units used by AI increases.

After multiple self-competition, researchers will do so from

AlphaStar League from

Nash spreads a version of AI as the last AI after the workout. This approach provides an optimal solution for many strategies that have been discovered.

According to from

DeepMind from

Introduction, defeat from

TLOfrom

(The Zergs play a protos rather than its best level) and from

authority from

from from

AlphaStar from

Versions from the first from

9 from

day from

14 from

The self-defense of the sky (as shown below), in fact, the players and the commentary of the game also noticed two games. from

AlphaStar from

Changes.

Efforts to ensure strategic diversity

DeepMind from

In the mentioned technical input blog to share from

AlphaStar League from

With the highest possible variety, they consciously define different learning objectives for different AIs (this is also in line with human common sense, and the diversity caused by simple random interference is very limited). Some AIs are specifically designed to defeat a particular AI or create other internal motives for other AIs, such as by building a unit and removing all AIs that use a particular type of strategy. These goals will be corrected during the training.from

DeepMind from

Visualization shows many different strategies that ultimately form, as shown below.

AlphaStar League from

In self-competition, the weight of the network of each AI is updated according to the reinforcement training algorithm to optimize their different learning objectives. The weight update rules come from a new, effective offline strategy from

Actor critic I playfrom

Algorithms that include ideas for replay, imitation, and strategy distillation.

AlphaStar from

Computer Search

To maintain a large number of different versions from

AlphaStar AI from

Battles and Updates,from

DeepMind from

Build a scalable and scalable, decentralized learning environment using the latest from

Google TPUv3from

This learning environment can support groups from

AlphaStar AI from

Concurrent execution of the instance; "Star Wars 2"from

"from

There are also thousands of specimens on the body of the game, which is performed synchronously.from

AlphaStar League from

The self-competition process is used from

14 from

Day, everyone from

AlphaStar AI from

use from

16 from

more from

TPUfrom

possibly equivalent to each AI has a long one from

200 from

Year of the game. The model after training is in a monolithic class of consumption from

GPU from

It can be accomplished.

AlphaStar from

Productivity of the game

Because of from

AlphaStar from

Firstly, human data simulation and the neural network have some delays in the calculations, with the operating frequency actually lower than the human.from

authority from

from from

APM from

The average level has been reached from

390from

,from

AlphaStar from

Only on average from

280 from

Left and right.from

AlphaStar from

The average delay is calculated from

350 from

Miseurs (from observation to action). In contrast, Xinghai, based on fixed strategies and manual rules, is used to it from

AI from

He will keep thousands from

APM.

DeepMind from

Also according to from

AlphaStar from

and from

authority from

Visual representation of the AI ​​perspective and inside information in the game as follows: display of the original data obtained from the neural network (blue point in the lower left corner), state of activation inside the neural network (miniature middle and lower hand), AI Consider the area of ​​the map of the building for the click and build (the lower right side of the picture, which can also be understood as the AI ​​area of ​​attention), the AI ​​output trigger (in the lower right corner) and the winning speed forecast. Synchronous display of the figure from

authority from

Perspective in the game from

AlphaStar from

I do not see the opponent's prospect.

Mentioned at the beginning of this article twice from

5:0 from

fight from

TLO from

and from

authority from

from from

AlphaStar from

You can read all the visible content on the map directly without having to control the viewing angle. In contrast, human players obviously need to manually switch perspectives to different locations on the map to see some of the information. From this point of view,from

AlphaStar from

There is suspicion of harassment.from

DeepMind from

Also for this analysis, their statistics believe that from

AlphaStar from

The speed at which the area of ​​interest is switched is about every minute. from

30 from

This time this number is comparable to that of the human pro.

Of course, the best way is to make experimental evidence. so from

DeepMind from

Retraining the need to control the prospect from

AlphaStar from

that's when it's live from

authority from

The version that is defeated (but the version is trained) from

7 from

Day, not the original version from

14 from

Day). This edition from

AlphaStar from

The information that can be crawled is limited to the perspective part and the instructions are the same.from

DeepMind from

The diagram provided also indicates that this results in a decrease in performance (although it can still be captured). however from

DeepMind from

The decline in performance is considered very weak and also shows from

AlphaStar from

The power is mainly due to the study of effective game strategies and powerful micro-operations.

DeepMind from

perspective

Although this model is used in "Xinghai Hegemony 2"from

"AIfrom

but from

DeepMind from

Think of it as a complex and representative work, and the techniques used to solve this work can be used in other complex issues. For example, a network architecture designed for continuous serial modeling can be used to model long series with incomplete information, such as weather forecasting, climate modeling, and understanding the language. They will continue to develop from

AlphaStar from

Projects and use of harvesting technology and improved performance.

On the other handfrom

DeepMind from

Think that this design strategy is also a way of safety and stability. from

AI from

New path. Current stage from

AI from

One of the big problems in the system is that it is difficult to predict how many different systems will fail.from

"from

The professional player wins from

AI from

They often rely on search and attack from

AI from

Weaknesses and Mistakes.from

AlphaStar from

The proposed group training strategy is a training strategy that is more reliable and has a significantly reduced probability of error.from

DeepMind from

Researchers believe that this method has a lot of research potential and maybe the future will become a very important part of the security factor. Eventually,from

DeepMind from

I hope to build a truly intelligent system to help mankind solve the most important and fundamental scientific issues.

More detailed and detailed information on the technical details,from

DeepMind from

I write the thesis and plan to present a manuscript in the peer review magazine. I look forward to publishing the official document.

(This article is licensed by Lei Feng Reprinted, First Source: Picture of the Movie)


Source link