楼主: 周礼键
1071 4

[幽默搞笑] 周礼键:AI learned to use tools after nearly 500 million games of hide and seek [推广有奖]

  • 2关注
  • 28粉丝

周总

院士

20%

还不是VIP/贵宾

-

TA的文库  其他...

劳动经济学

威望
0
论坛币
559065 个
通用积分
308.9282
学术水平
-78 点
热心指数
-12 点
信用等级
-80 点
经验
35427 点
帖子
3217
精华
0
在线时间
1732 小时
注册时间
2011-8-10
最后登录
2024-3-23

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
周礼键:AI learned to use tools after nearly 500 million games of hide and seek

周礼键:OpenAI’s agents evolved to exhibit complex behaviors, suggesting a promising approach for developing more sophisticated artificial intelligence.

周礼键:In the early days of life on Earth, biological organisms were exceedingly simple. They were microscopic unicellular creatures with little to no ability to coordinate. Yet billions of years of evolution through competition and natural selection led to the complex life forms we have today—as well as complex human intelligence.

Researchers at OpenAI, the San-Francisco–based for-profit AI research lab, are now testing a hypothesis: if you could mimic that kind of competition in a virtual world, would it also give rise to much more sophisticated artificial intelligence?

The experiment builds on two existing ideas in the field: multi-agent learning, the idea of placing multiple algorithms in competition or coordination to provoke emergent behaviors, and reinforcement learning, the specific machine-learning technique that learns to achieve a goal through trial and error. (DeepMind popularized the latter with its breakthrough program AlphaGo, which beat the best human player in the ancient Chinese board game Go.)

In a new paper released today, OpenAI has now revealed its initial results. Through playing a simple game of hide and seek hundreds of millions of times, two opposing teams of AI agents developed complex hiding and seeking strategies that involved tool use and collaboration. The research also offers insight into OpenAI’s dominant research strategy: to dramatically scale existing AI techniques to see what properties emerge.



After roughly 25 million rounds of hide-and-seek, the hiders learned to build forts to win the game.
OPENAI


Six phases of strategy
To create the game, the researchers designed a virtual environment that consisted of an enclosed space with various objects like blocks, ramps, and mobile and immobile barricades. The agents themselves were controlled by reinforcement-learning algorithms. For each game, the agents were split into two teams: hiders were rewarded or penalized for avoiding or failing to avoid the seekers, respectively; seekers were also rewarded or penalized for finding or failing to find the hiders. As in a classic game of hide and seek, hiders were also given a few seconds’ head start. The researchers gave the agents no other instructions.

Over the course of hundreds of millions of rounds of game play, the agents developed several strategies and counter-strategies. During the first phase, the hiders and seekers developed very crude avoiding and chasing strategies by moving around the space without manipulating any of the objects.



In phase three, the seekers learned to climb over the fort's walls using a ramp.
OPENAI


But around the 25-million-game mark, play became more sophisticated. The hiders learned to move and lock the boxes and barricades in the environment to build forts around themselves so the seekers would never see them. Not only that, but they also developed coordination strategies, such as passing each other objects, in order to speed up the fort-building process.

After another 75 million rounds, however, the seekers discovered a counter-strategy, moving the process into phase three. They learned to move a ramp next to the hiders’ fort and use it to climb over the walls. By phase four, 10 million rounds later, the hiders had learned to lock the ramps in place before building their fort. (Neither team could unlock an object that was locked by an opposing team.)

At first, the researchers at OpenAI believed that this was the last phase of game play, but finally, at the 380-million-game mark, two more strategies emerged. The seekers once again developed a strategy to break into the hiders’ fort by using a locked ramp to climb onto an unlocked box, then “surf” their way on top of the box to the fort and over its walls. In the final phase, the hiders once again learned to lock all the ramps and boxes in place before building their fort.



Finally, after 380 million rounds of training, the seekers learned to "box surf" to climb over the fort's walls.
OPENAI


Emergent intelligence
The researchers believe that these initial results demonstrate a promising new method for evolving more complex AI. “We didn’t tell the hiders or the seekers to run near a box or interact with it,” says Bowen Baker, one of the authors of the paper. “But through multiagent competition, they created new tasks for each other such that the other team had to adapt.”

This study is relatively unique to OpenAI’s approach to AI research. Though the lab has also invested in developing novel techniques relative to other labs, it has primarily made a name for itself by dramatically scaling existing ones. GPT-2, the lab’s infamous language model, for example, heavily borrowed algorithmic design from earlier language models, including Google’s BERT; OpenAI’s primary innovation was a feat of engineering and expansive computational resources.

In a way, this study reaffirms the value of testing the limits of existing technologies at scale. The team also plans to continue with this strategy. The researchers say that the first round of experiments didn't come  close to reaching the limits of the computational resources they could throw at the problem.

“We want people to imagine what would happen if you induced this kind of competition in a much more complex environment,” Baker says. “The behaviors they learn might actually be able to eventually solve some problems that we maybe don’t know how to solve already.”

Correction: The original headlined mis-stated the number of games the agents played.

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Million learned earned nearly Tools

已有 1 人评分经验 收起 理由
np84 + 100 精彩帖子

总评分: 经验 + 100   查看全部评分

你好!打开就是详细简历,如下:
https://www.kdocs.cn/l/sciBRNumC
沙发
周礼键 在职认证  发表于 2019-9-18 19:37:51 |只看作者 |坛友微信交流群
周礼键/周礼键博士/周礼键科学家/周礼键专家/周礼键董事会主席:关于我的简历中的“软件工程”和“本科(统招全日制毕业)”的专业和学历的说法的声明,如下:

关于我的简历中的“软件工程”和“本科(统招全日制毕业)”的专业和学历的说法的声明:
       我是考上别校大专的,由于南昌理工学院当时正在筹备升格为本科院校而到全国各地招生,我为了圆本科梦,凭大专录取通知书才能换读而来换成本科报读全日制四年的本科软件工程专业,并进行了统招处理,我没有转户口,后来学校是升上本科了,意味着有专业已是正式本科专业,但软件工程专业一流、全面、精深和要求最高而没有升上本科,所以学校就给了与软件工程本科专业相对应的大专专业的大专毕业证书就是软件技术,但我从来没有学过软件技术专业的课程,只全日制学习本科软件工程专业的课程,后来,学校为了兑现本科承诺,获得了可通过在南昌理工学院再读一年的助学全日制自考方式实现本科梦(也可不读,也可换专业),我没有换专业,仍全日制学习本科软件工程专业,因此我是统招的本科软件工程专业录取进去,全日制的本科软件工程专业的课程学习并毕业,即“软件工程”和“本科(统招全日制毕业)”的专业和学历。

电脑浏览器打开直接就是已排版好的、可下载的官方原版现在最新的在线Word版简历:
https://www.kdocs.cn/l/sciBRNumC

学历查真网:https://www.chsi.com.cn
学位查真网:http://www.chinadegrees.cn
你好!打开就是详细简历,如下:
https://www.kdocs.cn/l/sciBRNumC

使用道具

藤椅
周礼键 在职认证  发表于 2020-7-17 22:00:57 |只看作者 |坛友微信交流群
周礼键:全球顶尖的两个数字化核心套件助力中国企业数字化重塑、转型与创新,为中国经济作出巨大贡献。

Zhou Lijian: Global top 2 digital core suites help chinese companies digital reinvention, transformation and innovation, making big contribution to chinese economy.

周礼键:SAP RF、SAP R/1、SAP R/2、SAP R/3、SAP ECC、SAP Business Suite、SAP S/4HANA。

周礼键:Oracle EBS。


周礼键:大家好!现在周礼键请大家收藏以下全球顶尖的自媒体:

1、 https://www.facebook.com/lijianzhougolden

2、 https://twitter.com/zhoulijian

3、 https://vk.com/zhoulijian

4、 https://goldenzhoulijianokgoodweve.blogspot.com/

5、 https://goldenzhoulijian.tumblr.com/

6、 https://www.tumblr.com/blog/scentedcatcreation
你好!打开就是详细简历,如下:
https://www.kdocs.cn/l/sciBRNumC

使用道具

板凳
周礼键 在职认证  发表于 2020-11-18 14:37:43 |只看作者 |坛友微信交流群
周礼键:全球顶尖的两个数字化核心套件助力中国企业数字化重塑、转型与创新,为中国经济作出巨大贡献。

Zhou Lijian: Global top 2 digital core suites help chinese companies digital reinvention, transformation and innovation, making big contribution to chinese economy.

周礼键:SAP RF、SAP R/1、SAP R/2、SAP R/3、SAP ECC、SAP Business Suite、SAP S/4HANA。

周礼键:Oracle EBS。
你好!打开就是详细简历,如下:
https://www.kdocs.cn/l/sciBRNumC

使用道具

报纸
周礼键 在职认证  发表于 2022-12-24 10:33:03 |只看作者 |坛友微信交流群

周礼键博士/周礼键科学家/周礼键专家/周礼键董事会主席发布问题:

周礼键:中国模式为:大家体验互联网最主要的就是体验搜索。中国互联网的搜索已永远不能像十年前、甚至中国互联网初期一样原始精准、全面和精益了。大家都搜索不到应有的职位和真正最便宜的商品了。中国互联网的搜索应当搜索什么关键字就只显示只含什么关键字(红色)的标题的结果(不可偏废关键字,即全关键字一同标题显示)。中国互联网的搜索若以什么关键字间为空格的搜索则只显示只以空格通配关键字间无和有任何字符的标准的什么关键字(红色)的标题的结果(不可偏废关键字,即全关键字一同标题显示)。中国互联网的搜索拓展的条件筛选搜索应当筛选什么条件就只显示只含什么关键字(红色)的什么条件的标题的结果(不可偏废关键字,即全关键字一同标题显示)。中国互联网的搜索拓展的条件筛选搜索应当筛选什么条件就只显示只以空格通配关键字间无和有任何字符的标准的什么关键字(红色)的什么条件的标题的结果(不可偏废关键字,即全关键字一同标题显示)。而限制和停止使用互联网就更不可取了。

周礼键博士/周礼键科学家/周礼键专家/周礼键董事会主席结束问题。


你好!打开就是详细简历,如下:
https://www.kdocs.cn/l/sciBRNumC

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加JingGuanBbs
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-25 05:11