摘要翻译:
如今,评估人工智能系统的现有方法主要集中在使用经验技术来衡量算法在某些特定任务中的性能(例如,下棋、解决迷宫或降落直升机)。然而,如果我们要评估AI的一般智能,更不用说将其与人类智能进行比较,这些方法是不合适的。ANYNT项目设计了一种新的评估方法,试图使用众所周知的计算概念和尽可能普遍的问题来评估人工智能系统。这种新方法用于评估一般智力(这让我们学会如何解决我们面临的任何新问题),而不仅仅是评估在一系列特定任务上的表现。这种方法不仅侧重于衡量算法的智能,而且还可以评估任何智能系统(人类、动物、AI、外星人?,...),并让我们将它们的结果放在相同的尺度上,因此能够进行比较。这种新的方法将允许我们(在未来)评估和比较任何一种已知的智能系统,甚至构建/发现,无论是人工的还是生物的。本硕士论文旨在通过设计和实现通用智力测验的原型,并将其应用于不同的智能系统(人工智能算法和人类),以确保该新方法在评估人工智能算法时提供一致的结果。通过研究,我们分析了两个不同智能系统所获得的结果是否正确地位于同一尺度上,并对这些原型提出了修改和完善,以便在未来能够实现真正通用的智力测验。
---
英文标题:
《Analysis of first prototype universal intelligence tests: evaluating and
comparing AI algorithms and humans》
---
作者:
Javier Insa-Cabrera and Jose Hernandez-Orallo
---
最新提交年份:
2011
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence 人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
Today, available methods that assess AI systems are focused on using empirical techniques to measure the performance of algorithms in some specific tasks (e.g., playing chess, solving mazes or land a helicopter). However, these methods are not appropriate if we want to evaluate the general intelligence of AI and, even less, if we compare it with human intelligence. The ANYNT project has designed a new method of evaluation that tries to assess AI systems using well known computational notions and problems which are as general as possible. This new method serves to assess general intelligence (which allows us to learn how to solve any new kind of problem we face) and not only to evaluate performance on a set of specific tasks. This method not only focuses on measuring the intelligence of algorithms, but also to assess any intelligent system (human beings, animals, AI, aliens?,...), and letting us to place their results on the same scale and, therefore, to be able to compare them. This new approach will allow us (in the future) to evaluate and compare any kind of intelligent system known or even to build/find, be it artificial or biological. This master thesis aims at ensuring that this new method provides consistent results when evaluating AI algorithms, this is done through the design and implementation of prototypes of universal intelligence tests and their application to different intelligent systems (AI algorithms and humans beings). From the study we analyze whether the results obtained by two different intelligent systems are properly located on the same scale and we propose changes and refinements to these prototypes in order to, in the future, being able to achieve a truly universal intelligence test.
---
PDF链接:
https://arxiv.org/pdf/1109.5072


雷达卡



京公网安备 11010802022788号







