Dictionary - Non-Recommended v0.3.0

Table of contents

  1. Alignment
  2. Artificial General Intelligence
  3. Corrigibility
  4. Discontinuous Scaling (Unclear / Disputed)
  5. General Human Values (Unclear / Disputed)
  6. Human Level AI
  7. Overhang
  8. Safe, Artificial Intelligence
  9. Scaling
  10. Transformative Artificial Intelligence
  11. Value learning




Def. 1 - Acting not only [Safe], but consistent with [General Human Values], and continuing to do so even after arbitrary [scaling] or change in circumstance. Compare [Safe].

Proponents mostly agree that it is unclear if this is possible, and needs to be refined and better understood. It is also unclear what class of proof would be sufficient.

Def. 2 - Acting in a maner considered [Safe] according to a specific set of goals, and provably continuing to do so even after arbitrary [scaling] or changes in circumstance. The domain of proof needed is also unclear.

Unclear examples:

  • Any provably limited system.

Non examples:

  • Systems proven [Safe] in a given environment that can change, or empirically shown to be [Safe].

Artificial General Intelligence



An artificial intelligence system that has capabilities similar to that of humans, including the ability to learn arbitrary new tasks and perform them.


  • Hypothesized future systems that think and act similarly to humans, are agentic, with articulable goals. They are capable of performing any individual task or job a human would do, including creative or abstract thought.


  • Comprehensive AI Services capable of performing every human job or task
  • Systems able to learn any one human task or job, including high level or creative ones, but that are not able to adapt or learn.


  • Any currently extant system



The property of an AI system where the objective constrains it from taking actions that modify its own objective function in ways that undermine [Alignment].

Discontinuous Scaling (Unclear / Disputed)



This refers to the hypothesis that at some point measured via external clock time, the capabilities of a system greatly increase in a way inconsistent with the trend up to that point. This does not require a true mathematical discontinuity, but does require a significant increase in growth rate, rather than continutation of an exponential growth. cf. [Scaling], [Foom]

General Human Values (Unclear / Disputed)

Ultimate goals and constraints that either some human or all humans either espouse, or actually use operationally to guide actions. In the limit, this is related to “Coherent Extrapolated Volition” where given very large amounts of power, given effectively unlimited amounts of time to reflect and consider the impacts of actions and values, the resulting human values are coherent.

Human Level AI

See [Artificial General Intelligence]. (Used interchangably in Baum, Goertzel, & Goertzel, 2011.)



Overhang is a driver of [Discontinuous Scaling]. It refers to a situation where, when a goal is acheived, the resources needed for acheiving the goal are found to be far smaller than the state-of-the-art available at the time, creating a significant acceleration.

For example, if an algorithmic breakthrough solves a problem in a way that requires far less hardware than needed for the basic task, performance at that task would jump significantly.

Ambiguous Example: In computer Go, where once the basic problem of playing at a human level was solved by Alpha Go using large amounts of [compute], it was found that a smaller system would be able to do the same task, and AlphaGo Zero, then AlphaZero, exhibited significant improvements with the same amount of less hardware.

See, for example, Strategic implications of openness in AI development (Bostrom, 2017)

Safe, Artificial Intelligence



A human designed system that does at least approximately what the designer and/or user intended it to do, without significant unwanted side effects.


  • AlphaGo

Unclear examples:

  • Systems that have been experimentally verified to operate within planned constraints.
  • Systems which obey orders that should not be allowed, such as military systems being operated by rogue factions without authorization.
  • Systems that refuse orders that they are given, which cannot be overridden, because the system decides the outcome would be unacceptable.


  • Humans
  • Anything in (Victoria Kraknova’s list)[] of specification failures, where systems did not do what the designer intended.



The ability of a system to operate better due to increased resources of a specific type or types, even without a fundamental new insight or change in the system.


  • A machine learning system improving at a task when given additional training data.
  • A neural network that improves due to increasing its size, spending more time on [Compute, architecture search], [Compute, hyperparameter search], or [Compute, training].

Ambiguous Cases:

  • A system that is improved by changing from one structure or training type to a different type of structure, such as the transition from AlphaGo to AlphaZero.
  • A system that self-modifies in ways not originally planned.


  • A system’s increasing in capabilities when given more [Compute, operational], such as the improvement from giving AlphaGo more time to consider each move, or relaxing restrictions, such as removing APM limits from OpenAI Five.
  • Self-modification via automated parameter search given a specific set of hardware reseources, such as “Learning to learn by gradient descent by gradient descent”
  • Improvement at a task by a new or modified [system] due to fundamental breakthroughs or new structure types.

Transformative Artificial Intelligence



AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution. (As suggested by Karnofsky (2016).)

Per Greutzmacher, 2019, “AI that significantly transforms society by replacing humans for a large portion (i.e., 50% or greater) of economically useful work”

Note that a TAI system is not necessarily an AGI system, as long as its impact mediated by narrow capabilities is sufficiently transformative (as was e.g. the case for inventions during the original agricultural and industrial revolutions).


  • AI systems capable of fulfilling all the necessary functions of human scientists, unaided by humans, in developing another technology (or set of technologies) that ultimately becomes widely credited with being the most significant driver of a transition comparable to (or more significant than) the agricultural or industrial revolution.

Note that just because AI systems could accomplish such a thing unaided by humans doesn’t mean they would; it’s possible that human scientists would provide an important complement to such systems, and could make even faster progress working in tandem than such systems could achieve unaided.

Karnofsky (2016) emphasizes the hypothetical possibility of AI systems conducting substantial unaided research to draw a clear distinction from the types of AI systems that exist today. He believes that AI systems capable of such broad contributions to the relevant research would likely dramatically accelerate it.

  • AI systems capable of performing tasks that in 2016 accounted for the majority of full-time jobs worldwide, and/or over 50% of total world wages, unaided and for costs in the same range as what it would cost to employ humans.

Aside from the fact that this would likely be sufficient for a major economic transformation relative to today, Karnofsky (2016) also thinks that an AI with such broad abilities would likely be able to far surpass human abilities in a subset of domains, making it likely to meet one or more of the other criteria laid out here.

  • Surveillance, autonomous weapons, or other AI-centric technology that becomes sufficiently advanced to be the most significant driver of a transition comparable to (or more significant than) the agricultural or industrial revolution.

(This contrasts with the first point because it refers to transformative technology that is itself AI-centric, whereas the first point refers to AI used to speed research on some other transformative technology.)

Ambiguous Examples:

  • Clearly narrow AI such as [Self Driving Car]s and factory automation reducing the workforce by more than 50%
  • A transition to Universal Basic Income that leads more than 50% of people currently working to stop doing so.


  • Any AI system that exists as of 2019.

Value learning


The process of an AI system inductively learning goals that the system will pursue.

(The term does not denote sucess, just the class of process used for the goal.)

See: Nate Soars, “The Value Learning Problem” In: Ethics for Artificial Intelligence Workshop at 25th International Joint Conference on Artificial Intelligence