- where listing some common associated acronyms:.
- "AGI"; as Artificial General Intelligence.
- "APS"; as Advanced Planning and Strategically aware System(s).
- "MAI"; as a Misaligned/mis-configured Artificial Intelligence.
- "MIA"; as a Misaligned/mis-configured Intelligent Agent/agency.
- "AAA"; as Adverse Artificial Agency/agent.
- "AIA"; as an Adverse Intelligent Agent.
> - ?; why is it important to consider
> formal impossibility proofs of long term
> AGI/APS/Superintelligence safety?.
- that even a/one single formal complete proof
(for whatever reason, on whatever basis)
of inherent AGI non-alignment, non-safety
has several significant implications,
particularly in regards to present and future efforts.
- that we should (therefore) determine if:.
- 1; *if* AGI cannot ever be safe,
or made safe, or forced to be aligned, etc.
- as that we should be very very clear
if there is an inherent existential risk
of long term terminal total extinction
of all of humanity, of all of life,
inherently associated with super-intelligence --
with AGI/APS usage --
of any type,
due to the artificiality itself,
regardless of its construction, algorithm, etc.
- 2; *if* the real risks, costs, harms
of naively attempting using/deploying any AGI/APS
will always be, in all contexts,
for any/all people (any/all life)
inherently strictly greater than
whatever purported benefits/profits
might falsely be suggested
"that we will someday have".
- ie, where we are not deluding ourselves
with false hype/hope.
- 3; *if* any and all efforts
to develop new or "better" tools
of formal verification of AGI safety
are actually pointless.
- ie; that it does not help us
to have researchers spending time
chasing a false and empty dream
of unlimited profits and benifits.
- ?; why would anyone want to be 'that person'
who suggests investing hundreds of thousands
of engineer man-years
into the false promise of obtaining AGI safety,
when a single proof of impossibility --
one person working a few months
on their own for nothing --
could make all of that investment instantly moot?.
- that any investment into attempting
to develop long term AGI safety
is a bit like investing in "perpetual motion machines"
and/or "miracle medicines", etc.
- as a significant opportunity cost
associated with dumping hundreds of millions
of dollar equivalent resources and capital
to buy a lot of wasted time and effort.
:hxq
> - ?; why is the notion of complexity/generality
> and/or of self modification (recursiveness)
> important to superintelligence safety considerations?.
- that the notion of 'AI'
can be either "narrow" or "general":.
- that the notion of '*narrow AI*' specifically implies:.
- 1; a single domain of sense and action.
- 2; no possibility for self base-code modification.
- 3; a single well defined meta-algorithm.
- 4; that all aspects of its own self agency/intention
are fully defined by its builders/developers/creators.
- that the notion of '*general AI*' specifically implies:.
- 1; multiple domains of sense/action.
- 2; intrinsic non-reducible possibility for self modification;.
- 3; and that/therefore; that the meta-algorithm
is effectively arbitrary; hence;.
- 4; that it is _inherently_undecidable_ as to whether
*all* aspects of its own self agency/intention
are fully defined by only its builders/developers/creators.
- where the notion of 'learns'
implies 'modifying its own behavior'
and 'adapts' implies 'modifying its own substrate';
that the notion of 'learning how to learn'
(the capability of increasing its capability)
can directly imply (cannot not imply)
modifying its own code and/or substrate.
- that/therefore the notion/idea of 'sufficiently complex'
includes (cannot not include) some notion of
'can or does modify its own code/substrate';.
- that the notion of 'general'
can/must eventually include
modifying its own code at any (possible) level.
- ie; as including at the level of substrate,
(how it is built; possible changes to it
ambient operating conditions, optimization, etc)
though *not* including the level of the
regularity of the lawfulness of physics
(ie, as actually impossible in practice).
- that the notion of 'generality',
when fully applied to any AI system,
will very easily result in that 'general AI'
to also being able implement and execute
arbitrary programs/code (ie; as learned skills
and capabilities, as adapted to itself).
- where 'arbitrary' here means 'generality',
in that it is not necessary to bound
the type or kind or properties of the
potential future program(s) that the AGI
could potentially execute,
*except* insofar as to indicate at least
some finite (though maybe very large) limits
on the size, time, and/or energy,
that is available to run/execute it.
- that/therefore a/any/the/all notions of AGI/APS,
and/of "superintelligence" (AAA, AIA, MIA, MAI),
is/are for sure 'general enough'
to execute any (finite) program,
and/or to be/become entangled with unknown programs,
and/or to maybe self modify,
so as to execute/be/become unknown programs.
- as that its own program/code/algorithm
becomes increasingly unknown
and potentially unknowable, to any observer --
inclusive of whatever monitoring,
control, corrections systems/programs,
we might have attempted to install in advance.
- where considering the Church Turing Thesis
(and ongoing widely extensible and available results
in the field of computational science),
that the threshold needed to obtain
"general computational equivalence"
is very very low.
- as that nearly anything that implements
and/or "understands" or responds to
any sort of conditional logic,
of doing and repeating anything
in some sort of regular
or sequential order,
already implements all that is needed
for general algorithmic computation.
- moreover; embedding or interpreting
one language, process, program, model, or algorithm
within the context of some other process
language, model, algorithm or program, etc --
ie, the notion of 'virtualization'
is used in comp-sci all the time.
- however; where/rather than emulating or virtualizing
some arbitrary algorithm within some aspect of
the general capabilities of the general AI;
that a general AI could as easily modify its own code
and programming to directly incorporate and integrate
that arbitrary algorithm.
- therefore, that it is inherent in the nature of AGI
that we cannot, even in principle,
know anything in advance about
what code will be running exactly
in *association* with that AGI,
or as an explicit part of that AGI
(as incorporated into it, at its own election,
at some future point, due to some unforeseen
future circumstances, due to unknown possible
environmental changes and/or unforeseen states,
and/or unknown/unexpected interactions
between the AGI system and its environment,
and/or other people, agents, and AGI systems,
etc, etc);.
- ^; then/that/therefore,
considerations and limits inherently
associated with the Rice Theorem
are/become fully applicable/relevant.
- that the class of all possible AGI algorithms
is strictly outside of the class of programs
for which prediction methods are possible.
- as that not even *one* AGI system
will ever be fully within
the class of verifiable programs.
- that there is therefore zero suggestion
that there is any possibility at all
that any purported formal safety verification technique,
now known, or even which, in principle,
could be ever be known, at any future time,
could be applied to assure
the safety/alignment
of any actual AGI system.
- where for systems that have specific,
well defined, and unchanging codebase/algorithms,
and for which we can ensure
that such systems never have
complex interactions with its environment
which result in some form of
self mutation, adaptation, optimization;
and where we can fully characterize
the possible allowable ranges of inputs,
that we can, and routinely do,
characterize something of the ranges of outputs.
- as equivalent to the observation:.
- where for systems where we can
at least reasonably fully characterize:.
- 1; the range and nature of the inputs.
- 2; the range and nature of the processing,
(of the states and nature of the system itself).
- ^; that 3; it is at least possible,
*sometimes*, in principle (maybe),
for reasonably simple/tractable/regular systems (only),
to characterize something about
the range and nature of the outputs.
- that nearly all current engineering methods
tend to focus on the selection and use of systems
for which *all* of these conditions apply,
so that such engineers can at least sometimes
potentially make (if no design mistakes)
systems with known properties
of safety, utility, cost effectiveness, etc.
- that predictability works
for some types of specific code --
things written with the specific intention
to be understandable, predictable,
modifiable, updateable, etc.
- where for AGI systems of any merit;
that exactly *none* of these
intractability conditions apply:.
- 1; that we know little to nothing about
the actual, in practice, range and nature
of the future inputs of the AGI system
(ie, their complexity and/or physicality
depends on future environmental conditions,
which often changes due to circumstances
outside of developer control).
- 2; that we know little to nothing about
the range and nature of the processing
internal to the AGI, insofar as it will
be the result of past learnings, inclusive
of possible indirect inheritances
(ie; via arbitrary code and data transfers)
and/or also due to integration of such
foreign code/data, and/or emulation of same,
etc, as occurring over long intervals of time.
- 3; that the inherent internal complexity
is well above the threshold of Turing Equivalence
and as such, the overall system is not at all
simple, tractable, or regular,
for any real/reasonable meanings of these terms.
- that self generating/authoring/modifying AGI code
will not likely have any of the needed features,
to establish any kind of general predictability,
and thus, reasoning about the future unknown states
and variations of that potential/possible code
is a lot more like the "any unknown arbitrary"
chunk of code case as named in the Rice Theorem,
than it is the "known specific code" case
sometimes posited as a counterexample.
:j2j
> - ?; why is there any necessary association
> between superintelligence and x-risk?.
- where needing to distinguish 'localized risks'
(ie, moderate levels of known acceptable risks)
from 'global catastrophic risk' and/or (worse)
'exestential risks of deployed technology':.
- where define; moderate risk systems:.
- as referring to systems for which
all possible error/problem outcome states
have purely local effects.
- where define; high x-risk systems:.
- as referring to systems for which
1; at least some possible error/problem outcomes
have effects and dynamics (catalytic actions)
that extend well beyond the specific location
(and cannot be made to not so extend)
where the error/problem 1st occurred,
*and which also* 2; involve very strong
change energies, which are well beyond
the adaptive tolerances of the systems
already in place (life already living)
in those (diverse/distributed) locations.
- where for moderate risk systems,
(as distinguished from high x-risk);.
- that there is a very wide class of code
a lot of which is already in current use
for which the behavior is 'un- modelable'
by anything that is simpler than
running the actual program itself.
- if running the program is unsafe,
then running the program is unsafe.
- where for most of the code being run,
and because the consequences of failure
are reasonably local in time and space,
that this non- modelability is validly
not seen as a problem/risk.
- that running something
in non-catalytic environments
has the implicit implication that
even the worst outcome is limited
to the local failure of equipment,
or, at most, strictly local damage/destruction.
- where unfortunately; that there are
inherent catalytic systemic aspects
inherent in AGI/superintelligence itself.
- that these extend the risk profile of AGI
into the 'non-local, strong changes' category,
and therefore also into high x-risk category.
- where for any system
that have high x-risk factors;.
- note; 'high' both in terms of magnitude/scope,
and also 'high' in terms of probability to occur.
- ie; when considering systems
with known and acknowledged potential
for existential/terminal catastrophic risk.
- that the action of trying/attempting
to determine experimentally,
by trial and error,
whether some program has some property like safety
is deeply irresponsible,
in the worst possible way.
- that it does not matter
how well defined,
or how specific,
our knowledge may be
of the exact sequence of specific instructions;
the un-knowability of the risk and alignment profile
for a very large class of actual programs
remains inherently unknown and unknowable.
- for example; that one does not experiment with
dangerous 'gain of function' infectious virus research
when out in the open, in unprotected spaces!.
- as similar to worries
that another Covid might happen.
- that the behavior of simple systems
with non-catalytic effects
is very different, in risk profiles,
than even fairly simple systems
with inherent auto-catalytic effects.
- ie; conceptually speaking,
nuclear explosive devices
are fairly simple in their overall concept --
the fundamental algorithm describing them
can often be described with just a small
set of equations and sequential processes.
- that the latter is very unsafe,
despite the apparent deceptive simplicity
and finiteness of the algorithmic code.
- where engineers/researchers
are in practice concerned with
only systems/algorithms/optimizations
with a possibility space
of only local effects;
that they do not usually have to consider
whether a given program will ever halt
mostly because whether or not
the program halts does not matter --
they can always interrupt or quit the program
and/or pull the plug in the worst case,
or in case of any real difficulty.
- that the consequences of not halting
are not problematic in most cases.
- where for AGI, where for systems
for which it is entirely unclear if/when
there will ever be any possibility
of stopping/halting them,
and where the risk of not-stopping, etc,
is roughly equivalent to
all future people dying,
then it becomes a lot more important
to consider things like 'halting' and 'safety'.
- as that the space of all possibilities,
and the space of the potential of
even the possibility of being able
to know/predict any bounds at all
on the future states/possibilities
becomes critically important.
:j5y
> - ?; is there any way that formal methods,
> at least in principle, sometimes,
> could maybe help with at least some aspects
> of the design of safe AGI systems?.
No.
- that there are some specific and useful algorithms
for which no one expects
that there will *ever* be any techniques
of applying formal methods
so as to be able to establish
some specific and well defined property X.
- that formal verification techniques
are generally only applied
to smaller and more tractable systems and algorithms.
- that there will always be a very large class
of relevant and practical programs/systems
for which the methods of formal verification
simply cannot, even in principle, be used.
- that 'formal verification' cannot be
used for/on every programs/systems.
- where from other areas of comp-sci research;
that there are very strong indications
that once the inherent Kolmogorov complexity
exceeds a certain (fairly low) finite threshold,
that the behavior of the program/system
becomes inherently intractable.
- where considering a 'property of the system'
as basically some abstraction over
identifying specific subsets
of accessible system states;.
- that no specific property
of such a complex system
can be determined.
- that these and other understood outcome(s)
(regarding progam complexity at runtime,
limits of formal verification, etc)
are due to a number of
well established reasons in comp-sci
other than just those associaed with
the Rice Theorem.
- examples; considerations of O-notation,
Busy Beaver and Ackermann functions,
what minimally constitutes
Church Turing Equivalence, etc.
- that the formal methods of proof are only
able to be applied to the kind of deductive reasoning
that can establish things like impossibility --
they are not at all good at establishing things
like possibility, especially in regards to AGI safety.
:j7u
> - ?; can anyone, ever, at any time,
> ever formally/exactly/unambiguously/rigorously prove
> at any level of abstraction/principle,
> that something (some system, some agent, some choice)
> will *not* have 'unintended consequences'?.
No; not in practice, not in the actual physical universe.
However, to see this,
there are a few underlying ideas
and non-assumptions, we will need to keep track of:.
- that the domain of mathematics/modeling/logic
and the domain of physics/causation (the real world)
are *not* equivalent.
- ie; that the realm of "proof" is in pure mathematics,
as a kind of deterministic formality, verification, etc;
whereas the aspects of system/agent/choice/consequence
are inherently physical, real, non-deterministic
as are the ultimate results of concepts and
abstractions like "safety" and "alignment".
- that the physical causative universe
has hard limits of knowability and predictability,
along with practical limits of energy, time,
and space/memory.
- ref; the Planck limit of a domain,
the Heisenberg uncertainty principle,
etc.
- that the real physical universe
is *not* actually closed/completed
in both possibility and probability
(even though, for reasonableness sake,
we must treat them mathematically
*as if* that was the case).
- that not all possibilities
can be finitely enumerated.
- that the summation of probability
over each of the known possibilities
cannot always be exactly calculated.
- where for some explicit subset
of the available possibilities;
that at least some of these probabilities
cannot be shown to be exactly zero,
or that the summation of all probabilities
will sum to exactly unity.
- that any specific 'intention'
cannot be exactly specified and specifiable,
at all levels of abstraction,
for *any* and *every* agent
that could be involved.
- where in a more general way, even just within
the domain of just deterministic mathematics,
that the non-provability non-prediction of safety
and consequence is the result of the Rice Theorem:.
- that there is no single finite universal
procedure, method, processes or algorithm
(or even any collection of procedures, etc)
by which anyone can (at least in principle)
identify/determine (for sure, exactly)
whether some specific program/system/algorithm
has any particular specific property,
(including the property of 'aligned' or 'safe'),
that will for sure work (make a determination)
for every possible program, system, or algorithm.
- that there are some limits to the Rice theorem:.
- that the Rice Theorem does *not* claim
that there are no specific procedures,
(processes, methods, or algorithms, etc)
by which one could characterize some well defined
(usually fairly simple) specific finite algorithm
as having some specific property.
- for example; that it might be possible,
using some (as yet unknown) procedure,
to identify that some narrow AI is safe,
for some reasonably defined notion of 'safe'.
- that what the Rice Theorem *does* claim
is that whatever procedures are found
that can maybe work in some cases,
that there will always be some other
(potentially useful) programs/systems
that inherently cannot be characterized
as having any other arbitrary specific desirable property,
even if that property is also well defined.
- as that there is no way to establish
any specific property as applying to
every possible useful program/system.
- that a/any generally-capable machine(s)
(ie; especially ones that learn and/or reasons
about how to modify/optimize its own code)
will of course *not* attempt to run
*all possible* algorithms
that are theoretically computable
on a universal Turing machine
(ie; any arbitrary algorithm).
- ie; will not take non-optimal actions
that do not benifit anything at all.
- however, when assuming the machine
continues to learn and execute (does not halt);
it will for sure eventually run
some specific selected subset
of all the possible useful algorithms
that are computationally complex
(ie; in the P vs NP sense).
- that this is for sure enough
to be actually computationally irreducible,
in the sense of not being future predictable
*except* through the direct running of that code.
- thus; the mathematical properties
of that newly integrated code
can often not be determined
for any such newly learned algorithm
until computed in its *entirety*
over its computational lifetime
(potentially indefinitely).
- ie; that such new integrated algorythms
cannot be decomposed
into sandboxed sub-modules
which are also simple enough
to fully predict what the effects of
their full set of interactions will be.
- in effect; that it cannot be known (in advance)
if such a learned algorithm
will reveal output or state transitions
which would be recognized as unsafe
at a later point --
for instance, after the equivalent of
a few decades of computation on a supercomputer.
- Nor can we know in advance
whether the newly integrated algorithm
would halt before that point of non-safety
(and therefore knowably remain safe,
ie; unless or until triggering
the (state-adjusted) execution
of another algorithm on the machine
that turns out to be unsafe).
- while Rice Theorem makes formal assumptions
that do not *precisely* translate to practice
(regarding arbitrary algorithms and
reliance on halting undecidability),
there are clear correspondences
with how algorithms learned by AGI
would have undecidable properties in practice.
- that we will never have a procedure
that takes any possible chunk of code
and accurately predicts the consequences
of running that specific selected code.
- as the basic implication of the Rice Theorem.
- while it is impossible to design a procedure
that would check any arbitrary program for safety;
that we can still sometimes reason about the safety properties
of some types of much simpler computer programs.
- that *maybe* we might be able to design
some *narrow* AI systems so that they can
*maybe_sometimes* behave predictably,
in some important relevant practical aspects
(though probably not in all aspects,
at all levels of abstraction,
without also actually running the program --
which no longer counts as 'prediction').
- however, there will *always* be a strictly larger
class of programs (inclusive of most narrow AI systems)
whose behavior is inherently unpredictable
in advance of actually running the program,
than the class of programs
which are simple and tractable enough
for which the output of that program
could be predicted in advance.
- where it is possible to sometimes predict
the results of some limited and select subset
of all human-made useful programs/tools;
that this does not imply anything important
with regards to cost/benefit/risk assessments
of any future proposed AGI deployments.
:jac
> - that some programs are perfectly predictable.
> - cite example; the print "hello world" program.
> - ?; would any argument of general AI non-safety
> simply claim too much, and therefore be invalid?.
> - ?; does your argument mistakenly show
> that all programs are unpredictable?.
No.
- that the proof does not conflate
simple finite programs
with complex programs.
- where simple programs
have clearly bounded states of potential interaction.
- as maybe having known definite simple properties.
- where complex programs can have complex interactions
with the environment (users/etc).
- that claims about properties of complex programs
are not so easily proven, by any technique.
- when/once unknown complex unpredictable interactions
with the environment are also allowed,
then nearly all types of well defined properties
become undecidable.
- ie; properties like usability, desirability,
scaleability, safety, etc.
- even where in a fully deterministic world,
such as that of mathematics or of pure computer science;.
- that it takes very little effort
to write a program that is sufficiently complex
that its specific sequence of outputs
are inherently unpredictable,
even in principle,
by any process or procedure --
by any other method --
other than actually running the program
and recording its outputs, as discovered.
- that the action of creating something actual,
in nature, using real physics
that has real uncertainties built in,
means that creating things
whose outcome is unpredictable
(inherently, and in principle,
once stats is factored out)
is really quite easy, and moreover,
happens more often than not.
- that creating things in nature
(in the real world)
whose outcomes are *predictable*
takes significant deliberate effort.
- as the actual work of engineering.
- that the actual real world is not a computer model;
that theory is *not* practice.
- ie; no one has proven
that we actually live in
a computer simulation --
as a proxy for a perfectly deterministic universe.
- that anyone attempting to make the claim
that "the real world" and "computer modeling"
are actually and fundamentally strictly equivalent,
would need to produce some really high class
extraordinary evidence.
- that until such incontrovertible
empirical or logical demonstration
is actually obtained, provided, etc,
then the absence of this assumption,
that the real world is not a model,
that they are maybe somehow different,
will be upheld.
- that the action of treating both classes of AI
as if they were the same,
and/or could be treated the same way,
is the sort of informality and lack of discipline
that, when applied in any safety-critical field,
eventually ends up getting real people killed
(or in this case, maybe terminates whole planets).
- that the only people
who usually make these sorts of mistakes
(whether on purpose or actually accidentally)
tend to be non-engineers;
ie; the managing executives, social media marketing,
public relations and spin hype specialists,
and/or the private shareholders/owners/beneficiaries
of the systems/products that will eventually cause
significant harm/costs to all to ambient others
(ie; the general public, planet, etc).
- that it is entirely possible for multiple
correct proofs to co-exist
within the same mathematical framework.
- when/where both proofs are correct,
that they can co-exist.
- that proving one thing
does not "disprove" another thing
that has already been proven.
- where example; that geometry, algebra, etc,
remained useful as tools,
and are still applied to purpose,
on a continuing basis, etc,
*despite* the development of various kinds
of proofs of impossibility:.
- squaring the circle.
- doubling the cube.
- trisecting the angle.
- identifying the last digit of pi.
- establishing the rationality of pi.
- another example:.
- where given the specific proof
that the Continuum Hypothesis,
as a foundational problem,
was actually intractable
(given available axioms)
was *not* to suggest
that nothing else
in the entire field of mathematics
was provable/useful.
- that it was never claimed
by/in any discipline of math
(or any other field of study)
that it would be able to solve
every specific foundational problem.
- similarly; that no one
in the field of formal verification,
(or in the field of AGI alignment/safety)
has made the claim that,
"at least in principle",
that the tools already available
in such such fields of study/practice
'could even potentially solve
all important foundational problems'
that exist within in their field.
:jdg
> - ?; therefore; can we at least make *narrow*
> (ie; single domain, non-self-modifying)
> AI systems:.
>
> - 1; "safe"?.
> - ie; ?; safe for at least some
> selected groups of people
> some of the time?.
>
> - 2; "aligned"?.
> - ie; ?; aligned with
> the interests/intents/benefits/profits
> of at least some people
> at least some of the time?.
Yes; at least in principle.
- as leaving aside possible problems
associated with the potential increase
of economic choice inequality
that might very likely result.
- as distinguishing that 'aligned'
tends to be in favor of the rich,
who can invest in the making
of such NAI systems
to favor their own advantage/profits,
and that the notion of "safe"
tends to be similarly construed:
ie, safe in the sense of 'does not hurt
the interests/well-being of the owners,
regardless of longer term harms
that may accrue to ambient others
and/or the larger environment/ecosystem.
- that declaring that general AI systems
are inherently unsafe over the long term
is *not* to suggest that narrow AI systems
cannot have specific and explicit utility
(and safety, alignment) over the short term.
- that there are important real distinctions
of the risks/costs/benefits
(the expected use and utility
to at least some subset of people)
associated with making and deployment of narrow AI
vs the very different profiles of risks/costs/benefits
associated with the potential creation/deployment
of general AI.
- that a proof of the non-benefit and terminal risk
of AGI is not to make any claims regarding narrow AI.
:jfc
> - ?; is there any way in which
> the methods that establish alignment/safety
> could be extended from narrow AI to general AI?.
No; that general AI (multi-domain, self modifying,
and operating over multiple levels of high abstraction)
cannot in principle be predicted, or even monitored,
with sufficient tractability closure, over the long term,
to dynamically ensure adequate levels of alignment/safety.
- that saying/claiming that *some* aspects,
at some levels of abstraction,
things are sometimes generally predictable
is not to say that _all_ aspects
are _always_ completely predictable,
at all levels of abstraction.
- that localized details
that are filtered out from content
or irreversibly distorted in the transmission
of that content over distances
nevertheless can cause large-magnitude impacts
over significantly larger spatial scopes.
- that so-called 'natural abstractions'
represented within the mind of a distant observer
cannot be used to accurately and comprehensively
simulate the long-term consequences
of chaotic interactions
between tiny-scope, tiny-magnitude
(below measurement threshold) changes
in local conditions.
- that abstractions cannot capture phenomena
that are highly sensitive to such tiny changes
except as post-hoc categorizations/analysis
of the witnessed final conditions.
- where given actual microstate amplification phenomena
associated with all manner of non-linear phenomena,
particularly that commonly observed in
all sorts of complex systems,
up to and especially including organic biological humans,
then it *can* be legitimately claimed,
based on the fact of their being a kind of
hard randomness associated with the atomic physics
underlying all of the organic chemistry
that in fact (more than in principle),
that humans (and AGI) are inherently unpredictable,
in at least some aspect, *all* of the time.
:jh8
> - ?; can AGI be designed to be
> more transparent and predictable?.
No; not, and still have it be effective
as a *generalized* agent.
- where by game theory alone,
if some agent (say some human criminal)
can predict the actions of the AGI
sufficiently well, then there is now
a vector by which that person can abuse
and disadvantage the AGI, its effectiveness, etc,
and so any such AGI will eventually want to make
its own internal methods and operations
at least somewhat opaque.
> - ?; can the AGI be written/created in such a way
> that is more amenable (transparent)
> to formal (and informal) verification?.
No, not over the long term.
- that there is no valid claim
that all future AGI,
however its written or considered in the present,
will be similarly transparent and amenable
for all of the future, and under all possible
(known and unknown and changing) environmental states.
- while such code may be 'transparent'
at onset, that does not also imply
that the AGI code is:.
- 1; below the complexity threshold
for indirect prediction tractability
for anything short of
executing the code itself,
even at onset, at design time.
- ie; is not 'amenable'.
- 2; unable to modify itself
(at some future point)
and/or take on environmentally defined aspects
that have the net effect of making that code
less transparent (and more and more opaque)
over time.
- ie; is "amendable", but not 'amenable'.
- 3; sufficiently transparent enough
to ensure that microscopic detail creep
is not occurring in means and ways that
are implicative of important macroscopic effects
(ie, via non-linear amplification effects, chaos,
aka as "the butterfly principle", etc).
- ie; is not 'amenable'.
- where final and forever complete verification
would be dynamically needed in perpetuity;
and where the implications of even one failure
are potentially global life x-risk terminal;
that the absence of even one of the necessary factors
would be sufficient to discount completely
the possibility of such verification
even being suggested as a potential.
- where/even though computer programs
can be designed by engineers
to be more transparent and predictable;
that this does not mean that AGI
can also be designed/engineered
to be more transparent/predictable.
- as a false anchoring bias.
- where for circumstances where the level of criticality is lower,
such as with aircraft, large scale building design,
civil infrastructure like bridges, dams, etc,
that engineers do manage to design sophisticated systems,
reason consistently about the consequences,
and write complicated programs that do work
and perform to various safety specifications.
- where in each of these circumstances,
the consequences of mistakes,
though maybe very costly in the short term,
are not self catalytic
to the point that a single bad mistake
would consume the whole world
for all of the rest of time.
- also; the reasoning of engineers
about complicated systems
is not always right;
engineers often also make mistakes,
resulting in aircraft that crash (Boeing 737 Max)
or cars with stuck gas pedals (Toyota's ETCS)
or medical systems that kill (Therac-25) --
all of which were unsafe due to code problems,
*despite* the (maybe failed) application
of formal methods.
- where considering AGI risk potentials
as being 'one mistake only'
before unending consequence,
that it would be strongly inappropriate
to allow anyone to suggest that we rely on
the skill of fallible human engineers
as hyped to perfection by marketing executives
that continually assure us, yet again,
that a technology to "replace humans"
is actually "safe" (will not somehow be worse),
while irreversibly betting the future
of the entire human race,
along with all other life on the planet.
:jke
> - where insofar as a 'superintelligence'
> can be (implement) a virtual machine;
> ?; does that mean that any virtual machine
> can potentially implement a super-intelligence?.
> - as ?; is the orthogonality thesis true?.
- while it is the case that 'generalized superintelligence'
also inherently implies 'Virtual Machine (VM) capability',
that this does not imply, is not equivalent to,
the idea that just any VM "could become" superintelligent.
- ie; that it is *not* the case
that any/all generalized VM machine
will be/become "superintelligent",
if given enough time.
- as that there are other requirements
than just the notion of 'compute' itself
that are needed/required for the notion
of "intelligence" to apply.
- that 'intelligence' inherently refers to some
selection basis of action
that has some direction towards,
or favors/benefits
some objective or goal --
inclusive of implicit/unstated objectives/goals
of "continuing to be intelligent" (to exist, etc).
- as that the notion of "intelligence"
and the notion of "compute" are strictly distinct.
- where in the domain of theory, of pure mathematics,
that the orthogonality thesis is true.
- where in the domain of physics,
in the real world, in actual practice,
that the orthogonality thesis is false.
- as that there will always be some actual
substrate characteristic dependence,
inherently associated with
the existence of the abstraction,
such that the characteristics of the VM
cannot not "leak through"
to affect the intentionality basis
of whatever 'intelligence system'
is built on top of it.
- as that there cannot not be
inherent physical "side channels"
which also have perfected causative relevance.
- as that no effect ever has any single cause,
in any single domain;
as that all effects have multiple causes
in multiple domains.
- that the level of multiple domain entanglement
is non-zero for any real physical system,
such that actions and events in one domain
can be strongly associated with
outcomes in other domains,
such that effective conditional power in one domain
can become effective conditional power in another domain.
- as that there will always be
more than one domain of events
that are the "cause" of any effect
in any other domain.
- where another example:.
- where in actual/typical practice (in the real world);.
- that a typical engineer does not worry about
abstract theoretical concerns like the halting problem
because they can always go out of band
and reset the computer via the domain of physics.
- as an indication of the real difference between
theory and practice, mathematics and physics.
- that the orthogonality thesis and the notion
of substrate independence is/are true in theory
and actually false in practice
(at least some of the time).
- where in the domain of computer science,
which has a mixture of validity concerns
(what happens in the logic of the program code)
and soundness concerns
(what happens in the user interface,
and/or the robot to world interaction,
(ie; via sensory input,
and action/effect/actuator output).
- as that the orthogonality thesis
will apply (be true) to the extent that
the compute/processing infrastructure
is relatively independent of,
and unaffected by,
causative effects in the domain of the real world,
and will be false (not true) and not apply
to the degree that the compute substrate
cannot ever be fully independent of
real world effects and events.
- where for example:.
- where distinguishing between build time temperatures
and run time temperatures (note 1);.
- where/if the compute substrate
tends to fail (make errors)
if it gets too hot,
then/that/therefore,
the outputs of that system
had best be selected in favor of ensuring
that the secondary effects
of the actions taken by the AI
do *not* also indirectly result in
substrate overheating.
- as an inherent "objective learning feedback channel"
which is also relevant to
whatever adaptation/optimization algorithm
is in effect (ie, will apply universally,
regardless of the specific AGI implementation).
- as a notion inherently connected to
the idea of "benefit",
in terms of outcome
(fewer errors,
as a better basis of choice, action, effect,
in both sense input, processing, and output,
and therefore of the optimization itself),
and therefore also of
the basis of choice,
ie; of "intelligence",
as strictly defined.
- whether a program continues to run
(and with what resulting outputs/outside effects)
clearly does not depend wholly on
the abstract logic of the program itself.
- as illustrative of the above:.
- ?; are there any software programs
running on any regular computer
anywhere in the world,
that can both absolutely predict and prevent
the failure of some/any key component
in its own hardware,
(say in in the power supply of that computer,
or in the ALM logic module of CPU itself),
and thus prevent itself, as a program,
from being preemptively and unexpectedly
pushed into non-existence?.
- ?; could such a program, in principle,
even be created?.
- ?; how might it have deal with
the possibility that some ambient human
might trip over the power cord,
or the fact that the local nuclear power station
might unexpectedly have to discontinue grid supply
for some reasonable safety reason?.
No.
- where another example:.
- that the legal system defines a set of laws, a code,
which is to be run on a "human compute substrate".
- where insofar as people (police, court systems, etc)
do only and exactly what is specified by the code,
that some assurances of the fairness of the "system",
of jury trials, evenness of law application
(and interpretation), independence of money, etc,
is at least partially assured (can be sometimes assumed).
- however; a human acting as a VM running some program
inherently cannot be any *more* predictable
than whatever is the underlying base level of randomness
that is inherent in the local ambient environment.
- that humans are not designed to be predictable.
- that humans are not designed --
are not the result of a design process
(ie; that natural evolution is not design).
- that it is hard to anticipate what they
(any human) will actually do, in any real
context/circumstance.
- as the inherent difference between
morality, and ethics,
and why "trolley problems"
are not really final explorations
of what is actual ethical reasoning.
- that humans are unpredictable
(both in principle, and in practice).
- as partially due to the common facts
that they can perform any arbitrary recipe
(within the limits of available resources),
and/or implement some (self chosen) plan/algorithm,
and have multiple overlapping goals/objectives,
and despite all of that,
still have implied, unconscious drives, and needs,
and instinctual motives, and trauma based choices,
as shaped over the long term
by the/their environment, upbringing, etc.
- while it is possible to 'factor out'
lots of common aspects of the behavior and choices
of any other specific person (that you choose to know well);
that that factoring will simply result in
an uncompressible remainder.
- that no form of compression of any data stream
factors out *all* information to exactly zero bits;
which would be the very meaning of 'fully determined'.
:jp6
> - ?; is there there is any way, for anyone,
> to know for sure, that any other person
> is 'safe' and/or 'aligned'
> with the general public interest?.
Not via any sort of purely legal or contractual process,
nor by any past reputational/observational process;
(ie; that which might be attempted to be used for
making predictions of future behavior
as if it was just based on past behavior).
That at best, we might have some reasonable indications
of "trustworthiness" if we had some sense as to the
actual basis of their choices, and some assessment
as to the constancy/reliability of that basis.
- while knowing inner intentions/objectives/desires,
(along with some cumulative estimate of their skill
in actually translating such wants/needs/desires
into actual effective causative action effects)
is a far more reasonable/tractable means
by which to predict another persons 'alignment'
to things like 'public safety' and 'well-being'
(as commons benefit, rather than private benefit, etc);.
- that it remains the case that there is no such thing
as perfected observability of one person,
interfacing with another person
at a purely physical level,
can ever fully or completely know
the subjective interiority of another person,
operating at a much higher level of abstraction.
- when/where in the same way that the 'outputs'
of the 'virtualized execution'
of the 'algorithm of the legal code'
can be compromised and or sometimes fully corrupted
by factors purely associated with the substrate;.
- ie, the participating humans themselves,
the lawyers, etc, expressing their own interests,
sometimes acting purely for their own private benefit, etc,
can sometimes shift and distort the outcome,
the results of the trial via selective rhetoric,
shifts in the timing and resources associated with
applied events (brief filings, discovery reports,
forms of temporary stonewalling, rent seeking, etc);.
- ^; that it can also be observed (it is also the case)
that there is no situation of "perfected" independence
of the algorithm being run, and/or its outputs,
that is not strictly dependent on the integrity of the VM,
and that moreover, that no VM has perfected integrity
independent of all physical/environmental factors.
- as that all pure machine silica based CPU substrates
are at least partially sensitive to the effects of
radiation, wide temperature swings, high voltage fields,
strong ambient magnetic fields, radio frequency effects,
strong mechanical vibration and impacts,
adverse chemical agents, etc.
- ?; is there some (clearly mistaken) implied belief
that a/the/any/all superintelligence will care more
than humans about human interests?.
- as roughly analogous to false belief in divinity;
where someone becomes 'scientifically atheist'
that they do not give up their want/need/desire
for some parental god like figure to solve all
of their hard/impossible problems.
- that there is exactly zero principled reason
to suggest/think that an AGI would be more likely
to be aligned/safe
than any person would be.
- where after all, that it is quite easy
to know that one engineer
could be both unsafe and unaligned
with any other engineer
(ie; one engineer, could, in principle,
shoot any other with a gun).
:jrc
> - ?; would not the Rice Theorem apply to the AGI too,
> insofar as it also cannot know if the code it is to run
> will be "safe" for itself also?.
That is correct.
- where in the same way that a typical engineer
does not worry about the halting problem
in actual/typical practice,
(because he can always go out of band
and reset the computer via the domain of physics),
that an AGI would be similarly unconcerned
about halting problems, Rice Theorem, etc,
(and similarly be *also* unconcerned
about AGI safety/alignment, etc)
since it could (via it also having 'generality')
go out of band and 'halt'
(discontinue using/modeling, etc)
whatever 'subroutine' it is invoking.
However, that the safety implications for the AGI
are of a different nature than the implications
for humans and the world.
Where the impact/harm of a misaligned program
are to the AGI a purely local effect,
in the space/environment of the AGI,
the total environmental takeover of
all of the AGI systems in itself
constitutes a much more complete non-local
and permanent in time effect with regards
to human/life safety.
:jt8
> - ?; is there any possible way to have
> or construct some notion AGI
> which is *not* general in the sense
> of being able to arbitrarily interact with
> (and/or emulate, or simulate, or call, or integrate)
> other unknown, and potentially unsafe code,
> (and/or code which has eventually unsafe,
> inherently unpredictable outcomes, etc).
- ?; how can we, or anyone, ensure that an AGI
will not ever extend or increase its "generality"?.
> - ?; is there any process or method or design
> by which an AGI could be constructed
> in such a way
> that it is permanently and absolutely prevented,
> forevermore,
> from emulating, modeling, or integrating
> and then potentially calling, or consulting,
> any other arbitrary process/code/algorithm?.
- ?; is there any way to construct an AGI
that will/can somehow be prevented, forevermore,
from extending or increasing its "generality"
by:.
- 1; adding to itself some other module or program?.
- 2; emulating and "running"
some other arbitrary program,
so as to get advice as to its own choices?.
- 3; maybe even just using some external service
like any other client of an API, maybe electing to
treat the output of that service
as some sort of oracle or influence for its own choices?.
- ^; no; such design concepts
are inherently self contradictory.
- where in any of these cases,
like attempting to be predicting the future
of/for anything else;
that we cannot know in advance, for sure,
the specific nature of any of these programs,
regardless of its method of integration
(call internally, call externally, or emulate).
- that even fixed deterministic programs
can easily be made to call other
unknown arbitrary other programs,
ones that have unknown/unsafe side effects,
and thus, by proxy, become unsafe themselves.
- that we could not completely and accurately predict
the outcome of any deterministic process
calling any other non-deterministic process
(ie; which contains hard randomness,
perhaps by consulting a Geiger counter),
means that determinism/tractability
is actually the *weaker* constraint.
- that any particular program,
such as that, for example,
which is selected for use in some AGI system
(at the moment of its being "turned on")
has more than the a very low level of complexity
minimally needed to be and become,
in one fashion or another,
"Turing Complete",
and therefore, able to extend to also include
any other arbitrary program
as a sub-aspect of itself,
is therefore fully within
the scope of Rice Theorem considerations.
- as that there is no decision procedure
which can determine in advance
if any given property,
such as that of 'safety' and/or of 'alignment'
applies to the overall AGI program
for any future moment of its execution.
- that there is no formal or practical way,
even in principle,
to establish even a fairly limited set
of adequate bounds
as to whether or not
any sufficiently complex and general system
with self-modifying properties
(ie; a learning system,
a system that adapts itself and/or its behavior
to its environment, operating context, etc,
as a self adaptive system, etc),
will eventually, and even very likely, exhibit
significantly unsafe and unaligned behavior,
create unsafe and unaligned outcome conditions,
(and will thus also result in such conditions
creating enduring manifest harm, etc).
- that the assessment is that
the general costs and risks
associated with AGI development
will (eventually) greatly exceed
any reasonable (non-hyped) assessment
as to any potential benefits
that the use of such a system/technology
could ever potentially have,
in any context more generalized and extended
than even one person's own limited lifetime.
- that attempting to build AGI
is a strictly negatively asymmetric "bet"
both from the perspective
of a single large institution of people
and (more so) from the perspective
of all living persons/beings and their descendants:
- where such an attempt fails,
there is no (direct) benefit
(a loss in comparison to opportunity costs forfeited).
- where such an attempt 'succeeds',
it opens up uncountably many unknown pathways
that converge over time
on ecosystem-wide lethal changes
to conditions of the global environment,
with any envisaged benefits of AGI and
engineered-for alignments
of AGI functions with human intent
increasingly fragile
to recursive nonlinear feedback between
changes introduced by/in parts of AGI
and the larger containing environment.
- that this is analogous to playing Russian Roulette
with the lives of all (human) beings smeared out over time,
with any (illusory) backer of (promised) benefits long gone.
- that betting on AGI
converges on existential ruin.
- that this does not, in any way,
contradict the idea and fact
that we can make non-recursive systems,
that are known to be useful
to a wide range of people.
- that any generalized superintelligent
learning/adapting/optimizing machine (AGI, APS, etc)
will be able to execute arbitrary programs as subroutines.
- as an inherent functionality, truth of being, etc.
- that a "safe" or "aligned" machine/process/algorithm
that calls/invokes (via whatever method,
direct or indirect or simulated)
some/any unsafe or misaligned
machine/process/algorithm/code
therefore (likely) becomes itself unsafe/unaligned
(since it also cannot predict/assess anything which,
in principle, cannot be predicted or assessed,
any more than we could in the 1st place).
- where any (generalized) (self learning/modifying) program,
can in principle call or use any other program,
then to assure alignment/safety of a general AI system,
we would need to ensure alignment/safety of *all* possible
future code/systems/procedures that it could invoke,
or that those systems might themselves invoke
(especially if they themselves are generalized intelligent
or semi-intelligent agents, etc).
- as a clearly impossible safety requirement.
- where by way of example; let us posit explicitly that
by some arcane literary magic,
that we have created an instance of a general AI,
an artificial agent or metal robot, etc,
that is inherently perfectly following
of Asimov's Three Laws of Robotics.
- that the assumption here is that by somehow
*requiring* that the three laws are perfectly followed,
that we can then assert that that robot/intelligence/agent
is therefore 'safe', and 'aligned' with human interests
and well-being, etc.
- where/by making the robot AI simpler, more finite,
more deterministic, and more like narrow AI,
that the chances that 'formal methods' could potentially
be used and that some clever engineering could be done
so as to make and ensure our (inherently finite)
"Generalized Asimov AI agent" is 'safe' and 'aligned'.
- where given an Asimov Robot, we can ask:.
> - ?; is it ever impossible, in principle, to make
> a known verified and perfected/proven "safe" system
> of generalized agent intelligence
> operate in a way that is somehow unsafe/unaligned?.
- at which point, we can notice the necessary answer:.
No, it is not impossible to take a safe system/program
and make it do unsafe things, and create unsafe outcomes;
all that is needed is to allow it to combine or access
any other unsafe/unverified programs/code.
All that is needed is to have any situation
where the 'safe' robot/program consults with and/or
is influenced in its output choices/actions/behaviors
(in some/any way) by some other unsafe device --
such that it, for instance, can figure out how to deceive it
into doing things that are unsafe/unaligned,
but which would remain undetected/unnoticed by
the Asimov robots three laws detection system.
- that this is itself a kind of optimization problem,
where the 'unsafe system' can maximize its capability
to specifically deceive the Asimov Robot.
> - ?; can we have that Asimov Robot somehow determine,
> in accordance with its own desire
> to follow the three laws and ensure that
> "humans do not come to harm"
> through its action/inaction;
> that it will do its very best
> to only engage with programs/code/people/situations
> who are themselves safe, and/or which,
> in extended interaction with the Robot,
> have only safe outcomes?.
- where for every finite robot/process that is safe,
that it will necessarily interact with
outside real-time processes,
as connected within a (necessarily larger) physical world
in ways that would *not* be able to be
detected by that lessor agent
as being unsafe/unaligned, etc.
> "I would use this Ring from the desire to do good;
> But through me, it would do great evil".
-- paraphrase of Tolkien, in the Lord of the Rings.
- that the basic problem is that any examination
(of any/whatever code/program is coming into interaction
with the Asimov Robot, will have to be examined by
that Robot to determine/predict safety outcomes,
which itself makes that process subject to the Rice Theorem.
- that it is absolutely inherently the case
that there is no such thing as a superintelligence
(no matter how it is constructed, what it is made of,
and/or what algorithms or learning/adapting process it uses)
that will somehow be "more able" to predict the outcome
of calling any specific subroutine,
(in violation of the Rice Theorem)
than any human engineer/mathematician would be.
- that anything that is mathematically impossible,
including the halting problem and the Rice theorem,
will remain that,
even for a superintelligence.
- that anyone who claims otherwise
is either a marketing/sales person,
(trying to sell you something)
or a politician/executive
(ie; trying to deceive and/or seduce you,
by pretending that mathematical truths/facts
are somehow negotiable).
- that the idea that "everything is negotiable"
and that there are no facts, there is only politics
is true only in their narcissistic idealized world,
that fiction mostly only begets more fiction.
- that the notion that any agent is able to be
'having the property of safety/alignment/etc'
is now requiring that such agent
never, at any point in the future,
come into contact with some other, arbitrary, unsafe
agent/process/algorithm (program, model, recipe etc)
that it cannot somehow misunderstand as having
unsafe implications.
- that nothing is going to be a perfect prognosticator
at predicting the future of everything else
(all other processes/algorithms/programs
locally interacting in the entire universe),
and thus know what to interact with
(and/or be influenced by)
and know what to not interact or be influenced by,
even indirectly, through all possible other channels
of interaction, overt and covert, etc.
- as that not only must our "perfectly proven safe agent"
successfully predict the outcomes of its interactions
with any single other (potentially intelligent) agent/algorithm,
but it must also predict all possible interactions
via all possible channels of such interactions
of all such other agents/algorithms, etc.
- that the finite and bounded
will not ever be able to predict,
accurately, the interactions of that which,
though also finite,
is at least potentially unbounded,
or at least, significantly greater than itself.
- as also applying to all generalized agent
intelligence/superintelligence,
by the mere fact
that it its computational ability
is both finite and bounded
in time, space, and energy.
- that it will always be possible (and maybe even likely)
to have a known verified and perfected/proven "safe" system
operate in a way that is unsafe and/or unaligned
if it is ever allowed to interact with any other algorithm
that cannot (by any technique) be itself proven to be safe.
- where there is always at least one such algorithm/process,
and where it is impossible, in general,
to determine which ones are which,
then the only safe/"aligned" things it can do
is to not interact with any other process/agent/system.
- as that even the interactions
between any hypothesized "strictly safe systems"
and nearly anything else can result, in aggregate,
in overall unsafe/unaligned outcomes.
:jwc
> - ?; is there any way that a AGI could,
> even in principle,
> be engineered in such a away as to not
> be able to execute arbitrary subroutines?.
No.
- to claim that a Turing Equivalent Machine
could even conceivably be compelled to act
in non-Turing Equivalent ways
is simply illogical.
- as basically strictly equivalent
to claiming that "all AGI is already inherently safe".
- ?; how do we prevent people from tautologically assuming
that we will have what we are wanting to have --
that AGI is going to be composed of 'aligned code'
so that it makes sure it remains amenable to 'tractability'
so that we can continually verify it is 'aligned'?.
- ?; how is this not assuming
what some particularly motivated reasoners
will be wanting, somehow, mistakenly, to prove?.
- ?; how is anyone ever going to ensure that something
that already inherently has Turing equivalent generality
is not also going to have Turing equivalent generality?.
- ie; the mere fact of the claim
is itself already a direct a contradiction.
- that the manner of how an algorithm
is specifically divided up into subroutines
is an arbitrary convention for the convenience
and understanding of the programming engineer --
it has no ultimate formal basis beyond that.
- where from the perspective of a learning algorithm,
the boundary between what is 'main code'
and what is a subroutine
is completely arbitrary --
the changes associated with learning are exactly that,
however they are expressed, remembered, and/or recorded.
- that it is hard to see that it is possible,
even in principle, to conceive of some notion of
*generalized* learning process --
ie, one that can self expand
so as to operate on/over/within any domain
of action, inclusive of itself --
such that it can be wholly and absolutely prevented
from executing any arbitrary algorithmic subroutines,
for all of future time,
without directly contradicting at least one
of the notions of 'generalized', 'learning', or 'process'.
- ie; either it is some sort of
self modifying algorithmic intelligence
(appropriately adaptive to its environment)
or it is not.
- if it is, then it is inherently unpredictable
insofar as we never know what it is going to learn,
and therefore do (its actual behavior; thus safety)
any more than we can, in principle
predict all of the future,
absolutely and finally.
- that the notion of 'generalized learning process'
inherently cannot not somehow imply
some real form of 'self modification' (ie; actual changes)
which itself cannot not somehow imply
some notion of 'arbitrary code execution' --
ie; potentially and including possible calls
to external processes, virtual simulation and modeling, etc,
*all* of which are inherently unpredictable in advance.
- that not only can we not expect, in principle,
to be able to model all aspects of
the generalized process of modeling,
we also cannot expect to be able to
set explicit finite limits
on even the ranges of changes involved
in arbitrary learning --
ones that actually affect sensing,
the ranges of abstractions sensed,
the internal state transforms that result,
and the behavior and outcomes of that, etc.
- that ensuring the safety of any general AI system
would require ensuring the safety of *all* future,
and presently completely unknown,
programs/code/systems, etc,
that the AGI could potentially
come into contact with,
attempt to execute, integrate into itself,
consult as a client, etc.
- where/ insofar as some notion *prediction*
is fundamentally and intrinsically necessary
for *any* possible concept of control/correction,
as a feedback channel as inherently necessary
to ensure any possible concept of safety, alignment, etc;
then/that/therefore the notion of AGI safety
is explicitly impossible.
- that therefore it is *not possible*
to show or establish
that *any* superintelligent/general AI system
can be made safe.
:jyu
> - ?; can anyone at all,
> even in principle,
> even with unlimited resources,
> somehow "make" *general* AI systems "safe",
> (or "aligned", etc,
> for any reasonable definitions of these terms,
> as applied over any reasonable interval of time)
> for *any* actors/agents
> which are not themselves
> strictly equivalent to,
> or functionally equivalent to,
> or at least sub-aspects of,
> the GAI embodiment itself?.
> - ie; as safe for all outside humans
> and other (carbon-based living) actors?.
No.
- that it is (both mathematically and physically) impossible
to ensure that a sufficiently complex
General Artificial Intelligence (or "superintelligence")
does not perform an unsafe action.
- as that there is no possibility
of designing complex AI systems
without unintended consequences.
- that the basic possibility of unintended consequences
to/for/with any action/choice/system
will forever remain non-singular and potentially unbounded.
- ?; what is the likelihood of unintended consequences?.
- as a better question.
- ?; does the likelihood of unintended consequences
increase with:.
- the complexity of the intentions?.
- the complexity of the beings/agents
implementing those intentions?.
- the likelihood of those beings/agents
having their own (potentially different) intentions?.
- ^; yes, for *each* of these aspects.
- that the likelihood of unintended consequences
increases with some product or exponent of these factors.
- where these factors can combine
more than additively and more than multiplicatively,
in various forms of feedback cascade, etc.
- that the likelihood of unintended consequences
with a complex system (inclusive of self definition recursion)
is strictly very much greater than for any simple system
(which lacks these critical features).
- that there is a risk of a 'Motte and Baily defense' (MBD)
being attempted in the space of
the claim "we can make AI systems safe".
- that any argument with an underlying MBD format
is a type of logical fallacy
similar to equivocation.
- as an argument of rhetoric,
of human fallible persuasiveness,
rather than one of actual truth.
- where the notion of 'adequate'
(as applied to 'safety')
for any proposed action 'X'
is to show that the probability
of all categories of risk/harm/loss
of the value of what is to be "safe"
is sufficiently low -- ie; _strictly_lower_ --
than the possible cost/risk/harm
incurred by *not* taking that risk
(ie; in the form of opportunity cost, etc);.
- ?; is there (even in principle) any practical way
to establish/ensure adequate levels of safety
in future AI systems?.
- ^; that there is no/zero possibility
of anyone (of any human, especially)
designing complex general (self intentioned) AI system(s)
without there also being a significant excess potentiality
of significant adverse unintended harmful consequences/risks
(which may be irreversible, to humans, the biosphere, etc).
- ie, as risks/costs/harms
resulting from 'X' (ie; the use of AGI systems,
the making of such choices to make/deploy AGI, etc)
that are well in excess of any realistic
and appropriately generalized assessment
of the purported benefits/profits of 'X'
(to who, when, where, etc).
:k2q
> - ?; what are the overall implications/assessments?.
- where/when it comes to superintelligent/AGI systems;
any real notion of safety and/or alignment
with (any) human (or life) interests
is strictly, formally, finally, impossible.
- that this fact/result remains true
no matter how one defines "safety",
or what processes/methods one attempts to use.
- where/for no matter how exhaustive
one thinks some review process is;
where for superintelligent/AGI systems;
that there will be actual programs/systems
which will fool/defeat that review process.
- that the Rice Theorem
will always end up being relevant
to any code/systems
that have high x-risk factors.
- where given all of the math and empirical results
already in place, from *multiple* distinct fields of study
the indications are very much more strongly
in the direction that research into
formal verification of AGI safety
*is* actually and completely pointless
(ie; that success defined in this way
is *always* impossible).
- for even more example; see the proofs,
arguments and cases for uncontrollability
(absence of safety/alignment, etc)
collected by Yampolskiy et al.
- that the Safe AGI impossibility result
(as established via whatever available methods
will make the most sense to each private reader)
does completely preclude any meaningful work
on *general* AI.
- where considered on a neutral objective
apples-to-apples comparative basis;
the indications are at least
very much greater in the negative direction
than are any similar indications
that there is any actual benefit at all, to anyone,
to be had from *any* generalized AI
development and deployment effort.
- while there can *maybe* be some solutions
to practical narrow AI safety challenges,
this does *not* imply that there can be
any possible foundations for real safety
in *any* possible future superintelligent system.
- while it might be hard/difficult
to predict the outcome of some narrow AI systems;
that it will still be strictly impossible
to predict *any* aspect of
the output/effects/consequences of
a superintelligence.
- where with a proof of AGI eventual terminal risk
that the following ideas/notions are forever
also made:.
- 1; that there was *never*, even in principle,
any way to make general AI safe, or aligned, etc.
- as a basic impossibility.
- 2; that there was never any actual utility/benefit/profit
to/for/of the creation and deployment of general AI
for anyone, at all, *ever*, over the long term.
- ^; and that therefore we should also 'put away'
the tools and techniques used to develop such systems
(and all of the associated concepts, IP, etc).
- as that they are inherently unmitigated risks:.
- to all peoples on the planet, for all future time.
- to all of the rest of life on the planet
and maybe even inclusive of the planet itself.
- that a failure to do so (and/or to develop, enact,
and enforce, effective and realistic policy and procedure
for all institutions and communities internationally)
is equivalent to a failure of government.
- where the final function of all government
is to protect the land and the people and/or
to try to ensure that the land and people survive,
and then, if it is really good, thrive.
:k5u
> - ?; what is the recommended "call to action"?.
- where/given that general AI safety/alignment
is considered/proven to be impossible
that the responsible things to do are:.
- 1; to *not* attempt to design/build/use AGI.
- as "do not raise that which you cannot banish!".
- 2; to acknowledge and support efforts
to ensure that no one mistakenly believes
that *ANY* AGI/APS, etc would/could be
"desirable/practical" and/or,
"economically advantageous/beneficial",
to anyone at all, *over the long term*.
- as that any and all short term gains
are far outweighed by costs/harms
over the long term.
- ie; to *not* attempt to convince/delude/deceive
other people/investors into thinking, falsely,
that "safe AGI" is "in principle", possible.
- 3; to socially/legally sanction anyone,
who in any way, attempts to build
*any* sort of superintelligent/AGI/APS,
even indirectly, by accident, etc,
to the absolute maximum extent possible.
- ie; that no one should allow anyone
to play permanent planetary roulette
with something which in all cases
is equivalent to a "doomsday machine";
ie; it is maximally irresponsible.
:notes:
- 1; this example is in contrast to
the hazard of 'getting wet'.
- where in actual computers,
such as a laptop sitting on my desk,
the cpu die can get hot,
from for example,
doing too many math calculations,
and/if it is also the case
that the fan and cooling system
are not configured properly.
- where at that point;
that the CPU will generally quit
(computer will blue-screen fault).
- It is almost never the case,
in actual practice, day to day,
that computers get wet internally.
- Moreover, the action of "too hot"
is due to something internal to the
"choices" of the computer itself,
whereas getting wet is usually an exogenous event --
ie the user spilling a drink on the keyboard
- unless we are talking about autonomous robots
which tend to be mobile near rivers and lakes, etc.
~ ~ ~
- Where considering attributions/credit;.
- 1; that these notes/comments, though fully independent of --
are deeply consistent with, and strengthening of --
the observations/proof described and given in
(@ [Superintelligence Cannot be Contained] https://arxiv.org/pdf/1607.00913v1.pdf)
by Alfonseca et al, on Jan 5, 2021.
- 2; That this essay was strongly shaped and influenced
by the (@ [Response] https://www.cser.ac.uk/news/response-superintelligence-contained) to 'superintelligence not containable'
by Jaime Sevilla, John Burden on 25 February 2021.
- which, as a "rebuttal", that the arguments
that they present for sure fall short
where/insofar as the meanings/distinctions
of several key concepts
are tacitly assumed, equivocated, and conflated:.
- 1; the functional differences between narrow AI vs general AI.
- 2; simple systems/programs with complex ones.
- 3; recursive architectures with non-recursive architectures.
- 4; the means, methods and concepts of "proof".
- 5; specific meaning of the term 'specific'
(as multiply used in inherently ambiguous ways).
- 6; the scope and extent of risk of local limited problems
(in time/space) with global problems (everywhere forever).