The 'threat model' described herein
is something that is inherently and unavoidably
involved in *all* AGI safety alignment work,
inclusively, guaranteed.
That *any* and *every* AGI 'safety/alignment' process
is a special case of a 'causative feedback process'.
- that there are no/zero exceptions to this truth.
- as anything and everything that is true of
the use of causation, as a feedback process,
and/or an 'algorithm', also as a 'modeling process',
and/or of 'signaling', inherent in any 'feedback',
will for sure be true also of any AGI safety/alignment
technique, method, proposal, methodology, algorithm, etc.
That *any* and *every* 'causative _feedback_ process'
depends on at least *all* of the following:.
- 1; the completeness of the sense input.
- as the input data that is being processed by
or used by, the proposed AGI alignment technique/etc,
*and* also as the input data that is intake into
the AGI system, so that the _model_ of the key aspects
of the AGI system has reasonably correct input data.
- 2; the completeness of process modelability.
- ie; that the proposed AGI alignment technique/etc,
has to actually be that, some sort of algorithm,
itself conditional on only causation and logic
*and* that the AGI system itself has to be
at least partially modelable, in whatever key
aspects, are necessary to establish 'alignment'.
- 3; the completeness of predictability.
- ie; as the assumption that the model
of the AGI system
when also given the model of the input data
will allow for sufficient predictability
of whatever key aspects of the future AGI outputs
will be, before the real (non-modeled) AGI
would actually act, in a way that is assessable
and comparable to desired outcome states.
- 4; the completeness of some type of a comparison
to a reference (where that reference implicitly defines
whatever is the meaning of "aligned" or "safe").
- 5; the completeness of some type of signaling
(ie; what is the "feedback", how the model itself
works internally, how the model alerts the actuator
so that the model 'controls' and constrains the AGI
so that it is acting/operating in an aligned way).
- 6; the completeness of some type of
conditionalization (ie, what is "effectiveness")
of at least one of
the following AGI aspects:.
- output actions.
- internal process.
- sense input.
- ie; where/if the AGI conditionalization
does not have sufficient "power" to actually constrain
the AGI from doing bad things,
then even a very good alignment algorithm/technique/process
will not result in an aligned/safe AGI.
That aspects 1 thru 6 (inclusive) are true of *any*
and *every* 'causative feedback process'.
- that there are exactly none/zero exceptions.
- that *all* causative feedback process
will have/require all six of these aspects
(with no exceptions).
Therefore, a/the/any/all AGI alignment/safety
enforcement protocols, techniques, methodologies, etc,
will also (for sure) be required to have and implement
all six aspects (with no exceptions),
*and*, where given that requirement,
if *any* of these six aspects, for whatever reason,
cannot be implemented, then that/therefore
AGI alignment/safety cannot be implemented.
Unfortunately, _every_single_one_
of the listed dependencies (all six of them)
are individually impossible.
(Ie; they cannot be 'complete enough' to achieve
anywhere near the minimum threshold necessary
required for AGI alignment/safety).
Insofar as *all six* are necessary together
to achieve AGI alignment, that there are effectively six,
at least semi-independent, complete and whole arguments
against there being *any* notion of AGI alignment,
as based on nothing outside of
the notions of causation, conditionalization, and/or logic
(ie, all different manifestations of the same idea).
It is a separate exercise to show,
for each one of these six aspects
(necessary to any feedback system),
exactly why they fail, and cannot be implemented,
due to logic of causation and the requirements
of any rational notion of AGI safety/alignment.
However, a lot of this has been done already elsewhere,
and need not be repeated here.
All that is necessary herein is to show that 'threat model'
is actually directly and intrinsically connected
to the very notion of what it could ever even possibly mean
to be an "AGI Alignment Protocol", process, etc.
As such, it becomes increasingly obvious that there
is an inherent and unavoidable contradiction between
any notion of 'AGI persistently existing' and 'AGI is safe'.
These arguments apply, and are relevant to,
the overall AGI alignment concept, in total,
and cannot be escaped by any form of wishful thinking.
~ ~ ~
As a side note, in addition to all the above,
(as if that was not already sufficient),
the control and/or "error correction" system
must itself not make mistakes/errors (ie, be/operate
within reasonable limits, be itself predictable, etc).
What tracks errors/mistakes in the system
that is modeling the system to apply error correction to?
Insofar as the same logic applies
in proportion to the complexity of the error correction,
as it does in the complexity of the AGI system itself,
then there is a kind of regression cascade problem.
For example, in any system that is complex enough to
be Turing complete (which is not very much, overall,
as compared to the complexity of the AGI system to be
monitored/checked, etc) is also complex enough to
implement some sort of virtual machine dynamic.
Any VM is a strict indicator of impossibility
in regards to predictability (ie, everything
depends on the data, which immediately becomes code,
and any amount of uncertainty in the data --
even a single bit change, results in drastically
different outcomes; ie, near perfected unpredictability
in the face of any single point errors at all).
Basically, all sorts of Rice Theorem limits apply.