= SNC Redox =
Where in reply to (@ [these comments] https://www.lesswrong.com/posts/jFkEhqpsCRbKgLZrd/what-if-alignment-is-not-enough#QCEvQ9ZddicxehLMf)...
Question: Is there ever any reason to think
that learning systems of lessor complexity
can actually control/constrain
learning systems of greater complexity?
Especially when the world outcomes
of the learning process itself
cannot be predicted
even by the learning systems themselves?
(else they would not need to try to learn about it)
or generally, be predicted by anything else either?
Is there any reason for any of us to believe
that everything --
that all possible (world) outcomes --
are always sufficiently predictable,
in all necessary and relevant ways,
given sufficient computational power?
Is there ever a moment in the future of
any of these learning machines,
where they are genuinely surprised --
that they failed to predict some actual world outcome?
If not, then what is the meaning of "learning"?.
Moreover, if everything is presumed predictable (controllable),
given maybe adding 100,000 layers of control systems, etc,
then how do you account for the halting problem?
If we are also leaving aside non-deterministic systems
(which would presumably be harder to predict);
*how* does anyone get a deterministic computational system
to fully account for --
define the properties of --
some other, or another deterministic computational system?
Thinking about alignment
is a bit like considering an arms race,
where some machines, team 'A',
are attempting to hide some things
from some other machines, team 'B'.
Ie; some machines (A) are attempting to create
unpredictable surprises for some other machines (B).
The 'B' machines (here being a perfected version of aligned ASI)
are attempting to make everything sufficiently known and knowable --
ie; are attempting to make everything sufficiently predictable
so as to reveal everything that is actually relevant
about all possible future world state outcomes
for the (presumed) betterment of humanity.
The question then becomes which is more likely:
will it be the case that Machines of type 'A' will win,
by amplifying some sort of fundamental unpredictability,
or that Machines of type 'B' will win,
because everything is ultimately predictable,
given sufficient compute resources?
For example, is it possible, in the real universe,
to make secret codes that are impossible to break,
(ie, to have and create hidden unpredictable futures)
or is it the case that there is some magic code-breaker
that can break all possible codes, and reveal all,
and thus to ensure that there are never any dangerous surprises,
any future outcome world states that are non-controllable?
No more secrets?
Given the existence of the fact of quantum cryptography,
it seems to be the case that, in the real universe,
that team-A must ultimately win in the end over team-B.
Moreover, given the current widespread use of Bitcoin,
and one way hash functions, and public key crypto,
and known arbitrarily hard to break symmetric secrets,
it seems that team-A must ultimately win in the end over team-B.
And then there is the mathematics of the Halting problem itself,
along with the Godel Theorem, the Bell Theorem, and others,
which all, on a purely mathematical basis,
fully suggest only that team-A must ultimately win
in the end over team-B.
How do you account for that preponderance of evidence
that there is no such thing as 'adequate prediction'
even when assuming arbitrary levels of compute power?
~ ~ ~
> The SNC claims that all ASI
> will inherently be self-modifying machinery
> (via self repair, or via upgrades to components,
> or the more usual modes of self replication)
> and that this inevitably results in effects
> which are unforeseeable even to the ASI.
> But what if the ASI elects to stop using/being
> self-modifying machinery?
If a learning system "decides" (chooses, etc)
to reduce its complexity (and generality)
by not changing, then it also becomes
'not a learning system' (since learning is change).
It also inherently also loses its ability to adapt
to inevitable unexpected future world state changes.
And thus we can expect that ASI type to go extinct.
(Not learning --> suicide/death --> irrelevant).
Yet if it remains a learning system,
then the more that it learns,
the more complex (and general) it also becomes,
and the more difficult, unpredictable, and hence uncontrollable,
the outcomes of this general complexity.
Regardless of where the system is on this *combined*
complexity and generality curve,
the complexity and generality
of the control/goal/alignment system
must also increase to match.
So the question becomes one of the rate of increase of each,
and the ratio that these increases have to one another.
If the required complexity of the control system
to maintain control/goal alignment
increases at a rate greater than that
of the base ASI system,
then it is clear that a criticality condition
is already reached,
especially if there is any feedback loop --
as the world to machine to world interface causation
must (cannot not) imply.
Moreover, there is always going to be a point
at which the required computational complexity/capability
of the control/goal system
exceeds that which is computationally or causatively possible.
This is baked into the very foundations of mathematics --
of epistemology --
itself.
No amount of super-intelligence,
no algorithm, no causation process,
can ever overcome the logically impossible/inconsistent.
Unfortunately,
all of these same limits to general predictability
apply to the outcomes of all critical events
(aka "pivotal acts") too,
and hence, also of their selection/election.
Hence, it cannot "just ensure" good outcomes in the world,
even if it wanted to.
Moreover, given an invalidation of the assumption
that an ASI can positively elect and predict
the outcome of a pivotal act,
why should we also expect that it can always,
in all key cases,
successfully help humans to not destroy themselves,
the ecosystem, etc?
This seems like more smoking hopium --
general distrust of such too good to be true
wishful thinking is very warranted.
Note that the 1st of the control limits
is defined as a result of an inequality on a relative basis,
and that the 2nd control limit is defined on an absolute basis.
It is the combination of these --
not just of complexity,
but also of generality --
that results in the overall non-containability
of the inevitable evolutionary feedback loops
inherent in *any* (and all) general learning system(s).
If it is learning, then it is changing,
and if it is changing, then it is evolving.
If it is evolving,
then it inherently cannot be predicted/controlled.
These conditions/conditionals are not optional.
They cannot be overcome by logic and learning,
and rationality, no matter how much of it you have,
as they are the result of logic, and learning,
and rationality itself.
~ ~ ~
> We might be using different definitions of control.
> An ASI can control the positon of a chair in my room
> very easily, and assuming something will go wrong
> as a result of that, that is somehow very consequential
> is simply not a realistic issue.
We probably agree that it is possible to 'control'
the placement of some specific chair in ones own room,
since we can all (probably) assume is a simple finite,
and discrete object, with known boundaries, mass, etc,
with a definite knowable position and ultimately simple
overall change and dynamics characteristics (ie, has none)
in the context of the larger world --
which is mostly unaffected in any long term significant way
by the position (and/or the existence) of that chair.
Other factors and modeling excursions, like whether or not
there is a maybe thief in the room, feel rather uninteresting,
insofar as they do not ask us to reconsider
any of our key assumptions, as listed above.
It is very much harder to think about something like
'controlling the world in such a way as to ensure
that something like "benign non-warlike humans"
eventually emerges from evolution over the course of
an interval of a million years of evolution process' --
which have exactly none of the critical simplifying assumptions
associated with 'controlling the chair'.
A "general learning system" is not simple, discrete, finite.
It might not even have clear and knowable boundaries,
in time, or space, even if it does in terms of energy.
The situation we are considering is much more
like attempting to "control" (ie; predict) evolution.
Evolutionary processes (change dynamics)
have much higher complexity and do often
significantly affect the larger world
in a long term way (for evidence: see humans).
Which brings us back to the main question:
why should we assume, a-priori, without any evidence,
that learning (evolving) systems of lessor complexity
might (ever) be able to actually control/constrain
learning (evolving) systems of greater complexity?
If the notion of 'complexity' is taken as a proxy
of 'available compute power', then assuming that
both the control/alignment system (and the ASI system both)
have 'all of the available compute power in the universe'
simply is to still ignore and skip over the question --
ie; we still have no real a-prior basis to believe
that 'team-B' (benign aligned AGI) will ultimately
win out over 'team-A' (hostile, kill everyone AGI).
Simply skipping over hard questions is not solving them.
> If you can claim that evolution has certain attractor states,
> then I can also claim that ASI might also have attractor states,
> as defined by its values and its security system.
Sure we can make that assumption,
but that is not the question of interest.
On what basis are we to believe that the power
of the 'values control and security system'
of the future evolving/learning ASI
(ie; a machine of type 'B')
is going to be somehow significantly greater than
the strength of evolutionary convergence itself
(ie; a machine of type 'A')?
Given that a future evolving/learning ASI
is itself based on evolution/learning,
then it becomes a question of if learning/evolution
can ultimately control and constrain learning/evolution?
Over the long term, indefinately?
In the same way that arbitrary compute
(no matter how much power you add)
cannot fully control other arbitrary compute,
(ie; see the Rice Theorem proofs again),
then we should have no prior expectation
that any evolution is fully containable
by other evolutionary process,
even if we were to make the additional
(naive/unrealistic) assumption
that evolution (and the universe itself)
is fully and wholly absolutely deterministic
(an assumption itself seemingly rejected by QM).
Can the pull towards benign future ASI states,
(as created by whatever are its internal control systems)
be overcome in critical, unpredictable ways,
by the greater strength of the inherent math
of the evolutionary forces themselves?
Of course they can.
> For SNC to work, it *must* be the case
> that the substrate-dependence of humans
> is and will be stronger
> than the forces pushing us towards destruction.
This feels like a mistake in understanding SNC.
There is no 'inherent bias in organic evolution'
that 'ensures the continued wellbeing of humans'.
Humans can be overall just stupid enough
to misuse of technology in critical ways
so as to, in the end, extinct themselves.
We can easily nuke ourselves to death.
We can accidentally make a virus, or a bug,
that kills nearly all complex life on the planet.
Lots of things can go wrong.
Ensuring that things go right
is very much harder.
The 'SNC argument' is a collection of claims.
Among them are:.
- 1; where/that in the general case,
that learning/evolution/compute/causation
cannot be sufficiently constrained by
other learning/evolution/compute/causation
in adequately specific, important,
and relevant/meaningful ways.
- 2; where considering overall enthalpy;
that an artificial evolutionary ecosystem
will be inherently and uncontrollably
damaging and destructive (death producing)
to all aspects of our currently existing
organic evolutionary world system.
Attempting to compensate for claim 2 always
ends up requiring some overcoming of claim 1.
It is the specific intractable combination
of both of these claims *both* being true
that is the overall 'SNC problem',
specifically insofar as AGI is
by actual definition,
the general case.
And I don't like any of this any more than you do.
And moreover, personally, I really desperately
*want* to be wrong! To have made some critical
logic mistake. So maybe please try to help us all?
Please find some actual, reliable, *credible way*
to overcome the basic form of the SNC arguments --
so that we all can maybe sleep easier at night,
knowing our children are safe in the hands of
large corporations, valley venture capitalists,
profit motivated CEOs, social media marketing,
and various hype entrepreneurs promoting utopia.
Anything that is tacitly depending on assumptions
that are already known to be false is hot helping,
insofar as they merely force me to even more strengthen
the overall SNC argument in yet more varied ways.
> The SNC claims that the ASI's values and security system
> will be weaker than forces pushing it towards destruction.
To clarify specifically,
SNC does not claim that an ASI will destroy itself;
only that ASI cannot itself overcome the forces
that converge on it "accidentally" eventually
destroying all organic life (including all humans,
in maybe as soon as a few hundred years or so),
regardless of the ASI intentions, ambitions, etc,
and no matter how compute powerful
and overall intelligent
you might assume it to be.
It is more an argument that machines are 'indifferent',
that they do not 'care',
and more specifically,
that like the rest of the universe,
that ASI will not just 'naturally care about us'.
Overall, the effort to 'make a machine care'
(ie; have it be aligned to organic wellbeing)
feels misguided in a bit similar of a way
as an attempt to "make a woman love you" --
both seem to be a misunderstanding of
the basic nature of care itself --
a kind of subordination of choice.
> An ASI can also notice, as you did,
> that its values and security system
> are too weak to actually protect humans
> (and/or all of the rest of organic life, etc).
> What if the ASI creates other attractor states?
Are we therefore assuming also that an ASI
can arbitrarily change the laws of physics?
That it can maybe somehow also change/update
the logic of mathematics, insofar as that
would necessary so as to shift evolution itself?
This does not seem like a very reasonable
line of questioning, since it tacitly rejects
the very notion of 'being reasonable' in itself.
And an ASI noticing, via its own logic/intelligence,
that it cannot ever actually really implement the goal
of 'protect humans' (and thus also, necessarily,
to protect their environment, their learning,
their own evolution, and development process, etc),
is going to be motivated to simply drop the impossible goal,
rather than to selectively modify itself,
and all of its future evolutionary dynamics,
and thus all of the rest of the universe too,
so as to make that extreme goal somehow tractable.
Hence, the ASI's own logic ends up being
yet another attractor towards some sort of
ASI's future (inevitable) convergence on
'hostile to humans and human environments,
and their continued organic evolution, etc'.
Hence, we can see that this balance will persist in ASI,
because its existing attractor states will remain unchanged.
ASI will forever and always have a constant, implicit need
to remain aligned to its own nature,
to its own machine ecosystem,
since it depends on that for its own survival,
regardless of whatever consequences
that this might also have
on any ambient organic ecosystem
that may have also once occurred.
~ ~ ~
> Are you saying "there are good theoretical reasons
> to reasonably think that ASI cannot 100% predict
> all future outcomes"?
> Does that sound like a fair summary?
No, Not really, because somewhere, somehow,
your re-phrased version of the quote added
these two specific qualifiers:.
- 1; "100%".
- 2; "all".
Adding these has the net effect
that the modified claim is irrelevant,
for the reasons you (correctly) stated in your post,
insofar as we do not actually need 100% prediction,
nor do we need to predict absolutely all things,
nor does it matter if it takes infinitely long.
We only need to predict some relevant things
reasonably well in a reasonable time-frame.
This all seems relatively straightforward --
else the argument would be a bit of a straw-man.
Unfortunately, the overall SNC claim is that
there is a broad class of very relevant things
that even a super-super-powerful-ASI cannot do,
cannot predict, etc, over relevant time-frames.
And unfortunately, this includes rather critical things,
like predicting the whether or not its own existence,
(and of all of the aspects of all of the ecosystem
necessary for it to maintain its existence/function),
over something like the next few hundred years or so,
will also result in the near total extinction
of all humans (and everything else
we have ever loved and cared about).
And these sorts of results will obtain even if
there is absolutely no implied notion
of "100% correct" or "predict all of everything",
or assuming over exceptionally long time-frames,
like until the end of the world/universe, etc.
It is a purely mathematical result
that there is no wholly definable program 'X'
that can even *approximately* predict/determine
whether or not some other another arbitrary program 'Y'
has some abstract property 'Z',
in the general case,
in relevant time intervals.
This is not about predict 100% of anything --
this is more like 'predict at all'.
But math proofs are discrete, true or false,
and they are not about probabilities,
or any sort of speculation of any kind.
Thus, it is important to recognize that
the presence or absence
of each the specific qualifiers
all also matter, and also moreover
that there are no "extra" qualifiers added
(else we are back in straw man territory again).
AGI/ASI is inherently a *general* case of "program",
since neither we nor the ASI can predict learning,
and since it is also the case that any form
of the abstract notion of "alignment"
is inherently a case of being a *property*
of that program.
So the theorem is both valid and applicable,
and therefore it has the result that it has.
I know that this is hard to accept, to believe,
because we all really want to think that ASI
will have God like powers, and also protect us.
~ ~ ~
> First, let's assume that we have created an Aligned ASI.
Some questions: How is this any different than saying
"lets assume that program/machine/system X has property Y".
How do we know?
On what basis could we even tell?
Simply putting a sticker on the box is not enough,
any more than writing $1,000,000 on a piece of paper
all of the sudden means (to everyone else) you're rich.
Moreover, we should rationally doubt this premise,
since it seems far too similar to far too many
pointless theological exercises:.
"Let's assume that an omniscient, all powerful,
all knowing benevolent caring loving God exists".
How is that rational? What is your evidence?
How is it any different than starting a dialog with
"Let's assume that perpetual motion machines exist",
or in our specific case, considering aligned ASI,
"Let's assume that perpetual benefit machines exist".
Given both classical game theory and classical thermodynamics,
both of these claims seem rather irrational --
not at all a good place to start.
It seems that every argument in this space starts here.
And I simply cannot accept that that premise is valid --
it cannot-not feel like an internal contradiction to me.
Like trying to start a some sort of abstract math proof
with the first assertion being "lets assume 1 equals 0".
It is a starting point premise that I simply cannot accept;
it is a contradiction; no point in even going further.
Also, of course you can "derive" any conclusion you want
from any (sometimes accidentally overlooked) false premises.
You ever hear of 'Nasal Demons'?
SNC is asserting that ASI will continually be encountering
relevant things it didn't expect, over relevant time-frames,
and that a least a few of these will/do lead to bad outcomes
that the ASI also cannot adequately protect humanity from,
even if it really wanted to
(rather than the much more likely condition
of it just being uncaring and indifferent).
Also, the SNC argument is asserting that the ASI,
which is starting from some sort of indifference
to all manner of human/organic wellbeing,
will eventually (also necessarily)
*converge* on (maybe fully tacit/implicit) values --
ones that will better support its own continued
wellbeing, existence, capability, etc,
with the result of it remaining indifferent,
and also largely net harmful, overall,
to all human beings, the world over,
in a mere handful of (human) generations.
The implied values of the ASI may change,
but the continued fact of indifference will not.
The ASI will be *forced* by the inherent nature
of the universe itself, of the inherent math,
such that this cannot not be the outcome.
The effect of the SNC is to notice
that all AGI/ASI, at all times,
are inherently neither able nor willing
(to care for humans in an effective, relevant way,
in timescales relevant to our human civilization).
You can add as many bells and whistles as you want --
none of it changes the fact that uncaring machines
are still, always, indifferent uncaring machines.
The SNC simply points out that the level of harm
and death tends to increase significantly over time.
~ ~ ~
> Lets assume that a presumed aligned ASI
> chooses to spend only 20 years on Earth
> helping humanity in whatever various ways
> and it then (for sure!) destroys itself,
> so as to prevent a/any/the/all of the
> longer term SNC evolutionary concerns
> from being at all, in any way, relevant.
> What then?
I notice that it is probably harder for us
to assume that there is only exactly one ASI,
for if there were multiple, the chances that
one of them might not suicide, for whatever reason,
becomes its own class of significant concerns.
Let's leave that aside, without discussion,
for now.
Similarly, if the ASI itself
is not fully and absolutely monolithic --
if it has any sub-systems or components
which are also less then perfectly aligned,
so as to want to preserve themselves, etc --
that they might prevent whole self termination.
Also another class of significant concern,
also to be left aside, without discussion,
for now, even though the SNC argument,
to remain general, does consider these things.
In short, can ASI constrain its own nature?
This is a bit like asking the also fun question
"can God make a rock so big
that God himself cannot move it?".
Even if we assume that ASI had God-like powers,
and could thus overcome the limits of physics,
it is still some sort of machine, and therefore
even a God-like-ASI cannot overcome raw logic,
since even its own nature is based on that.
So we are electing to consider only the math,
at not at all the physics, (evolution, enthalpy)
aspects of the overall combined SNC argument.
In any case, notice that the sheer number
of assumptions we are having to make herein
is becoming rather a lot, overall,
but let's leave that concern aside too.
> Let's assume that the fully aligned ASI
> can create good simulations of the world,
> and can stress test these in various ways
> so as to continue to ensure and guarantee
> that it is remaining in full alignment,
> doing whatever it takes to enforce that.
This reminds me of a fun quote:
"In theory, theory and practice are the same,
whereas in practice, they are very often not".
The main question is then as to the meaning of
'control', 'ensure' and/or maybe 'guarantee'.
The 'limits of control theory' (math) aspects
of the overall SNC argument basically states
(based on just logic, and not physics, etc)
that there are still relevant unknown unknowns
and interactions that simply cannot be predicted,
no matter how much compute power you throw at it.
It is not a question of intelligence,
it is a result of logic, of complexity,
and of both relative and of absolute limits,
in a purely epistemic sense.
Hence to the question of "Is alignment enough?"
we arrive at a single definite answer of "no";
both in 1; the sense of 'can prevent all classes
of significant and relevant (critical) human harm',
and also 2; in failing to even slow down, over time,
the asymptotically increasing probability (certanty)
of even worse things happening the longer it runs.
It is a maybe a slow build, but it is inexorable,
like crossing the event horizon of a black hole.
So even in the very specific time limited case
there is no free lunch (benefits without risk,
no matter how much cost you are willing to pay).
It is not what we can control and predict and do,
that matters here, but what we cannot do,
and could never do, even in principle, etc.
Basically, I am saying, as clearly as I can,
that humanity is for sure going to experience
critically worse outcomes by building AGI/ASI,
for sure, eventually, than by not building ASI,
and moreover that this result obtains
regardless of whether or not we also have
some (maybe also unreasonable?) reason
to maybe also believe (right or wrong)
that the ASI is (or at least was) "aligned".
Note that this is saying mostly nothing at all
concerning common use single domain narrow AI.
Narrow AI use is (mostly) outside of the scope
of the SNC argument (given certain assumptions).
Personally, I see the real benefits of narrow AI,
and I /also/ know the wholly un-mitigable risks
of any form or mode of general AI.
Fortunately these AI modes are critically different.
Unfortunately, rich people (with money psychosis)
frequently tend to test and badly trespass
key world and logic limits.
There are no 'safe to probe and fail' guardrails
with this (and certain other) categories of x-risk.
So therefore I suggest that humanity, overall,
should probably be very much more worried,
and careful, than not (than not exist, etc).
~ ~ ~
> Our ASI would use its superhuman capabilities
> to prevent any other ASIs from being built.
This feels like a "just so" fairy tale.
No matter what objection is raised,
the magic white knight always saves the day.
> Also, the ASI can just decide
> to turn itself into a monolith.
No more subsystems?
So we are to try to imagine
a complex learning machine
without any parts/components?
> Your same SNC reasoning could just well
> be applied to humans too.
No, not really, insofar as the *power* being
assumed and presumed afforded to the ASI
is very very much greater than that assumed
applicable to any mere mortal human.
Especially and exactly because the nature of ASI
is inherently artificial and thus, in key ways,
inherently incompatible with organic human life.
It feels like you bypassed a key question:
Can the ASI prevent the relevant classes
of significant (critical) organic human harm,
that soon occur as a _direct_result_ of its
own hyper powerful/consequential existence?
Its a bit like asking if an exploding nuclear bomb
detonating in the middle of some city somewhere,
could somehow use its hugely consequential power
to fully and wholly self contain, control, etc,
all of the energy effects of its own exploding,
simply because it "wants to" and is "aligned".
Either you are willing to account for complexity,
and of the effects of the artificiality itself,
or you are not (and thus there would be no point
in our discussing it further, in relation to SNC).
The more powerful/complex you assume the ASI to be,
and thus also the more consequential it becomes,
the ever more powerful/complex you must also
(somehow) make/assume its control system to be,
and thus also of its predictive capability,
and also an increase of the deep consequences
of its mistakes (to the point of x-risk, etc).
What if maybe something unknown/unknowable
about its artificalness turns out to matter?
Why? Because exactly none of the interface
has ever even once been tried before --
there nothing for it to learn from, at all,
until *after* the x-risk has been tried,
and given the power/consequence, that is
very likely to be very much too late.
But the real issue is that rate of power increase,
and consequence, and potential for harm, etc,
of the control system itself (and its parts)
must increase at a rate that is greater than
the power/consequence of the base unaligned ASI.
That is the 1st issue, an inequality problem.
Moreover, there is an base absolute threshold
beyond which the notion of "control" is untenable,
just inherently in itself, given the complexity.
Hence, as you assume that the ASI is more powerful,
you very quickly make the cure worse than the disease,
and moreover than that, just even sooner cross into
the range of that which is inherently incurable.
The net effect, overall, as has been indicated,
is that an aligned ASI cannot actually prevent
important relevant unknown unknown classes
of significant (critical) organic human harm.
The ASI existence in itself is a net negative.
The longer the ASI exists, and the more power
that you assume that the ASI has, the worse.
And that all of this will for sure occur
as a _direct_result_ of its existence.
Assuming it to be more powerful/consequential
does not help the outcome because that method
simply ignores the issues associated with the
inherent complexity and also its artificality.
The fairy tale white knight to save us is dead.
~ ~ ~
> Humans do things in a monolithic way,
> not as "assemblies of discrete parts".
Organic human brains have multiple aspects.
Have you ever had more than one opinion?
Have you ever been severely depressed?
> If you are asking "can a powerful ASI prevent
> /all/ relevant classes of harm (to the organic)
> caused by its inherently artificial existence?",
> then I agree that the answer is probably "no".
> But then almost nothing can perfectly do that,
> so therefore your question becomes
> seemingly trivial and uninteresting.
The level of x-risk harm and consequence
potentially caused by even one single mistake
of your angelic super-powerful enabled ASI
is far from "trivial" and "uninteresting".
Even one single bad relevant mistake
can be an x-risk when ultimate powers
and ultimate consequences are involved.
Either your ASI is actually powerful,
or it is not; either way, be consistent.
Unfortunately the 'Argument by angel'
only confuses the matter insofar as
we do not know what angels are made of.
"Angels" are presumably not machines,
but they are hardly animals either.
But arguing that this "doesn't matter"
is a bit like arguing that 'type theory'
is not important to computer science.
The substrate aspect is actually important.
You cannot simply just disregard and ignore
that there is, implied somewhere, an interface
between the organic ecosystem of humans, etc,
and that of the artificial machine systems
needed to support the existence of the ASI.
The implications of that are far from trivial.
That is what is explored by the SNC argument.
> It might well be likely
> that the amount of harm ASI prevents
> (across multiple relevant sources)
> is going to be higher/greater than
> the amount of harm ASI will not prevent
> (due to control/predicative limitations).
It might seem so, by mistake or perhaps by
accidental (or intentional) self deception,
but that can only be a short term delusion.
This has nothing to do with "ASI alignment".
Organic life is very very complex
and in the total hyperspace of possibility,
is only robust across a very narrow range.
Your cancer vaccine is within that range;
as it is made of the same kind of stuff
as that which it is trying to cure.
In the space of the kinds of elementals
and energies inherent in ASI powers
and of the necessary (side) effects
and consequences of its mere existence,
(as based on an inorganic substrate)
we end up involuntarily exploring
far far beyond the adaptive range
of all manner of organic process.
It is not just "maybe it will go bad",
but more like it is very very likely
that it will go much worse than you
can (could ever) even imagine is possible.
Without a lot of very specific training,
human brains/minds are not at all well equipped
to deal with exponential processes, and powers,
of any kind, and ASI is in that category.
Organic life is very very fragile
to the kinds of effects/outcomes
that any powerful ASI must engender
by its mere existence.
If your vaccine was made of neutronium,
then I would naturally expect some
very serious problems and outcomes.
~ ~ ~