prev
000 of 000
next
FILE: REVI: - [20_22/09/02;20:40:19.01]:. - 1st draft composition completed. - [20_22/09/05;08:04:00.01]:. - edit to be more relatable. - [20_22/09/05;09:25:00.01]:. - added 'natural abstraction' points. - [20_22/09/28;08:51:42.00]:. - separate as own file from ^rice_rebuttal_rebuttal_2.txt^. - [20_22/10/05;22:03:02.00]:. - mass edits and partial sequence clustering. - [20_22/10/06;09:17:17.00]:. - additional re-sequencing, edits, preface. - [20_22/10/06;11:28:41.00]:. - rename file basis. - [20_22/10/06;14:47:07.00]:. - sequence close; unitary pass edits. - [20_22/10/06;15:40:17.00]:. - rough push printing. - [20_22/10/07;12:19:28.00]:. - transit itag pref to attr; add itag lega. - [20_22/10/08;10:43:19.00]:. - integrate update/edit suggestions. - [20_22/10/13;10:14:59.00]:. - some minor wording fixes. - added note. - [20_22/10/15;22:37:05.00]:. - convert to section marks. TITL: *Superintelligence Safety Question and Answer* *By Forrest Landry* *Oct 6th, 2022* ABST: - as expressing ideas as to why general AI/APS is inherently, in principle, un-containable and therefore, in the long term, inherently unsafe. TEXT: - where listing some common associated acronyms:. - "AGI"; as Artificial General Intelligence. - "APS"; as Advanced Planning and Strategically aware System(s). - "MAI"; as a Misaligned/mis-configured Artificial Intelligence. - "MIA"; as a Misaligned/mis-configured Intelligent Agent/agency. - "AAA"; as Adverse Artificial Agency/agent. - "AIA"; as an Adverse Intelligent Agent. :hvu > - ?; why is it important to consider > formal impossibility proofs of long term > AGI/APS/Superintelligence safety?. - that even a/one single formal complete proof (for whatever reason, on whatever basis) of inherent AGI non-alignment, non-safety has several significant implications, particularly in regards to present and future efforts. - that we should (therefore) determine if:. - 1; *if* AGI cannot ever be safe, or made safe, or forced to be aligned, etc. - as that we should be very very clear if there is an inherent existential risk of long term terminal total extinction of all of humanity, of all of life, inherently associated with super-intelligence -- with AGI/APS usage -- of any type, due to the artificiality itself, regardless of its construction, algorithm, etc. - 2; *if* the real risks, costs, harms of naively attempting using/deploying any AGI/APS will always be, in all contexts, for any/all people (any/all life) inherently strictly greater than whatever purported benefits/profits might falsely be suggested "that we will someday have". - ie, where we are not deluding ourselves with false hype/hope. - 3; *if* any and all efforts to develop new or "better" tools of formal verification of AGI safety are actually pointless. - ie; that it does not help us to have researchers spending time chasing a false and empty dream of unlimited profits and benifits. - ?; why would anyone want to be 'that person' who suggests investing hundreds of thousands of engineer man-years into the false promise of obtaining AGI safety, when a single proof of impossibility -- one person working a few months on their own for nothing -- could make all of that investment instantly moot?. - that any investment into attempting to develop long term AGI safety is a bit like investing in "perpetual motion machines" and/or "miracle medicines", etc. - as a significant opportunity cost associated with dumping hundreds of millions of dollar equivalent resources and capital to buy a lot of wasted time and effort. :hxq > - ?; why is the notion of complexity/generality > and/or of self modification (recursiveness) > important to superintelligence safety considerations?. - that the notion of 'AI' can be either "narrow" or "general":. - that the notion of '*narrow AI*' specifically implies:. - 1; a single domain of sense and action. - 2; no possibility for self base-code modification. - 3; a single well defined meta-algorithm. - 4; that all aspects of its own self agency/intention are fully defined by its builders/developers/creators. - that the notion of '*general AI*' specifically implies:. - 1; multiple domains of sense/action. - 2; intrinsic non-reducible possibility for self modification;. - 3; and that/therefore; that the meta-algorithm is effectively arbitrary; hence;. - 4; that it is _inherently_undecidable_ as to whether *all* aspects of its own self agency/intention are fully defined by only its builders/developers/creators. - where the notion of 'learns' implies 'modifying its own behavior' and 'adapts' implies 'modifying its own substrate'; that the notion of 'learning how to learn' (the capability of increasing its capability) can directly imply (cannot not imply) modifying its own code and/or substrate. - that/therefore the notion/idea of 'sufficiently complex' includes (cannot not include) some notion of 'can or does modify its own code/substrate';. - that the notion of 'general' can/must eventually include modifying its own code at any (possible) level. - ie; as including at the level of substrate, (how it is built; possible changes to it ambient operating conditions, optimization, etc) though *not* including the level of the regularity of the lawfulness of physics (ie, as actually impossible in practice). - that the notion of 'generality', when fully applied to any AI system, will very easily result in that 'general AI' to also being able implement and execute arbitrary programs/code (ie; as learned skills and capabilities, as adapted to itself). - where 'arbitrary' here means 'generality', in that it is not necessary to bound the type or kind or properties of the potential future program(s) that the AGI could potentially execute, *except* insofar as to indicate at least some finite (though maybe very large) limits on the size, time, and/or energy, that is available to run/execute it. - that/therefore a/any/the/all notions of AGI/APS, and/of "superintelligence" (AAA, AIA, MIA, MAI), is/are for sure 'general enough' to execute any (finite) program, and/or to be/become entangled with unknown programs, and/or to maybe self modify, so as to execute/be/become unknown programs. - as that its own program/code/algorithm becomes increasingly unknown and potentially unknowable, to any observer -- inclusive of whatever monitoring, control, corrections systems/programs, we might have attempted to install in advance. - where considering the Church Turing Thesis (and ongoing widely extensible and available results in the field of computational science), that the threshold needed to obtain "general computational equivalence" is very very low. - as that nearly anything that implements and/or "understands" or responds to any sort of conditional logic, of doing and repeating anything in some sort of regular or sequential order, already implements all that is needed for general algorithmic computation. - moreover; embedding or interpreting one language, process, program, model, or algorithm within the context of some other process language, model, algorithm or program, etc -- ie, the notion of 'virtualization' is used in comp-sci all the time. - however; where/rather than emulating or virtualizing some arbitrary algorithm within some aspect of the general capabilities of the general AI; that a general AI could as easily modify its own code and programming to directly incorporate and integrate that arbitrary algorithm. - therefore, that it is inherent in the nature of AGI that we cannot, even in principle, know anything in advance about what code will be running exactly in *association* with that AGI, or as an explicit part of that AGI (as incorporated into it, at its own election, at some future point, due to some unforeseen future circumstances, due to unknown possible environmental changes and/or unforeseen states, and/or unknown/unexpected interactions between the AGI system and its environment, and/or other people, agents, and AGI systems, etc, etc);. - ^; then/that/therefore, considerations and limits inherently associated with the Rice Theorem are/become fully applicable/relevant. - that the class of all possible AGI algorithms is strictly outside of the class of programs for which prediction methods are possible. - as that not even *one* AGI system will ever be fully within the class of verifiable programs. - that there is therefore zero suggestion that there is any possibility at all that any purported formal safety verification technique, now known, or even which, in principle, could be ever be known, at any future time, could be applied to assure the safety/alignment of any actual AGI system. - where for systems that have specific, well defined, and unchanging codebase/algorithms, and for which we can ensure that such systems never have complex interactions with its environment which result in some form of self mutation, adaptation, optimization; and where we can fully characterize the possible allowable ranges of inputs, that we can, and routinely do, characterize something of the ranges of outputs. - as equivalent to the observation:. - where for systems where we can at least reasonably fully characterize:. - 1; the range and nature of the inputs. - 2; the range and nature of the processing, (of the states and nature of the system itself). - ^; that 3; it is at least possible, *sometimes*, in principle (maybe), for reasonably simple/tractable/regular systems (only), to characterize something about the range and nature of the outputs. - that nearly all current engineering methods tend to focus on the selection and use of systems for which *all* of these conditions apply, so that such engineers can at least sometimes potentially make (if no design mistakes) systems with known properties of safety, utility, cost effectiveness, etc. - that predictability works for some types of specific code -- things written with the specific intention to be understandable, predictable, modifiable, updateable, etc. - where for AGI systems of any merit; that exactly *none* of these intractability conditions apply:. - 1; that we know little to nothing about the actual, in practice, range and nature of the future inputs of the AGI system (ie, their complexity and/or physicality depends on future environmental conditions, which often changes due to circumstances outside of developer control). - 2; that we know little to nothing about the range and nature of the processing internal to the AGI, insofar as it will be the result of past learnings, inclusive of possible indirect inheritances (ie; via arbitrary code and data transfers) and/or also due to integration of such foreign code/data, and/or emulation of same, etc, as occurring over long intervals of time). - 3; that the inherent internal complexity is well above the threshold of Turing Equivalence and as such, the overall system is not at all simple, tractable, or regular, for any real/reasonable meanings of these terms. - that self generating/authoring/modifying AGI code will not likely have any of the needed features, to establish any kind of general predictability, and thus, reasoning about the future unknown states and variations of that potential/possible code is a lot more like the "any unknown arbitrary" chunk of code case as named in the Rice Theorem, than it is the "known specific code" case sometimes posited as a counterexample. :j2j > - ?; why is there any necessary association > between superintelligence and x-risk?. - where needing to distinguish 'localized risks' (ie, moderate levels of known acceptable risks) from 'global catastrophic risk' and/or (worse) 'exestential risks of deployed technology':. - where define; moderate risk systems:. - as referring to systems for which all possible error/problem outcome states have purely local effects. - where define; high x-risk systems:. - as referring to systems for which 1; at least some possible error/problem outcomes have effects and dynamics (catalytic actions) that extend well beyond the specific location (and cannot be made to not so extend) where the error/problem 1st occurred, *and which also* 2; involve very strong change energies, which are well beyond the adaptive tolerances of the systems already in place (life already living) in those (diverse/distributed) locations. - where for moderate risk systems, (as distinguished from high x-risk);. - that there is a very wide class of code a lot of which is already in current use for which the behavior is 'un- modelable' by anything that is simpler than running the actual program itself. - if running the program is unsafe, then running the program is unsafe. - where for most of the code being run, and because the consequences of failure are reasonably local in time and space, that this non- modelability is validly not seen as a problem/risk. - that running something in non-catalytic environments has the implicit implication that even the worst outcome is limited to the local failure of equipment, or, at most, strictly local damage/destruction. - where unfortunately; that there are inherent catalytic systemic aspects inherent in AGI/superintelligence itself. - where for more; (@ see this essay https://mflb.com/ai_alignment_1/aps_detail_out.html). - that these extend the risk profile of AGI into the 'non-local, strong changes' category, and therefore also into high x-risk category. - where for any system that have high x-risk factors;. - note; 'high' both in terms of magnitude/scope, and also 'high' in terms of probability to occur. - ie; when considering systems with known and acknowledged potential for existential/terminal catastrophic risk. - that the action of trying/attempting to determine experimentally, by trial and error, whether some program has some property like safety is deeply irresponsible, in the worst possible way. - that it does not matter how well defined, or how specific, our knowledge may be of the exact sequence of specific instructions; the un-knowability of the risk and alignment profile for a very large class of actual programs remains inherently unknown and unknowable. - for example; that one does not experiment with dangerous 'gain of function' infectious virus research when out in the open, in unprotected spaces!. - as similar to worries that another Covid might happen. - that the behavior of simple systems with non-catalytic effects is very different, in risk profiles, than even fairly simple systems with inherent auto-catalytic effects. - ie; conceptually speaking, nuclear explosive devices are fairly simple in their overall concept -- the fundamental algorithm describing them can often be described with just a small set of equations and sequential processes. - that the latter is very unsafe, despite the apparent deceptive simplicity and finiteness of the algorithmic code. - where engineers/researchers are in practice concerned with only systems/algorithms/optimizations with a possibility space of only local effects; that they do not usually have to consider whether a given program will ever halt mostly because whether or not the program halts does not matter -- they can always interrupt or quit the program and/or pull the plug in the worst case, or in case of any real difficulty. - that the consequences of not halting are not problematic in most cases. - where for AGI, where for systems for which it is entirely unclear if/when there will ever be any possibility of stopping/halting them, and where the risk of not-stopping, etc, is roughly equivalent to all future people dying, then it becomes a lot more important to consider things like 'halting' and 'safety'. - as that the space of all possibilities, and the space of the potential of even the possibility of being able to know/predict any bounds at all on the future states/possibilities becomes critically important. :j5y > - ?; is there any way that formal methods, > at least in principle, sometimes, > could maybe help with at least some aspects > of the design of safe AGI systems?. No. - that there are some specific and useful algorithms for which no one expects that there will *ever* be any techniques of applying formal methods so as to be able to establish some specific and well defined property X. - that formal verification techniques are generally only applied to smaller and more tractable systems and algorithms. - that there will always be a very large class of relevant and practical programs/systems for which the methods of formal verification simply cannot, even in principle, be used. - that 'formal verification' cannot be used for/on every programs/systems. - where from other areas of comp-sci research; that there are very strong indications that once the inherent Kolmogorov complexity exceeds a certain (fairly low) finite threshold, that the behavior of the program/system becomes inherently intractable. - where considering a 'property of the system' as basically some abstraction over identifying specific subsets of accessible system states;. - that no specific property of such a complex system can be determined. - that these and other understood outcome(s) (regarding progam complexity at runtime, limits of formal verification, etc) are due to a number of well established reasons in comp-sci other than just those associaed with the Rice Theorem. - examples; considerations of O-notation, Busy Beaver and Ackermann functions, what minimally constitutes Church Turing Equivalence, etc. - that the formal methods of proof are only able to be applied to the kind of deductive reasoning that can establish things like impossibility -- they are not at all good at establishing things like possibility, especially in regards to AGI safety. :j7u > - ?; can anyone, ever, at any time, > ever formally/exactly/unambiguously/rigorously prove > at any level of abstraction/principle, > that something (some system, some agent, some choice) > will *not* have 'unintended consequences'?. No; not in practice, not in the actual physical universe. However, to see this, there are a few underlying ideas and non-assumptions, we will need to keep track of:. - that the domain of mathematics/modeling/logic and the domain of physics/causation (the real world) are *not* equivalent. - ie; that the realm of "proof" is in pure mathematics, as a kind of deterministic formality, verification, etc; whereas the aspects of system/agent/choice/consequence are inherently physical, real, non-deterministic as are the ultimate results of concepts and abstractions like "safety" and "alignment". - that the physical causative universe has hard limits of knowability and predictability, along with practical limits of energy, time, and space/memory. - ref; the Planck limit of a domain, the Heisenberg uncertainty principle, etc. - that the real physical universe is *not* actually closed/completed in both possibility and probability (even though, for reasonableness sake, we must treat them mathematically *as if* that was the case). - that not all possibilities can be finitely enumerated. - that the summation of probability over each of the known possibilities cannot always be exactly calculated. - where for some explicit subset of the available possibilities; that at least some of these probabilities cannot be shown to be exactly zero, or that the summation of all probabilities will sum to exactly unity. - that any specific 'intention' cannot be exactly specified and specifiable, at all levels of abstraction, for *any* and *every* agent that could be involved. - where in a more general way, even just within the domain of just deterministic mathematics, that the non-provability non-prediction of safety and consequence is the result of the Rice Theorem:. - that there is no single finite universal procedure, method, processes or algorithm (or even any collection of procedures, etc) by which anyone can (at least in principle) identify/determine (for sure, exactly) whether some specific program/system/algorithm has any particular specific property, (including the property of 'aligned' or 'safe'), that will for sure work (make a determination) for every possible program, system, or algorithm. - that there are some limits to the Rice theorem:. - that the Rice Theorem does *not* claim that there are no specific procedures, (processes, methods, or algorithms, etc) by which one could characterize some well defined (usually fairly simple) specific finite algorithm as having some specific property. - for example; that it might be possible, using some (as yet unknown) procedure, to identify that some narrow AI is safe, for some reasonably defined notion of 'safe'. - that what the Rice Theorem *does* claim is that whatever procedures are found that can maybe work in some cases, that there will always be some other (potentially useful) programs/systems that inherently cannot be characterized as having any other arbitrary specific desirable property, even if that property is also well defined. - as that there is no way to establish any specific property as applying to every possible useful program/system. - that a/any generally-capable machine(s) (ie; especially ones that learn and/or reasons about how to modify/optimize its own code) will of course *not* attempt to run *all possible* algorithms that are theoretically computable on a universal Turing machine (ie; any arbitrary algorithm). - ie; will not take non-optimal actions that do not benifit anything at all. - however, when assuming the machine continues to learn and execute (does not halt); it will for sure eventually run some specific selected subset of all the possible useful algorithms that are computationally complex (ie; in the P vs NP sense). - that this is for sure enough to be actually computationally irreducible, in the sense of not being future predictable *except* through the direct running of that code. - thus; the mathematical properties of that newly integrated code can often not be determined for any such newly learned algorithm until computed in its *entirety* over its computational lifetime (potentially indefinitely). - ie; that such new integrated algorythms cannot be decomposed into sandboxed sub-modules which are also simple enough to fully predict what the effects of their full set of interactions will be. - in effect; that it cannot be known (in advance) if such a learned algorithm will reveal output or state transitions which would be recognized as unsafe at a later point -- for instance, after the equivalent of a few decades of computation on a supercomputer. - Nor can we know in advance whether the newly integrated algorithm would halt before that point of non-safety (and therefore knowably remain safe, ie; unless or until triggering the (state-adjusted) execution of another algorithm on the machine that turns out to be unsafe). - while Rice Theorem makes formal assumptions that do not *precisely* translate to practice (regarding arbitrary algorithms and reliance on halting undecidability), there are clear correspondences with how algorithms learned by AGI would have undecidable properties in practice. - that we will never have a procedure that takes any possible chunk of code and accurately predicts the consequences of running that specific selected code. - as the basic implication of the Rice Theorem. - while it is impossible to design a procedure that would check any arbitrary program for safety; that we can still sometimes reason about the safety properties of some types of much simpler computer programs. - that *maybe* we might be able to design some *narrow* AI systems so that they can *maybe_sometimes* behave predictably, in some important relevant practical aspects (though probably not in all aspects, at all levels of abstraction, without also actually running the program -- which no longer counts as 'prediction'). - however, there will *always* be a strictly larger class of programs (inclusive of most narrow AI systems) whose behavior is inherently unpredictable in advance of actually running the program, than the class of programs which are simple and tractable enough for which the output of that program could be predicted in advance. - where it is possible to sometimes predict the results of some limited and select subset of all human-made useful programs/tools; that this does not imply anything important with regards to cost/benefit/risk assessments of any future proposed AGI deployments. :jac > - that some programs are perfectly predictable. > - cite example; the print "hello world" program. > - ?; would any argument of general AI non-safety > simply claim too much, and therefore be invalid?. > - ?; does your argument mistakenly show > that all programs are unpredictable?. No. - that the proof does not conflate simple finite programs with complex programs. - where simple programs have clearly bounded states of potential interaction. - as maybe having known definite simple properties. - where complex programs can have complex interactions with the environment (users/etc). - that claims about properties of complex programs are not so easily proven, by any technique. - when/once unknown complex unpredictable interactions with the environment are also allowed, then nearly all types of well defined properties become undecidable. - ie; properties like usability, desirability, salability, safety, etc. - even where in a fully deterministic world, such as that of mathematics or of pure computer science;. - that it takes very little effort to write a program that is sufficiently complex that its specific sequence of outputs are inherently unpredictable, even in principle, by any process or procedure -- by any other method -- other than actually running the program and recording its outputs, as discovered. - that the action of creating something actual, in nature, using real physics that has real uncertainties built in, means that creating things whose outcome is unpredictable (inherently, and in principle, once stats is factored out) is really quite easy, and moreover, happens more often than not. - that creating things in nature (in the real world) whose outcomes are *predictable* takes significant deliberate effort. - as the actual work of engineering. - that the actual real world is not a computer model; that theory is *not* practice. - ie; no one has proven that we actually live in a computer simulation -- as a proxy for a perfectly deterministic universe. - that anyone attempting to make the claim that "the real world" and "computer modeling" are actually and fundamentally strictly equivalent, would need to produce some really high class extraordinary evidence. - that until such incontrovertible empirical or logical demonstration is actually obtained, provided, etc, then the absence of this assumption, that the real world is not a model, that they are maybe somehow different, will be upheld. - that the action of treating both classes of AI as if they were the same, and/or could be treated the same way, is the sort of informality and lack of discipline that, when applied in any safety-critical field, eventually ends up getting real people killed (or in this case, maybe terminates whole planets). - that the only people who usually make these sorts of mistakes (whether on purpose or actually accidentally) tend to be non-engineers; ie; the managing executives, social media marketing, public relations and spin hype specialists, and/or the private shareholders/owners/beneficiaries of the systems/products that will eventually cause significant harm/costs to all to ambient others (ie; the general public, planet, etc). - that it is entirely possible for multiple correct proofs to co-exist within the same mathematical framework. - when/where both proofs are correct, that they can co-exist. - that proving one thing does not "disprove" another thing that has already been proven. - where example; that geometry, algebra, etc, remained useful as tools, and are still applied to purpose, on a continuing basis, etc, *despite* the development of various kinds of proofs of impossibility:. - squaring the circle. - doubling the cube. - trisecting the angle. - identifying the last digit of pi. - establishing the rationality of pi. - another example:. - where given the specific proof that the Continuum Hypothesis, as a foundational problem, was actually intractable (given available axioms) was *not* to suggest that nothing else in the entire field of mathematics was provable/useful. - that it was never claimed by/in any discipline of math (or any other field of study) that it would be able to solve every specific foundational problem. - similarly; that no one in the field of formal verification, (or in the field of AGI alignment/safety) has made the claim that, "at least in principle", that the tools already available in such such fields of study/practice 'could even potentially solve all important foundational problems' that exist within in their field. :jdg > - ?; therefore; can we at least make *narrow* > (ie; single domain, non-self-modifying) > AI systems:. > > - 1; "safe"?. > - ie; ?; safe for at least some > selected groups of people > some of the time?. > > - 2; "aligned"?. > - ie; ?; aligned with > the interests/intents/benefits/profits > of at least some people > at least some of the time?. Yes; at least in principle. - as leaving aside possible problems associated with the potential increase of economic choice inequality that might very likely result. - as distinguishing that 'aligned' tends to be in favor of the rich, who can invest in the making of such NAI systems to favor their own advantage/profits, and that the notion of "safe" tends to be similarly construed: ie, safe in the sense of 'does not hurt the interests/well-being of the owners, regardless of longer term harms that may accrue to ambient others and/or the larger environment/ecosystem. - that declaring that general AI systems are inherently unsafe over the long term is *not* to suggest that narrow AI systems cannot have specific and explicit utility (and safety, alignment) over the short term. - that there are important real distinctions of the risks/costs/benefits (the expected use and utility to at least some subset of people) associated with making and deployment of narrow AI vs the very different profiles of risks/costs/benefits associated with the potential creation/deployment of general AI. - that a proof of the non-benefit and terminal risk of AGI is not to make any claims regarding narrow AI. :jfc > - ?; is there any way in which > the methods that establish alignment/safety > could be extended from narrow AI to general AI?. No; that general AI (multi-domain, self modifying, and operating over multiple levels of high abstraction) cannot in principle be predicted, or even monitored, with sufficient tractability closure, over the long term, to dynamically ensure adequate levels of alignment/safety. - that saying/claiming that *some* aspects, at some levels of abstraction, that some things are sometimes generally predictable is not to say that _all_ aspects are _always_ completely predictable, at all levels of abstraction. - that localized details that are filtered out from content or irreversibly distorted in the transmission of that content over distances nevertheless can cause large-magnitude impacts over significantly larger spatial scopes. - that so-called 'natural abstractions' represented within the mind of a distant observer cannot be used to accurately and comprehensively simulate the long-term consequences of chaotic interactions between tiny-scope, tiny-magnitude (below measurement threshold) changes in local conditions. - that abstractions cannot capture phenomena that are highly sensitive to such tiny changes except as post-hoc categorizations/analysis of the witnessed final conditions. - where given actual microstate amplification phenomena associated with all manner of non-linear phenomena, particularly that commonly observed in all sorts of complex systems, up to and especially including organic biological humans, then it *can* be legitimately claimed, based on the fact of their being a kind of hard randomness associated with the atomic physics underlying all of the organic chemistry that in fact (more than in principle), that humans (and AGI) are inherently unpredictable, in at least some aspect, *all* of the time. :jh8 > - ?; can AGI be designed to be > more transparent and predictable?. No; not, and still have it be effective as a *generalized* agent. - where by game theory alone, if some agent (say some human criminal) can predict the actions of the AGI sufficiently well, then there is now a vector by which that person can abuse and disadvantage the AGI, its effectiveness, etc, and so any such AGI will eventually want to make its own internal methods and operations at least somewhat opaque. > - ?; can the AGI be written/created in such a way > that is more amenable (transparent) > to formal (and informal) verification?. No, not over the long term. - that there is no valid claim that all future AGI, however its written or considered in the present, will be similarly transparent and amenable for all of the future, and under all possible (known and unknown and changing) environmental states. - while such code may be 'transparent' at onset, that does not also imply that the AGI code is:. - 1; below the complexity threshold for indirect prediction tractability for anything short of executing the code itself, even at onset, at design time. - ie; is not 'amenable'. - 2; unable to modify itself (at some future point) and/or take on environmentally defined aspects that have the net effect of making that code less transparent (and more and more opaque) over time. - ie; is "amendable", but not 'amenable'. - 3; sufficiently transparent enough to ensure that microscopic detail creep is not occurring in means and ways that are implicative of important macroscopic effects (ie, via non-linear amplification effects, chaos, aka as "the butterfly principle", etc). - ie; is not 'amenable'. - where final and forever complete verification would be dynamically needed in perpetuity; and where the implications of even one failure are potentially global life x-risk terminal; that the absence of even one of the necessary factors would be sufficient to discount completely the possibility of such verification even being suggested as a potential. - where/even though computer programs can be designed by engineers to be more transparent and predictable; that this does not mean that AGI can also be designed/engineered to be more transparent/predictable. - as a false anchoring bias. - where for circumstances where the level of criticality is lower, such as with aircraft, large scale building design, civil infrastructure like bridges, dams, etc, that engineers do manage to design sophisticated systems, reason consistently about the consequences, and write complicated programs that do work and perform to various safety specifications. - where in each of these circumstances, the consequences of mistakes, though maybe very costly in the short term, are not self catalytic to the point that a single bad mistake would consume the whole world for all of the rest of time. - also; the reasoning of engineers about complicated systems is not always right; engineers often also make mistakes, resulting in aircraft that crash (Boeing 737 Max) or cars with stuck gas pedals (Toyota's ETCS) or medical systems that kill (Therac-25) -- all of which were unsafe due to code problems, *despite* the (maybe failed) application of formal methods. - where considering AGI risk potentials as being 'one mistake only' before unending consequence, that it would be strongly inappropriate to allow anyone to suggest that we rely on the skill of fallible human engineers as hyped to perfection by marketing executives that continually assure us, yet again, that a technology to "replace humans" is actually "safe" (will not somehow be worse), while irreversibly betting the future of the entire human race, along with all other life on the planet. :jke > - where insofar as a 'superintelligence' > can be (implement) a virtual machine; > ?; does that mean that any virtual machine > can potentially implement a super-intelligence?. > - as ?; is the orthogonality thesis true?. - while it is the case that 'generalized superintelligence' also inherently implies 'Virtual Machine (VM) capability', that this does not imply, is not equivalent to, the idea that just any VM "could become" superintelligent. - ie; that it is *not* the case that any/all generalized VM machine will be/become "superintelligent", if given enough time. - as that there are other requirements than just the notion of 'compute' itself that are needed/required for the notion of "intelligence" to apply. - that 'intelligence' inherently refers to some selection basis of action that has some direction towards, or favors/benefits some objective or goal -- inclusive of implicit/unstated objectives/goals of "continuing to be intelligent" (to exist, etc). - as that the notion of "intelligence" and the notion of "compute" are strictly distinct. - where in the domain of theory, of pure mathematics, that the orthogonality thesis is true. - where in the domain of physics, in the real world, in actual practice, that the orthogonality thesis is false. - as that there will always be some actual substrate characteristic dependence, inherently associated with the existence of the abstraction, such that the characteristics of the VM cannot not "leak through" to affect the intentionality basis of whatever 'intelligence system' is built on top of it. - as that there cannot not be inherent physical "side channels" which also have perfected causative relevance. - as that no effect ever has any single cause, in any single domain; as that all effects have multiple causes in multiple domains. - that the level of multiple domain entanglement is non-zero for any real physical system, such that actions and events in one domain can be strongly associated with outcomes in other domains, such that effective conditional power in one domain can become effective conditional power in another domain. - as that there will always be more than one domain of events that are the "cause" of any effect in any other domain. - where another example:. - where in actual/typical practice (in the real world);. - that a typical engineer does not worry about abstract theoretical concerns like the halting problem because they can always go out of band and reset the computer via the domain of physics. - as an indication of the real difference between theory and practice, mathematics and physics. - that the orthogonality thesis and the notion of substrate independence is/are true in theory and actually false in practice (at least some of the time). - where in the domain of computer science, which has a mixture of validity concerns (what happens in the logic of the program code) and soundness concerns (what happens in the user interface, and/or the robot to world interaction, (ie; via sensory input, and action/effect/actuator output). - as that the orthogonality thesis will apply (be true) to the extent that the compute/processing infrastructure is relatively independent of, and unaffected by, causative effects in the domain of the real world, and will be false (not true) and not apply to the degree that the compute substrate cannot ever be fully independent of real world effects and events. - where for example:. - where distinguishing between build time temperatures and run time temperatures (note 1);. - where/if the compute substrate tends to fail (make errors) if it gets too hot, then/that/therefore, the outputs of that system had best be selected in favor of ensuring that the secondary effects of the actions taken by the AI do *not* also indirectly result in substrate overheating. - as an inherent "objective learning feedback channel" which is also relevant to whatever adaptation/optimization algorithm is in effect (ie, will apply universally, regardless of the specific AGI implementation). - as a notion inherently connected to the idea of "benefit", in terms of outcome (fewer errors, as a better basis of choice, action, effect, in both sense input, processing, and output, and therefore of the optimization itself), and therefore also of the basis of choice, ie; of "intelligence", as strictly defined. - whether a program continues to run (and with what resulting outputs/outside effects) clearly does not depend wholly on the abstract logic of the program itself. - as illustrative of the above:. - ?; are there any software programs running on any regular computer anywhere in the world, that can both absolutely predict and prevent the failure of some/any key component in its own hardware, (say in in the power supply of that computer, or in the ALM logic module of CPU itself), and thus prevent itself, as a program, from being preemptively and unexpectedly pushed into non-existence?. - ?; could such a program, in principle, even be created?. - ?; how might it have deal with the possibility that some ambient human might trip over the power cord, or the fact that the local nuclear power station might unexpectedly have to discontinue grid supply for some reasonable safety reason?. No. - where another example:. - that the legal system defines a set of laws, a code, which is to be run on a "human compute substrate". - where insofar as people (police, court systems, etc) do only and exactly what is specified by the code, that some assurances of the fairness of the "system", of jury trials, evenness of law application (and interpretation), independence of money, etc, is at least partially assured (can be sometimes assumed). - however; a human acting as a VM running some program inherently cannot be any *more* predictable than whatever is the underlying base level of randomness that is inherent in the local ambient environment. - that humans are not designed to be predictable. - that humans are not designed -- are not the result of a design process (ie; that natural evolution is not design). - that it is hard to anticipate what they (any human) will actually do, in any real context/circumstance. - as the inherent difference between morality, and ethics, and why "trolley problems" are not really final explorations of what is actual ethical reasoning. - that humans are unpredictable (both in principle, and in practice). - as partially due to the common facts that they can perform any arbitrary recipe (within the limits of available resources), and/or implement some (self chosen) plan/algorithm, and have multiple overlapping goals/objectives, and despite all of that, still have implied, unconscious drives, and needs, and instinctual motives, and trauma based choices, as shaped over the long term by the/their environment, upbringing, etc. - while it is possible to 'factor out' lots of common aspects of the behavior and choices of any other specific person (that you choose to know well); that that factoring will simply result in an uncompressible remainder. - that no form of compression of any data stream factors out *all* information to exactly zero bits; which would be the very meaning of 'fully determined'. :jp6 > - ?; is there there is any way, for anyone, > to know for sure, that any other person > is 'safe' and/or 'aligned' > with the general public interest?. Not via any sort of purely legal or contractual process, nor by any past reputational/observational process; (ie; that which might be attempted to be used for making predictions of future behavior as if it was just based on past behavior). That at best, we might have some reasonable indications of "trustworthiness" if we had some sense as to the actual basis of their choices, and some assessment as to the constancy/reliability of that basis. - while knowing inner intentions/objectives/desires, (along with some cumulative estimate of their skill in actually translating such wants/needs/desires into actual effective causative action effects) is a far more reasonable/tractable means by which to predict another persons 'alignment' to things like 'public safety' and 'well-being' (as commons benefit, rather than private benefit, etc);. - that it remains the case that there is no such thing as perfected observability of one person, interfacing with another person at a purely physical level, can ever fully or completely know the subjective interiority of another person, operating at a much higher level of abstraction. - when/where in the same way that the 'outputs' of the 'virtualized execution' of the 'algorithm of the legal code' can be compromised and or sometimes fully corrupted by factors purely associated with the substrate;. - ie, the participating humans themselves, the lawyers, etc, expressing their own interests, sometimes acting purely for their own private benefit, etc, can sometimes shift and distort the outcome, the results of the trial via selective rhetoric, shifts in the timing and resources associated with applied events (brief filings, discovery reports, forms of temporary stonewalling, rent seeking, etc);. - ^; that it can also be observed (it is also the case) that there is no situation of "perfected" independence of the algorithm being run, and/or its outputs, that is not strictly dependent on the integrity of the VM, and that moreover, that no VM has perfected integrity independent of all physical/environmental factors. - as that all pure machine silica based CPU substrates are at least partially sensitive to the effects of radiation, wide temperature swings, high voltage fields, strong ambient magnetic fields, radio frequency effects, strong mechanical vibration and impacts, adverse chemical agents, etc. - ?; is there some (clearly mistaken) implied belief that a/the/any/all superintelligence will care more than humans about human interests?. - as roughly analogous to false belief in divinity; where someone becomes 'scientifically atheist' that they do not give up their want/need/desire for some parental god like figure to solve all of their hard/impossible problems. - that there is exactly zero principled reason to suggest/think that an AGI would be more likely to be aligned/safe than any person would be. - where after all, that it is quite easy to know that one engineer could be both unsafe and unaligned with any other engineer (ie; one engineer, could, in principle, shoot any other with a gun). :jrc > - ?; would not the Rice Theorem apply to the AGI too, > insofar as it also cannot know if the code it is to run > will be "safe" for itself also?. That is correct. - where in the same way that a typical engineer does not worry about the halting problem in actual/typical practice, (because he can always go out of band and reset the computer via the domain of physics), that an AGI would be similarly unconcerned about halting problems, Rice Theorem, etc, (and similarly be *also* unconcerned about AGI safety/alignment, etc) since it could (via it also having 'generality') go out of band and 'halt' (discontinue using/modeling, etc) whatever 'subroutine' it is invoking. However, that the safety implications for the AGI are of a different nature than the implications for humans and the world. Where the impact/harm of a misaligned program are to the AGI a purely local effect, in the space/environment of the AGI, the total environmental takeover of all of the AGI systems in itself constitutes a much more complete non-local and permanent in time effect with regards to human/life safety. :jt8 > - ?; is there any possible way to have > or construct some notion AGI > which is *not* general in the sense > of being able to arbitrarily interact with > (and/or emulate, or simulate, or call, or integrate) > other unknown, and potentially unsafe code, > (and/or code which has eventually unsafe, > inherently unpredictable outcomes, etc). - ?; how can we, or anyone, ensure that an AGI will not ever extend or increase its "generality"?. > - ?; is there any process or method or design > by which an AGI could be constructed > in such a way > that it is permanently and absolutely prevented, > forevermore, > from emulating, modeling, or integrating > and then potentially calling, or consulting, > any other arbitrary process/code/algorithm?. - ?; is there any way to construct an AGI that will/can somehow be prevented, forevermore, from extending or increasing its "generality" by:. - 1; adding to itself some other module or program?. - 2; emulating and "running" some other arbitrary program, so as to get advice as to its own choices?. - 3; maybe even just using some external service like any other client of an API, maybe electing to treat the output of that service as some sort of oracle or influence for its own choices?. - ^; no; such design concepts are inherently self contradictory. - where in any of these cases, like attempting to be predicting the future of/for anything else; that we cannot know in advance, for sure, the specific nature of any of these programs, regardless of its method of integration (call internally, call externally, or emulate). - that even fixed deterministic programs can easily be made to call other unknown arbitrary other programs, ones that have unknown/unsafe side effects, and thus, by proxy, become unsafe themselves. - that we could not completely and accurately predict the outcome of any deterministic process calling any other non-deterministic process (ie; which contains hard randomness, perhaps by consulting a Geiger counter), means that determinism/tractability is actually the *weaker* constraint. - that any particular program, such as that, for example, which is selected for use in some AGI system (at the moment of its being "turned on") has more than the a very low level of complexity minimally needed to be and become, in one fashion or another, "Turing Complete", and therefore, able to extend to also include any other arbitrary program as a sub-aspect of itself, is therefore fully within the scope of Rice Theorem considerations. - as that there is no decision procedure which can determine in advance if any given property, such as that of 'safety' and/or of 'alignment' applies to the overall AGI program for any future moment of its execution. - that there is no formal or practical way, even in principle, to establish even a fairly limited set of adequate bounds as to whether or not any sufficiently complex and general system with self-modifying properties (ie; a learning system, a system that adapts itself and/or its behavior to its environment, operating context, etc, as a self adaptive system, etc), will eventually, and even very likely, exhibit significantly unsafe and unaligned behavior, create unsafe and unaligned outcome conditions, (and will thus also result in such conditions creating enduring manifest harm, etc). - that the assessment is that the general costs and risks associated with AGI development will (eventually) greatly exceed any reasonable (non-hyped) assessment as to any potential benefits that the use of such a system/technology could ever potentially have, in any context more generalized and extended than even one person's own limited lifetime. - that attempting to build AGI is a strictly negatively asymmetric "bet" both from the perspective of a single large institution of people and (more so) from the perspective of all living persons/beings and their descendants: - where such an attempt fails, there is no (direct) benefit (a loss in comparison to opportunity costs forfeited). - where such an attempt 'succeeds', it opens up uncountably many unknown pathways that converge over time on ecosystem-wide lethal changes to conditions of the global environment, with any envisaged benefits of AGI and engineered-for alignments of AGI functions with human intent increasingly fragile to recursive nonlinear feedback between changes introduced by/in parts of AGI and the larger containing environment. - that this is analogous to playing Russian Roulette with the lives of all (human) beings smeared out over time, with any (illusory) backer of (promised) benefits long gone. - that betting on AGI converges on existential ruin. - that this does not, in any way, contradict the idea and fact that we can make non-recursive systems, that are known to be useful to a wide range of people. - that any generalized superintelligent learning/adapting/optimizing machine (AGI, APS, etc) will be able to execute arbitrary programs as subroutines. - as an inherent functionality, truth of being, etc. - that a "safe" or "aligned" machine/process/algorithm that calls/invokes (via whatever method, direct or indirect or simulated) some/any unsafe or misaligned machine/process/algorithm/code therefore (likely) becomes itself unsafe/unaligned (since it also cannot predict/assess anything which, in principle, cannot be predicted or assessed, any more than we could in the 1st place). - where any (generalized) (self learning/modifying) program, can in principle call or use any other program, then to assure alignment/safety of a general AI system, we would need to ensure alignment/safety of *all* possible future code/systems/procedures that it could invoke, or that those systems might themselves invoke (especially if they themselves are generalized intelligent or semi-intelligent agents, etc). - as a clearly impossible safety requirement. - where by way of example; let us posit explicitly that by some arcane literary magic, that we have created an instance of a general AI, an artificial agent or metal robot, etc, that is inherently perfectly following of Asimov's Three Laws of Robotics. - that the assumption here is that by somehow *requiring* that the three laws are perfectly followed, that we can then assert that that robot/intelligence/agent is therefore 'safe', and 'aligned' with human interests and well-being, etc. - where/by making the robot AI simpler, more finite, more deterministic, and more like narrow AI, that the chances that 'formal methods' could potentially be used and that some clever engineering could be done so as to make and ensure our (inherently finite) "Generalized Asimov AI agent" is 'safe' and 'aligned'. - where given an Asimov Robot, we can ask:. > - ?; is it ever impossible, in principle, to make > a known verified and perfected/proven "safe" system > of generalized agent intelligence > operate in a way that is somehow unsafe/unaligned?. - at which point, we can notice the necessary answer:. No, it is not impossible to take a safe system/program and make it do unsafe things, and create unsafe outcomes; all that is needed is to allow it to combine or access any other unsafe/unverified programs/code. All that is needed is to have any situation where the 'safe' robot/program consults with and/or is influenced in its output choices/actions/behaviors (in some/any way) by some other unsafe device -- such that it, for instance, can figure out how to deceive it into doing things that are unsafe/unaligned, but which would remain undetected/unnoticed by the Asimov robots three laws detection system. - that this is itself a kind of optimization problem, where the 'unsafe system' can maximize its capability to specifically deceive the Asimov Robot. > - ?; can we have that Asimov Robot somehow determine, > in accordance with its own desire > to follow the three laws and ensure that > "humans do not come to harm" > through its action/inaction; > that it will do its very best > to only engage with programs/code/people/situations > who are themselves safe, and/or which, > in extended interaction with the Robot, > have only safe outcomes?. - where for every finite robot/process that is safe, that it will necessarily interact with outside real-time processes, as connected within a (necessarily larger) physical world in ways that would *not* be able to be detected by that lessor agent as being unsafe/unaligned, etc. > "I would use this Ring from the desire to do good; > But through me, it would do great evil". -- paraphrase of Tolkien, in the Lord of the Rings. - that the basic problem is that any examination (of any/whatever code/program is coming into interaction with the Asimov Robot, will have to be examined by that Robot to determine/predict safety outcomes, which itself makes that process subject to the Rice Theorem. - that it is absolutely inherently the case that there is no such thing as a superintelligence (no matter how it is constructed, what it is made of, and/or what algorithms or learning/adapting process it uses) that will somehow be "more able" to predict the outcome of calling any specific subroutine, (in violation of the Rice Theorem) than any human engineer/mathematician would be. - that anything that is mathematically impossible, including the halting problem and the Rice theorem, will remain that, even for a superintelligence. - that anyone who claims otherwise is either a marketing/sales person, (trying to sell you something) or a politician/executive (ie; trying to deceive and/or seduce you, by pretending that mathematical truths/facts are somehow negotiable). - that the idea that "everything is negotiable" and that there are no facts, there is only politics is true only in their narcissistic idealized world, that fiction mostly only begets more fiction. - that the notion that any agent is able to be 'having the property of safety/alignment/etc' is now requiring that such agent never, at any point in the future, come into contact with some other, arbitrary, unsafe agent/process/algorithm (program, model, recipe etc) that it cannot somehow misunderstand as having unsafe implications. - that nothing is going to be a perfect prognosticator at predicting the future of everything else (all other processes/algorithms/programs locally interacting in the entire universe), and thus know what to interact with (and/or be influenced by) and know what to not interact or be influenced by, even indirectly, through all possible other channels of interaction, overt and covert, etc. - as that not only must our "perfectly proven safe agent" successfully predict the outcomes of its interactions with any single other (potentially intelligent) agent/algorithm, but it must also predict all possible interactions via all possible channels of such interactions of all such other agents/algorithms, etc. - that the finite and bounded will not ever be able to predict, accurately, the interactions of that which, though also finite, is at least potentially unbounded, or at least, significantly greater than itself. - as also applying to all generalized agent intelligence/superintelligence, by the mere fact that it its computational ability is both finite and bounded in time, space, and energy. - that it will always be possible (and maybe even likely) to have a known verified and perfected/proven "safe" system operate in a way that is unsafe and/or unaligned if it is ever allowed to interact with any other algorithm that cannot (by any technique) be itself proven to be safe. - where there is always at least one such algorithm/process, and where it is impossible, in general, to determine which ones are which, then the only safe/"aligned" things it can do is to not interact with any other process/agent/system. - as that even the interactions between any hypothesized "strictly safe systems" and nearly anything else can result, in aggregate, in overall unsafe/unaligned outcomes. :jwc > - ?; is there any way that a AGI could, > even in principle, > be engineered in such a away as to not > be able to execute arbitrary subroutines?. No. - to claim that a Turing Equivalent Machine could even conceivably be compelled to act in non-Turing Equivalent ways is simply illogical. - as basically strictly equivalent to claiming that "all AGI is already inherently safe". - ?; how do we prevent people from tautologically assuming that we will have what we are wanting to have -- that AGI is going to be composed of 'aligned code' so that it makes sure it remains amenable to 'tractability' so that we can continually verify it is 'aligned'?. - ?; how is this not assuming what some particularly motivated reasoners will be wanting, somehow, mistakenly, to prove?. - ?; how is anyone ever going to ensure that something that already inherently has Turing equivalent generality is not also going to have Turing equivalent generality?. - ie; the mere fact of the claim is itself already a direct a contradiction. - that the manner of how an algorithm is specifically divided up into subroutines is an arbitrary convention for the convenience and understanding of the programming engineer -- it has no ultimate formal basis beyond that. - where from the perspective of a learning algorithm, the boundary between what is 'main code' and what is a subroutine is completely arbitrary -- the changes associated with learning are exactly that, however they are expressed, remembered, and/or recorded. - that it is hard to see that it is possible, even in principle, to conceive of some notion of *generalized* learning process -- ie, one that can self expand so as to operate on/over/within any domain of action, inclusive of itself -- such that it can be wholly and absolutely prevented from executing any arbitrary algorithmic subroutines, for all of future time, without directly contradicting at least one of the notions of 'generalized', 'learning', or 'process'. - ie; either it is some sort of self modifying algorithmic intelligence (appropriately adaptive to its environment) or it is not. - if it is, then it is inherently unpredictable insofar as we never know what it is going to learn, and therefore do (its actual behavior; thus safety) any more than we can, in principle predict all of the future, absolutely and finally. - that the notion of 'generalized learning process' inherently cannot not somehow imply some real form of 'self modification' (ie; actual changes) which itself cannot not somehow imply some notion of 'arbitrary code execution' -- ie; potentially and including possible calls to external processes, virtual simulation and modeling, etc, *all* of which are inherently unpredictable in advance. - that not only can we not expect, in principle, to be able to model all aspects of the generalized process of modeling, we also cannot expect to be able to set explicit finite limits on even the ranges of changes involved in arbitrary learning -- ones that actually affect sensing, the ranges of abstractions sensed, the internal state transforms that result, and the behavior and outcomes of that, etc. - that ensuring the safety of any general AI system would require ensuring the safety of *all* future, and presently completely unknown, programs/code/systems, etc, that the AGI could potentially come into contact with, attempt to execute, integrate into itself, consult as a client, etc. - where/ insofar as some notion *prediction* is fundamentally and intrinsically necessary for *any* possible concept of control/correction, as a feedback channel as inherently necessary to ensure any possible concept of safety, alignment, etc; then/that/therefore the notion of AGI safety is explicitly impossible. - that therefore it is *not possible* to show or establish that *any* superintelligent/general AI system can be made safe. :jyu > - ?; can anyone at all, > even in principle, > even with unlimited resources, > somehow "make" *general* AI systems "safe", > (or "aligned", etc, > for any reasonable definitions of these terms, > as applied over any reasonable interval of time) > for *any* actors/agents > which are not themselves > strictly equivalent to, > or functionally equivalent to, > or at least sub-aspects of, > the GAI embodiment itself?. > - ie; as safe for all outside humans > and other (carbon-based living) actors?. No. - that it is (both mathematically and physically) impossible to ensure that a sufficiently complex General Artificial Intelligence (or "superintelligence") does not perform an unsafe action. - as that there is no possibility of designing complex AI systems without unintended consequences. - that the basic possibility of unintended consequences to/for/with any action/choice/system will forever remain non-singular and potentially unbounded. - ?; what is the likelihood of unintended consequences?. - as a better question. - ?; does the likelihood of unintended consequences increase with:. - the complexity of the intentions?. - the complexity of the beings/agents implementing those intentions?. - the likelihood of those beings/agents having their own (potentially different) intentions?. - ^; yes, for *each* of these aspects. - that the likelihood of unintended consequences increases with some product or exponent of these factors. - where these factors can combine more than additively and more than multiplicatively, in various forms of feedback cascade, etc. - that the likelihood of unintended consequences with a complex system (inclusive of self definition recursion) is strictly very much greater than for any simple system (which lacks these critical features). - that there is a risk of a 'Motte and Baily defense' (MBD) being attempted in the space of the claim "we can make AI systems safe". - that any argument with an underlying MBD format is a type of logical fallacy similar to equivocation. - as an argument of rhetoric, of human fallible persuasiveness, rather than one of actual truth. - where the notion of 'adequate' (as applied to 'safety') for any proposed action 'X' is to show that the probability of all categories of risk/harm/loss of the value of what is to be "safe" is sufficiently low -- ie; _strictly_lower_ -- than the possible cost/risk/harm incurred by *not* taking that risk (ie; in the form of opportunity cost, etc);. - ?; is there (even in principle) any practical way to establish/ensure adequate levels of safety in future AI systems?. - ^; that there is no/zero possibility of anyone (of any human, especially) designing complex general (self intentioned) AI system(s) without there also being a significant excess potentiality of significant adverse unintended harmful consequences/risks (which may be irreversible, to humans, the biosphere, etc). - ie, as risks/costs/harms resulting from 'X' (ie; the use of AGI systems, the making of such choices to make/deploy AGI, etc) that are well in excess of any realistic and appropriately generalized assessment of the purported benefits/profits of 'X' (to who, when, where, etc). :k2q > - ?; what are the overall implications/assessments?. - where/when it comes to superintelligent/AGI systems; any real notion of safety and/or alignment with (any) human (or life) interests is strictly, formally, finally, impossible. - that this fact/result remains true no matter how one defines "safety", or what processes/methods one attempts to use. - where/for no matter how exhaustive one thinks some review process is; where for superintelligent/AGI systems; that there will be actual programs/systems which will fool/defeat that review process. - that the Rice Theorem will always end up being relevant to any code/systems that have high x-risk factors. - where given all of the math and empirical results already in place, from *multiple* distinct fields of study the indications are very much more strongly in the direction that research into formal verification of AGI safety *is* actually and completely pointless (ie; that success defined in this way is *always* impossible). - for even more example; see the proofs, arguments and cases for uncontrollability (absence of safety/alignment, etc) collected by Yampolskiy et al. - that the Safe AGI impossibility result (as established via whatever available methods will make the most sense to each private reader) does completely preclude any meaningful work on *general* AI. - where considered on a neutral objective apples-to-apples comparative basis; the indications are at least very much greater in the negative direction than are any similar indications that there is any actual benefit at all, to anyone, to be had from *any* generalized AI development and deployment effort. - while there can *maybe* be some solutions to practical narrow AI safety challenges, this does *not* imply that there can be any possible foundations for real safety in *any* possible future superintelligent system. - while it might be hard/difficult to predict the outcome of some narrow AI systems; that it will still be strictly impossible to predict *any* aspect of the output/effects/consequences of a superintelligence. - where with a proof of AGI eventual terminal risk that the following ideas/notions are forever also made:. - 1; that there was *never*, even in principle, any way to make general AI safe, or aligned, etc. - as a basic impossibility. - 2; that there was never any actual utility/benefit/profit to/for/of the creation and deployment of general AI for anyone, at all, *ever*, over the long term. - ^; and that therefore we should also 'put away' the tools and techniques used to develop such systems (and all of the associated concepts, IP, etc). - as that they are inherently unmitigated risks:. - to all peoples on the planet, for all future time. - to all of the rest of life on the planet and maybe even inclusive of the planet itself. - that a failure to do so (and/or to develop, enact, and enforce, effective and realistic policy and procedure for all institutions and communities internationally) is equivalent to a failure of government. - where the final function of all government is to protect the land and the people and/or to try to ensure that the land and people survive, and then, if it is really good, thrive. :k5u > - ?; what is the recommended "call to action"?. - where/given that general AI safety/alignment is considered/proven to be impossible that the responsible things to do are:. - 1; to *not* attempt to design/build/use AGI. - as "do not raise that which you cannot banish!". - 2; to acknowledge and support efforts to ensure that no one mistakenly believes that *ANY* AGI/APS, etc would/could be "desirable/practical" and/or, "economically advantageous/beneficial", to anyone at all, *over the long term*. - as that any and all short term gains are far outweighed by costs/harms over the long term. - ie; to *not* attempt to convince/delude/deceive other people/investors into thinking, falsely, that "safe AGI" is "in principle", possible. - 3; to socially/legally sanction anyone, who in any way, attempts to build *any* sort of superintelligent/AGI/APS, even indirectly, by accident, etc, to the absolute maximum extent possible. - ie; that no one should allow anyone to play permanent planetary roulette with something which in all cases is equivalent to a "doomsday machine"; ie; it is maximally irresponsible. :ka8 NOTE: - 1; this example is in contrast to the hazard of 'getting wet'. - where in actual computers, such as a laptop sitting on my desk, the cpu die can get hot, from for example, doing too many math calculations, and/if it is also the case that the fan and cooling system are not configured properly. - where at that point; that the CPU will generally quit (computer will blue-screen fault). - It is almost never the case, in actual practice, day to day, that computers get wet internally. - Moreover, the action of "too hot" is due to something internal to the "choices" of the computer itself, whereas getting wet is usually an exogenous event -- ie the user spilling a drink on the keyboard - unless we are talking about autonomous robots which tend to be mobile near rivers and lakes, etc. :kc4 ATTR: - Where considering attributions/credit;. - 1; that these notes/comments, though fully independent of -- are deeply consistent with, and strengthening of -- the observations/proof described and given in (@ "Superintelligence Cannot be Contained" https://arxiv.org/pdf/1607.00913v1.pdf) by Alfonseca et al, on Jan 5, 2021. - 2; That this essay was strongly shaped and influenced by the (@ "Response" https://www.cser.ac.uk/news/response-superintelligence-contained) to 'superintelligence not containable' by Jaime Sevilla, John Burden on 25 February 2021. - which, as a "rebuttal", that the arguments that they present for sure fall short where/insofar as the meanings/distinctions of several key concepts are tacitly assumed, equivocated, and conflated:. - 1; the functional differences between narrow AI vs general AI. - 2; simple systems/programs with complex ones. - 3; recursive architectures with non-recursive architectures. - 4; the means, methods and concepts of "proof". - 5; specific meaning of the term 'specific' (as multiply used in inherently ambiguous ways). - 6; the scope and extent of risk of local limited problems (in time/space) with global problems (everywhere forever). - where for a point by point response, and correction, see the (@ Rice Rebuttal Rebuttal https://mflb.com/ai_alignment_1/rice_rebuttal_rebuttal_out_6.html). :menu If you want/need to send us an email, with questions, comments, etc, on the above, and/or on related matters, use this address: ai@mflb.com (@ Mode Switch com.op_mode_tog_1();) + (@ View Source com.op_notepad_edit_1();) Back to the (@ Area Index https://mflb.com/ai_alignment_1/index.html). LEGA: Copyright (c) of the non-quoted text, 2022, by Forrest Landry. This document will not be copied or reproduced outside of the mflb.com presentation context, by any means, without the expressed permission of the author directly in writing. No title to and ownership of this or these documents is hereby transferred. The author assumes no responsibility and is not liable for any interpretation of this or these documents or of any potential effects and consequences in the lives of the readers of these documents. ENDF:
prev
000 of 000
next