evolution and the computer
contents
- part 0: introduction (cognitive science and my ambivalent feelings towards linguistics)
- part 1: machines
- part 2: the universal machine
- part 3: the halting problem: one step ahead
- part 4: elements that breaking free
- part 5: evolution
- part ?: what could go wrong
part 0: introduction (cognitive science and my ambivalent feelings towards linguistics)
let's talk machines.
cognitive scientists love these things. as cognitive science studies the brain as if it were a black box (all we can perceive from the outside are our inputs and outputs, not our mechanisms), they needed some abstract way to represent that black box: the machine.
and the power in using "machines" is that they have no spec. they are the abstract base class of all things, unaware of their implementation details. the brain as a machine doesn't have to be a neural network or an automaton. it could be anything.
and because it could be anything, studying machines at an abstract level technically means one studies the actual model of the brain. at least in a very dumbed-down way.
in my college linguistics classes, we studied one and only one model for language in the brain (chomsky's). this felt incredibly wrong and arbitrary, both in how we wrote everything and in it's relation to the brain. the argument was "all systems we pick are arbitrary and provably equivalent, so let's pick one model".
it does make sense that it is pleasing to create and use such a system. creating structures is fun :3 (i'm talking about you, c++) and a model like chomsky's is useful at helping understand some aspects of the brain.
however using one implementation of a machine to represent the brain feels like it loses touch of the "black box" that the brain is. we don't know if this is the right model. let's not assume it is.
the only linguistics class that did not strictly use chomskian grammar was computational linguistics. the class was actually mainly formal language theory. i also took another class, theory of computing, which was the computer science equivalent of that class (funnily enough, the linguistics class was programming based, while the cs one was theory based).
part 1: machines
if you already know about automata, you can skip to section 2
theory of computing and computational linguistics talked about the same thing: automata theory and language theory. in simple terms, an automaton is an abstract machine. it takes in input, processes it, and outputs something. it's like a mathematical function.
the only thing that makes this different than a mathematical function is a simple implementation detail: it must go symbol by symbol. while this constraint isn't necessary, it allows us to actually see how the machine works. otherwise, if the machine did not go symbol by symbol, it would just be as black a box as a function.
for the majority of automata i studied, the machines outputted "yes" or "no". however, they could easily be made to output any finite number of values, or, by observing "side effects"/the state of the machines (the tape on a turing machine/the stack[s] on a state machine), an infinite number of values.
these binary classifier machines had special interest to linguistics, since that classification could be "acceptable"/"unacceptable" or "grammatical"/"ungrammatical". you can see the influence linguistics has on these machine in some terminology. we say "a machine generates a language" if the machine accepts (says "yes") to every string in the "language" and no other string. thus, if your brain was one of these machines, the language would be your language. the set of all possible sentences or set of possible words or syllables or phoneme strings.
from a cognitive perspective, the language is the thing we want to study: it characterizes the input/output of the machine. the languages are very important as a concept, and usually languages are classified alongside the types of automata that can generate them, as you will see below.
there are a lot of terms, so here is a brief overview:
- a symbol is a member of an (arbitrary) alphabet
- an input string is a string/list of symbols
- a machine takes in a symbol, processes it symbol by symbol, and either accepts or rejects it.
- the set of strings that a machine accepts is called the language
- a machine generates a language, or the language is recognized by the machine
1.1 overview of language types
chomsky describes 4 classes of languages. however, 3 of these form a clean "class" when implemented as you will see in the second on state machine below. the third (context-sensitive languages) makes more sense when implemented using grammars, which i mention but don't go into depth on.
The 3 languages are:
- regular languages, which are recognized by finite-state automata (FSA) (and regular expressions, and )
- context-free languages, which are recognized by pushdown automata (PDA) (and context-free grammars/CFGs)
- recursively enumerable languages (or turing-recognizable), which are recognized by 2-stack pushdown automata (2SPDA) (and unrestricted grammars/turing machines/lambda calculus)
these languages are in part defined by the machines that generate them, but it's a good idea to understand what languages fall under what categories.
1.1.1 regular languages
if you are familiar with regular expressions, these are what regular languages are. however, a note about real regular languages is that
they must be parsed left to right without look-behind or look-ahead and without any extra state (so no modern regex stuff,
only (a|b)*c or anything implementable using these operators).
so a language like (ie. "", "ab", "aabb", "aaabbb", ...) could not be regular,
since by the time i get through all of the as, i cant go back and check how many i went through. i cannot
write a regular expression to match only these strings.
a test we can do to confirm that a language regular is to check how infinite strings work.
regular expressions have the * operator, which means "repeat some substring any number of times". obviously, this can generate infinite strings.
and this somewhat categorizes regular languages: long strings must have a substring repeated within them. this is called the pumping lemma.
(if every string in a language can be pumped, the language is not necessarily regular. it only works the other way)
1.1.2 context free languages
context-free languages are similar to (aka are) syntax/parse trees. below, we will explain more, as it makes more sense in terms of PDAs. they can recognize
, but they cannot recognize . a common example language is the language of well formed parentheses: "(()(()()))" but not "(()(()"
for context-free grammars, they also have a pumping lemma. however, they can have repetition in two locations. you can see an example in english with repeated relative clauses:
the girl that the kid that the dog liked hated swam
this has 2 sets of "dependent" reptitions:
the girl (that the kid (that the dog liked) hated) swam
NP (C NP (C NP VP) VP) VP
i cannot repeat "that the dog liked", so the language failed the regular-language pumping lemma, but i can repeat it within the phrase "that the dog (that the dog (that the dog ...) liked) liked". this is the pumping lemma for context free grammars.
1.1.3 recursively enumerable languages
recursively enumerable languages have much fewer restrictions. they can recognize , and even , and so forth.
1.1.4 relationships between languages
the languages are ordered in a hierarchy. you could probably guess, recursively enumerable languages, with the little restrictions they have, contain all regular and all context-free languages. in addition, context-free languages contain all regular languages.
1.2 state machines
though it's not necessary to understand the rest, i will briefly go over state machines as an example for how these machines are implemented.
1.2.1 finite-state machines
at its core, a finite state machine is a simple machine which has a state from a finite set and a rule for which state to go to next based on the input. the machine accepts if it ends on an accept state, and rejects if it ends on a reject state (or, alterantively, if no rule can be found at any given step).
this is well represented by a graph. each state is a node, and each rule is an edge. thus, to run a finite state machine, you would start at the start node and follow the edges based on the input.
all regular languages can be written using these basic state machines. for example, let's take the regex for all binary strings with an even number of "0"s. it could look like this:
1 1
┏━┓ ┏━┓
┃ ↓ ┃ ↓
┏━━━━━━━━┓ 0 ┏━━━━━━━━┓
┃ ┃ ---> ┃ ┃
---> ┃ even ┃ ┃ odd ┃
┃(accept)┃ <--- ┃ ┃
┗━━━━━━━━┛ 0 ┗━━━━━━━━┛the machine would start on the even state (as indicated by the left arrow).
then, if it reads a 0, it will go to the odd state, and if it reads a 1, it will stay at the even state.
at the odd state, if it reads a 0, it will move to the even state, and if it reads a 1, it will stay on the odd state.
it will only accept if, after reading the entire input, it ends on the "even state"
i hope this seems relatively straight forward. you can also see how this translates into regex pretty easily: 1*(01*01*)*.
note that the *s directly translate to loops in the graph.
note that the machine has a finite amount of "state". there is no concept of "what state the machine was on previously" or "what symbol was before", since that would be extra state the machine is no capable of doing. this is why regular languages are restricted as they are: they can only store a finite amount of data at a given point in time.
1.2.2 pushdown automata
a pushdown automata relieves the finite amount of state constraint. it gives the state machine an infinite stack of arbitary finite data. each rule can depend on the value on the top of this stack and remove/push a new value to the stack. in addition, they can usually push a stack symbol without reading from the input. this is represented by the symbol
here is a diagram showing the language . note that the stack here "counts" the number of as.
ε
1 1
┏━┓ ┏━┓
┃ ↓ ┃ ↓
┏━━━━━━━━┓ ┏━━━━━━━━┓
┃ ┃ b, A -> ε ┃ ┃
---> ┃ as ┃ ----------> ┃ bs ┃ --->
┃(accept)┃ ┃ ┃
┗━━━━━━━━┛ ┗━━━━━━━━┛1.2.3 2-stack pushdown automata
a two stack pushdown automata is the same as a pushdown automata, except it has two stacks.
1.3 other implementations
1.3.1 phrase structure rules
there are two other implementations of machines which are of note.
i mentioned "context-free grammars" above. these are an example of phrase structure grammars, where there is no "machine" per se, but a description of a language in terms of "constituents". it is very similar to syntactic trees and is used a lot in parsers (Backus-Naur form).
they work by starting from a root node (say, a sentence) and substituting it into constituent nodes (say, a subject and a verb phrase) recursively until we get the input. the grammar accepts the input if this is possible, and rejects it otherwise.
an example CFG for the language would be:
S -> "a" S "b" | ""the steps this machine takes is called a derivation. for grammars, there is no one correct way to do a derivation, and derivations can be classified into different types for practical reasons (eg. LR, LL, LR(N)). an example derivation for the sentence "the dog who eats rice is here" would be:
Stack Rule Input
S the dog who eats the
rice is here
NP VP S->NP VP
D N RC VP NP->D N RC
N RC VP D->"the" dog who eats the rice is here
RC VP N->"dog" who eats the rice is here
C VP VP N->"dog" who eats the rice is here
VP VP C->"who" eats the rice is here
V NP VP VP->V NP eats the rice is here
NP VP V->"eats" the rice is here
D N VP NP->D N the rice is here
N VP D->"the" rice is here
VP N->"rice" is here
V ADV VP->V ADV is here
ADV V->"is" here
ADV->"here"phrase structure grammars can easily represent context-free languages, context-sensitive languages, and recursively enumerable languages with context-free, context-sensitive, and unrestricted grammars, respectively. each of these has different restrictions on the rules.
context-sensitive grammars can only have one nonterminal (ie. stack) symbol on the left side of each rule.
you can see this above: there is no rule like V ADV -> ADV V
context-sensitive grammars have rules of the form A B C -> A D C.
unrestricted grammars have no restrictions on rules.
1.3.2 turing machines
usually, when you hear about machines, you hear about turing machines. this is probably because the class of machines equivalent to recursively enumerable languages is called "turing-complete".
a turing machine is similar to a PDA, except for two things. first, it has a "doubly linked list" (tape) instead of a stack, and in every step of the computation it can write to the current node and move left or right. second, the input string starts on this list, so the machine doesn't need to "read" the input, it just needs to read the tap.
1.3.3 lambda calculus
lambda calculus is another turing-complete model. if we compare a turing machine to an imperative programming language, lambda calculus might be compared to a functional one. that is, because functional programming languages are heavily inspired by it lol (i think).
part 2: the universal machine
"recursively enumerable" is the name for the language recognized by "turing-equivalent" machines, which includes 2-stack pushdown automata, turing machines, lambda calculus, unrestricted grammars, etc. a definition for recursively enumerable is shockingly simple:
a language is recursively enumerable if there is an algorithm that recognizes that language
in other words, if a language can be computed by an algorithm, it is recursively enumerable.
in other words, all turing equivalent machines can perform any algorithm.
that's right. that little automata with 2 stacks? it can perform any algorithm. the 1 stack PDA one can't even tell you if a string is in the form , but if you add one extra stack, it makes it so it can compute neural networks, solve any optimization problem, do chess, anything you can write on paper.
it may be hard to convince yourself of this, since these machines seem so simple. however, note that so is a computer.
a computer is just circuitry following kirchhoff's circuit laws.
the computer stores some symbols in memory, and every clock cycle, it deterministically moves from one state to the next in the same way as a state machine.
but why this computational power spike? what makes a 2-stack pda so powerful?
both finite-state machines and pdas have tight restrictions, yet a 2-stack pda can do anything.
i rationalize it like this: a finite-state automaton is weak because it can only store a finite amount of state. it can just store the current node as state, and nothing more.
a pushdown automata lets the machine store infinite state. it can push all the symbols at once to the stack. however, it only has access to one at a time. if it wants to access a different state, it comes at the cost of forgetting the top of the stack.
however, adding an additional stack allows access to infinite working memory. if a machine needs to check the second symbol on the stack, it can temporarily store the top symbol on the second stack for later use. it can write an infinite number of symbols, and, with care, it can access every symbol in any order.
it is this "infinite working memory" which makes these machines so powerful. and any type of infinite working memory will suffice. a thought experiment is that an finite-state automata equipped with 1 integer state is as powerful as any turing-equivalent model. if you think about it, that is your computer. your computer's ram is just one enormous integer.
we can try to make a machine more powerful than a turing-equivalent machine. but this supermachine would not be able to do a single thing that a 2-stack pushdown automaton couldn't. it might do it faster, but it cannot do anything new.
simulating a 2 stack pda, a turing machine, lambda calculus, etc. is an algorithm. we just have to pick a way to represent the automaton as an input string.
and so, we can createa a 2-stack pda that takes a 2-stack pda as input and runs it.
and if we do this...
we never need to create another 2-stack pushdown automata.
in this abstract model of machines, we have created a universal machine that simulates the system it was placed in
we took full advantage of this when making computers. instead of recreating circuits from scratch for each individual algorithm we needed to accomplish, we created a circuit that can emulate other circuits and passed the circuits we wanted to emulate as input. and that is what you are reading this on right now. a computer is that machine. you can simulate finite state machines (regular expressions), context free machines (syntax parsers), and other turing machines (executables) on it. it is universal.
a universal machine can simulate any turing-equivalent machine. and since the universal machine is build from a turing-equivalent model...
the universal machine can run itself
i can encode this machine the same way i encode any other machine. and then i can run it on itself. i can run windows on a vm running on windows. i can write a c compiler in c.
part 3: the halting problem: one step ahead
if a machine can run itself, a machine can change its behavior based on its behavior.
this sounds paradoxical and that is because it is. this reality introduces a bunch of paradoxes. the most well known is the halting problem.
the halting problem (rather, halt on empty problem) can be stated as a simple question:
can i create a machine that decides if another machine will halt (eventually terminate) when given no input?
(note that decide means it must terminate in finite time)
as a human reading code, you can usually determine if a program will halt.
for example, this program will halt:
int main(){
printf("woof\n");
return 0;
}but this program will not:
int main(){
while(true){
printf("meow :3\n");
}
return 0;
}and there are some programs that would seem very hard to prove whether they halt.
for example:
n = 2
while True:
if sum(1/i for i in range(1,n + 1)) % 1 == 0:
breakthis will only halt if there exists a number such that
is an integer. now you would need to be joseph louis françois bertrand or pafnuty chebyshev to solve this one, and the answer is that there is no such number. in fact, many (if not most/all) famous math problems could be solved if we could simply determine if a machine halts.
so let's see what happens.
assume we could construct , the machine that determines if another will halt.
now let me create a new malicious machine . all does is run on itself. other than the fine details (how does this machine embed itself?), this is relatively valid and no much more strange than a universal machine.
however, what if we define to halt if says it shouldn't, and to loop indefinitely otherwise?
def halts(function):
...
def malicious():
if halts(malicious):
while True:
continue
else:
returnbecause of this malicious turing machine, no can exist and work for all machines.
this problem can be shown to be a much deeper problem. a turing machine cannot say anything about the behavior of another machine. it cannot determine if a machine accepts, rejects, or halts on any input string. it cannot determine anything about the language the machine generates. it can't determine if the turing machine's language is regular, or if it contains any strings at all. the same "malicious" machine can be constructed for each case.
because a turing machine can run a turing machine,
because a turing machine can reference itself,
it cannot ask questions about any other turing machine,
because the other turing machine already knows the answer.
the other turing machine can always stay a step ahead.
however, obviously, this is an extreme case. this pushes the limits of what a machine can do. of course your compiler can do static analysis to determine if 99.9% of computer programs will halt. you can test a program with all possible inputs with the assumption that the program behaves nicely. most machines aren't bad actors when it comes to their analysis.
part 4: elements that breaking free
what makes turing machines so powerful is that they can understand the system they are in. they can be aware of what machines are and can understand the rules for how to process them.
this is what makes them different from other types of automata. finite state automata can't simulate other finite state automata, and a pushdown automata can't simulate an arbitrary pushdown automata.
turing-machines are unique in that one turing machine has the entire power of all machines. one turing machine can know how turing machines work. one turing machine skips the need for the system to begin with.
it is like turing-machines are breaking the bounds of the system they are in. they exist within the system of machines, but they have evolved to a point where they can replace the entire system. and, in our age, they have. in a way. we don't implement regular expressions using finite-state automata, but using a turing machine simulating a finite-state automata.
since turing machines have overtaken the system they are placed in, we have zoomed in: we don't care about types of machines, we care about subsets of turing machines. or implementations on top of turing machines. we care now not about what is possible, but how to do what is possible faster and easier.
we can abstract this "breaking free" to other systems too. the special properties of turing machines we have seen so far are:
- recursion: an element of the system emulating another member
- universality: there exists an element of the system which replaces the need for the system
- paradox: an element of the set cannot know about the other elements
let's look at some other systems.
part 5: evolution
evolution is a system where each element has features, and those features determine fitness, and that fitness determines survival, which thereby adjusts proportions of features.
what would these 3 special properties look like?
5.1: recursion in evolution
5.2: universality in evolution
adaptation is an evolved trait. it fundamentally plays the role of skipping evolution. instead of an animal dying and that death causing their features to change according to the laws of evolution, it skips the dying step and goes straight to changing features. adpatation is when an element of evolution understands evolution in order to bypass evolution.
the human brain is also an evolved trait that not only can emulate fitness and evolution, but can also (in theory) emulate other entity's behaviors and fitnesses. of course, we are far from perfect and far from ever being perfect. we can only approximate their behaviors using science, and we cannot modify our own features.
however, one can conceptualize a universal evolved entity:
this entity would be able to, at any given instant, calculate it's own fitness and change any of its features in order to maximize fitness. it could fly at any given moment, it could swim at any given moment, etc. this entity would no longer need evolution to refine its features, because it itself refines its features.
i propose that this is a more optimal entity than the strongest or fastest entity: it is not about features, but the ability to change features. this may be the reason that humans have done so well, since we are one of the most capable species at adaptation and understanding other entities.
the paradoxes come in when we consider that this perfect entity cannot determine what another one would do. it cannot adjust its fitness in light of a unversal-entity threat, since whatever it thinks, the other entity is thinking too.