A year ago today my grandpa died.
I said it. My body goes electric as my throat constricts and I will back the growing pressure of tears welling behind my eyes.
I needed you; need you still.
It’s been the hardest year and the one person in the world that I want to share it with is gone. How do I know if he’s proud? How do I know I’m doing right? No one is there to ask “What do you think, grandpa?”.
Too much.
My cheeks are wet and I realize that I’ve written pure nonsense.
Sorry.
No, I’m not sorry. I’m finally ready to show the world my words. The words for him  the words I said in front of family and strangers when he wasn’t around to hear them.
Vince was so many things to so many different people: a father, a husband, a grandfather, an army captain, and various official and unofficial titles where he lived and served around his community. To me, he was more than just a grandfather, he was my friend, my buddy, and my rolemodel.
Today, I want to share with you some of the things that I’m still learning from my grandfather, my friend, my rolemodel:
Humility.
He was often the smartest guy in the room, but he never made anyone feel like they were stupid. Instead, with his logician’s mind and his trapdoor memory for random trivia, he was the type of guy who you could ask anything and he probably would know it; if he didn’t, he didn’t have trouble saying “I don’t know”. I think this is what made “Vince and Gloria” such a great team. He knew when the “glo” in “vinglo” was better at something than he was, and he was smart enough to let her handle it.
Responsibility.
Speaking of the “glo” in “vinglo”. One of the most vivid childhood memories, and life lessons came from grandpa when I was about 7 or 8. I was “being a kid” during an uno game at one of those famous “vinglo” holiday parties. I don’t remember exactly how it was said, but it was directed to Gloria, and it included the words “old”, “good looking”, and a reference to a 1930’s slang word for a woman.
He took me aside after the guests had left and told me, in that calm, but serious voice (which let you know that he was disappointed in you): “you really hurt grandma’s feelings, and you need to go apologize to her”. With the weight of my grandfather’s disappointment on my shoulders and a knot in my stomach, I apologized. And in that apology, I learned a sense of responsibility for the consequences of my actions, especially when my actions weren’t meant to be hurtful. This lesson is something that I’ve thought about throughout my life. Something I reflect on. Something I carry with me close to my heart. As I hear stories from friends, family, and neighbors  I realize how important taking responsibility was to him. He felt a duty to do what was right and he felt that others should as well.
Being a man.
Grandpa was big and strong. A hug from him always made me feel like that little boy that would sit on his lap, so safe and happy, and listen to him sing, in that deep voice:
“There was an old lady who swallowed a fly”
After one of those hugs, you’d be left with the smell of his aftershave on you. I suppose that was the type of man he was, his left a lasting imprint on those he touched.
He wasn’t a perfect man. I don’t know if it’s on account of growing up in a different time or being 20 years in the military, but he was pretty macho. To his detriment, at times. Before this surgery during his hospital stay, he wouldn’t complain about his pain or discomfort until it was obvious to one of us and we would have to get a nurse or doctor.
His stoicism was out of a sense of duty to protect grandma. Unfortunately, it was having the opposite effect, it was making it much harder for her. Luckily, I came prepared for this exact thing — having learned a great lesson from a great man, oh around the age of 7 or 8. To say that I was nervous to talk to my grandfather, my rolemodel, about something he was doing wrong, was an understatement. I sat with him alone by his bedside, and in the way that he talked to me so many years ago, I explained to him that he had to do better. He looked at me, and nodded, and said “I understand”.
You see, he has taught me that being a man is about being strong, and about having pride, and about protecting those that you love. But, it’s also about knowing when you are wrong and taking responsibility. He taught me that what really makes a man is the desire and commitment to always be a better one than you are. No matter how macho or prideful, he was, he was still the type of man that in his hospital bed, he would hold his 31 year old grandson’s hand while they watched a 49ers game or rooted for the Giants in the World Series. He was the type of man that would sing to his grandchildren,
“I don’t know why she swallowed that fly”.
He was the best man that I have ever known. I will be learning these lessons he taught me for the rest of my life, in hopes that I can follow in his footsteps and be as good of a man as he was.
Goodbye grandpa.
]]>This is the crucial step: abstracting the concrete semantics away so that we can do control flow analysis (CFA). Fortunately, in smallstep abstract interpretation, the concrete and abstract semantics are very similar.
In this article, I introduce the concept and need for control flow analysis and show how to convert concrete semantics from the definition of my Concrete CESK machine into abstract semantics in preparation for creating an Abstract CESK machine.
At the heart of analysis of higherorder languages (like Dalvik), is a recursive relationship between controlflow and valueflow: any attempt to tackle one necessarily means you must tackle the other. The class of algorithms that moves to solve this are Controlflow analyses  which specialize in solving the valueflow problem.
Smallstep abstract interpretation is one way to write a CFA.
The problem is simply stated as: the precise target of a function call may not be obvious. In languages Objectoriented like Java, Ruby, and Python, this problem is presented in the form of dynamically dispatched methods.
1


In the above code, the target of the method call .method()
depends on the value
that flows to the statement object
. In other words, the dataflow
affecting the value of object
creates a dependency with the controlflow
of the method call.
Furthermore, if object
were a parameter to a method, the
dataflow of the parameters of the method would be determined by the
controlflow of the function.
With a javalike pseudocode example:
1 2 3 4 5 6 

In the above example, the target of func
depends on where the method flows.
Further, it is not clear which method func
may refer. So in controlflow
analysis where the expression could be invoked must be considered along with
what types of argument is might receive.
The consideration of which values might be produced by the expressions func
and x
is the valueflow problem. This is generally undecidable. Instead we
must think of all possible values that a procedure could evaluate to  without
knowing exactly which values those might be.
This means that there is nondeterminism in the evaluation of procedures and consequently leads to the inability of Concrete semantics to correctly model the flow of a program without having infinite amounts of memory at its disposal.
Once we add nondeterminism into the mix, we need to redefine the CESK machine from Concrete semantics to Abstract semantics.
There are two main changes that occur in the transition from concrete to abstract semantics.
In the Concrete CESK machine that I built, it would be trivial to construct a recursive function that allocated a new address on every call. This would eventually fill up the available memory of the machine running it  unless, of course, you have a computer with an infinite amount of memory. Since it is safe to say that an infinitememory machine doesn’t exist, we must abstract away the idea of an address.
It is both mathematically and computationally possible to define an abstract machine with a single address. This is due to the nature of abstract interpretation. Since an abstract address is a pointer to a set of possible values, then through nondeterminism, you can give all possible (and impossible) execution paths with a single address. I will take a more practical approach.
The solution to how to represent a finite number of addresses comes from a simple observation: A statement is a finite thing and in any Dalvik program, there are a finite number of statements. So an abstract address could be the statement itself.
With that solution, the following definition of the conversion from an infinite store and address space to a finite store and address space follows:
In a way, one can think of the transition from the infinite statespace to the finitestate space as a simple “Throw hats on everything” approach. That thought would be wrong, however.
The problem with that approach is that since addresses contain frame pointers and frame pointers contain addresses, you analysis could run into an infinite frame pointer where its set of values contains itself.
kCFA shows us how to tame recursion by removing it from the statespace through a process called “snipping the knots”  thereby removing these cyclical dependencies from the dependency graph. I’ll be following the outline in Abstract interpreters for free.
The first part of of making a snip is handled already by the Concrete CESK machine: add a store. The store is still defined in terms of an infinite number of addresses at this point:
We need to take this store and then thread it through the transition relation so that a dependency graph edge going from set would result in:
Thereby making the store an integral component of each state. This is known as a storepassing style transform. Fortunately, I already did this in the Concrete CESK machine.
We need two components to calculate an “optimal” smallstep abstract transition relation:
Galois connections are seen mostly in order theory and is a particular correspondence between partially ordered sets. It would be way beyond the scope of this to delve too deeply into it. However, it is important to explain the role the correspondence plays in static program analysis.
In my article about static analysis, I touch a little on lattices and partial ordering where I introduce the concept of a supremum and infimum in terms of a lattice and their role in showing partial ordering of variable types.
I would like to take this a bit further and define a simple Galois connection resulting from the mapping functions and from the abstract transition relation above.
Take the following code:
1 2 3 4 5 6 7 

We can clearly see that if a or b is 0, that we have caused a problem. We can define a single abstract function and a single aconcretization function as follows:
You can clearly see that the functions map from one semantic domain to the other. The goal is to ensure we have those and mappings for the entire transition function:
This happens next.
After snipping and storepassing style transforms, what is left are sets of addresses that I will refer to as infinite leaf nodes. One of the two goals in the transition from concrete semantics to abstract semantics is to make addresses finite.
It is up to the analysis designer to choose a finite set for every leaf node which will become the leaves of the abstract statespace. It is equally important that an extraction function be created that maps a concrete element to an abstract element: . With a fixed extraction function, we can use inference rules to automate the construction of the abstract statespace with structural Galois connections.
Given this surjective map, , the structure yields the Galois connection:
Furthermore, the extraction function fixes polyvariance and the contextsensitivity of the analysis.
With the use of inference rules, we can build structural Galois connections and recursively apply them all the way up through the toplevel of sets of states. The inference rules are beyond the scope of this article, but can be found in Principles of Program Analysis.
After this recursive automatic synthesis of the abstract statespace, we have now built up Galois connections all the way up. Combining these connections, the extraction function on addresses and the snipping of , we can finally define the toplevel statespace and the Galois connection.
Galois connection:
Toplevel Statespace:
The differences between concrete and abstract semantics look small on the surface. For the most part, it really is as simple as “throw a hat on everything”. However, there are fundamental differences between the two domains that require care.
In my next article, I will be applying these mathematical transformations to the implementation of an Abstract CESK machine.
]]>At the heart of all Android applications is Dalvik byte code. It’s what everything gets compiled to and then run on the Dalvik VM. In order to do static program analysis for Android, you need to somehow interpret the byte code. That’s where the CESK machine shines.
The CESK machine, developed by Matthias Felleisen, provides a simple and powerful architecture used to model the semantics of functional, objectoriented and imperative languages and features like mutation, recursion, exceptions, continuations, garbage collection and multithreading. It is a statemachine that takes its name from the four components of each state: Control, Environment, Store, and Kontinuation.
In this article, I implement a concrete CESK machine to interpret a dynamically typed objectoriented language abstracted from Dalvik byte code. Every byte code and its semantics have been transformed into this language.
Being a state machine, CESK has a notion of jumping or stepping from one state
to another. In terms of sets, we can think of a program () as a set of
these machine states () that exist within the set of all machine
states () with a partial transition function (step
) from
state to state ().
Defining the statespace, as $$\sigma\in\Sigma = Stmt^{*}\times FP\times Store\times Kont$$, which allows us to encode a state as a simple struct:
(struct state {stmts fp stor kont})
If you are familiar with pushdown automata, then a CESK machine has many striking similarities. You can think of each state as the CES portion and the would be what would be pushed/popped from the stack between state transitions. (For a project in my computational theory course some classmates and I implemented a NonDeterministic Pushdown Automata in Python that outputs to a commandline and DOT format, feel free to play around with it: PyDA)
The Control component of a CESK machine is a control string. In the lambda calculus, the control string would be an expression. For Dalvik, we should think of this component as a sequence of statements. This sequence of statements gives an indication of which part of the program this state is.
The Environment component of a CESK machine is a datastructure that maps variables with an address. In the Dalvik CESK machine, we use simple frame pointers as the addresses.
Registers are offsets from frame pointers and map to local variables, we will need to compute the location (frame offset) by pairing the frame pointer with the name of a Dalvik register
Objects and their fields are structurally equivalent to frame addresses.
So the set of all addresses include both Object and Frame addresses:
The Store component of a CESK machine is a data structure that maps addresses to values. In the Dalvik case, we map frame pointers to values.
The Kontinuation component of a CESK machine is essentially a program stack. Within Dalvik, you find exception handlers and procedure calls  and with all continuation based machines, the halt continuation to signify program termination.
Each continuation is placed on a stack, where the topmost matching continuation is found and executed: for exceptions this is the matching handler and for assignment it is the next assignment continuation.
Halt is handled as a termination continuation without context encoded into the component.
In the case of exceptions, Dalvik defines a type of exception (class name), the branch label where execution should go and the next continuation. $$ handle(className, label, \kappa) $$
Any other type of invocation affecting the program stack is an assignment continuation, where the return context for a procedure call is encoded with the register waiting for the result (name), the statement after the call, the frame pointer from before the call and the next continuation. $$ assign(name, \vec{s}, fp, \kappa)$$
Since the CESK machine is a statemachine, we have a single partial transition
function (from state to state) that is run until it is told to terminate (when
we encounter the halt continuation) called step
. We only need to have an
initial state () and then iterate until we hit halt. So, we
need four things:
inject
to create the initial state, with an empty environment and storestep
the transition function for each type of state transitionlookup
a way to lookup frame pointers in the storerun
run the CESK statemachine1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 

For the purposes of this article, I will be using the core grammar defined by Matt Might’s Java CESK article. He defined this by looking at all of Dalvik’s byte code and ensuring that its semantics are represented in a straightforward way. He divided the language into two classes of terms: statements and expressions.
program ::= classdef ... classdef ::= class classname extends classname { fielddef ... methoddef ... } fielddef ::= var fieldname ; methoddef ::= def methodname($name, ..., $name) { body } body ::= stmt ... stmt ::= label label:  skip ;  goto label ;  if aexp goto label ;  $name := aexp  cexp ;  return aexp ;  aexp.fieldname := aexp ;  pushhandler classname label ;  pophandler ;  throw aexp ;  moveexception $name ; cexp ::= new classname  invoke aexp.methodname(aexp,...,aexp)  invoke super.methodname(aexp,...,aexp) aexp ::= this  true  false  null  void  $name  int  atomicop(aexp, ..., aexp)  instanceof(aexp, classname)  aexp.fieldname
Remembering from earilier in the article, there are three types of continutations: assignment, handler, and halt. Each of the components of a Dalvik program will use those generalized definitions.
We can define an applyKont function to aid in the overall machine design, which will be utilized when we encounter returns and exceptions.
Then using applyKont with the assign and handle continuations is defined by:
With this definition, we can translate this to code with apply/κ
:
1 2 3 4 5 6 7 8 9 10 

Atomic expressions are expressions which evaluation must terminate, never cause and exception or side effect.
Atomic statements assign an atomic value to a variable, this involves evaluation of the statement/expression, calculating the frame address and updating the store.
To evaluate an atomic expression, we use the atomic expression evaluator:
We have some key types of atomic expressions and how they are evaluated.
Atomic values that can be immediately returned such as integers, booleans, void, null.
Register lookups simply involve knowing what the frame pointer offset is
to do a lookup of the atomic value. Since we have encoded this with the name of
the expression along with the frame pointer we have the frame address encoded
into the store with (fp, name)
. There are two special registers "$this"
and
$ex
, but use the same semantics as other register lookups.
Accessing an object field is similar to register lookups, you get the field
offset from the object pointer with (op, field)
from the store.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 

Like I said earlier, we need to evaluate the atomic expression and assign it a variablevalue pair in the store. We can define this operation as:
The is the store updated with the new atomic assignment variable and value mapping:
I have a couple of articles regarding variable substitution and implementation if you are interested in getting a larger view of what is going on here.
1 2 3 4 5 6 7 8 

This creates a new state, where the store is now updated with a mapping of the
variable var
to the value val
.
There is another type of assignment similar to the atomic assignment statement,
a new object. This is when a variable is being assigned to a brand new object,
e.g. Object o = new Object()
. Consequently, the definition of object
creation and assignment is similar to atomic assignments:
The is the store updated with the new object assignment variable and value mapping (corresponding to a neverbefore used object pointer op’):
1 2 3 4 5 6 7 8 

This creates a new state, where the store is now updated with a mapping of the
variable varname
to the value (object classname (gensym))
. The (gensym)
is
used to generate a guaranteedtobeglobally unique value.
There are thre types of statements that cause no change in state: nop
,
label
, and line
. We can define nop
as: $$step(\mathbf{nop}: (\vec{s}, fp,
\sigma, \kappa) \to (\vec{s}, fp, \sigma, \kappa)$$ This says that when we se
a nop
, get the next statement in the list of statements and run it.
The only difference between nop
and label
is that label
has an identifier
(the label) with it $$step(\mathbf{label}\;\mathit{l}: (\vec{s}, fp, \sigma, \kappa))
\to (\vec{s}, fp, \sigma, \kappa)$$
line
is defined the same as label
and is a sideeffect of the sexpression
generation, not part of the actual grammar.
We can add a few new items to our transition function’s match
statement:
1 2 3 4 5 6 

goto
is much like nop
, except that we must do a lookup using the label
to
find the next statement sequence. $$next(\mathbf{goto}\;\mathit{label} : \vec{s},
fp, \sigma, \kappa) = (S(label), fp, \sigma, \kappa)$$
1 2 3 4 5 

Labels are identifiers for statements and used in jumping from one statement to another. We will need to store these labels for lookup later. So, let’s define a label map and then a mechanism to lookup labels. We define this mapping function as
In code, what we are trying to do is to find the label and execute the next statment, so we will need a label store, a way to update the store, and a way to lookup the next statement by the label
1 2 3 4 5 6 7 8 9 10 

But, this means we also need to update the store when we see a label, so an update to the earlier match construct is in order:
1 2 3 4 5 6 7 

The ifgoto
statement is similar to a jump, the only difference being that the
conditional statement must be evaluated before you can determine which branch to
execute. We will use the atomiceval
that was constructed earlier to determine
the truthiness of the expression, then either issue a goto
or just move to
the next statement.
1 2 3 4 5 6 7 8 9 10 11 12 

Dalvik supports inheritance, due to a possible traversal of super classes, method invocation is necessarily the most complicated to model. Methods involve using all four components of the CESK machine: Control, Environment, Store, Continuation.
It is useful to abstract away a simplified situation: assume that the method has already been looked up. Thus, we can define an applyMethod helper function to aid in applying the method to its arguments: $$applyMethod : Method \times Name \times Value \times AExp^{*} \times FP \times Store \times Kont \rightharpoonup \Sigma$$
Further assume a method is defined as m = def methodName($v_1,...,$v_n) {body}
applyMethod needs to do the following:
1 2 3 4 5 6 7 8 

With apply/method
now doing much of the heavy lifting, invoking a method is
reduced to a simple method lookup. lookup is a partial function that traverses
through the inheritance chain until it finds the matching method. First, let’s
define invoke, we’ll need the methodName for our lookup function.
In order to run the applyMethod function, we need a few variables defined: val and m. We can get val by a store lookup:
Getting m is where we finally need to define lookup since it is what finds the correct method. We need two values to process our lookup: className and methodName. We already have methodName from our invoke function, and we can get our className by extracting it from val:
Now that we have both className and methodName we can define our partial function lookupMethod that will traverse the class hierarchy to find the correct method to invoke:
In code, this sequence of functions becomes:
1 2 3 4 5 6 7 8 9 10 

We can’t implement lookup/method
quite yet, however, since I still haven’t
defined and implemented classes. But, for now, let’s move on.
A return is an application of the current continuation to the return value. We
already have the apply/κ
function for application, so we just need to define
what to pass in the val parameter. In the case of a return value, we need to
evaluate the value to ensure we get the atomic instance:
In code, this is simply:
1 2 3 4 5 

To handle exceptions we have several cases to implement: when a continuation is an exception handler, pushing and popping exception handlers, throwing exception handlers, and capturing exceptions.
This is the simplest continuation to handle, we simply skip over the current
continuation and we have already implemented it in the apply/κ
function, so
there is no need to do anything more. This would happen if the current
continuation was an exception, but no exception was needed.
We will have two ways to put and get exception handlers, but pushing and popping from the program stack with a pushhandler and a pophandler:
In code:
1 2 3 4 5 6 7 8 

In order to throw an exception, we must search the stack for a matching exception handler. Implementing a helper function handle to do this for us will help. First, let’s define helper as:
Here is how we will traverse the stack, putting last thrown exceptions into the register “$ex” as is protocol:
If className is a subclass of className’:
If not:
A throw skips over nonhandler continuations:
handle in code looks like this:
1 2 3 4 5 6 7 8 9 

You might have noticed a call to isinstanceof
. I will define this function
later when defining classes.
With the handle helper function, we can now define the throw statement:
Then we can add this to the transition function:
1 2 3 4 5 

Since we store the last thrown exception into the $ex register, it can be examined to determine which exception was caught. We then use this to go to the label that handles this execution branch.
We can define this capturing with a moveException statement:
The store is updated by simply moving the exception e into the register $ex:
Putting this in our transition function, this is:
1 2 3 4 5 6 7 

With the calls to the unimplemented lookup/method
and isinstanceof
functions, it’s time to implement them! But, first we need to define and
implement classes.
Classes are defined by their name, a potential super class, 0 or more fields (instance variables), and 0 or more methods.
What we need is to be able to represent classes in a sane way that will allow us
to keep track of super classes, fields, and methods. A classDef then, in code
is: (struct class {super fields methods})
where super
is the name of the
super class, fields
is a list of field names, and methods
is a mapping
defined by:
Since methods also have multiple properties, we will also define a method as:
(struct method {formals body})
.
We will also need a way to store and lookup classes. Since class names are unique, we can define a simple table of classes and a lookupClass method defined as:
Now in code, classes are:
1 2 3 4 5 6 7 8 9 10 

Now that we have defined classes, method lookup is a recursive function that traverses the classes hierarchy until it reaches a matching method, defined by
In code:
1 2 3 4 5 6 7 8 9 

Since super is just a string, all we need to do is check for string equality. By definition, a class without a super class is void.
Finding out if a class is an instance of another class is simple: find out if the class name is anywhere in its direct class hierarchy, returning true if it is and false otherwise:
In code:
1 2 3 4 5 6 7 8 9 10 

Since super is just a string, we need to only check string equality. By definition, super is void if there is no higher class.
]]>In class today, I sat through my second lecture on the power behind Regular Expressions with derivatives. The professor live coded a Regular Expression engine in Python during class that has all of the regular language functionality and more than the ones you will find in perl, ruby, python, java, boost, etc.
Why would language implementers not use the more powerful way? Because long ago it was thought that Brzozowski’s derivative method was too costly so everyone used Thompson’s method which is fast for most operations but suffers from possible exponential blowup with some operations.
With derivatives we get those operations back (Intersection, Difference, Complement) and significantly decrease the complexity of implementing regular expressions.
I will implement all operations for regular languages in Python, Javascript and Ruby for fun and posterity.
Keep in mind, none of these are optimized and there are some types of regular expressions that will cause exponential time to compute, but these are easily fixed with some simple bailout rules. I have not implemented any of those fixes here.
The algorithm is a simple two step process:
In formal theory, a language is a set of characters, what we’d call a set of strings like . The derivative of that language in terms of the character b is .
A nullable language is one that no longer accepts any input, in other words does the language not accept ? Such a derivative of a language looks like , where z is not in the language and thus is null.
We define two things
Here we, will define our base RegEx
object. A character matches if the
derivative of the language with respect to that character exists. This is
inherently a recursive definition, and you can tell in the matches
that so
long as we haven’t traversed the entire string to match we will continue
deriving and checking matches.
1 2 3 4 5 6 7 8 

1 2 3 4 5 6 7 8 9 10 11 12 

1 2 3 4 5 6 7 8 9 10 11 

This, along with the emptystring language, are what we find at the depths of
our recursive match calls. If at any point in the matching process does a match
fail, the bottom of the recursion will contain Empty
.
1 2 3 4 5 

1 2 3 4 5 6 7 8 9 10 

1 2 3 4 5 6 7 8 

If the string is accepted (meaning it matched all the way), then it accepts
which is represented by the emptystring language of Blank
.
The derivative of is and is with respect to any character.
1 2 3 4 5 

1 2 3 4 5 6 7 8 9 10 

1 2 3 4 5 6 7 8 

A primitive is a single character language, like {‘c’}. You can think of a
string as being one or more Primitive
languages, e.g. 'cat' = 'c' 'a' 't'
all concatenated together.
So, the derivative of a primitive is and the derivative of the re with respect to c is if c is the same as the parameter c’ and if they are not equal.
1 2 3 4 5 6 7 8 9 10 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Choice is where you have two languages and matching either one is fine, e.g. ‘foo’ or ‘bar’ would match ‘foo’ and ‘bar’. This is also known as the Union operation in set theory.
The derivitive of the union of two languages is the union of their respective derivatives. The same goes for the derivative of the re of those languages.
1 2 3 4 5 6 7 8 

1 2 3 4 5 6 7 8 9 10 11 12 

1 2 3 4 5 6 7 8 9 10 11 12 

Repetition is zero or more repetitions of the language, e.g. ‘foo’ would match ‘foofoofoo’ and ‘’ since it is matching 3 repetitions and 0 respectively.
The derivative of a repetition is and the derivative with of the re is the derivative of the re concatenated with re*.
1 2 3 4 5 6 7 

1 2 3 4 5 6 7 8 9 10 11 

1 2 3 4 5 6 7 8 9 10 11 

Like I explained earlier, strings longer than 1 character can be thought of as
concatenations of Primitive
languages, e.g. ‘foo’ is the same as ‘f’ ‘o’ ‘o’.
The derivative of a sequence of languages is the sequence of the derivative of
those languages. And the derivative of the re is the Choice
of the sequence
of first derivative with the second language, and the derivative of the second.
1 2 3 4 5 6 7 8 9 10 11 12 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

This is one of those operations you don’t get with classic regular expression engines in perl, python, ruby, etc. This is because the way they are written make the nfa to dfa conversion blow up exponentially in most cases. With derivatives, we get them cheaply.
The derivative of the intersection of languages is the intersection of their derivatives. This is the same with respect to the re.
1 2 3 4 5 6 7 8 

1 2 3 4 5 6 7 8 9 10 11 12 

1 2 3 4 5 6 7 8 9 10 11 12 

This is another operation prohibitively expensive in classic implementations that we get cheaply with derivatives. The difference of two languages would be where all the strings accepted by A minus the strings accepted by language B.
The derivative of the difference of two languages is the difference of their derivatives.
1 2 3 4 5 6 7 8 

1 2 3 4 5 6 7 8 9 10 11 

1 2 3 4 5 6 7 8 9 10 11 12 

This is another example of an operation we get cheaply with derivatives, which we don’t get at all in classic implementations. The complement of a language is where we want to match all strings that are not accepted.
The derivative of the complement of a language is the compliment of the derivative of the language. Same for the re.
1 2 3 4 5 6 7 

1 2 3 4 5 6 7 8 9 10 11 

1 2 3 4 5 6 7 8 9 10 11 

As I stated in my previous post, my undergraduate thesis is on static program analysis of Android applications to prove malicious behavior. In this article, I try to explain what static program analysis is and isn’t and the rational behind using using it for this research. Oh, and there is code to look at, too!
Static Analysis attempts to predict the future behavior of a program. My research professor has a great introduction to static analysis article that shies away from a lot of the technical aspects of it.
There are two main types of Static Program Analysis: sound and unsound. The difference between them being that unsound is trying to determine behavior based on a partial modeling of program behavior while sound analysis models all behaviors of a program. While unsound analysis can say something about all lines of code, it will make assumptions that certain portions of the program are either unreachable or unimportant for the analysis, while sound analysis does not make these assumptions and tries to prove and model all behavior.
In other words, unsound analysis uses an incomplete model while sound uses a complete model. Furthermore, sound analysis must begin with a soundness proof of all layers of abstraction. I’ll get into more details about this in later articles.
This research is sound static program analysis.
We have a deep and unforgiving failure at the core of the assumption that we can prove behavior of a program: the halting problem (undecidability). This ultimately states that it is impossible to prove that a program terminates for all programinput pairs. So, if can’t know if a program terminates how can we determine all paths a program can take? Wouldn’t that cause an infinitely deep tree of paths?
Fortunately, this is only undecidable because in an infinite domain, we can get around this by presenting a finite number of approximations (abstractions), which guarantees that the abstract interpretation of the program will terminate.
Let’s take the following code
1 2 

This is a loop that will never terminate. Graphed like a tree, it would look a lot like an infinitely long linked list. Graphed like a DFA, if would be a single state looped back on itself.
However, we can abstract this out into a state that matches a type of infinite loop.
In static analysis, the more information you have about something, the more likely you’ll find a contradiction. The less information you have about something, the more generalized it is. This relationship comes from Order Theory.
Rewriting this statement in terms of code, let’s give a bunch of information in a control sequence. Since Ruby hasn’t had much love on my blog so far, let’s give it some.
1 2 3 4 5 

Here we have given enough information to form a contradiction.
, so the call to malware.steal_passwords
can never be
reached.
We can describe relations of information and state by using lattices, which comes from Order Theory. An article, again by my research professor, gives a good overview of the aspects of Order Theory that pertain to Computer Science.
The basic idea of a lattice in Static Analysis, is that you have a top and a
bottom. At the top, you know so very little information about a thing as to
make it generic  like Object
in ruby
or python
are close to the
top. Whereas x
from the code above is closer to the bottom of the lattice,
having so much information as to cause a contradiction. From this basic idea, we
can define the symbol for a lattice as with a supremum and
a infimum (max and min). Where the order of elements is determined partially
(asymmetrically).
If you think of a lattice as an upper and lower bound for your program (), then all elements in your program must be partially ordered such that and
As an example:
1 2 3 4 5 6 7 8 9 10 11 12 

We can say that and that creature
is closer to and ralph
is closer to .
This example shows that it is possible to have a provable definition of the order of inheritance that
allows us now to make mathematical assertions and proofs on the inheritance of
that program. We know that all MythicalCreature
objects can never
steal_souls
so we can ignore that possibility.
It is one use of a lattice in static program analysis and in a larger context Order Theory. However, there are many other ways of using Order Theory to prove things about a program including monotonism, continuity, and fixed points to name a few.
Our first goal is to have a sound mathematical definition of each element of a program so that we can assert behavior about that program. Since it is completely possible to execute a program in theory that could never happen in practice, it is important to know for certain that it really could never happen.
In math, we can prove that is irrational by contradiction, but we have to make assumptions like mutuallyprime, natural number numerator/denominator pairs are rational and we must define a strict set of rules for rationality, mathematical operations and what numbers really are. (If you haven’t taken the chance to take a mathematical analysis class, I encourage you to).
Fortunately for Mathematicians, we can and have proven these things, so we can say with certainty that is irrational. And fortunately for us, since we have Order Theory, we can prove things about a finite set of elements that make up a program so that we can make assertions about it, like this program will run perfectly 999 times and then start blowing up memory usage. Or as I mentioned in my post on dalvik, we know that the garbage collector will never factor into the concerns of the application, because it runs in a separate environment.
Security is really a subset of assertions, where the assertions are things like: this program cannot access a remote server, there is no way for a user to leave the bounds of the application from within the application, there is a reachable path of execution that will crash the device
One major issue with static program analysis, is two fold: it needs access to the source and that source should not be obfuscated.
To be fair, the research that we are doing is on Dalvik bytecode, so we could disassemble any application. It is also for the military, so it well within the bounds of operation to know that we will have access to any code that could potentially be added to a device.
The reason why code obfuscation is an issue, is that there are limitations in how you can abstractly interpret and ultimately analyse a program. Take my professor’s example of proving that the result of is negative without computing it is and turn it into code:
1


We would convert this to which would reduce to , a negative.
What if we made one of the numbers into a variable?
1 2 

That is easy enough, we could have easy just substituted in or did an environment lookup. (For details on that see substitution and implementing substitution).
What if we make it even harder, by making each number a lambda that creates a number?
1 2 3 

Ok, well, that is easy enough, we could just beta reduce. That would be perfect except for the small fact that this quickly becomes undecidable as the obfuscation gets worse. (For implementation details of reduction, see my post on reduction)
So, code obfuscation has the ability to make it impractical to do static program analysis.
The goals of Static Program Analysis lend themselves well to the field of security. This is particularly true for controlled environments like government. While it is not a beallendall answer for security, it should allow us to make strong assertions about the behavior of Andriod applications.
]]>Malware on Android is a problem. With over 500 million activated devices, it is a large user base to attack and the Android Market is a great delivery mechanism. Even though Google has started to scan applications uploaded to the marketplace, there still has been an uptick in malware. This raises the central question I am addressing in my Undergraduate Thesis: Can an Android application be proven to be nonmalicious by small step analysis?
This article starts off a series of articles that I will be writing over the course of the next few months investigating this question through my research with the UCombinator group at the University of Utah. The group is determining, through static analysis, how much can be proven about the nature of any Android application in an effort to provide a relativelyprovable secure Android environment. To get started, this article gives an overview of what makes up an Android application.
Android applications are really just Dalvik bytecode that run on the
Dalvik virtual machine. They start out as collections of .java files and get
translated into a single Dalvik Executable file (dex
). Every application,
for security reasons, has its own independent address space and memory.
The goals of the Dalvik VM are to run on a device with relatively little RAM and a slow CPU. Since each application will run in its own address space and memory, special care was taken to shave off as much memory taken up by each compiled Dalvik program.
The VM is a register based VM, like Parrot  the VM used for Rakudo Perl 6. A register design was chosen over the stack based design of Java because of the need to significantly reduce the CPU operation overhead associated with pushing and popping duplicate operands. This decision, however, became less important after a JIT was introduced in Android. Most of the gains from the register design come from reducing instructions and interpreterkilling opcode dispatches.
Registers can be 16, 256, or 64k bits depending on the instruction. Unlike Parrot’s infinite register machine, Dalvik has a finite number of registers, though it is an incredibly large number of registers  2^16=65,536 to be exact.
The instruction set is small compared to a behemoth like x86. The opcodes fit within 32bits and, from a higherlevel abstraction, can be represented by a little more than 40 instructions.
Each dex file starts with a File Header, which provides meta data about itself, such as file size, checksums. The last two parts of a dex file are the Class definitions and the program data. In the middle are a handful of tables.
In order to shave off some memory overhead: strings, types, method prototypes, fields, and methods are all put into separate tables and referenced in the executable code by indexing into these tables. For example, the string table holds every string in the file, including string constants, class names, method names, and variable names to name a few.
Each of these sections can and do reference other sections. The data section will directly use each of the lookup tables to execute its logic, while the class definitions might reference and index into the methods table which references an index to the prototype table which then references the type table which references the string table which can then go back and reference the data section.
While that seems convoluted and complicated, the result is that dex files end up with a shared pool of data among all of the java classes, thus reducing the memory footprint when they are translated into dex. This sharing continues with builtin libraries, which upon starting an Android device, get loaded into memory immediately.
This immediately raises into question the idea that every application has its own memory and address space. It is safer to say that private memory stays private, like application heap and dex structures or applicationspecific dex files. For shared memory, data that can be cleanly ejected by the kernel at any time or copyonwrite heap data is accessible from Zygote.
Zygote is a Dalvik process that starts on boot that will ensure that classes are loaded before they are needed. It helps to give processsandboxed applications access to shared libraries and readonly copyonwrite heap data. Zygote is forked at will and provides access to the Zygote parent, ensuring that data that can be shared is shared, and data that can’t be shared is kept private.
Not surprisingly, with the constraints given by Dalvik for sharing data, garbage collection must help to keep private data away from other memory. The GC uses a Mark and Sweep approach. In many cases, reachability bits don’t need to be separated from the main heap. However, with Android, these mark bits are kept separate from other heap memory.
Garbage collection is also not a global process, there are separate GCs along with separate processes and heaps. To further complicate it, the GCs must be able to respect the shared pool.
Some work is done upfront during an application’s install to ensure that the dex file structure is sound. This is includes checking for valid indices and offsets into lookup tables, checking types and references. This, however, is not a security measure. This only ensures that the code is valid. Perfectly “honest” dex can behave maliciously and as I’ll write about in future articles can circumvent the builtin platform security.
If possible, lookup table operations are removed by static linking, such as turning a method name string lookup, it can index into a vtable.
Most of the design decisions of the Dalvik VM have little to do with the abstractions made by Static Analysis. However, it is important to understand the inner workings of a system before you can make the assumptions required to infer maliciousness.
Knowing, for example, that Garbage collection does not need to factor into security concerns when analysing an application is helpful when asserting the behavior of an application.
Ultimately, the analysis of an Android application is not limited in the ways that the application is. But, knowing the constraints of an application aids in creating a realistic model to analyse.
]]>Two of the most common forms of reduction are CallByName (CBN) and CallByValue (CBV), with the latter being the most ubiquitous. In this post, I give implement both in Racket.
### Non Deterministic Full ßreduction This is a nondeterministic, full ßreduction definition. Notice that there is no value term definition, that is because a value cannot be a ßredex. The way to read the following rules are, the definition on top of the line is the conditional, on the bottom is what happens if that is true.
###Application In application, there are two scenarios that need to be handled: the left hand and/or right hand terms are ßredexes.
If the body of a λabstraction is a ßredex, then reduce it and substitute.
If we have an abstraction and a term, substitute the term into the body of the abstraction if the free variables in the body match the term.
###Substitution You can see the implementation and theory posts on substitution to get more background. Here is the implementation in Racket:
(define (subst term var value)
(match term
; [x > s]y > y = x ? s : y
[(? symbol?) (if (eq? term var) value term)]
; [x > s]λx.b > λx.b
; [x > s]λy.b > λy.[x > s]b
[`(λ (,v) ,body) (if (eq? v var) term `(λ (,v) ,(subst body var value)))]
; [x > s](t1 t2) > [x > s]t1 [x > s]t2
[`(,f ,a) `(,(subst f var value) ,(subst a var value))]))
Betareduction is not full beta reduction. We only have one rule to implement, abstraction
; (t1 t2)
(define (ßreduce term)
(match term
[`((λ (,v1) ,b1), (and rhs `(λ (,v2) ,b2))) (subst b1 v1 rhs)]))
(define (fullßreduce term)
(match term
; value
[(? symbol?) (error "Cannot reduce value" term )]
; identity
[`(λ (,v) ,v) (term)]
; abstraction
[`((λ (,v1) ,b1)(λ (,v2) ,b2)) (ßreduce term)]
; application
[`((λ (,v1) ,b1), e) (fullßreduce e)]
; applicationabstraction
[`(,f ,e) (fullßreduce f)]))
###BetaReduce Tests
;(ßreduce `((λ(x) (x x))(λ(z) u)))
;'(((λ (z) u)) ((λ (z) u)))
;(ßreduce `((λ(x)(λ(y) (y x)))(λ(z)u)))
;'(λ (y) (y ((λ (z) u))))
;> (fullßreduce `((λ(y) (y a))((λ(x)x)(λ(z)((λ(u)u)z)))))
;'((λ (z) ((λ (u) u) z)))
;> (fullßreduce `((λ(x)(x x))(λ(x)(x x))))
;'(((λ (x) (x x))) ((λ (x) (x x))))
This post has been removed and replaced by a composite Reduction article.
]]>Now that I have the mathematical definition of substitution, I can implement it. For ease of implementation, I will not consider call by name substitution which requires alphaconversion. I will implement call by value, which is what most familiar programming languages use, such as Python, Ruby, PHP, C, Perl, and Java to name a few.
# Replace all free occurrences of x in x with s
[x ↦ s]x = s
# Replace all free occurrences of x in y with s
[x ↦ s]y = y if x ≠ y
# Replace all free occurrences of x in λy.t1 with s
[x ↦ s](λy.t1) = λy.[x ↦ s]t1 if y ≠ x and y ∉ FV(s)
# Replace all free occurrences of x in t1 t2 with s
[x ↦ s](t1 t2) = ([x ↦ s]t1)([x ↦ s]t2)
I basically have four things to consider when substituting in call by value:
* If the term is in the form [x ↦ s]x
, return the value s
* If the term is in the form [x ↦ s]y
, return the term y
, since x ≠ y
* If the term is in the form [x ↦ s](λy.t1)
, return the substituted abstraction λy.[x ↦ s]t1
* If the term is in the form [x ↦ s](t1 t2)
, return the application where each term is substituted ([x ↦ s]t1)([x ↦ s]t2)
There are three parts to every substitution: term, variable, and value.
So I will need them as parameters to my function; I’ll call them term var value
.
In mathematical format, you can think of it like this: [var ↦ value]term
.
Using a functional language, like Racket, allows us a more powerful, terse, and
elegant solution to the substitution function. I employ the use of Racket’s
match
which gives extremely powerful pattern
matching.
(define (subst term var value)
(match term
; [x > s]y > y = x ? s : y
[(? symbol?) (if (eq? term var) value term)]
; [x > s]λx.b > λx.b
; [x > s]λy.b > λy.[x > s]b
[`(λ (,v) ,body) (if (eq? v var) term `(λ (,v) ,(subst body var value)))]
; [x > s](t1 t2)
[`(,f ,a) `(,(subst f var value) ,(subst a var value))]))
> (subst `(λ (x) 'y) 'y '1)
'(λ (x) (1))
> (subst `(λ (x) 'y) 'y `(λ (x) 'z))
'(λ (x) ((λ (x) 'z)))
> (subst `((λ (x) y) (λ (y) z)) 'z '2)
'((λ (y) 2))
> (subst `((λ (x) y) (λ (y) z)) 'y '2)
'((λ (y) z))
The ability to properly substitute is vital to reduction. In this post, I will show proper and improper definitions of substitution.
From my earlier post on
currying
you would have seen a substitution syntax like this, (λx.t1)t2 ↦ [x ↦ t2]t1
where [x ↦ t2]t1
means the term obtained by replacing all free occurrences of
x
in t1
by t2
. See my
last post
for the definition of free variables.
The definition of the λcalculus is simple. We have a term broken into three additional parts: variable, abstraction, and application. To define substitution, we must define it for the each part of a term. For variables, it must be defined when there is a free variable and when there is not. ###Naive Recursive Definition
# Replace all free occurrences of x in x with s
[x ↦ s]x = s
# Replace all free occurrences of x in y with s
[x ↦ s]y = y if x ≠ y
# Replace all free occurrences of x in λy.t1 with s
[x ↦ s](λy.t1) = λy.[x ↦ s]t1
# Replace all free occurrences of x in t1 t2 with s
[x ↦ s](t1 t2) = ([x ↦ s]t1)([x ↦ s]t2)
The problem with the definition from above is that we are relying on the name
of the variable or term to define substitution. This works fine if we are
careful with our choice of bound variable names, however take the example [x ↦
y](λx.x)
. If we replace all free x
terms within the abstraction λx.x
with
y
we would end up with the following: λx.y
, clearly this is no longer the
identity function.
What should be taken away from this? The names of bound variables do not matter
Let’s ensure that names of bound variables don’t matter.
# Replace all free occurrences of x in x with s
[x ↦ s]x = s
# Replace all free occurrences of x in y with s
[x ↦ s]y = y if x ≠ y
# Replace all free occurrences of x in λy.t1 with s
[x ↦ s](λy.t1) = λy.t1 if y = x
[x ↦ s](λy.t1) = λy.[x ↦ s]t1 if y ≠ x
# Replace all free occurrences of x in t1 t2 with s
[x ↦ s](t1 t2) = ([x ↦ s]t1)([x ↦ s]t2)
So, in this case, [x ↦ y](λx.x) = λy.y
which is indeed the identity function.
However, this is still wrong. Take this example: [x ↦ z](λz.x)
. Using the
definition above, we would be substituting z
for all free variables in
(λz.x)
which would leave us with λz.z
, the identity function. But λz.x
is
not the identity function. So names are still an issue.
When a free variable in a term s
is bound when s
is substituted into a term
t
naively is called variable capture. We want to avoid this. We do this by
using a substitution operation that avoids mixing bound variable names of t
and free variable names of s
. This is called captureavoiding substitution
and is often what is implicitly meant by the term substitution. It is easily
achieved by one more condition on the abstraction case:
# Replace all free occurrences of x in x with s
[x ↦ s]x = s
# Replace all free occurrences of x in y with s
[x ↦ s]y = y if x ≠ y
# Replace all free occurrences of x in λy.t1 with s
[x ↦ s](λy.t1) = λy.t1 if y = x
[x ↦ s](λy.t1) = λy.[x ↦ s]t1 if y ≠ x and y ∉ FV(s)
# Replace all free occurrences of x in t1 t2 with s
[x ↦ s](t1 t2) = ([x ↦ s]t1)([x ↦ s]t2)
This however, is not a complete definition, since it has now changed
substitution into a partial operation. For example:
[x ↦ y z](λy.xy)
should equal λy.yzy
, but because it does not appear free in
(y z)
it never hits one of our definitions.
In order to fix the issue of bound and free variable names, we must decide to work with terms “up to renaming of bound variables” or what Church called alphaconversion. This is the operation of consistently renaming a bound variable in a term. In other words, “Terms that differ only in the names of bound variables are interchangeable in all contexts”.
This actually simplifies our definition, since we can change names as soon as we get to a place where we are trying to apply the substitution to arguments where it is undefined. This means we can completely drop the first clause of the abstraction section.
# Replace all free occurrences of x in x with s
[x ↦ s]x = s
# Replace all free occurrences of x in y with s
[x ↦ s]y = y if x ≠ y
# Replace all free occurrences of x in λy.t1 with s
[x ↦ s](λy.t1) = λy.[x ↦ s]t1 if y ≠ x and y ∉ FV(s)
# Replace all free occurrences of x in t1 t2 with s
[x ↦ s](t1 t2) = ([x ↦ s]t1)([x ↦ s]t2)
The ability to reduce is one of the key components to the λcalculus. However, you find it directly implemented in compiler optimizations for both functional and imperative languages and as inspiration in Google’s MapReduce.
However, there are some subtleties that must be addressed to go from our λ and λNB calculi to being able to implement reduction. In this post, I mathematically prove that reduction on the λcalculus terms can be done.
As I described in my initial post about the λcalculus, it is a very simple definition:
t ::= terms:
x variable
λx.t abstraction
t t application
Free variables are defined as:
FV(x) = {x}
FV(λx.t1) = FV(t1) \ {x}
FV(t1 t2) = FV(t1) ∪ FV(t2)
Size is defined as:
size(true) = 1
size(false) = 1
size(0) = 1
size(succ t1) = size(t1) + 1
size(pred t1) = size(t1) + 1
size(iszero t1) = size(t1) + 1
size(if t1 then t2 else t3) = size(t1) + size(t2) + size(t3) + 1
Unlike many statements in math, intuition and reality are in sync on this
statement. Of course the statement { ∀ t, FV(t) ≤ size(t) }
is true. However,
it is important to formally prove this, since this is key to reduction.
By induction, by proving the following three cases, we can prove the statement
to be true for all t
:
t = x
FV(t) = {x} = 1 = size(t)
t = λx.t1
Inductively:
FV(t1) ≤ size(t1)
So:
FV(t) = FV(t1) \ {x} ≤ FV(t1) ≤ size(t1) < size(t)
t = t1 t2
Inductively:
FV(t1) ≤ size(t1) and FV(t2) ≤ size(t2)
So:
FV(t) = FV(t1) ∪ FV(t2) ≤ FV(t1) + FV(t2) ≤ size(t1) + size(t2) < size(t)
Since I just did an implementation of the
Zcombinator and Factorial in Python,
I figured it would be fun to implement in Perl, too. It’s a lot uglier in Perl
than Python, but Python is cheating with having a builtin lambda
operator.
Z = λf. (λx. f (λy. x x y))(λx. f (λy. x x y))
g = λfct. λn. if realeq n c0 then c1 else (times n (fct (prd n))) ;
factorial = Z g
#!/usr/bin/env perl
use warnings;
use strict;
my $Z = sub {
my ($f) = @_;
return (sub {
my ($x) = @_;
return $f>(sub {
my ($y) = @_;
return $x>($x)>($y);
});
})>(sub {
my ($x) = @_;
return $f>(sub {
my ($y) = @_;
return $x>($x)>($y);
});
})
};
my $g = sub {
my ($fct) = @_;
return sub {
my ($n) = @_;
return !$n ? 1 : $n * $fct>($n1);
};
};
my $factorial = $Z>($g);
print map { $factorial>($_)."\n" } @ARGV;
So what is the output?
./thelambda.pl 3 5 100
6
120
9.33262154439441e+157
Now that I have the basis for an enriched λcalculus , I can add a very important aspect to programming: recursion.
There are several ways to arrive at recursion, some include:
* Directly with the callbyname Ycombinator Y = λf. (λx. f (x x))(λx. f (x x))
* Derive the Ycombinator by
functionals and the Ucombinator
* Use the callbyvalue Zcombinator: Z = λf. (λx. f (λy. x x y))(λx. f (λy. x x y))
In the case of a callbyvalue language such as Python, it is useless to use the fixedpoint Ycombinator since it diverges. For this post, I will be using the Zcombinator. Pierce does not go into the details of this intricate structure, instead opting for understanding through example.
Here is my Python implementation of the Zcombinator:
Z = λf. (λx. f (λy. x x y))(λx. f (λy. x x y))
# Z = λf. (λx. f (λy. x x y))(λx. f (λy. x x y))
Z = lambda f: (lambda x: f(lambda y: (x)(x)(y)))(lambda x: f(lambda y: (x)(x)(y)))
Let’s take Pierce’s example of factorial from earlier in his book:
factorial = λn. if n=0 then 1 else n * factorial(n1)
The use of a fixedpoint combinator is to essentially unroll the recursive
definition of factorial
to where it occurs, where I would rewrite the above
definition unrolled:
if n=0 then 1
else n* (if n1=0 then 1)
else (n1) * (if (n2)=0 then 1)
else (n2) *...))
You could imagine this in Church’s Pure λcalculus:
if realeq n c0 then c1
else time n (if realeq (prd n) c0 then c1
else times (prd n)
(if realeq (prd (prd n)) c0 then c1
else times (prd (prd n)) ...)
The same unrolling effect is achieved when using the Zcombinator by first
defining the recursive function g = λf.〈body containing f〉
then h = Z g
.
Now that I have the abstract idea of how to generate a recursive function using the fixedpoint combinator (Z), I can implement the λcalculus definition in Python for our enriched (λNB) and pure versions.
g = λfct. λn. if realeq n c0 then c1 else (times n (fct (prd n))) ;
factorial = Z g
# g = λfct. λn. if realeq n c0 then c1 else (times n (fct (prd n))) ;
g = lambda fct: lambda n: (c0) if realeq(n)(c0) else (times(n)(fct((prd)(n))))
gNB = lambda fct: lambda n: 1 if n == 0 else (n * (fct(n1)))
# factorial = Z g
factorial = Z(g)
factorialNB = Z(gNB)
>>> factorialNB(3)
6
>>> factorialNB(5)
120
>>> factorialNB(100)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
So far, we have been able to build in
multiargument
handling,
booleans, numerals, arithmetic,
equality and lists
all while staying in the Pure λCalculus. However, it is convenient to introduce
primitives like numbers and booleans when working on more complicated examples
in order to remove some extra cognitive steps. As an examples, which takes fewer
steps to recognize: 2
or the return value of scc(scc(0))
?
In this post, I describe the means of converting some of the strictly pure λcalculus primitives to a more common numeric and boolean representations that Pierce calls λNB  which is his name for the enriched λcalculus. I will also detail how to convert the other way from λNB→λ.
As a reminder, the Church boolean tru
and fls
are defined:
tru = λt. λf. t
fls = λt. λf. f
To convert from λ→λNB we simply apply the λexpression to true
and false
:
realbool = λb. b true false
# realbool = λb. b true false
realbool = lambda b: (b)(True)(False)
>>> realbool(tru)
True
>>> realbool(fls)
False
In the other direction, λNB→λ, we use an if
expression:
churchbool = λb. if b then tru else fls
# churchbool = λb. if b then tru else fls
churchbool = lambda b: (tru) if b else (fls)
>>> realbool(churchbool(False))
False
>>> realbool(churchbool(True))
True
Just like we were able to build higher level equality functions using Church booleans, we can do higher level conversions as well. As a reminder of the definition of equality:
equal = λm. λn. and (iszro (m prd n))(iszro (n prd m))
realeq = λm. λn. (equal m n) true false
# realeq = λm. λn. (equal m n) true false
realeq = lambda m: lambda n: (equal(m)(n))(True)(False)
>>> realeq(c1)(c1)
True
>>> realeq(c1)(c2)
False
churcheq = λm. λn. if equal m n then tru else fls
# churcheq = λm. λn. if equal m n then tru else fls
churcheq = lambda m: lambda n: (tru) if m == n else (fls)
>>> realbool(churcheq(3)(3))
True
>>> realbool(churcheq(3)(2))
False
I have already been converting Church numerals to numbers in my previous posts
using the (lambda n: n+1)(0)
function, here we will define it as a callable
function. Church numerals are defined as:
scc = λn. λs. λz. s (n s z)
c0 = λs. λz. z
c1 = scc c0
c2 = scc c1
c3 = scc c2
realnat = λm. m (λx. succ x) 0
# realnat = λm. m (λn. succ n) 0
realnat = lambda m: (m)(lambda n: n + 1)(0)
>>> realnat(c2)
2
>>> realnat(times(c2)(c3))
6
This is more complicated and possible conversions require methods that I have not covered yet. I will revisit this when I post about Recursion.
]]>In my previous post about Church Encoding, I built up some booleans and their respective operators, numerals and several arithmetic operators. This post will focus on building two important constructs for any programming language: equality testing and lists. I will also continue my implementation with Python.
Determining if a Church numeral is zero is done by finding a pair of arguments
that will return whether the numeral is zero or not (a True/False expression).
We can use the zz
and ss
terms from the
subtraction
operation, by applying our numeral to the
pair ss
and zz
. The trick is, if ss
is applied at all to zz
we know that
the numeral is not zero and return fls
, otherwise we return tru
. This makes
perfect sense, since the numeral will be applied the number of times equal to
its value.
In other words,
if the Church numeral is 0
, then ss
will be applied 0
times to zz
and
will return tru
. Once ss
is applied to zz
(if the numeral is not 0
), it
will return fls
 not 0
.
iszro = λm. m (λx. fls) tru
# iszro = λm. m (λx. fls) tru
iszro = lambda m: (m)(lambda x: fls)(tru)
iszro(c1)(True)(False)
False
>>> iszro(c0)(True)(False)
True
>>> iszro(sub(c3)(c3))(True)(False)
True
>>> iszro(sub(c3)(c2))(True)(False)
False
>>> iszro(times(c0)(c1))(True)(False)
True
There are probably many ways to define numeric equality, however, the trick I
will use is that m  n = 0
when m = n
. So, testing for equality is as simple
as applying sub
then applying iszro
to the result.
equal = λm. λn. iszro (sub m n)
# equal = λm. λn. iszro (sub m n)
equal = lambda m: lambda n: iszro(sub(m)(n))
>>> equal(c3)(c2)(True)(False)
False
>>> equal(c3)(c1)(True)(False)
False
>>> equal(c3)(c3)(True)(False)
True
There is are two big problems with this definition, however. First, each sub
operation is O(n)
, second the resulting Church numeral must be defined
otherwise iszro
will evaluate to tru
.
Pierce has a different definition of equal which has fewer evaluations than mine.
equal = λm. λn. and (iszro (m prd n))(iszro (n prd m))
# equal = λm. λn. and (iszro (m prd n))(iszro (n prd m))
equal = lambda m: lambda n: And(iszro((m)(prd)(n)))(iszro((n)(prd)(m)))
>>> equal(c2)(c3)(True)(False)
False
>>> equal(c0)(c0)(True)(False)
True
His definition is both more efficient and works in cases that mine does not.
Since my definition does lefttoright subtraction, negative Church numerals
(which I haven’t defined) evaluate to tru
, since sub(2)(3) = 1
.
Since we can define equal
it shouldn’t be too hard to define greater and less
than, and it turns out that it isn’t.
For strictly greaterthan, we exploit the fact that prd
will return 0 when
m >= n
. So, if m > 0
then it could be said that the following holds true:
m >= n && !(n >= m)
which makes m
strictly greaterthan n
.
For strictly lessthan, a simple trick of switching the order of arguments
accomplishes the same thing: n >= m && !(m >= n)
which means that m
must be
strictly less than n
.
gt = λm. λn. and (iszro (m prd n))(not iszro(n prd m))
lt = λm. λn. and (iszro (n prd m))(not iszro(m prd n))
# gt = λm. λn. and (iszro (m prd n))(not iszro(n prd m))
gt = lambda m: lambda n: And(iszro((m)(prd)(n)))(Not(iszro((n)(prd)(m))))
# lt = λm. λn. and (iszro (n prd m))(not iszro(m prd n))
lt = lambda m: lambda n: And(iszro((n)(prd)(m)))(Not(iszro((m)(prd)(n))))
>>> gt(c0)(c3)(True)(False)
False
>>> gt(c0)(c0)(True)(False)
False
>>> gt(c3)(c2)(True)(False)
True
>>> lt(c0)(c3)(True)(False)
True
>>> lt(c0)(c0)(True)(False)
False
>>> lt(c3)(c2)(True)(False)
False
Greaterthanorequal and lessthanorequal can simply be calculated by
concatenating gtequal
and ltequal
, which is trivial and I’ll leave that up
to the reader.
A list can be represented by a reduce
or fold
function in the λcalculus. So
the list [x y z]
becomes a twoargument (c n
) function that returns c x (c
y (c z n))
. There are several steps required to build lists detailed below.
nil
nil
can be represented by the same expression as 0
and fls
, using the
arguments c n
we can define:
nil = λc. λn. n
# nil = λc. λn. n
nil = lambda c: lambda n: n
cons
Functioncons
is a function that will take an argument h
and a list t
and returns a
folded representation of t
with h
prepended.
cons = λh. λt. λc. λn . c h (t c n)
# cons = λh. λt. λc. λn . c h (t c n)
cons = lambda h: lambda t: lambda c: lambda n: ((c)(h))((t)(c)(n))
isnil
FunctionThe isnil
function will mimic the iszero
function, since the definition of
nil
is the same as 0
. However, we are running it on a list, so there is a
little more to is.
isnil = λl. l (λh. λt. fls) tru
# isnil = λl. l (λh. λt. fls) tru
isnil = lambda l: (l)(lambda h: lambda t: fls)(tru)
>>> isnil(c1)(True)(False)
False
>>> isnil(nil)(True)(False)
True
head
Functionhead
is similar to isnil
, except that we element at the beginning of the
list instead of a Church boolean, otherwise fls
.
head = λl. l (λh. λt. h) fls
# head = λl. l (λh. λt. h) fls
head = lambda l: (l)(lambda h: lambda t: h)(fls)
tail
Functiontail
is much more difficult and employs a similar trick as the
pred function
did. I was unable to figure out tail
without help from the book, so here is
Pierce’s solution:
tail = λl.
fst (l (λx. λp. pair (snd p)(cons x (snd p)))
(pair nil nil))
# tail = λl.
# fst (l (λx. λp. pair (snd p)(cons x (snd p)))
# (pair nil nil))
tail = lambda l: fst((l)(lambda x: lambda p: pair((snd(p))(cons(x)(snd(p))))(pair(nil)(nil))))
Since the λcalculus does not have numbers or operators, only functions, it seems limited and useless, but with simple constructs one can create numbers, then build to operators, then data structure like lists. In fact, the λcalculus is Turing Complete! The λcalculus also lacks the builtin ability to handle multiple arguments, instead it employs Currying.
In this post I implement a simple multiargument function transformed into a curried form in Python, Javascript and Perl.
We must understand
that the λcalculus has no builtin ability to handle multiple arguments for a
function  or abstraction. Instead, we rely on a transformation of multiple
arguments to higherorder functions called currying  named after
Haskell Curry. The principle is straightforward: “Suppose that s
is a term
involving two free variables x
and y
and that we want to write a function
f
that, for each pair (v,w)
of arguments, yields the result of substituting
v
for x
and w
for y
in s
.” (Pierce).
So, as an example of what you would do in a rich programming language
for a multiple argument function f = λ(x,y).s
, where v = 2x
and w =
3y
:
Python:
def f(x,y):
return 2*x + 3*y
Javascript:
var f = function(x,y) {
return 2*x + 3*y;
}
Perl:
sub f {
my (x,y) = @_;
return 2*x + 3*y;
}
In the λcalculus, we employ currying where the expression f = λ(x,y).s
is
transformed into f = λx.λy.s
. This simply means that f
is a function that
returns a function when given a value v
for x
, that returned function then
returns the result when given a value w
for y
. In reducible expression form:
f v w
reduces to ((λy.[x → v]s)w)
once the value for x
is passed, then is
reduced to [y → w][x → v]s
.
In the following examples, I continue with v = 2x
and w = 3y
.
Python (builtin lambda):
(lambda x: (lambda y: 2*x + 3*y)(3))(2)
>>>13
Javascript (anonymous functions):
(function(x) {
return (function(y) {
return 2*x + 3*y;
});
})(2)(3);
>>>13
Perl (anonymous subroutines):
sub {
my $x = shift;
return sub {
return 2*x + 3*shift;
};
}>(2)>(3);
>>>13
In my previous post about Currying, I mentioned that the λcalculus has no primitive numbers or operation, just functions and more functions. In this post, I explore how, through simple constructs, Church was able to implement numbers, booleans, arithmetic operations and conditionals with examples in Python.
Please see my previous post on Currying, as it is critical to understanding the material here.
There is no such thing as a “true” or “false” in the λcalculus. However, we can
represent those boolean values by defining the λ terms tru
and fls
.
# tru = λt.λf.t
tru = lambda t: lambda f: t
# fls = λt.λf.f
fls = lambda t: lambda f: f
Those terms can be used to test if a value is true
or false
with a term
test b v w
which reduces to v
if b
is tru
and w
if fls
.
# test = λl.λm.λn. l m n
test = lambda l: lambda m: lambda n: (l)(m)(n)
To see how test
reduces when called as test tru v w
, we must first expand it
and then we’ll use callbyvalue reduction
test tru v w
→ (λl.λm.λn. l m n) tru v w
→ (λm.λn. tru m n)v w
→ (λn. tru v n)w
= tru v w
→ (λt.λf.t)v w
→ (λf.v)w
= v
Making some test runs with our Python implementation:
>>> test(tru)(True)(False)
True
>>> test(fls)(True)(False)
False
Since we now have the ability to represent boolean values, we can implement boolean algebra!
Implementing a logical AND and = λb.λc. b c fls
, we return c
if b
is tru
or fls
if b
is fls
. So if b
is tru
and c
is fls
, we return c
(meaning fls
), otherwise if b
and c
are tru
, we return tru
.
# and = λb.λc. b c fls
And = lambda b: lambda c: (b)(c)(fls)
>>> And(tru)(tru)(True)(False)
True
>>> And(tru)(fls)(True)(False)
False
>>> And(fls)(tru)(True)(False)
False
To implement OR: or = λb.λc. b tru c
, meaning if b
is fls
return c
, if b
is tru
return tru
. This, of course, would not implement XOR, since in the
event that b
is tru
, we automatically return tru
.
# or = λb.λc. b tru c
Or = lambda b: lambda c: (b)(tru)(c)
>>> Or(fls)(fls)(True)(False)
False
>>> Or(fls)(tru)(True)(False)
True
>>> Or(tru)(tru)(True)(False)
True
>>> Or(tru)(fls)(True)(False)
True
To implement NOT: not = λb. b fls tru
, meaning return the opposite of b
. We
can apply to AND and OR operations along with boolean values.
# not = λb. b fls tru
Not = lambda b: (b)(fls)(tru)
>>> Not(tru)(True)(False)
False
>>> Not(fls)(True)(False)
True
>>> Not(And(tru)(tru))(True)(False)
False
>>> Not(And(tru)(fls))(True)(False)
True
>>> Not(Or(fls)(tru))(True)(False)
False
>>> Not(Or(fls)(fls))(True)(False)
True
Now that we have booleans, we can encode pairs of terms into one term, getting
the first and second projections (fst
and snd
) when we apply the correct
boolean value:
pair = λf.λs.λb. b f s
fst = λp. p tru
snd = λp. p fls
This means if we apply boolean value b
to the function pair v w
, it applies
b
to v
and w
. The application yields v
if b
is tru
and w
otherwise. Reducing the redex fst(pair v w) →* v
goes as follows:
fst(pair v w)
= fst((λf.λs.λb. b f s)v w)
→ fst((λs.λb. b v s)w)
→ fst((λb. b v w))
= (λp. p tru)(λb. b v w)
→ (λb. b v w)tru
→ tru v w
= v
# pair = λf.λs.λb. b f s
pair = lambda f: lambda s: lambda b: (b)(f)(s)
# fst = λp. p tru
fst = lambda p: (p)(tru)
# snd = λp. p fls
snd = lambda p: (p)(fls)
>>> fst(pair(tru)(fls))(True)(False)
True
>>> fst(pair(tru)(tru))(True)(False)
True
>>> fst(pair(fls)(tru))(True)(False)
False
>>> fst(pair(fls)(fls))(True)(False)
False
>>> snd(pair(fls)(fls))(True)(False)
False
>>> snd(pair(fls)(tru))(True)(False)
True
>>> snd(pair(tru)(fls))(True)(False)
False
>>> snd(pair(tru)(tru))(True)(False)
True
So far, we have been able to implement the basis of Boolean Algebra with only
functions! Church uses a slightly more intricate representation of numbers by
use of composite functions. Basically, to represent a natural number n
, you
simply encapsulate an argument n
times with a successor function scc
. You
can think of scc = n + 1
. To represent 0, Church used the same definition as
fls
(our False representation)  this should be familiar to most programmers
as 0 == False
in many languages. So a break down of 0..Nth Church numeral is
as such:
c0 = λs. λz. z
c1 = λs. λz. s z
c2 = λs. λz. s (s z)
c3 = λs. λz. s (s (s z))
... and so on
The scc
combinator is defined as: scc = λn. λs. λz. s (n s z)
It works by combining a numeral n
and returns another Church numeral. It does
this by yielding a function that takes s
and z
as arguments and applies s
repeatedly to z
, specifically n
times.
Note: For the example output below, the two arguments to each Church numeral
are a lambda function that is equivalent to scc
and the numeral 0
, since
those are the parameters required for a Church numeral. It merely maps each
function to the natural number value it represents. c0>0, c1>1
.
# scc = λn. λs. λz. s (n s z)
scc = lambda n: lambda s: lambda z: (s)((n)(s)(z))
c0 = lambda s: lambda z: z
c1 = scc(c0)
c2 = scc(c1)
c3 = scc(c2)
>>> (c1)(lambda n: n+1)(0)
1
>>> (c2)(lambda n: n+1)(0)
2
>>> (c3)(lambda n: n+1)(0)
3
Addition is essentially the scc
combinator applied m
times to a Church
numeral n
, where n + m = v
. Equivalently, scc
is merely plus
applied
once.
plus = λm. λn. λs. λz. m s (n s z)
# plus = λm. λn. λs. λz. m s (n s z)
plus = lambda m: lambda n: lambda s: lambda z: (m)(s)((n)(s)(z))
>>> plus(c1)(c2)(lambda n: n+1)(0)
3
>>> plus(c3)(c2)(lambda n: n+1)(0)
5
Multiplication is the repeated application of plus
, since 2+2+2 = 2*3
.
times = λm. λn. m (plus n) c0
times = lambda m: lambda n: (m)(plus(n))(c0)
>>> times(c2)(c3)(lambda n: n+1)(0)
6
>>> times(c3)(c3)(lambda n: n+1)(0)
9
Exponentiation uses repeated multiplication to get the intended value since
2**3 = 2*2*2
.
exp = λm. λn. n (times m) c1
This translates to, (m * 1)**n
.
exp = lambda m: lambda n: (n)(times(m))(c1)
>>> exp(c3)(c3)(lambda n: n+1)(0)
27
>>> exp(c3)(c0)(lambda n: n+1)(0)
1
>>> exp(c0)(c1)(lambda n: n+1)(0)
0
>>> exp(c2)(c1)(lambda n: n+1)(0)
2
>>> exp(c3)(c2)(lambda n: n+1)(0)
9
Subtraction requires quite a bit more work to function, involving the
predecessor combinator prd
. First we must define two pairs zz
(a starting
value) and ss
that takes two arguments ci, cj
then yields cj, cj+1
.
Applying ss
, m
times to pair c0,c0
yields 0,0
when m
is 0, otherwise
cm1, cm
when m
is positive. The pred
is always found in the first
component of the pair.
zz = pair c0 c0
ss = λp. pair (snd p) (plus c1 (snd p))
prd = λm. fst (m ss zz)
# zz = pair c0 c0
zz = pair(c0)(c0)
# ss = λp. pair (snd p) (plus c1 (snd p))
ss = lambda p: pair(snd(p))(plus(c1)(snd(p)))
# prd = λm. fst (m ss zz)
prd = lambda m: fst((m)(ss)(zz))
Now that we have the prd
combinator, we can define subtraction. Like addition,
where we find a successor by iteratively adding 1 to 0, we can subtract by
iteratively subtracting 1, ntimes from m
, where m  n = v
.
sub = λm. λn. n prd m
# sub = λm. λn. n prd m
sub = lambda m: lambda n: (n)(prd)(m)
>>> sub(c3)(c1)(lambda n: n+1)(0)
2
>>> sub(c3)(c2)(lambda n: n+1)(0)
1
>>> sub(c0)(c0)(lambda n: n+1)(0)
0
The λcalculus, in its pure form does not have constants or primitive operators like those used in arithmetic operations (this includes numbers). The way you compute one term with another is by applying a function to its argument(s), and an argument is always just another function.
In this post, I focus on some of the λcalculus syntax with coding examples in Javascript and Python.
λ’s syntax is very simple consisting of only three kinds of terms:
variable, abstraction, and application.
* variable x
is a term
* abstraction λx.t
is a term called a lambda abstraction
* application t s
is a term, where t,s
are terms
Abstract syntax refers the the robust and provable representation of syntax
known as an Abstract Syntax Tree (AST) – the same structure used by compilers
and interpreters. These are similar structures to the Context Free Grammar trees
from two of my previous posts, for example:
λx. (λy. ((x y) x))
would create the following AST:
λx

λy

apply
/ \
apply x
/ \
x y
This can also be written as λx. λy. x y x
, since application associates to the
left and bodies of abstractions are extended as far to the left as possible.
Scope in the λcalculus is fairly straightforward. You have three main parts to
scope: bound variables, binders and free variables.
* a variable is bound when it is within the body of term t
of an abstraction of the form λx.t
* λx
is a binder with scope t
* a variable x
is free if its position is not bound by an enclosing abstraction
Occurrences of x
in xy
and λy.xy
are free, whereas λx.x
and
λz.λx.λy.x(yz)
are bound. For separate occurences of x
, there can be a mix
of bound and free states such as in (λx.x)x
where the first x
is bound to
λx
and the second is free.
Here is an example of λx. λy. x y
in Javascript:
function(x) {
return function(y) {
return x(y);
}
}
Or in Python:
lambda x: lambda y: x(y)
Running this program setting x
to the print()
function and y
to the string
"hello"
:
>>> (lambda x: lambda y: x(y))(print)("hello")
hello
The AST created is simple:
λx

λy

apply
/ \
x y
Changing this expression slightly to λx. (λy. x) y
changes the syntax tree and
the code.
λx

apply
/ \
λy y

x
Producing the following code in Javascript:
function(x) {
var funY = function(y) {
return x;
}
return funY(y);
}
And Python:
(lambda x: (lambda y: x))
Let’s run that Python code a few times and see what prints out:
>>> (lambda x: (lambda y: x)(4))(5)
5
>>> (lambda x: (lambda y: x)(2))(8)
8
>>> (lambda x: (lambda y: x)(123))(2)
2
So, for every input x,y
, we always return x
. That has the same behavior as
the identity function: λx.x
, where the only thing it does is return its
argument. So, how does λx. (λy. x) y
have the same behavior as λx.x
? It is
because it is a reducible expression (redex) that reduces to the identity
function.
There are many ways to reduce λexpressions and different languages use
different strategies. The most general purpose strategy is full betareduction,
where you can reduce any redex at any time. To reduce λx. (λy. x)y
we do the
following reduction in one step:
λx. (λy. x)y
λx. x
There are other reduction strategies employed by various languages (some use more than one) such as: * Callbyname/need: Haskell * Callbyreference: Perl, PHP, C++ * Callbyvalue: C, Scheme, OCaml
]]>I recently cracked open Benjamin C. Pierce’s Types and Programming
Languages
in anticipation of the upcoming semester’s research. I will often times skim
through introductions, but decided to carefully analyze what I was reading this
time  and I’m glad I did! For quite some time I’ve wondered what the purpose of
such verbose type systems, like Java has, had any purpose beyond contributing to
carpal tunnel a visual queue to the programmer. Coming into the
programming world through Perl and Ruby, I
learned how to infer what a variable does by the context of the statement and
fail to see the benefit of having to explicitly state it. Compilers can do type
inference, too, so what is the point of all the extra declarations?
I have heard
numerous explanations for why we must type ArrayList array = new
ArrayList()
, the most often cited is that it makes it a safe language.
That, intuitively, is false. If that were the case, C++ would be a safe language
but any language that allows you to write past the bounds of an array into some
unknown block is categorically unsafe.
Well, it turns out that I was almost right: verbose type declarations are often
times there to annoy help the programmer, not the compiler. Explicit
type declaration allows the language to be statically checked for sanity,
however static type checking does not guarantee a safe language, nor does it
guarantee runtime safety, e.g. array bounds checking is still done dynamically.
There has been much work done in the area of static analysis for dynamically typed languages. In fact, much of the research Matt Might does is in this area of higherorder languages.
Scheme, Perl, Lisp, etc are all safe languages. In other words, it is “impossible to shoot yourself in the foot while programming” (Pierce)
According to Pierce, a safe language is one that “protects its own abstractions”, his examples include accessing arrays by built in methods, lexically scoped variables only being accessible in that scope, and stacks that behave as stacks.
Variable scope is an interesting attribute of a safe language, many languages
handle this differently. Perl, for example, uses the my
keyword to limit the
scope of variables and enforces this when you explicitly declare use strict
.
Take the following useless bit of code:
sub fun {
my ($arrayref) = @_;
my %hash;
my $value = 0;
for my $element (@$arrayref) {
my $thing = $element;
$hash{$element} = $thing;
}
# this should print 0..arrayref length
print map { "$_\n" if $arrayref>[$_] == $hash{$_}} @$arrayref;
print $thing # outofscope, error
print $element # outofscope, error
}
However, this is completely different than scope in Javascript, since it has
functionlevel scoping. var
is the equivalent to Perl’s my
keyword, since
all variables without a var
are globally scoped.
function fun(array) {
var hash = {},
value = 0;
for (var i = 0, len = array.length; i < len; i++) {
var thing = array[i];
hash[thing] = thing;
}
// this will never loop, since 'i' is scoped to the function 'fun' and not
// the for loop
for (; i < len; i++) {
if (array[i] === hash[array[i]]) {
console.log(array[i]);
}
}
console.log(thing); // prints last element of array
console.log(i, len); // prints the values of these
}
I would argue that while Javascript is a bit strange here with functionlevel scoping only, it passes the scoping test in Pierce’s safe language definition, as well as passing the arraybounds checking test and a stack that acts like a stack. In fact, it is the safety of Javascript that allows tools like Mozilla’s Spider Monkey, Google’s V8 (which is what node.js runs on) to infer type information.
I don’t know. It seems that much of the most recent languages are all dynamically typed. I will revisit this question at a later date, when I’ve made my way through more than just the first few chapters of this book. At the moment, it seems like an antiquated and restrictive way of analyzing programs.
]]>In my previous post, I described some ambiguous CFGs for the simple math expression: a+a*a. In this post, I’ll be talking about a more complicated, but not more difficult ambiguous contextfree grammar, the ifelsethen contextfree language.
Take this grammar describing if q then if q then p else p:
# if q then if q then p else p
S > if E then S  if E then S else S  P
P > p
E > q
Why is this an ambiguous grammar, well, it has to do with the else statement. else p can be the end of the inner if q then p or the end of the outer if q then if q then p. Let’s write out the parse trees to make some visual sense.
# if q then if q then p else p
S S
 
if E then S else S if E then S
  \  
q if E then S P q if E then S else S
     
q P p q P P
  
p p p
##Cut into smaller modules
If you think of the grammar as being cut up into smaller named modules, it makes it easier to abstract away the more complex strings generated. For example, if E then S we could call U (for unmatched else) and if E then S else S we could call M (for matched else).
S > U  M
U > if E then S
M > if E then S else S
Then we begin to disambiguate the grammar by removing the some inner relationships between S, U, and M. Start by making M only selfreferential and terminaling.
M > if E then M else M  P
Then we make U the unbalanced else by tailrecursing and referring to S in order to eventually terminate.
U > if E then S # base case
 if E then M else U # recurse
Now we have an unambiguous grammar, ensuring that all matched else strings follow one branch and unmatched else strings follow another.
S > U  M
U > if E then S # base case
 if E then M else U # recurse
M > if E then M else M  P
P > p
E > e
# if q then if q then p else p
S

U

if E then S
 
q M

if E then M else M
  
q P P
 
p p