Introduction
This is the crucial step: abstracting the concrete semantics away so that we can do control flow analysis (CFA). Fortunately, in smallstep abstract interpretation, the concrete and abstract semantics are very similar.
In this article, I introduce the concept and need for control flow analysis and show how to convert concrete semantics from the definition of my Concrete CESK machine into abstract semantics in preparation for creating an Abstract CESK machine.
Smallstep Abstract Interpretation
At the heart of analysis of higherorder languages (like Dalvik), is a recursive relationship between controlflow and valueflow: any attempt to tackle one necessarily means you must tackle the other. The class of algorithms that moves to solve this are Controlflow analyses  which specialize in solving the valueflow problem.
Smallstep abstract interpretation is one way to write a CFA.
Higherorder Controlflow problem
The problem is simply stated as: the precise target of a function call may not be obvious. In languages Objectoriented like Java, Ruby, and Python, this problem is presented in the form of dynamically dispatched methods.
1


In the above code, the target of the method call .method()
depends on the value
that flows to the statement object
. In other words, the dataflow
affecting the value of object
creates a dependency with the controlflow
of the method call.
Furthermore, if object
were a parameter to a method, the
dataflow of the parameters of the method would be determined by the
controlflow of the function.
With a javalike pseudocode example:
1 2 3 4 5 6 

In the above example, the target of func
depends on where the method flows.
Further, it is not clear which method func
may refer. So in controlflow
analysis where the expression could be invoked must be considered along with
what types of argument is might receive.
Higherorder Valueflow problem
The consideration of which values might be produced by the expressions func
and x
is the valueflow problem. This is generally undecidable. Instead we
must think of all possible values that a procedure could evaluate to  without
knowing exactly which values those might be.
This means that there is nondeterminism in the evaluation of procedures and consequently leads to the inability of Concrete semantics to correctly model the flow of a program without having infinite amounts of memory at its disposal.
Once we add nondeterminism into the mix, we need to redefine the CESK machine from Concrete semantics to Abstract semantics.
Abstract vs Concrete
There are two main changes that occur in the transition from concrete to abstract semantics.
 Infinite statespace must become finite state space.
 The addresses in the finite statespace become pointers to a set of all possible values for that address.
Finite StateSpace
In the Concrete CESK machine that I built, it would be trivial to construct a recursive function that allocated a new address on every call. This would eventually fill up the available memory of the machine running it  unless, of course, you have a computer with an infinite amount of memory. Since it is safe to say that an infinitememory machine doesn’t exist, we must abstract away the idea of an address.
It is both mathematically and computationally possible to define an abstract machine with a single address. This is due to the nature of abstract interpretation. Since an abstract address is a pointer to a set of possible values, then through nondeterminism, you can give all possible (and impossible) execution paths with a single address. I will take a more practical approach.
The solution to how to represent a finite number of addresses comes from a simple observation: A statement is a finite thing and in any Dalvik program, there are a finite number of statements. So an abstract address could be the statement itself.
With that solution, the following definition of the conversion from an infinite store and address space to a finite store and address space follows:
In a way, one can think of the transition from the infinite statespace to the finitestate space as a simple “Throw hats on everything” approach. That thought would be wrong, however.
The problem with that approach is that since addresses contain frame pointers and frame pointers contain addresses, you analysis could run into an infinite frame pointer where its set of values contains itself.
kCFA shows us how to tame recursion by removing it from the statespace through a process called “snipping the knots”  thereby removing these cyclical dependencies from the dependency graph. I’ll be following the outline in Abstract interpreters for free.
Making Snips
The first part of of making a snip is handled already by the Concrete CESK machine: add a store. The store is still defined in terms of an infinite number of addresses at this point:
We need to take this store and then thread it through the transition relation so that a dependency graph edge going from set would result in:
Thereby making the store an integral component of each state. This is known as a storepassing style transform. Fortunately, I already did this in the Concrete CESK machine.
Trickling up Abstraction
We need two components to calculate an “optimal” smallstep abstract transition relation:
 An abstract statespace
 A Galois connection between the abstract and concrete statespaces
Galois Connections
Galois connections are seen mostly in order theory and is a particular correspondence between partially ordered sets. It would be way beyond the scope of this to delve too deeply into it. However, it is important to explain the role the correspondence plays in static program analysis.
In my article about static analysis, I touch a little on lattices and partial ordering where I introduce the concept of a supremum and infimum in terms of a lattice and their role in showing partial ordering of variable types.
I would like to take this a bit further and define a simple Galois connection resulting from the mapping functions and from the abstract transition relation above.
Take the following code:
1 2 3 4 5 6 7 

We can clearly see that if a or b is 0, that we have caused a problem. We can define a single abstract function and a single aconcretization function as follows:
You can clearly see that the functions map from one semantic domain to the other. The goal is to ensure we have those and mappings for the entire transition function:
This happens next.
Abstracting Leaves of the dependency graph
After snipping and storepassing style transforms, what is left are sets of addresses that I will refer to as infinite leaf nodes. One of the two goals in the transition from concrete semantics to abstract semantics is to make addresses finite.
It is up to the analysis designer to choose a finite set for every leaf node which will become the leaves of the abstract statespace. It is equally important that an extraction function be created that maps a concrete element to an abstract element: . With a fixed extraction function, we can use inference rules to automate the construction of the abstract statespace with structural Galois connections.
Given this surjective map, , the structure yields the Galois connection:
Furthermore, the extraction function fixes polyvariance and the contextsensitivity of the analysis.
Percolating automatic abstraction up
With the use of inference rules, we can build structural Galois connections and recursively apply them all the way up through the toplevel of sets of states. The inference rules are beyond the scope of this article, but can be found in Principles of Program Analysis.
After this recursive automatic synthesis of the abstract statespace, we have now built up Galois connections all the way up. Combining these connections, the extraction function on addresses and the snipping of , we can finally define the toplevel statespace and the Galois connection.
Galois connection:
Toplevel Statespace:
Conclusion
The differences between concrete and abstract semantics look small on the surface. For the most part, it really is as simple as “throw a hat on everything”. However, there are fundamental differences between the two domains that require care.
In my next article, I will be applying these mathematical transformations to the implementation of an Abstract CESK machine.