do not commit

Ambiguous Context-Free Grammars

| Comments


In this post, I am going to describe what makes a Context-Free Grammar (CFG) ambiguous. I’m doing this because I have a test tomorrow on Pushdown Automata (NPDA), CFG, Pumping Lemma, and (Non)?Deterministic Turing Machines - what better way to study than this?


A grammar is considered ambiguous simple if it has more than one way that it can construct a parse tree for the same input. That’s it! A simple example is in a CFG for this a math expression parser:

# a+a*a
S -> S + S | S * S | (S) | a
# or expanded
S -> S + S
S -> S * S
S -> (S)
S -> a

So, why is this ambiguous? For most of us we immediately see that we should multiply first and then add, per the order-of-operations rule. However, when building parse trees off of that CFG, we can see that it can be made into two different trees:

    S                S
    |                |
  S + S            S * S
 /  |  \          /  |  \
a   + S * S    S + S *   a
      | | |    | | |
      a * a    a + a

The key to remember is that we know that the multiplication should come first, so we should fix this.


The part that makes this grammar split is that we are referencing addition, multiplication, and terminal a with the same Non-terminal S. There is no algorithm that can tell if a CFG is ambiguous, however there are some techniques for disambiguating: * Add non-terminals * Divide into: Expressions, Factors, Terms

# a+a*a
S -> E
E -> E + T | T # E is an expression
T -> T * F | F # T is a term that disambiguates * and +
F -> (E) | a   # F is a factor, isn't affected by * or +
# compacted
S -> S + T | T
T -> T * F | F
F -> (S) | a
# or expanded
S -> E
E -> E + T
E -> T
T -> T * F
T -> F
F -> (E)
F -> a

h This will ensure that we end up with a single parse tree for that string.

  S + T
 /  |  \
T   + T * F
|     | | |
F     a * a

Inherent Ambiguity

Inherently ambiguous Context-Free Languages are languages that have no unambiguous CFG. This is a tough one to determine because not only is there no algorithm to help, but you must prove that it is ambiguous for all grammars. Wikipedia has a good example of an inherently ambiguous language. Though, it is sort of cheating, since the language defined by the union of the two CFLs is not a CFL. I prefer the CFL: