Writing a calculator in C# using Irony

My friend Mark wrote a post last week about implementing a calculating compiler using SableCC. I have worked with SableCC on a project where we needed to parse C# code and I have to admit I don't recall SableCC as a friendly framework, and reading Mark's post proves my memory right.

First of Mark's grammar is very specific to SableCC, a lot af tricks is made to make SableCC accept the grammar and furthermore the grammar is complicated further by handling operator precedence. If we disregard performance considerations and focus on an easily understandable and maintainable compiler structure our grammar for our parser generator ideally would look very much like the Extended Backus Naur Format (EBNF) of the language.

number = digit+ | digit+ '.' digit+
lparen = '('
rparen = ')'
binop = '+' | '-' | '/' | '*' | '%'
funcname = letter+
expression := lparen expression rparen | expression operator expression | funcname lparen expression rparen | number

Irony is like SableCC a scanner/parser generator but it is implemented as an internal DSL in C# for specifying grammars on a format very close to EBNF. The code is specified in the constructor body of a class inherting from Irony's Grammar class. The EBNF described above (slightly modified for readability) looks like this:

public class SimpleCalcGrammar : Irony.Parsing.Grammar{
    public SimpleCalcGrammar(){
        var number = TerminalFactory.CreateCSharpNumber("number");
        var identifier = TerminalFactory.CreateCSharpIdentifier("identifier");
        var expression = new NonTerminal("expression");
        var binexpr = new NonTerminal("binexpr");
        var parexpr = new NonTerminal("parexpr");
        var fncall = new NonTerminal("fncall");
        var binop = new NonTerminal("binop");
            
        expression.Rule =  parexpr | binexpr | number | fncall;
        parexpr.Rule = "(" + expression + ")";
        binexpr.Rule = expression + binop + expression;
        binop.Rule = Symbol("+") | "-" | "/" | "*" | "%";
        fncall.Rule = identifier + "(" + expression + ")";
        this.Root = expression;
        //...
    }
}

The first few lines are initialization of our Terminals and Productions, this is pretty trivial however a few things is worth noting. Irony is a very rich framework that provides a lot of prebaked functionality, one of these is the TerminalFactory for creating typically used terminals, in our case we use C#'s format for numbers and identifiers, that means we out of the bag get support in our language for suffix on numbers for specifying types etc.

The interresting part is after initialization, here we set up rules for the productions. Irony relies heavily on operator overloading so using our productions and the operators from EBNF we can describe the language pretty straight forward. Lastly we specify what productions is the root of our program, that is the production that contains the entry point for the whole language.

Next up we need to handle operator precedence, fortunately Irony knows that this is a common challenge when developing languages and hence it supports registering our operators and their precedence using the following code:

RegisterOperators(1, "+", "-");
RegisterOperators(2, "*", "/", "%");

Having specified this we have a working parser that can be used like this:

SimpleCalcGrammar g = new SimpleCalcGrammar();
Parser p = new Parser(g);
ParseTree t = p.Parse("25-37+2*(1.22+cos(5))*sin(5)*2+5%2*3*sqrt(5+2)");

This gives us a parse tree that is pretty rough in the edges and hard to work with, what we want instead is a pretty Abstract Syntax Tree that contains exactly the information we need. On our NonTerminal instances we can specify which nodes they should be transformed into, fortunately Irony contains some of the basic nodes that are needed in many languages for instance the binary expression node, so we can specify the built-in BinExprNode on our binexpr NonTerminal like this:

var binexpr = new NonTerminal("binexpr", typeof(BinExprNode));

For us to parse the language we need one aditional node in our AST, that is the FunctionCall node (the observant reader might have noticed i have choosen to implement an openended grammar that allows for extending with new built-in functions without modifying the grammar). Irony has a AST node for function calls built-in but for the sake of the example i will show a custom implementation here:

public class FunctionCallNode : AstNode{
    public string Name;
    public AstNode Argument;
    public override void Init (Irony.Parsing.ParsingContext context, Irony.Parsing.ParseTreeNode treeNode)
    {
        base.Init (context, treeNode);
        Name = treeNode.ChildNodes[0].FindTokenAndGetText(); 
        Argument = AddChild("Arg", treeNode.ChildNodes[1]);
        AsString = Name; 
    }
}
//Adding the node to our function call node
var fncall = new NonTerminal("fncall", typeof(FunctionCallNode));

So implementing our own AST nodes is pretty simple, and Irony has a rich infrastructure for getting text of tokens and adding childnodes as you can see in the sample. Because our parse tree contains a lot of "garbage" that we don't want transferred to our AST, so we need to inform Irony to import these, this is done by marking nonterminals as transient like this:

MarkTransient(parexpr, expression);

And at last we need to set a flag so that the parser generates the Ast using the following line of code:

LanguageFlags = LanguageFlags.CreateAst;

The next step is to actually work with the AST for sementic analysis, codegeneration or as in our case execution of the calculation. Mark did this using a visitor and Irony's AST supports visitors aswell, so we can make the "PrintVisitor" mark used for debugging like this:

public class PrintVisitor : IAstVisitor{
    int indentation = 0;
    public void BeginVisit (AstNode node)
    {
        for (int i = 0;i<indentation ;i++ ) {
            Console.Write("\t");
        }
        Console.WriteLine(node.ToString());
        indentation++;
    }
    
    public void EndVisit (AstNode node)
    {
        indentation--;
    }    
}

Using a visitor looks like this:

SimpleCalcGrammar g = new SimpleCalcGrammar();
Parser p = new Parser(g);
ParseTree t = p.Parse("25-37+2*(1.22+cos(5))*sin(5)*2+5%2*3*sqrt(5+2)");
var astnode = (AstNode)t.Root.AstNode
astnode.AcceptVisitor(new PrintVisitor());

And the result of this operation is:

+(operator)
    Arg: +(operator)
        Arg: -(operator)
            Arg:25
            Arg:37
        Arg: *(operator)
            Arg: *(operator)
                Arg: *(operator)
                    Arg:2
                    Arg: +(operator)
                        Arg:1.22
                        Arg: cos
                            Arg:5
                Arg: sin
                    Arg:5
            Arg:2
    Arg: *(operator)
        Arg: *(operator)
            Arg: %(operator)
                Arg:5
                Arg:2
            Arg:3
        Arg: sqrt
            Arg: +(operator)
                Arg:5
                Arg:2

However, there is no need for us to create a printvisitor, Irony comes with an UI that allows us to work with our grammar and can parse samples and show us AST, parsertrace with parserstatec and much more, this is very useful to see what the grammar actually does under the covers (shift/reduce). But basically we can go ahead and implement pretty much the same visitor as Mark have used for SableCC, but let's do something different. Irony has support for a basic LanguageRuntime, and actually for a basic interpreter aswell. So if we tell our AST-nodes how they should evaluate themselves we get all the infrastructure given to us by Irony!

So lets override the Evaluate method on our FunctionCallNode so it can be used in Irony's interpreter infrastructure:

public class FunctionCallNode : AstNode{
    //....
    public override void Evaluate (Irony.Interpreter.EvaluationContext context, Irony.Ast.AstMode mode)
    {
        Argument.Evaluate(context, AstMode.Read); //Evaluate the argument the result is saved in context.Data
        double input = Convert.ToDouble(context.Data[0]);
        double result;
        if(Name == "sqrt"){
            result = Math.Sqrt(input);
        }else if(Name == "cos"){
            result = Math.Cos(input);    
        }else if(Name == "sin"){
            result = Math.Sin(input);    
        }else{
            throw new NotSupportedException("Method " + Name + " not supported");    
        }
        context.Data.Replace(1, result); //Replace the argument value on the stack with the result of the function call
    }
}

Yes, in real life our methods would be in a method table and not hardcoded here, this is just to demonstrate the functionality of the interpreter in irony. To use the Interpreter we write code like this:

var interpreter = new Irony.Interpreter.ScriptInterpreter(new LanguageData(new SimpleCalcGrammar()));
interpreter.Evaluate("25-37+2*(1.22+cos(5))*sin(5)*2+5%2*3*sqrt(5+2)");
Console.WriteLine(interpreter.EvaluationContext.LastResult);

Voila, done with the code that corresponds to Mark's sample and the result of running the above is: -9.83033874894108 and the built-in Irony interpreter handle almost everything for us leaving me with only 57 lines of pure C# code:

public class FunctionCallNode : AstNode{
    public string Name;
    public AstNode Argument;
    public override void Init (Irony.Parsing.ParsingContext context, Irony.Parsing.ParseTreeNode treeNode)
    {
        base.Init (context, treeNode);
        Name = treeNode.ChildNodes[0].FindTokenAndGetText();
        Argument = AddChild("Arg", treeNode.ChildNodes[1]);
        AsString = Name;
    }
    
    public override void Evaluate (Irony.Interpreter.EvaluationContext context, Irony.Ast.AstMode mode)
    {
        Argument.Evaluate(context, AstMode.Read);
        double input = Convert.ToDouble(context.Data[0]);
        double result;
        if(Name == "sqrt"){
            result = Math.Sqrt(input);
        }else if(Name == "cos"){
            result = Math.Cos(input);    
        }else if(Name == "sin"){
            result = Math.Sin(input);    
        }else{
            throw new NotSupportedException("Method " + Name + " not supported");    
        }
        context.Data.Replace(1, result);
    }
}
public class SimpleCalcGrammar : Irony.Parsing.Grammar{
    public SimpleCalcGrammar(){
        
        var number = TerminalFactory.CreateCSharpNumber("number");
        var identifier = TerminalFactory.CreateCSharpIdentifier("identifier");
        var expression = new NonTerminal("expression");
        var binexpr = new NonTerminal("binexpr", typeof(BinExprNode));
        var parexpr = new NonTerminal("parexpr");
        var fncall = new NonTerminal("fncall", typeof(FunctionCallNode));
        var binop = new NonTerminal("binop", "operator");
        
        expression.Rule =  parexpr | binexpr | number | fncall;
        parexpr.Rule = "(" + expression + ")";
        binexpr.Rule = expression + binop + expression;
        binop.Rule = Symbol("+") | "-" | "/" | "*" | "%";
        fncall.Rule = identifier + "(" + expression + ")";
        
        RegisterPunctuation("(",")");
        RegisterOperators(1, "+", "-");
        RegisterOperators(2, "*", "/", "%");
        
        
        MarkTransient(parexpr, expression);
        
        this.Root = expression;
        this.LanguageFlags = LanguageFlags.CreateAst;
    }
}
Posted on 07 Oct 2009 by Jakob T. Andersen
blog comments powered by Disqus