Lab 2 Bison Quick Start

Part 1: How to write a makefile?

Part 2: How to use Bison?

Part 3: What's my task in lab 2?

 


Part 1. How to write a makefile?

CC = g++

CFLAGS = -g -I/usr/local/include/sgi-stl -Wall

compile:lex.c yac.c tok.h

    $(CC) $(CFLAGS) lex.c yac.c -o compile -lfl

lex.c: calc.lex

    flex calc.lex

    -cmp -s lex.yy.c lex.c || mv lex.yy.c lex.c

yac.c: calc.y

    bison -d -v calc.y

    -cmp -s calc.tab.c yac.c || mv calc.tab.c yac.c   

    -cmp -s calc.tab.h tok.h || mv calc.tab.h tok.h

clean:

    rm -rf *.o lex.yy.c calc.output calc.tab.h


Part 2. How to use Bison?

The parser has a stack and input.

shift: Move the 1st input token to top of stack.

Reduce: Choose a grammar rule X-> A B C; pop C B A ; push X

 

Bison: a general purpose parser generator that converts a grammar description for an LALR(1) context free grammar into a C program to parse that grammar. It's compatible with YACC.

 

Each token (returned from scanner) should have two attributes:

1. type, it can be int, double, string or a union. We will use union type in our project. Because some tokens are integer, some are string.

2. semantic value, all rest information about token -- value of an Integer, name of Identifier

 

Structure of a Bison input file

%{

       C declarations

%}

       Bison declarations           //name of terminal, non-terminal, symbols, precedence, data types

%%

       Grammar rules                //productions, actions

%%

       Additional C code           // any C code you want to use

 

Useful Bison Declaration

%union: Declare the collection of data types that semantic values may have

%type: used to declaring nonterminal symbols

%token: declaring token types (terminal)

%left, %right, %nonassoc: associativity

%start: the 1st nonterminal specified in the grammar specification section.

 

Note: About union, you may want to declare

%union{

  int int_val;

  string * str_val;

}

 

In the actions part of flex source file, you need to do the following:

Identifer's regular expression   {    yylval.str_val = malloc(strlen(yytext)+1); 

                                                    strcpy(yylval.string,yytext); 

                                                    return ID;

                                               }

 

Number's regular expression   {   yylval.int_val = atoi(yytext), return NUM; }

 

 

Some important data structure and function

yyparse(): Call this function to start parsing.  It reads tokens, executes actions, ultimately returns when it encounters end-of-input or an unrecoverable syntax error.

yyerror(): User-supplied function to be called by yyparse on error.

yylval: External variable in which yylex should place the semantic value associated with a token.

YYSTYPE: Macro for the data type of semantic values; int by default. We should make it to union in our project.

 

Some Bison Documentation and Example:

Bison Manual (must read)

A complex real-world example of a parser for Java (for reference only): 


Part 3. My task in the lab?

1. Continue writing your scanner, if you haven't finished that part.

 

2. Start with the following example (a calculator) to learn Bison:

 

Example flex, bison source file and Makefile . (I have tested them thoroughly)

a) Download the above 3 files to a temporary directory

b) Type "make" in the command line

c) You will get an executable file, called compile. (There is a warning message, which you can ignore)

d) Type "./compile" in the command line. Type a calculation formula like "2+4*7", press enter on the keyboard. Type CTRL+D together, telling the terminal that the user input ends. You will get "=30" result.

e) Type "./compile", then type "2+4+", press enter, type CTRL+D, you will see error message reported by the parser.

3. Read the sample files carefully. Understand each line in the files.

4. Start to think and write your MiniJava parser, following the example. The action for the grammar rule should.be as simple as possible. If the user input match the grammer, your parser output nothing. If the user input dismatch the grammer, your parser reports an error.