This week we will hand out the first phase of the class project, which deals with lexical analysis using the flex tool. We will also complete an exercise that will help you get acquainted with flex.
Outline for today's lab:
In this exercise, we will write a flex specification for a lexical analyzer for a simple calculator language. For now, this language will contain integer numbers, the operators plus, minus, multiply, and divide, and parentheses for grouping. Additionally, the symbol "=" is in the language to terminate an expression. These symbols and their corresponding token names are shown in the table below.
|Symbol in Language||Token Name|
|integer number (e.g., "0", "12", "1719")||NUMBER XXXX [where XXXX is the number itself]|
The calculator language itself is very simple. There is only one type of phrase in the language: "Expression=", where "Expression" is defined in the same way as for the class project, except for the fact that there are no variables in the calculator language, only numbers. For example, all of the following are valid in the calculator language.
Note, however, that lexical analysis only scans for valid tokens in the calculator
language, not valid expressions. The parsing phase (phase 2, the next phase of the class project)
is where sequences of tokens will be checked to ensure that they adhere to the specified
language grammar. Thus, for this exercise which deals only with lexical analysis, even
such phrases as
***101***())(- can still be
Task 1: Create a flex specification to recognize tokens in the calculator language. Print out an error message and exit if any unrecognized character is encountered in the input. Use flex to compile your specification into an executable lexical analyzer that reads text from standard-in and prints the identified tokens to the screen, one token per line.
Task 2: Enhance your flex specification so that input text can be optionally read from an input file, if one is specified on the command line when invoking the lexical analyzer.
Task 3: Enhance your flex specification so that in addition to printing out each encountered token, the lexical analyzer will also count the following.
+, -, *, /
Task 4 (optional): For a challenge, you may want to try extending the calculator language to allow for decimal numbers in addition to regular integers. Thus, the following numbers should be recognized by your lexical analyzer.
For an even greater challenge, extend the calculator language to allow for scientific notation in the numbers. After the number, there can be an optional "e-phrase" consisting of either "e" or "E", followed by an optional "+" or "-", followed by one or more digits. For example, the following numbers in scientific notation would be recognized by your lexical analyzer.