Assessed Exercise 1, Task 2 |
Introduction. In this task you will write a lexer for the simple programming language given here. You are free to write the lexer in any form you want, whether 'by hand' or with lexer generator, as long as it adheres to the requirements specified below. Writing the lexer by hand will give you a better understanding of lexical analysis, while using a tools achieves the goal with many fewer lines of code (my example solution, written using the JFlex lexer generator is around 60 lines of code). Another implementation technique (probably easier) is using Java's regular expression library. Task. Your lexer takes arbitrary strings (over the given alphabet) as input, and returns either a suitable token list when the input is lexically valid, or a suitable error exception (when the input is lexically invalid). Note that you are not asked to check if the input is syntactically correct w.r.t. to the context free grammar here. You should only check for lexical correctness. The lexer has to have the following signature. class LexicalException extends Exception { public String msg; public LexicalException ( String _msg ) { msg = _msg; } } class Task2Exception extends Exception { public String msg; public Task2Exception ( String _msg ) { msg = _msg; } } interface Lexer { public List<Token> lex ( String input ) throws LexicalException, Task2Exception; }
Here You will also have to implement a method
class Task2 { public static Lexer create () { ... } } For your convenience, here are the remaining definitions in one file. If you use a lexer generator to handle this task, I strongly recommend submitting the generator's source files too. I am unable to comprehend and comment on auto-generated lexers without source files: the auto-generated code is too complicated/weird for humans to comprehend directly. Examples. Here is a lexically valid program in the language. def f(x,y,z) = { if x == y then { z } else { 0 } } It gives rise to the following token stream. T_Def T_Identifier ( "f" ) T_LeftBracket T_Identifier ( "x" ) T_Comma T_Identifier ( "y" ) T_Comma T_Identifier ( "z" ) T_RightBracket T_EqualDefines T_LeftCurlyBracket T_If T_Identifier ( "x" ) T_Equal T_Identifier ( "y" ) T_Then T_LeftCurlyBracket T_Identifier ( "z" ) T_RightCurlyBracket T_Else T_LeftCurlyBracket T_Integer ( 0 ) T_RightCurlyBracket T_RightCurlyBracket Note that the following is also a valid (at the lexical level) input. ;;{{{}}{{{ {{}}}} }}}}}}}}10 10 if if then then then else The necessary Java files for all three tasks can be found for convenient download here: Warning. Your code
needs to compile against the signatures given to you
here. Do not change them. You are welcome to add new
classes. For your convenience you can find a
file |