Language Syntax

Lexical description. The lexical units of the language are integers, special notation, identifiers, keywords, and white space. Any input string that contains only those components is lexically valid.

Keywords. def, if, then, else, skip, while, do, repeat, until, break and continue. Note that keywords are case sensitive, and do not contain upper-case letters.
Integers. Integers are non-empty strings of digits 0-9.
Identifiers. Identifiers are non-empty strings consisting of letters (lower or upper case), digits, and the underscore character. The first character in an identifier must be a lower-case letter.
White space. White space consists of any sequence of the characters: blank (ascii 32), \n (newline, ascii 10), \f (form feed, ascii 12), \r (carriage return, ascii 13), \t (tab, ascii 9). Whitespace always separates tokens: whatever (non whitespace) is to the left of a whitespace must be part of a different token than whatever is on the right of the whitespace. Note that the opposite direction is not necessarily true: two distinct tokens are not always separated by whitespace, for example the string (()) consists of 4 tokens, likewise the string 65x consists of two tokens, T_Integer(65) followed by token T_Identifier("x"), and the string 65if; should be lexed into T_Integer(65) T_If T_Semicolon.
Special notation. The special syntactic symbols (e.g., parentheses, assignment operator, etc.) are as follows.
```
	; ( ) = == < > <= >= , { } := + * - /
      
```
Like white space, special notation always separates tokens.
Disambiguation. The rules above are ambiguous. To disambiguate, use the following two policies.
- Operate a 'longest match' policy to disambiguate: if the beginning of a string can be lexed in several ways, choose the tokensisation where the initial token removes the most from the beginning of the string.
- If there are more than one longest match, give preference to keywords.
For example the string deff should be lexed into a single identifier, not the token def followed by the identifier f. Similarly, === must be == followed by =, not the other way round or three occurences of =.

Syntax description. Here is the language syntax, given by the following context free grammar with initial non-terminal PROG, where ε stands for the empty production.

 PROG → DEC | DEC PROG 
 DEC → def ID (VARDEC) = BLOCK
 VARDEC →  ε | VARDECNE 
 VARDECNE → ID | VARDECNE, ID 
 ID → ... (identifiers)
 INT → ... (Integers)
 BLOCK → { ENE }
 ENE → E | E; ENE
 E →  INT 
   | ID 
   | if E COMP E then BLOCK else BLOCK
   | (E BINOP E)
   | skip
   | BLOCK
   | while E COMP E do BLOCK 
   | repeat BLOCK until E COMP E 
   | ID := E
   | ID (ARGS)
   | break
   | continue
 ARGS → ε | ARGSNE
 ARGSNE → E | ARGSNE, E
 COMP → == | < | > | <= | >=
 BINOP → + | - | * | /