Lexical description. The
lexical units of the language are integers, special notation,
identifiers, keywords, and white space. Any input string that
contains only those components is lexically valid.
- Keywords.
def , if ,
then , else , skip ,
while , do , repeat , until ,
break and continue . Note that keywords
are case sensitive, and do not contain upper-case letters.
- Integers. Integers are non-empty strings of digits 0-9.
- Identifiers. Identifiers are non-empty strings
consisting of letters (lower or upper case), digits, and the
underscore character. The first character in an identifier must be a
lower-case letter.
- White space. White space consists of any sequence of
the characters: blank (ascii 32), \n (newline, ascii 10), \f (form
feed, ascii 12), \r (carriage return, ascii 13), \t (tab, ascii
9). Whitespace always separates tokens: whatever (non whitespace) is to the left of a
whitespace must be part of a different token than whatever is on the
right of the whitespace. Note that the opposite direction is not
necessarily true: two distinct tokens are not always separated by
whitespace, for example the string
(()) consists of 4
tokens, likewise the string 65x consists of two
tokens, T_Integer(65) followed by
token T_Identifier("x") , and the string 65if;
should be lexed into T_Integer(65) T_If T_Semicolon .
- Special notation. The special syntactic symbols (e.g.,
parentheses, assignment operator, etc.) are as follows.
; ( ) = == < > <= >= , { } := + * - /
Like white space, special notation always separates tokens.
- Disambiguation. The rules above are ambiguous. To
disambiguate, use the following two policies.
- Operate a 'longest match' policy to disambiguate: if the
beginning of a string can be lexed in several ways, choose the
tokensisation where the initial token removes the most from
the beginning of the string.
- If there are more than one longest match, give preference
to keywords.
For example the string deff should be lexed into a
single identifier, not the token def followed by
the identifier
f . Similarly, === must
be == followed by = , not the other way
round or three occurences of = .
Syntax description. Here is
the language syntax, given by the following context free grammar with
initial non-terminal PROG , where ε stands for
the empty production.
PROG → DEC | DEC PROG
DEC → def ID (VARDEC) = BLOCK
VARDEC → ε | VARDECNE
VARDECNE → ID | VARDECNE, ID
ID → ... (identifiers)
INT → ... (Integers)
BLOCK → { ENE }
ENE → E | E; ENE
E → INT
| ID
| if E COMP E then BLOCK else BLOCK
| (E BINOP E)
| skip
| BLOCK
| while E COMP E do BLOCK
| repeat BLOCK until E COMP E
| ID := E
| ID (ARGS)
| break
| continue
ARGS → ε | ARGSNE
ARGSNE → E | ARGSNE, E
COMP → == | < | > | <= | >=
BINOP → + | - | * | /
|