Lexical Conventions

From GNUpdf

A PDF document is a sequence of 8-bit bytes.

  • Character set:
    • Regular characters
    • Delimiter characters
    • White-space characters

White-space characters

character ASCII code
Null 0
Tab 9
LineFeed 10
FormFeed 12
Carriage return 13
Space 32
End of line

One of:

  • CR + LF
  • CR
  • LF
Example sunflower image
Note: there are some parts of a PDF file where you should use a specific form of EOL


Delimiter characters

These characters delimit syntactic entities. Not part of of the defined entity.

Delimiter Meaning
( begin of string
) end of string
< begin of hexadecimal string
> end of hexadecimal string
[ begin of array
] end of array
{ begin of ???
} end of ???
/ begin of a name
% begin of a comment

Regular characters

\{ regular\ characters \} \equiv \Sigma - \{ white\ space\ characters \}

Example sunflower image
Note: a sequence of consecutive regular characters comprises a single token.