Unit PasDoc_Tokenizer

Description

Simple Pascal tokenizer.

The TTokenizer object creates TToken objects (tokens) for the Pascal programming language from a character input stream.

The PasDoc_Scanner unit does the same (it actually uses this unit's tokenizer), with the exception that it evaluates compiler directives, which are comments that start with a dollar sign.

Uses

Overview

Classes, Interfaces, Objects and Records

Name Description
Class TToken Stores the exact type and additional information on one token.
Class TTokenizer Converts an input TStream to a sequence of TToken objects.

Functions and Procedures

function StandardDirectiveByName(const Name: string): TStandardDirective;
function KeyWordByName(const Name: string): TKeyword;

Types

TTokenType = (...);
TKeyword = (...);
TStandardDirective = (...);
TStandardDirectives = set of TStandardDirective;
TSymbolType = (...);

Constants

TOKEN_TYPE_NAMES: array[TTokenType] of string = ( 'whitespace', 'comment ((**)-style)', 'comment ({}-style)', 'comment (///-style)', 'comment (//-style)', 'identifier', 'number', 'string', 'symbol', 'directive', 'reserved word', 'AT&T assembler register name');
TokenCommentTypes: set of TTokenType = [ TOK_COMMENT_PAS, TOK_COMMENT_EXT, TOK_COMMENT_HELPINSIGHT, TOK_COMMENT_CSTYLE ];
SymbolNames: array[TSymbolType] of string = ( '+', '-', '*', '/', '=', '<', '<=', '>', '>=', '[', ']', ',', '(', ')', ':', ';', 'ˆ', '.', '@', '$', ':=', '..', '**', '\' );
KeyWordArray: array[Low(TKeyword)..High(TKeyword)] of string = ('x', 'AND', 'ARRAY', 'AS', 'ASM', 'BEGIN', 'CASE', 'CLASS', 'CONST', 'CONSTRUCTOR', 'DESTRUCTOR', 'DISPINTERFACE', 'DIV', 'DO', 'DOWNTO', 'ELSE', 'END', 'EXCEPT', 'EXPORTS', 'FILE', 'FINALIZATION', 'FINALLY', 'FOR', 'FUNCTION', 'GOTO', 'IF', 'IMPLEMENTATION', 'IN', 'INHERITED', 'INITIALIZATION', 'INLINE', 'INTERFACE', 'IS', 'LABEL', 'LIBRARY', 'MOD', 'NIL', 'NOT', 'OBJECT', 'OF', 'ON', 'OR', 'PACKED', 'PROCEDURE', 'PROGRAM', 'PROPERTY', 'RAISE', 'RECORD', 'REPEAT', 'RESOURCESTRING', 'SET', 'SHL', 'SHR', 'STRING', 'THEN', 'THREADVAR', 'TO', 'TRY', 'TYPE', 'UNIT', 'UNTIL', 'USES', 'VAR', 'WHILE', 'WITH', 'XOR');
StandardDirectiveArray: array[Low(TStandardDirective)..High(TStandardDirective)] of PChar = ('x', 'ABSOLUTE', 'ABSTRACT', 'APIENTRY', 'ASSEMBLER', 'AUTOMATED', 'CDECL', 'CVAR', 'DEFAULT', 'DISPID', 'DYNAMIC', 'EXPERIMENTAL', 'EXPORT', 'EXTERNAL', 'FAR', 'FORWARD', 'GENERIC', 'HELPER', 'INDEX', 'INLINE', 'MESSAGE', 'NAME', 'NEAR', 'NODEFAULT', 'OPERATOR', 'OUT', 'OVERLOAD', 'OVERRIDE', 'PASCAL', 'PRIVATE', 'PROTECTED', 'PUBLIC', 'PUBLISHED', 'READ', 'REFERENCE', 'REGISTER', 'REINTRODUCE', 'RESIDENT', 'SEALED', 'SPECIALIZE', 'STATIC', 'STDCALL', 'STORED', 'STRICT', 'VIRTUAL', 'WRITE', 'DEPRECATED', 'SAFECALL', 'PLATFORM', 'VARARGS', 'FINAL');

Description

Functions and Procedures

function StandardDirectiveByName(const Name: string): TStandardDirective;

Checks is Name (case ignored) some Pascal keyword. Returns SD_INVALIDSTANDARDDIRECTIVE if not.

function KeyWordByName(const Name: string): TKeyword;

Checks is Name (case ignored) some Pascal standard directive. Returns KEY_INVALIDKEYWORD if not.

Types

TTokenType = (...);

enumeration type that provides all types of tokens; each token's name starts with TOK_.

TOK_DIRECTIVE is a compiler directive (like $ifdef, $define).

Note that tokenizer is not able to tell whether you used standard directive (e.g. 'Register') as an identifier (e.g. you're declaring procedure named 'Register') or as a real standard directive (e.g. a calling specifier 'register'). So there is no value like TOK_STANDARD_DIRECTIVE here, standard directives are always reported as TOK_IDENTIFIER. You can check TToken.Info.StandardDirective to know whether this identifier is maybe used as real standard directive.

Values
  • TOK_WHITESPACE
  • TOK_COMMENT_PAS
  • TOK_COMMENT_EXT
  • TOK_COMMENT_HELPINSIGHT
  • TOK_COMMENT_CSTYLE
  • TOK_IDENTIFIER
  • TOK_NUMBER
  • TOK_STRING
  • TOK_SYMBOL
  • TOK_DIRECTIVE
  • TOK_KEYWORD
  • TOK_ATT_ASSEMBLER_REGISTER
TKeyword = (...);
 
Values
  • KEY_INVALIDKEYWORD
  • KEY_AND
  • KEY_ARRAY
  • KEY_AS
  • KEY_ASM
  • KEY_BEGIN
  • KEY_CASE
  • KEY_CLASS
  • KEY_CONST
  • KEY_CONSTRUCTOR
  • KEY_DESTRUCTOR
  • KEY_DISPINTERFACE
  • KEY_DIV
  • KEY_DO
  • KEY_DOWNTO
  • KEY_ELSE
  • KEY_END
  • KEY_EXCEPT
  • KEY_EXPORTS
  • KEY_FILE
  • KEY_FINALIZATION
  • KEY_FINALLY
  • KEY_FOR
  • KEY_FUNCTION
  • KEY_GOTO
  • KEY_IF
  • KEY_IMPLEMENTATION
  • KEY_IN
  • KEY_INHERITED
  • KEY_INITIALIZATION
  • KEY_INLINE
  • KEY_INTERFACE
  • KEY_IS
  • KEY_LABEL
  • KEY_LIBRARY
  • KEY_MOD
  • KEY_NIL
  • KEY_NOT
  • KEY_OBJECT
  • KEY_OF
  • KEY_ON
  • KEY_OR
  • KEY_PACKED
  • KEY_PROCEDURE
  • KEY_PROGRAM
  • KEY_PROPERTY
  • KEY_RAISE
  • KEY_RECORD
  • KEY_REPEAT
  • KEY_RESOURCESTRING
  • KEY_SET
  • KEY_SHL
  • KEY_SHR
  • KEY_STRING
  • KEY_THEN
  • KEY_THREADVAR
  • KEY_TO
  • KEY_TRY
  • KEY_TYPE
  • KEY_UNIT
  • KEY_UNTIL
  • KEY_USES
  • KEY_VAR
  • KEY_WHILE
  • KEY_WITH
  • KEY_XOR
TStandardDirective = (...);
 
Values
  • SD_INVALIDSTANDARDDIRECTIVE
  • SD_ABSOLUTE
  • SD_ABSTRACT
  • SD_APIENTRY
  • SD_ASSEMBLER
  • SD_AUTOMATED
  • SD_CDECL
  • SD_CVAR
  • SD_DEFAULT
  • SD_DISPID
  • SD_DYNAMIC
  • SD_EXPERIMENTAL
  • SD_EXPORT
  • SD_EXTERNAL
  • SD_FAR
  • SD_FORWARD
  • SD_GENERIC
  • SD_HELPER
  • SD_INDEX
  • SD_INLINE
  • SD_MESSAGE
  • SD_NAME
  • SD_NEAR
  • SD_NODEFAULT
  • SD_OPERATOR
  • SD_OUT
  • SD_OVERLOAD
  • SD_OVERRIDE
  • SD_PASCAL
  • SD_PRIVATE
  • SD_PROTECTED
  • SD_PUBLIC
  • SD_PUBLISHED
  • SD_READ
  • SD_REFERENCE
  • SD_REGISTER
  • SD_REINTRODUCE
  • SD_RESIDENT
  • SD_SEALED
  • SD_SPECIALIZE
  • SD_STATIC
  • SD_STDCALL
  • SD_STORED
  • SD_STRICT
  • SD_VIRTUAL
  • SD_WRITE
  • SD_DEPRECATED
  • SD_SAFECALL
  • SD_PLATFORM
  • SD_VARARGS
  • SD_FINAL
TStandardDirectives = set of TStandardDirective;
 
TSymbolType = (...);

enumeration type that provides all types of symbols; each symbol's name starts with SYM_

Values
  • SYM_PLUS
  • SYM_MINUS
  • SYM_ASTERISK
  • SYM_SLASH
  • SYM_EQUAL
  • SYM_LESS_THAN
  • SYM_LESS_THAN_EQUAL
  • SYM_GREATER_THAN
  • SYM_GREATER_THAN_EQUAL
  • SYM_LEFT_BRACKET
  • SYM_RIGHT_BRACKET
  • SYM_COMMA
  • SYM_LEFT_PARENTHESIS
  • SYM_RIGHT_PARENTHESIS
  • SYM_COLON
  • SYM_SEMICOLON
  • SYM_DEREFERENCE
  • SYM_PERIOD
  • SYM_AT
  • SYM_DOLLAR
  • SYM_ASSIGN
  • SYM_RANGE
  • SYM_POWER
  • SYM_BACKSLASH: SYM_BACKSLASH may occur when writing char constant "ˆ\", see ../../tests/ok_caret_character.pas

Constants

TOKEN_TYPE_NAMES: array[TTokenType] of string = ( 'whitespace', 'comment ((**)-style)', 'comment ({}-style)', 'comment (///-style)', 'comment (//-style)', 'identifier', 'number', 'string', 'symbol', 'directive', 'reserved word', 'AT&T assembler register name');

Names of the token types. All start with lower letter. They should somehow describe (in a few short words) given TTokenType.

TokenCommentTypes: set of TTokenType = [ TOK_COMMENT_PAS, TOK_COMMENT_EXT, TOK_COMMENT_HELPINSIGHT, TOK_COMMENT_CSTYLE ];
 
SymbolNames: array[TSymbolType] of string = ( '+', '-', '*', '/', '=', '<', '<=', '>', '>=', '[', ']', ',', '(', ')', ':', ';', 'ˆ', '.', '@', '$', ':=', '..', '**', '\' );

Symbols as strings. They can be useful to have some mapping TSymbolType -> string, but remember that actually some symbols in tokenizer have multiple possible representations, e.g. "right bracket" is usually given as "]" but can also be written as ".)".

KeyWordArray: array[Low(TKeyword)..High(TKeyword)] of string = ('x', 'AND', 'ARRAY', 'AS', 'ASM', 'BEGIN', 'CASE', 'CLASS', 'CONST', 'CONSTRUCTOR', 'DESTRUCTOR', 'DISPINTERFACE', 'DIV', 'DO', 'DOWNTO', 'ELSE', 'END', 'EXCEPT', 'EXPORTS', 'FILE', 'FINALIZATION', 'FINALLY', 'FOR', 'FUNCTION', 'GOTO', 'IF', 'IMPLEMENTATION', 'IN', 'INHERITED', 'INITIALIZATION', 'INLINE', 'INTERFACE', 'IS', 'LABEL', 'LIBRARY', 'MOD', 'NIL', 'NOT', 'OBJECT', 'OF', 'ON', 'OR', 'PACKED', 'PROCEDURE', 'PROGRAM', 'PROPERTY', 'RAISE', 'RECORD', 'REPEAT', 'RESOURCESTRING', 'SET', 'SHL', 'SHR', 'STRING', 'THEN', 'THREADVAR', 'TO', 'TRY', 'TYPE', 'UNIT', 'UNTIL', 'USES', 'VAR', 'WHILE', 'WITH', 'XOR');

all Object Pascal keywords

StandardDirectiveArray: array[Low(TStandardDirective)..High(TStandardDirective)] of PChar = ('x', 'ABSOLUTE', 'ABSTRACT', 'APIENTRY', 'ASSEMBLER', 'AUTOMATED', 'CDECL', 'CVAR', 'DEFAULT', 'DISPID', 'DYNAMIC', 'EXPERIMENTAL', 'EXPORT', 'EXTERNAL', 'FAR', 'FORWARD', 'GENERIC', 'HELPER', 'INDEX', 'INLINE', 'MESSAGE', 'NAME', 'NEAR', 'NODEFAULT', 'OPERATOR', 'OUT', 'OVERLOAD', 'OVERRIDE', 'PASCAL', 'PRIVATE', 'PROTECTED', 'PUBLIC', 'PUBLISHED', 'READ', 'REFERENCE', 'REGISTER', 'REINTRODUCE', 'RESIDENT', 'SEALED', 'SPECIALIZE', 'STATIC', 'STDCALL', 'STORED', 'STRICT', 'VIRTUAL', 'WRITE', 'DEPRECATED', 'SAFECALL', 'PLATFORM', 'VARARGS', 'FINAL');

Object Pascal directives

Authors


Generated by PasDoc 0.16.0.