Unit PasDoc_Tokenizer

Description

Simple Pascal tokenizer.

The TTokenizer object creates TToken objects (tokens) for the Pascal programming language from a character input stream.

The PasDoc_Scanner unit does the same (it actually uses this unit's tokenizer), with the exception that it evaluates compiler directives, which are comments that start with a dollar sign.

Source: source/component/PasDoc_Tokenizer.pas (line 38).

Uses

Overview

Classes, Interfaces, Objects and Records

Name Description
Class TToken Stores the exact type and additional information on one token.
Class TTokenizer Converts an input TStream to a sequence of TToken objects.

Functions and Procedures

function StandardDirectiveByName(const Name: string): TStandardDirective;
function KeyWordByName(const Name: string): TKeyword;
function IsIdentifierStartChar(const C: Char): Boolean;
function IsIdentifierOtherChar(const C: Char): Boolean;

Types

TTokenType = (...);
TKeyword = (...);
TStandardDirective = (...);
TStandardDirectives = set of TStandardDirective;
TSymbolType = (...);
TTokenList = specialize TObjectList<TToken>;

Constants

TOKEN_TYPE_NAMES: array[TTokenType] of string = ( 'whitespace', 'comment ((**)-style)', 'comment ({}-style)', 'comment (///-style)', 'comment (//-style)', 'identifier', 'number', 'string', 'symbol', 'directive', 'reserved word', 'AT&T assembler register name', 'double-quoted string');
TokenCommentTypes: set of TTokenType = [ TOK_COMMENT_PAS, TOK_COMMENT_EXT, TOK_COMMENT_HELPINSIGHT, TOK_COMMENT_CSTYLE ];
SymbolNames: array[TSymbolType] of string = ( '+', '-', '*', '/', '=', '<', '<=', '>', '>=', '[', ']', ',', '(', ')', ':', ';', 'ˆ', '.', '@', '$', ':=', '..', '**', '\', '"' );
KeyWordArray: array[Low(TKeyword)..High(TKeyword)] of string = ('x', 'AND', 'ARRAY', 'AS', 'ASM', 'BEGIN', 'CASE', 'CLASS', 'OBJCCLASS', 'CONST', 'CONSTRUCTOR', 'DESTRUCTOR', 'DISPINTERFACE', 'DIV', 'DO', 'DOWNTO', 'ELSE', 'END', 'EXCEPT', 'EXPORTS', 'FILE', 'FINALIZATION', 'FINALLY', 'FOR', 'FUNCTION', 'GOTO', 'IF', 'IMPLEMENTATION', 'IN', 'INHERITED', 'INITIALIZATION', 'INLINE', 'INTERFACE', 'IS', 'LABEL', 'LIBRARY', 'MOD', 'NIL', 'NOT', 'OBJECT', 'OF', 'OR', 'PACKED', 'PROCEDURE', 'PROGRAM', 'PROPERTY', 'RAISE', 'RECORD', 'REPEAT', 'RESOURCESTRING', 'SET', 'SHL', 'SHR', 'STRING', 'THEN', 'THREADVAR', 'TO', 'TRY', 'TYPE', 'UNIT', 'UNTIL', 'USES', 'VAR', 'WHILE', 'WITH', 'XOR');
StandardDirectiveArray: array[Low(TStandardDirective)..High(TStandardDirective)] of PChar = ('x', 'ABSOLUTE', 'ABSTRACT', 'APIENTRY', 'ASSEMBLER', 'AUTOMATED', 'CDECL', 'CVAR', 'DEFAULT', 'DISPID', 'DYNAMIC', 'EXPERIMENTAL', 'EXPORT', 'EXTERNAL', 'FAR', 'FORWARD', 'GENERIC', 'HELPER', 'INDEX', 'INLINE', 'MESSAGE', 'NAME', 'NEAR', 'NODEFAULT', 'NORETURN', 'ON', 'OPERATOR', 'OUT', 'OVERLOAD', 'OVERRIDE', 'PASCAL', 'PRIVATE', 'PROTECTED', 'PUBLIC', 'PUBLISHED', 'READ', 'REFERENCE', 'REGISTER', 'REINTRODUCE', 'RESIDENT', 'SEALED', 'SPECIALIZE', 'STATIC', 'STDCALL', 'STORED', 'STRICT', 'VIRTUAL', 'WRITE', 'DEPRECATED', 'SAFECALL', 'PLATFORM', 'VARARGS', 'FINAL', 'UNIMPLEMENTED');

Description

Functions and Procedures

function StandardDirectiveByName(const Name: string): TStandardDirective;

Checks is Name (case ignored) some Pascal keyword. Returns SD_INVALIDSTANDARDDIRECTIVE if not.

Source: source/component/PasDoc_Tokenizer.pas (line 490).

function KeyWordByName(const Name: string): TKeyword;

Checks is Name (case ignored) some Pascal standard directive. Returns KEY_INVALIDKEYWORD if not.

Source: source/component/PasDoc_Tokenizer.pas (line 494).

function IsIdentifierStartChar(const C: Char): Boolean;

Returns true if c is a valid first character of a Pascal identifier: underscore, ASCII letter, or non-ASCII character (Unicode letter). Following https://docwiki.embarcadero.com/RADStudio/Athens/en/Identifiers , Delphi allows Unicode letters as identifier start characters.

Source: source/component/PasDoc_Tokenizer.pas (line 500).

function IsIdentifierOtherChar(const C: Char): Boolean;

Returns true if c is a valid continuation character of a Pascal identifier: underscore, ASCII letter, ASCII digit, or non-ASCII character (Unicode letter/digit).

Source: source/component/PasDoc_Tokenizer.pas (line 504).

Types

TTokenType = (...);

All types of tokens.

Note that tokenizer is not able to tell whether you used standard directive (e.g. 'Register') as an identifier (e.g. you're declaring procedure named 'Register') or as a real standard directive (e.g. a calling specifier 'register'). So there is no token like TOK_STANDARD_DIRECTIVE here, standard directives are always reported as TOK_IDENTIFIER. You can check TToken.Info.StandardDirective to know whether this identifier is maybe used as real standard directive.

Values
  • TOK_WHITESPACE
  • TOK_COMMENT_PAS
  • TOK_COMMENT_EXT
  • TOK_COMMENT_HELPINSIGHT
  • TOK_COMMENT_CSTYLE
  • TOK_IDENTIFIER
  • TOK_NUMBER
  • TOK_STRING
  • TOK_SYMBOL
  • TOK_DIRECTIVE: Compiler directive, like $ifdef
  • TOK_KEYWORD
  • TOK_ATT_ASSEMBLER_REGISTER
  • TOK_DOUBLE_QUOTED_STRING: Possible in assembler blocks, see ../../tests/ok_asm.pas

Source: source/component/PasDoc_Tokenizer.pas (line 61).

TKeyword = (...);

This item has no description.

Values
  • KEY_INVALIDKEYWORD
  • KEY_AND
  • KEY_ARRAY
  • KEY_AS
  • KEY_ASM
  • KEY_BEGIN
  • KEY_CASE
  • KEY_CLASS
  • KEY_OBJCCLASS
  • KEY_CONST
  • KEY_CONSTRUCTOR
  • KEY_DESTRUCTOR
  • KEY_DISPINTERFACE
  • KEY_DIV
  • KEY_DO
  • KEY_DOWNTO
  • KEY_ELSE
  • KEY_END
  • KEY_EXCEPT
  • KEY_EXPORTS
  • KEY_FILE
  • KEY_FINALIZATION
  • KEY_FINALLY
  • KEY_FOR
  • KEY_FUNCTION
  • KEY_GOTO
  • KEY_IF
  • KEY_IMPLEMENTATION
  • KEY_IN
  • KEY_INHERITED
  • KEY_INITIALIZATION
  • KEY_INLINE
  • KEY_INTERFACE
  • KEY_IS
  • KEY_LABEL
  • KEY_LIBRARY
  • KEY_MOD
  • KEY_NIL
  • KEY_NOT
  • KEY_OBJECT
  • KEY_OF
  • KEY_OR
  • KEY_PACKED
  • KEY_PROCEDURE
  • KEY_PROGRAM
  • KEY_PROPERTY
  • KEY_RAISE
  • KEY_RECORD
  • KEY_REPEAT
  • KEY_RESOURCESTRING
  • KEY_SET
  • KEY_SHL
  • KEY_SHR
  • KEY_STRING
  • KEY_THEN
  • KEY_THREADVAR
  • KEY_TO
  • KEY_TRY
  • KEY_TYPE
  • KEY_UNIT
  • KEY_UNTIL
  • KEY_USES
  • KEY_VAR
  • KEY_WHILE
  • KEY_WITH
  • KEY_XOR

Source: source/component/PasDoc_Tokenizer.pas (line 80).

TStandardDirective = (...);

This item has no description.

Values
  • SD_INVALIDSTANDARDDIRECTIVE
  • SD_ABSOLUTE
  • SD_ABSTRACT
  • SD_APIENTRY
  • SD_ASSEMBLER
  • SD_AUTOMATED
  • SD_CDECL
  • SD_CVAR
  • SD_DEFAULT
  • SD_DISPID
  • SD_DYNAMIC
  • SD_EXPERIMENTAL
  • SD_EXPORT
  • SD_EXTERNAL
  • SD_FAR
  • SD_FORWARD
  • SD_GENERIC
  • SD_HELPER
  • SD_INDEX
  • SD_INLINE
  • SD_MESSAGE
  • SD_NAME
  • SD_NEAR
  • SD_NODEFAULT
  • SD_NORETURN
  • SD_ON
  • SD_OPERATOR
  • SD_OUT
  • SD_OVERLOAD
  • SD_OVERRIDE
  • SD_PASCAL
  • SD_PRIVATE
  • SD_PROTECTED
  • SD_PUBLIC
  • SD_PUBLISHED
  • SD_READ
  • SD_REFERENCE
  • SD_REGISTER
  • SD_REINTRODUCE
  • SD_RESIDENT
  • SD_SEALED
  • SD_SPECIALIZE
  • SD_STATIC
  • SD_STDCALL
  • SD_STORED
  • SD_STRICT
  • SD_VIRTUAL
  • SD_WRITE
  • SD_DEPRECATED
  • SD_SAFECALL
  • SD_PLATFORM
  • SD_VARARGS
  • SD_FINAL
  • SD_UNIMPLEMENTED

Source: source/component/PasDoc_Tokenizer.pas (line 149).

TStandardDirectives = set of TStandardDirective;

This item has no description.

Source: source/component/PasDoc_Tokenizer.pas (line 206).

TSymbolType = (...);

enumeration type that provides all types of symbols; each symbol's name starts with SYM_

Values
  • SYM_PLUS
  • SYM_MINUS
  • SYM_ASTERISK
  • SYM_SLASH
  • SYM_EQUAL
  • SYM_LESS_THAN
  • SYM_LESS_THAN_EQUAL
  • SYM_GREATER_THAN
  • SYM_GREATER_THAN_EQUAL
  • SYM_LEFT_BRACKET
  • SYM_RIGHT_BRACKET
  • SYM_COMMA
  • SYM_LEFT_PARENTHESIS
  • SYM_RIGHT_PARENTHESIS
  • SYM_COLON
  • SYM_SEMICOLON
  • SYM_DEREFERENCE
  • SYM_PERIOD
  • SYM_AT
  • SYM_DOLLAR
  • SYM_ASSIGN
  • SYM_RANGE
  • SYM_POWER
  • SYM_BACKSLASH: SYM_BACKSLASH may occur when writing char constant "ˆ\", see ../../tests/ok_caret_character.pas
  • SYM_DOUBLE_QUOTE: May occur in assembler blocks, see ../../tests/ok_asm.pas

Source: source/component/PasDoc_Tokenizer.pas (line 226).

TTokenList = specialize TObjectList<TToken>;

This item has no description.

Source: source/component/PasDoc_Tokenizer.pas (line 333).

Constants

TOKEN_TYPE_NAMES: array[TTokenType] of string = ( 'whitespace', 'comment ((**)-style)', 'comment ({}-style)', 'comment (///-style)', 'comment (//-style)', 'identifier', 'number', 'string', 'symbol', 'directive', 'reserved word', 'AT&T assembler register name', 'double-quoted string');

Names of the token types. All start with lower letter. They should somehow describe (in a few short words) given TTokenType.

Source: source/component/PasDoc_Tokenizer.pas (line 212).

TokenCommentTypes: set of TTokenType = [ TOK_COMMENT_PAS, TOK_COMMENT_EXT, TOK_COMMENT_HELPINSIGHT, TOK_COMMENT_CSTYLE ];

This item has no description.

Source: source/component/PasDoc_Tokenizer.pas (line 218).

SymbolNames: array[TSymbolType] of string = ( '+', '-', '*', '/', '=', '<', '<=', '>', '>=', '[', ']', ',', '(', ')', ':', ';', 'ˆ', '.', '@', '$', ':=', '..', '**', '\', '"' );

Symbols as strings. They can be useful to have some mapping TSymbolType -> string, but remember that actually some symbols in tokenizer have multiple possible representations, e.g. "right bracket" is usually given as "]" but can also be written as ".)".

Source: source/component/PasDoc_Tokenizer.pas (line 244).

KeyWordArray: array[Low(TKeyword)..High(TKeyword)] of string = ('x', 'AND', 'ARRAY', 'AS', 'ASM', 'BEGIN', 'CASE', 'CLASS', 'OBJCCLASS', 'CONST', 'CONSTRUCTOR', 'DESTRUCTOR', 'DISPINTERFACE', 'DIV', 'DO', 'DOWNTO', 'ELSE', 'END', 'EXCEPT', 'EXPORTS', 'FILE', 'FINALIZATION', 'FINALLY', 'FOR', 'FUNCTION', 'GOTO', 'IF', 'IMPLEMENTATION', 'IN', 'INHERITED', 'INITIALIZATION', 'INLINE', 'INTERFACE', 'IS', 'LABEL', 'LIBRARY', 'MOD', 'NIL', 'NOT', 'OBJECT', 'OF', 'OR', 'PACKED', 'PROCEDURE', 'PROGRAM', 'PROPERTY', 'RAISE', 'RECORD', 'REPEAT', 'RESOURCESTRING', 'SET', 'SHL', 'SHR', 'STRING', 'THEN', 'THREADVAR', 'TO', 'TRY', 'TYPE', 'UNIT', 'UNTIL', 'USES', 'VAR', 'WHILE', 'WITH', 'XOR');

all Object Pascal keywords

Source: source/component/PasDoc_Tokenizer.pas (line 461).

StandardDirectiveArray: array[Low(TStandardDirective)..High(TStandardDirective)] of PChar = ('x', 'ABSOLUTE', 'ABSTRACT', 'APIENTRY', 'ASSEMBLER', 'AUTOMATED', 'CDECL', 'CVAR', 'DEFAULT', 'DISPID', 'DYNAMIC', 'EXPERIMENTAL', 'EXPORT', 'EXTERNAL', 'FAR', 'FORWARD', 'GENERIC', 'HELPER', 'INDEX', 'INLINE', 'MESSAGE', 'NAME', 'NEAR', 'NODEFAULT', 'NORETURN', 'ON', 'OPERATOR', 'OUT', 'OVERLOAD', 'OVERRIDE', 'PASCAL', 'PRIVATE', 'PROTECTED', 'PUBLIC', 'PUBLISHED', 'READ', 'REFERENCE', 'REGISTER', 'REINTRODUCE', 'RESIDENT', 'SEALED', 'SPECIALIZE', 'STATIC', 'STDCALL', 'STORED', 'STRICT', 'VIRTUAL', 'WRITE', 'DEPRECATED', 'SAFECALL', 'PLATFORM', 'VARARGS', 'FINAL', 'UNIMPLEMENTED');

Object Pascal directives

Source: source/component/PasDoc_Tokenizer.pas (line 475).

Authors


Generated by PasDoc 1.0.2.