Appendix: Internationalization Considerations

Unicode provides a mechanism for signaling direction within a string (see Unicode Bidirectional Algorithm UAX9), however, when a string has an overall base direction which cannot be determined by the beginning of the string, an external indicator is required, such as the HTML dir attribute, which currently has no counterpart for Datalog literals.

Until a more comprehensive solution can be addressed in a future version of this specification, programmers should consider this issue when representing strings where the base direction of the string cannot otherwise be correctly inferred based on the content of the string. See string-meta for a discussion best practices for identifying language and base direction for strings used on the Web.

Identifier Characters

The characters allowed in labeling relations, attributes, facts, atoms, and variables are taken from a broad set of Unicode by category rather than by codepoint range. The following are the definitions of the lexical productions, showing their relation to defined categories:

SPACE_SEP
        ::= ? corresponds to the Unicode category 'Zs' ? ;
LC_ALPHA
        ::= ? corresponds to the Unicode category 'Ll' ? ;
UC_ALPHA
        ::= ? corresponds to the Unicode category 'Lu' ? ;
TC_ALPHA
        ::= ? corresponds to the Unicode category 'Lt' ? ;
ALPHA   ::= LC_ALPHA | UC_ALPHA | TC_ALPHA ;

The following productions are the key identifier-like values, built almost entirely from the character productions above with the addition of the single “_” underscore.

predicate
        ::= LC_ALPHA  ( ALPHA | DIGIT | "_" )* ;
named-variable
        ::= UC_ALPHA  ( ALPHA | DIGIT | "_" )* ;
identifier-string
        ::= predicate ( ":" ALPHA ( ALPHA | DIGIT | "_" )* )? ;

Example

The following is a perfectly valid version of the Socratic syllogism from the example in § Program.

ανθρώπινο("Σωκράτης").

θνητός(Χ) :- ανθρώπινο(Χ).

?- θνητός("Σωκράτης").

Numerical Values

Numerical values use the following lexical production that allows a number of language representations of the digits 0 to 9. For example, “123”, “١٢٣” (Arabic-Indic), or “१२३” (Devangari).

DIGIT   ::= ? corresponds to the Unicode category 'Nd' (decimal number) ? ;

The only exception to this broad inclusive approach is the definition of the HEXDIGIT production where the Anglo-centric approach is commonly understood and alternate forms may cause confusion.

HEXDIGIT
        ::= [0-9a-fA-F] ;

Unicode Operators

The following operators, or syntax symbols, are defined in this specification. Where Unicode symbols are defined for the symbol these are described by codepoint value and assigned name.

SymbolPrimaryAlternateUnicodeCodepointName
Material implication:-<-U+E28690LEFTWARDS ARROW
Conjunction,&, ANDU+E288A7LOGICAL AND
True (boolean)trueU+E28AA4DOWN TACK
False (boolean)falseU+E28AA5UP TACK
Logical negation!NOTU+EFBFA2FULLWIDTH NOT SIGN
Disjunction;|, ORU+E2BBA8LOGICAL OR
TautologyN/AU+E28AA4DOWN TACK
AbsurdityU+E28AA5UP TACK
Equal=U+3DEQUALS SIGN
Not equal!=/=U+E289A0NOT EQUAL TO
Less Than<U+3CLESS-THAN SIGN
Less than, or equal<=U+E289A4LESS-THAN OR EQUAL TO
Greater than>U+3EGREATER-THAN SIGN
Greater than, or equal>=U+E289A5GREATER-THAN OR EQUAL TO
String match*=MATCHU+E2899BSTAR EQUALS
Functional dependency-->U+E29FB6LONG RIGHTWARDS ARROW

The following table describes the symbols introduced by different Datalog languages. Note that only the symbols material implication, conjunction, true, and false, are required by the core language $\small\text{Datalog}$.

LanguageIntroducesSymbols
$\small\text{Datalog}^{\lnot}$negation of literals in rule bodies!, NOT,
$\small\text{Datalog}^{\lor}$disjunction in rule heads;, |, OR,
$\small\text{Datalog}^{\Leftarrow}$rules as constraints, i.e. no body
$\small\text{Datalog}^{\Gamma}$typed attributes for relationsN/A
$\small\text{Datalog}^{\theta}$arithmetic literals in rule bodies=, !=, , <, <=, , >, >=, , *=, ≛`
$\small\text{Datalog}^{\rightarrow}$functional dependency processing instruction-->,