Appendix: Internationalization Considerations
Unicode provides a mechanism for signaling direction within a string (see Unicode Bidirectional Algorithm UAX9), however, when a string has an overall base direction which cannot be determined by the beginning of the string, an external indicator is required, such as the HTML dir
attribute, which currently has no counterpart for Datalog literals.
Until a more comprehensive solution can be addressed in a future version of this specification, programmers should consider this issue when representing strings where the base direction of the string cannot otherwise be correctly inferred based on the content of the string. See string-meta for a discussion best practices for identifying language and base direction for strings used on the Web.
Identifier Characters
The characters allowed in labeling relations, attributes, facts, atoms, and variables are taken from a broad set of Unicode by category rather than by codepoint range. The following are the definitions of the lexical productions, showing their relation to defined categories:
SPACE_SEP
::= ? corresponds to the Unicode category 'Zs' ? ;
LC_ALPHA
::= ? corresponds to the Unicode category 'Ll' ? ;
UC_ALPHA
::= ? corresponds to the Unicode category 'Lu' ? ;
TC_ALPHA
::= ? corresponds to the Unicode category 'Lt' ? ;
ALPHA ::= LC_ALPHA | UC_ALPHA | TC_ALPHA ;
The following productions are the key identifier-like values, built almost entirely from the character productions above with the addition of the single “_” underscore.
predicate
::= LC_ALPHA ( ALPHA | DIGIT | "_" )* ;
named-variable
::= UC_ALPHA ( ALPHA | DIGIT | "_" )* ;
identifier-string
::= predicate ( ":" ALPHA ( ALPHA | DIGIT | "_" )* )? ;
Example
The following is a perfectly valid version of the Socratic syllogism from the example in § Program.
ανθρώπινο("Σωκράτης").
θνητός(Χ) :- ανθρώπινο(Χ).
?- θνητός("Σωκράτης").
Numerical Values
Numerical values use the following lexical production that allows a number of language representations of the digits 0 to 9. For example, “123”, “١٢٣” (Arabic-Indic), or “१२३” (Devangari).
DIGIT ::= ? corresponds to the Unicode category 'Nd' (decimal number) ? ;
The only exception to this broad inclusive approach is the definition of the HEXDIGIT
production where the Anglo-centric approach is commonly understood and alternate forms may cause confusion.
HEXDIGIT
::= [0-9a-fA-F] ;
Unicode Operators
The following operators, or syntax symbols, are defined in this specification. Where Unicode symbols are defined for the symbol these are described by codepoint value and assigned name.
Symbol | Primary | Alternate | Unicode | Codepoint | Name |
---|---|---|---|---|---|
Material implication | :- | <- | ← | U+E28690 | LEFTWARDS ARROW |
Conjunction | , | & , AND | ∧ | U+E288A7 | LOGICAL AND |
True (boolean) | true | ⊤ | U+E28AA4 | DOWN TACK | |
False (boolean) | false | ⊥ | U+E28AA5 | UP TACK | |
Logical negation | ! | NOT | ¬ | U+EFBFA2 | FULLWIDTH NOT SIGN |
Disjunction | ; | | , OR | ∨ | U+E2BBA8 | LOGICAL OR |
Tautology | N/A | ⊤ | U+E28AA4 | DOWN TACK | |
Absurdity | ⊥ | U+E28AA5 | UP TACK | ||
Equal | = | U+3D | EQUALS SIGN | ||
Not equal | != | /= | ≠ | U+E289A0 | NOT EQUAL TO |
Less Than | < | U+3C | LESS-THAN SIGN | ||
Less than, or equal | <= | ≤ | U+E289A4 | LESS-THAN OR EQUAL TO | |
Greater than | > | U+3E | GREATER-THAN SIGN | ||
Greater than, or equal | >= | ≥ | U+E289A5 | GREATER-THAN OR EQUAL TO | |
String match | *= | MATCH | ≛ | U+E2899B | STAR EQUALS |
Functional dependency | --> | ⟶ | U+E29FB6 | LONG RIGHTWARDS ARROW |
The following table describes the symbols introduced by different Datalog languages. Note that only the symbols material implication, conjunction, true, and false, are required by the core language $\small\text{Datalog}$.
Language | Introduces | Symbols |
---|---|---|
$\small\text{Datalog}^{\lnot}$ | negation of literals in rule bodies | ! , NOT , ¬ |
$\small\text{Datalog}^{\lor}$ | disjunction in rule heads | ; , | , OR , ∨ |
$\small\text{Datalog}^{\Leftarrow}$ | rules as constraints, i.e. no body | ⊤ |
$\small\text{Datalog}^{\Gamma}$ | typed attributes for relations | N/A |
$\small\text{Datalog}^{\theta}$ | arithmetic literals in rule bodies | = , != , ≠ , <, <=, ≤, >, >=, ≥, *=, ≛` |
$\small\text{Datalog}^{\rightarrow}$ | functional dependency processing instruction | --> , ⟶ |