Inhalt
Topic:.ZBNF_syntaxDescription.
pStyle=std tableStyle=stdTable
|
Written by Hartmut Schorrig, www.vishia.org. Latest edition 2009-04-11 |
|
Hint: I need somebodies help! This text is written originally in german, but translated by myself. It is possible, that there are some mistakes in grammer, because my english isn't so far. The content should be correct. If anybody may correct the text, please send me a short information and later the corrected text: hartmut.schorrig@vishia.de. I will mention your name as corrector in the text. You may take the original source of the text in thisDownload:_../../../examples_XML/DocuGenerationViaXML/docuSrc/Zbnf.syntax_en.topic and correct this text. The source is written in a style adquate Wikipedia-editing-style with some extra enhancements. |
.
The BNF, Backus-Naur-Format -->Wikipedia is created with the development if the computer language Algol, at beginning of the 60-th of last century. It was an important milestone of software technology. BNF allows first time to describe exactly the syntax of proramming lanuages.
BNF was developed in the future, at example Niclaus Wirth from the Zurich University creates and uses EBNF for his programming language PASCAL -->Wikipedia EBNF. Also known are syntactical expressions adequate to BNF at example for syntax of command line arguments. Typical are the option brackets: [...].It is state of the art.
The BNF is not fully standardizised. Several variants are used. Mostly BNF-like explainations are used for documentation. The italic style of text is used to signalize keywords. Such a documentation may be well able to read by human, but it isn't proper for computer driven evaluation. For automatically processing the semantic of the parts of syntactical constructs are important. Using the semantic of expressions, it content should be recognized and processed.
The ZBNF enhances the BNF with semantic aspects and some possibilities of syntax constructs.
Topic:.ZBNF_syntaxDescription..
pStyle=std tableStyle=stdTable
A ZBNF-syntax-script may be given either in a text file or as a String in a Java program. A ZBNF-syntax-script may have a
head, which also defines the used encoding in the file. Than some control settings, starting with a $ may be followed, see Chapter: 10 All control variables. After them some syntax-definitions follows. The first syntax-definition is used to parse the input text, the other syntax-definitions
are sub-syntax-definitions. Some explainations of semantic parts may be contained between any syntax-definitions, see Chapter: 7.5 Explaination of the semantic in the ZBNF-script as help. The syntax-script may contain comment lines.
The following wording is used:
syntax-script is the whole script, which contains maybe the title line with encoding, some variables and some syntax-definitions.
syntax-definition means a definition of a syntax, including the identifier of the definition and the prescript.
syntax-prescript is a String-sequence, which defines any syntax in ZBNF. A syntax-prescript is the right part of a syntax-definition after the ::= or maybe a part of them.
A syntax-definition is written as following:
.syntaxident::=syntaxPrescript.
Thereby syntaxident is the identifier of the syntax-definition, used as symbol for calling that syntax in other syntax-prescripts as component.
syntaxPrescript is the definition of this syntax itself. The dot marks the end. In ZBNF the syntax would defined like following:
Syntaxdefinition::=<$?syntaxident>::=<syntaxPrescript>\..
It means:A syntax-definition consist of the identifier, semanticly named as syntaxident, following with ::= without any white spaces. Than a not here defined expression syntaxPrescript followes. At end a dot should be written. The dot is coded with \. The dot aftet the \. marks the end of the definition.
A ZBNF-syntax-script may import another script, using the $import-control variable. Than some syntax-definitions from the imported script can be used. It is a important form of reusing of
syntax definitions.
An simple example for a ZBNF-Script will be given as following::
<?ZBNF-www.vishia.org version="1.0" encoding="iso-8859-1"?>
$setLinemode.
shopping-list::=shopping \\n
<![=]*?> \\n
{ <position> \\n }.
position::=<#?@amount> [<?@unit>peaces|x|] <$?text()>.
In this file at beginning the encoding is defined. Than using of the line mode is defined. The first ZBNF-syntax-definition
shopping-list is the main-definition. There the parsing begins. The syntax requires a line shopping, than a line with ==========, than some <position> any in one line. The second syntax-definition is for <position>, it is a syntax-component.
Writing and printing rules in the following explaination: In the description below some syntax-prescripts are shown in ZBNF itself (pure ASCII), but a better readable form is used in the printed text:
terminal symbol [ ] < > = : Terminal symbols are written in a monospaced font. The special syntax control characters []{}?. used as terminal symbols are written also immediately in this form (without circumscription with \ in a ZNF-script).
monospacedItalic: At that position any identifier should be written in a syntax-prescript. In ZBNF it is defined as <$?semantic>.
italic: It means in ZBNF <component> or any partial syntax-prescript. At some positions ... are written as wildcard for a partial syntax, which isn't significant
in the given context. A special semantic aspect <syntax?semantic> can't be shown in this form. But the semantic is declared in the explaination text or it should be self-declaring. Also a
special syntax aspect doesn't may be shown in a formal kind.
[ option ] { repetition }. : The syntax control symbols are written in the standard paragraph font.
Some pattern of ZBNF using are shown as examples. Than the monospaced font is used without special character fonts.
Topic:.ZBNF_syntaxDescription.semanticAspect.
pStyle=std tableStyle=stdTable
The "Z" in "ZBNF" is a reverse "S" for semantic. The semantic aspect isn't respect sufficient in the orignal BNF and its variants. If you write:
variabledefinition::= <identifier> <identifier> ; . identifier ::= alphachar [ digit | identifier]. alphachar = A|B|C|...
than the syntax is defined exactly. But the meaning of the first identifier, is it a type ?, and the second one, a variable name?, is unknown yet. A verbal explaination is needed additionally . The
same situaltion is in the presentation of some command line calls like:
XCOPY Quelle [Ziel] [/A | /M] [/D[:Datum]] [/P] [/S [/E]] [/V] [/W]
[/C] [/I] [/Q] [/F] [/L] [/G] [/H] [/R] [/T] [/U]
[/K] [/N] [/O] [/X] [/Y] [/-Y] [/Z]
[/EXCLUDE:Datei1[+Datei2][+Datei3]...]
Quelle Die zu kopierenden Dateien.
Ziel Position und/oder Name der neuen Dateien.
/A Kopiert nur Dateien mit gesetztem Archivattribut,
�dert das Attribut nicht.
/M Kopiert nur Dateien mit gesetztem Archivattribut,
setzt das Attribut nach dem Kopieren zurck.
(Sorry, its german, I have install Windows with german language.)
This example is the start of the content, which is kept typing help xcopy in Windows-XP (Microsoft). The meaning of the options are explained verbal. But after all with help of this BNF-like presentation
it is able to recognize, that the options /A and /M are excluded together etc.
For a computer-aided information processing verbal explainations aren't usefull, a complex programming is necessary to process the result of a parser.
In ZBNF, the syntax above can be written in form:
variabledefinition::= <identifier?type> <identifier?name> ; .
So the first identifier is explained as type, and the second as name in formal kind.
The idea of an association between the pure information-data with its meaning is a basic idea of XML. In XML a <tagnamen> is the semantic description, where the content of the element or an attribute is the pure information: <meaning>information<subtag>...subInfo</subtag></meaning> or <tag meaning="information">. Using this idea a computer-aided information processing is able to run, also if informations comes from older versions of
sources, from other providers with altered definitions and so on. The compatibilty of information interchanging is better
able to control.
The basic idea of binding a syntax with its semantic is a core idea of ZBNF. It enables the conversion of an any desired syntactical interpretable text to XML without additional programming effort, see Topic:.ZBNF2Xml.. The than following information processing can use the well known XML tool supports. It is possible to write:
Text x ZBNF =: XML
The reverse conversion:
XML x XSLT =: Text
is the known XSLT-techniques. The x means a processing or cross product.
Details see Semantic definitions.
Topic:.ZBNF_syntaxDescription..
pStyle=std tableStyle=stdTable
Terminal characters are that characters, which should be written in the input text in the given form. They are keywords of
text recognizing. In BNF the terminal text often should be written in quotions like "terminal", but in ZBNF it isn't so. Terminal characters will be notated immediately. But there is an conflict with the special characters,
which controls the syntax flow: [ ] | { } < > ? . To determine that characters as terminal character, it should be written with a backslash \ before. At example if a [ is necessary, it should be written as \[.
Special chars
The backslash \ is useable adequate in string-literals in Java and C/C++ as escape char for control characters: \n \r \b \t \f with the meaning of Newline (0x0a), Carrige Return (0x0d), Backspace (0x08), Tabulator (0x09), Formfeed (0x0c). Such terminal
characters are necessary often for format-separation. The backslash itself will be written as \\ There are some special escape sequences too:
\s is a whitespace-symbol in the line, but not a line feed.
\ backslash following by a simple space means simple space as terminal symbol.
\e means end of text. If such an terminal symbol is required, the input text should be ended at this position.
Any Unicode-Characger can be coded with \uxxxx where xxxx is the 4-digit hexa code of UFT16-table. This kind of represention of terminal symbols allows require of characters additional
and outside of the encoding of the script.
The encoding of the terminal characters used in a ZBNF-script-file are able to define, see Character enconding.
Topic:.ZBNF_syntaxDescription..
pStyle=std tableStyle=stdTable
A Whitespace is a text part, which produces space without characters in a print output. Spaces, Tabulator, line-feed and its sequence are white spaces
Topic:.ZBNF_syntaxDescription...
pStyle=std tableStyle=stdTable
The syntax definition itself can be written in a free format with whitespaces. Normally if a whitespace is written in the syntax script, also a whitespace in the input text is allowed.
Using a control setting $Whitespaces=Whitespaces. in the ZBNF-Script outside a syntax definition (should be noted at start of syntax script) it can be defined, which characters
are whitepaces. Default they are \ \t\r\n. At example a \t can exclude as whitespace, because it has a special meaning in the text. Than it should be written:
$Whitespaces=\ \r\n.
Topic:.ZBNF_syntaxDescription...
pStyle=std tableStyle=stdTable
With a additinal <$NoWhiteSpaces> at beginning of a syntax definition there can be defined, that whitespaces in the syntax prescript don't allow whitespaces
in the input text. A syntax-definition should be written in form:
Syntaxdefinition::=syntaxident::=[<$semanticOftheDefinition>][<$NoWhiteSpaces>] syntaxprescript..
The <$NoWhiteSpaces> should be written after an optional <?semantic>, without whitespaces between.
If spaces or whitespaces are necessary in the syntax, it should be written as terminal characters like \s or \ , \t and so on or using Regular Expressions.
An example is the syntax prescript to parse a #define NAME xxx... in C or C++. In zbnfjax/zbnf/Cheader.zbnf there is the following part:
defineDefinition::=<$NoWhiteSpaces> <$?@name> [ ( { <$?parameter/@name> ? , } ) ]
<![ \t]*?> [ <#-?intvalue>
| 0x<#x?hexvalue>
| <""?stringvalue>
|]
<![ \t]*?>
{ <*|\n|\\|\r\n?value>
? \\[\r]\n
}.
The problem is: The define may be wrapped at end of line using a \, but the text at the next line have the same meaning like without \ end line-wrapping. The syntax-prescript parses first the <$?name> as identifier. The optional following parameter names should be written without spaces. Than a whitespaces is admissible,
written in syntax using the Regular Expression <![ \t]*?>. Than either an integer-value or String in some variants are accepted, it is a mainstream use case of define. After them
all other characters are captured until end of line or \. It is stored as <...?value>. An \ followed immediately by \n optional with a \r means, the next line is a next <...?value> of this define. The using of the parser result may concatenate this <...?value> to get the whole expression.
Topic:.ZBNF_syntaxDescription...
pStyle=std tableStyle=stdTable
The syntax prescript text allowes a line and comment, started with two ## outside of a < ... > and outside of a prescription with \#. A single # is a normal terminal symbol. If two ## are necessary as terminal symbol, you should write \#\#.
White spaces between the syntax prescripts inside the ZBNF-script are ignored. Also comments are ignored. That comments or whitespaces haven't any meaning in opposite to the meaning of whitespaces inside a script.
Topic:.ZBNF_syntaxDescription...
pStyle=std tableStyle=stdTable
Generally, a comment is regocnized like a whitespace. But it is possible to test comment constructs at some positions, though on other positions there are skipped. There are some rules:
Without any other decision a text in /* ... */ and a text after // until end of line is a comment. But using the control variable $comment=<*|\.\.\.?startCommentString>\.\.\.<*\.?endCommentString>. it is possible to set other characters instead. The default decision should be written as
$CommentString=/*...*/.
Another variant will be
$comment=(?...?).
With the control variable $endlineComment=<*.?$endlineComment>. the start characters of a endline comment are determined. The default decision should be written as
$endlineComment=//.
Another variant will be
$endlineComment=#.
With the control variable
$setLineMode.
it is set, that a \n and \r are not recognized as whitespace. It is a decision, if a input text is not a free format but line-oriented. Whitespaces in
the line are recognized, but may be supressed using the <$NoWhiteSpaces> in the syntax prescript.
In generally it will be tested at first while parsing, whether the input text matches to the current terminal symbols. After
them comments are skipped. It means, if the start character of a comment are matching to the current terminal symbol, the
parse process recognizes it as parsed input, and the following text is parsed after them. At example comments are parsed in
the Cheader.zbnf with the sequence:[ /**<description>*/], where it is defined:
description::= <*{ * }|*/?!test_description>.
It means, all characters until */. So a description started with /** is processed as a parsing result.
The explicitely terminal symnols \ , \n or [\]\n as combination of 0x0d 0x0a or only 0x0a and a \s are useable too.
Topic:.ZBNF_syntaxDescription.syntaxControl.
pStyle=std tableStyle=stdTable
.
Topic:.ZBNF_syntaxDescription.syntaxControl..
pStyle=std tableStyle=stdTable
The alternative was the only one control of the old BNF in the 60-th. The BNF uses only alternatives and recursion. Yet the alternative isn't necessary to define
DigitNotZero::=1|2|3|4|5|6|7|8|9. Digit::=0 | <ZifferUngleichNull>. Digitsequence::= <Ziffer> | <Ziffer><Ziffernfolge>. positiveNumber::= <ZifferUngleichNull><Ziffernfolge>.
which is shown in some examples of education. Process a digit is better done with a fix programmed algorithm. Therefore the
syntax special construct <#?Number> or <$?indentifier> is available in ZBNF.
The alternative is usefull for better things like:
Customer::=<Consumer>|<BusinessClient>.
where Consumer or BusinessClient may be a complexly syntactical construct. Also terminal symbols are usefull in alternatives:
title::= Mr\\. | Miss | Mrs\\. .
Any syntax prescript (right side of a syntax definition after ::= may be al alternative.
componentidentifer::= alternative1 | alternative2 | alternative3.
If alternatives are necessary as part of syntax prescript, it should be written in square-brackets as option, where at least
one shoule be matched if no |] is written on end or [| is written at begin:
...[ alternative1 | alternative2 | alternative3 ]...
or it may be assigned in the forward or backward part of a repition:
...{ alternative1 | alternative2 | alternative3 ? backalternative1 | backalternative2 } ...
Topic:.ZBNF_syntaxDescription.syntaxControl.options.
pStyle=std tableStyle=stdTable
.
Topic:.ZBNF_syntaxDescription.syntaxControl.options..
pStyle=std tableStyle=stdTable
The simple option is designated with sqare brackets, like in the old BNF: [ ] The general meaning, also in ZBNF, is: This is optional, it may be matched, or not. This contract is known also in all syntax
descriptions.
It is possible, that such an option is able between terminal symbols. At example in a report file the word telegram is written without the second e: telgram. It was an mistake, later versions writes telegramm. Now the parsing of older and newer reports should detect both variants. Therefore the terminal syntax is written as tel[e]gram.
Inside the square brackets of the option any possible syntax prescript is possible. This part of syntax may get a special
semantic designation. It should be written: [<?semantik> syntaxprescript ]. If the option matches in syntax processing, a parse result with the given semantic is produced than.
Topic:.ZBNF_syntaxDescription.syntaxControl.options..
pStyle=std tableStyle=stdTable
It is a combination of the option writing with some alternatives. In this notation at least one option should be matched. The square bracket of the option doesn't mean, it is optional all in all, rather it is a obligate to use one of it.
Topic:.ZBNF_syntaxDescription.syntaxControl.options..
pStyle=std tableStyle=stdTable
If at least is written |], (without spaces), it is an empty choice. It means, that if no alternative matches, it is okay also. If no alternative matches,
no parse result is produced for that alternatives and for the whole option. If it is written
[<?SomeChoices> Alternative3a | Alternative3b |]
no parse result named SomeChoices is produced if no alternative matches.
Topic:.ZBNF_syntaxDescription.syntaxControl.options..
pStyle=std tableStyle=stdTable
If it is written [|... the parser tests first, whether the syntax matches in the sequence after the option. Only if it doesn't match, the option
is tested.That may be advisable in a construct like
[|-<?negative>] <value>
The <value> may mean a number, also negative. defined as value::=<#-?number>|<$?ident> if the input text contains a negativ number like -123, it will be matched as a number itself. The Semantic negative is not necessary and not produced. But if it is an <$?ident>, the negative sign should be parsed independent. Another example is:
coplexlyNumberString::=[|<#?leftPart>\\.][|<#?middlePart>\\.]<#?rightPart>.
The parser should be detect a middlePart and a rightPart if the number only contains one dot, but not a leftPart and middlePart. If at example the input text contains 123.456, at first 123 would be parsed as right part. But because the dot after it doesn't match the following syntax, the parser starts at the
middle part and matches.
With this construct the principle of right-alligned parsing is possible to use.
Topic:.ZBNF_syntaxDescription.syntaxControl.options.requiredNegative.
pStyle=std tableStyle=stdTable
It it is written
[? syntax ]
than the syntax does not match, if the syntax in the option bracket matched. Typically it is useable at example for repetitions and its break:
Example::={ [?;] <*;?text> ; } ;.
A text ending with semicolen matches any time. But two semicolons one after another should be tem termination of this sequence.
Without using this possibility the second semicolon would be recognized as a text, and the following input are confuse. With
this notation the second semicolon is detect as not a repitition, the continuance detect the second semicolon after the repition, and the following input may matched.
Topic:.ZBNF_syntaxDescription.syntaxControl.options.requiredNegative.
pStyle=std tableStyle=stdTable
It it is written
[! syntax ]
than the syntax is tested but not processed. Such tests may be required if an input should be pre-tested but processed in following syntax constructs. At example
Example::= [!;|:|+] <nextPart> | <somewhatElse>.
The problem may be, a next part starts with the shown characters. The characters have to be parsed as part of the <nextPart>. But the decision that it is a <nextPart> is arrived before entry in this test.
Topic:.ZBNF_syntaxDescription.syntaxControl.options.expectedVariant.
pStyle=std tableStyle=stdTable
If it is written
[> syntax ]
it means, that the syntax inside the square brackets should be matched, Otherwise not only the parsing of that bough is failed, but the whole parsing process is aborted. This construct should be used only in syntactical environment where previous checks determine, that the following syntax have to be matched. At example:
[/**[><description>*/]]
The /** ... */ as whole construct is optional. But if a /** is detect, the next part of the syntax bough <description> should be matched unconditionally.
Another examlpe is:
[ condition: [><?which>A|B|C]] ]
The keyword condition: is optional. But if it is detect, A or B or C have to be matched. In some cases the parsing process is terminated with an syntax error anyway, but before the knock out
is detected, all other variants are tested. It needs calculation time. But in the case if the tested part is in a part of
text, which is also detectable as comment, it is a important feature.
Topic:.ZBNF_syntaxDescription.syntaxControl..
pStyle=std tableStyle=stdTable
In the originally BNF from the 60-th, the repetition wasn't defined. For repetition constructs, the recursion was used. In the http://www.en.wikipedia.org/wiki/EBNF the repetition was established now with the brace.
In ZBNF at least one match is obligate. If no pass should be also okay, it should be written as:
[{...}]
The second distiction to EBNF is: A backward bough may be defined if necessary. It is defined starting with a ? with the back syntax until }. If the back syntax matches, the repition is obligate. It is a frequently situation in praxis. At example betwenn enumerations
and some other cycle of parts a special character like a comma is written. But the comma isn't written at the end of the sequence.
If a comma is detect, a next cycle should be start. An example is shown above already, the parameter of a define in C/C++
are comma-separated:
defineDefinition::= ... <$?@name> [ ( { <$?parameter/@name> ? , } ) ]
In this case the whole prescript in (...) is optional. But if a (...) is used, at least one parameter name should be present between the parenthesis. If a comma is written, a next identifier
have to be following.
Because to forward and backward bough of a repetition is a syntax prescript too, the alternatives can be used just all other syntax control constructs. It may be typical to write at example:
{ <variantA> | <varianteB> ? [<?delimiter> + | , ] | : <?specialDelimiter> }
or construct nested repetitions.
Topic:.ZBNF_syntaxDescription.ZbnfComponent.
pStyle=std tableStyle=stdTable
Syntax-components are complex syntax prescripts in generally. It may be written as extra syntax definitions. At least the
clarity may be increased using that. Syntax components was able to define also in the old BNF. It was denoted as meta-morpheme, or may be non-terminal as opposite to terminal symbols. It is the view of syntactical definition. But from view of the semantic, it is better to denote as component. It is a part of the syntax, which use a <component> and it is defined in an own syntax prescript. The parse result builds a node at this point, which contains the whole component
with an own parse result bough. It builds a tree.
The notation with angle brackets was used also in the old BNF. Another frequently used notification is writing in italic script. But this is not able to use in technical (ASCII) text formats, only able and well to use in a manually explainations with printed texts. The third used form is the notation as a simple identifier. But it is only possible, if terminal characters are writen in quotions. It is the choice in EBNF.
In ZBNF the simple expression of requiring a syntax component is writen as
.<SyntaxIdentifier>
The Syntax of the component now should be defined anywhere in the syntax script in form
.SyntaxIdentifier::=SyntaxPrescript.
In the ZBNF not only the syntax of a component is relevant, but also its meaning for post-processing. It is the semantic. If the notation above is used, the semantic identification is indentically with the syntax identifier. But also a alternate semantic is able to declare. At example:
{bill-postal-address: <Address?BillAddress> | Supply-postal-adress: <Address?SupplyAdress>}
In both cases, for the postal-address of the bill and for the supply, the same syntax is used. But the meaning and the post processing of both addresses a different. Therefore the semantic is different. The semantic is used to identify the parsers result:
.<SyntaxIdentifer?SemanticIndentifier>
It is possible also to prevent a semantic for the component. In this case no extra component is produced as result, but the syntax is written in an extra definition. See Chapter: 7 Semantic-specification variants. It should be noted as
.<SyntaxIdentifier?>
The semantic may be more as an simple identifier. Especially producing an XML-expression some special cases are able too.
The technical implemtation accepted the whole character sequence until the > as semantic expression. But there are writing rules, see Chapter: 7.1 Semantic writing rules
Additionally there are some special characters after the ?, see Transformation of result and Inner syntax parsing.
Topic:.ZBNF_syntaxDescription.ZbnfComponent..
pStyle=std tableStyle=stdTable
The following notation is able in ZBNF (examples):
<#?number> <$?identifier> <*|*/?string> <![=]*?RegularExpression>
Numbers and Identifiers are able to process in hard-coded-software better as in a syntax description. Therefore the standard expressions are written in the showed forms. There are parsed hard-coded. Its syntax is well known and defined, it shouldn't be a part of the users syntax script.
The generally notation is <syntax-symbol?semantic> adequat to syntax components. Additionally a numer of chars may be defined. It should be noted withoud space after the < as positive number, at example:
<16*?cell>
This example means that any characters, but exactly 16, should be parsed and assigned to the semantic cell. It may be used to parse text in tables with fix column width.
The following notations are possible to use as <syntax-symbol?...>
|
syntax-symbol |
Explaination |
|
|
An identifier is expected, written like the known form in the programming languages Java, C: it should consisting of alphabetics
There is a possibility to exclude some identifiers from recognitions here: The syntax prescript may contain a keyword $keywords::=class|interface|super|new|return|if|else. The example shows, that the keyword of a programming language are excluded here. |
|
|
The identifier may contain but not start with any of the addChars. At example addChars may be |
|
|
It is the syntax of a positive integer number, consisting of the digits |
|
|
A negative sign before the digits is admissible. It is a optional negative number. |
|
|
It is a float number maybe with exponent in the Java- and C-standard notation. Internally a double value is stored. |
|
|
It is possible to parse a float number and store a multiplicated represenataion with the |
|
|
A hexadecimal number will expected. The hexa-digits |
|
|
All characters until ones of the given endchars will be accepted. The endchars may be at example Example:
Note: This construct may be in opposite to the key notes of the parsing process. It doesn't test the matching, but a non-matching
will be searched. It is possible, that a lot of text will be accepted unchecked, until the end character is found. At example
if a space is used:
Note: The sequences |
|
|
All characters until ones of the given endchars outside of a quotion will be accepted. The circumscription of a quotion mark with |
|
|
All characters until ones of the given char sequence str will be accepted. Before the terminating char sequences a character Example: |
|
|
All characters until ones of the given last char in endchars will be accepted. The searching process starts from the end of the whole parsing text. This special form is able to use especially, if an inner syntax with a short length is parsed, or a short-length input at
example an answer of a command line call. At example a file path is tested. The last testFilePath::=[<toLastChar:/\\?path>[/|\\]]<*?name>. |
|
|
Adequate to testFilePath::=[<toLastCharIncl:/\\?path>]<*?name>. |
|
|
A string in quotion marks will be expected. Inside the quotion a circumscription with The input doesn't match, if the input doesn't start with a |
|
|
Adequat variant of |
|
|
This is a special form of /**Example Comment of a method * It does this and that * @param x value. */ At example the syntax is Example Comment of a method It does this and that @param x value. This may be simple, no additional characters should be given. But programmers writes not exactly often. At example it is given: /** Comment to any method
* * List bullet
* this line is written right-shifted
this line is written left-shifted without the asterisk
*/
or tabulators are used. The parser stores the result proper too: Comment to any method * List bullet this line is written right-shifted this line is written left-shifted without the asterisk The characters space and |
|
|
This is a universal form to parse the current text with a given user method. The method should be found in a special Java-class assigned to the parser calling |
|
|
A regular expression are used to describe the syntax. The regular expression follows the definition of the java class |
Topic:.ZBNF_syntaxDescription.ZbnfComponent..
pStyle=std tableStyle=stdTable
Regular Expressions are a powerfull method to describe a syntax of any text. But the handling of complexly regex with well readable assignment to results isn't proper in any case. The additional using of regex in ZBNF maybe a good idea. The input text of a whole syntax part described with the given regular expression is stored as result.
The notation form of regular-expression-using in ZBNF is:
.<!Regex?Semantik>
Be aware that a backslash, often used in Regex, should be written twice: \\, because the syntax script use it also as circumscription character. See examples below. The examples are simple and explains
the priciples, the whole usability is explained otherwhere. It may/should be proper to use only simple constructs of regular
expression in ZBNF. The complexly constructs are not necessary because ZBNF has own implementations of adequate features.
|
Regex |
Meaning |
|
|
A simple dot is a place holder for any character. |
|
|
The asterisk doesn't mean any character like in ZBNF or wildcard in file pathes, it means any number of repetition of the characters left from it. In this case,
combination of dot and asterisk, any chararcter is accepted any time. But there should be a limit. In ZBNF it is able to write
a maximal number outside of the Regex, at ex. |
|
|
Like .*, but at least one character is necessary. |
|
|
One of the characters between [] are accepted. |
|
|
That is a useful combination: Any desired number of the characters between [] are accepted, also nothing. |
|
|
Ad�uat |
|
|
Character range: One of the characters form ..a.. to |
|
|
It is possible to combine more ranges of characters. This example means: all alphabetic characters. |
|
|
That means a word, consiting of upper and lower alphabetic characters. |
|
|
That means a word beginning with an upper case alphabetic, than lower case chars. |
|
|
That is the fix terminal string |
|
|
It means at ex. |
|
|
Now |
|
|
Any character which isn't a whitespace. Note: The backslash should be written twice in a ZBNF script. |
|
|
All characters until a whitespace. |
|
|
Any whitespace, inclusive Linefeed (Hexa 0a) und Carrige Return (Hexa 0d). |
|
|
All whitespaces, also no whitespace. |
|
|
All whitespaces, at least one whitespace. |
|
|
A word character: [a-zA-Z_0-9] |
|
|
A word consisting of at least one char. |
|
|
any characters outside a word. |
|
|
all characters outside a word until start of next word. |
Topic:.ZBNF_syntaxDescription.semantic.
pStyle=std tableStyle=stdTable
In generally, the semantic is written in syntax component expressions after an question mark:
.<syntax?semantic>
where the syntax may be a ZBNF-defined component:
.syntax::=....
or a standard syntax, presented in the chapter above, at example <#?semanticOfNumber>.
There are some specials:
<syntax> means a ZBNF-syntax component with same semantic identifier as syntax. It is a typical case, if syntax componentes are only singletons. Writing of <syntax?syntax> procudes the same result.
<syntax?> In this case there is no semantic assigned. If Zbnf2Xml is used, no own element is created associated to that ZBNF-syntax component. But the parse-result of the component is assigned
as child to the current result of the calling environment. It's an interesting special case. In some cases a complexly sub-syntay
may be defined in an extra syntax prescript, but this sub-syntax hasn't a meaning of a ZBNF-component, it is only a sub-syntax,
not also a ZBNF-component. At example:
structDefinition::=struct [<$?typetagident>] \\{ { <structContent?> } \\} <$?name>;.
classDefinition::=class [<$?typetagident>] \\{ { [ <classContent?> | <structContent?> ] } \\} <$?name>;.
structContent::=<?>
[ <unionDefinition>
| <structDefinition>
| <attribute>
| <defineDefinition>
| <structContentInsideCondition>
].
The inner content of structContent supplies ZBNF-components, but not the wrapper of this definition itself.
<?Semantic> Thereby no ZBNF-component is required, but a semantic entry is created if this bough of syntax is passed.
<syntax?!subSyntax> In this case no semantic is given, but the result of the ZBNF-Syntax-Component is evaluated with the given subSyntax. See Chapter: 8 Innerer Syntax of parsed text <syntax?!innerSyntax>.
<syntax?-semantic> The - means, that the semantic respectively the parse result of this component isn't assigned to the current result, but it is
stored, ... see next. The semantic may be the same as the syntax identifier, than <syntax?-?> can be written.
<syntax?+semantic> A stored result of a component is assigned additonally to this component. The semantic can be empty here, writing <syntax?+> or the semantic may be the same as the syntax identifier, writing <syntax?+?>. The second question mark replaces the same semantic ident. See Chapter: 9 Transformation of a semantic (parse result) to another ZBNF-component.
Topic:.ZBNF_syntaxDescription.semantic.semanticRules.
pStyle=std tableStyle=stdTable
A main area of application is the conversion of free but syntactical textes to XML. Therefore the semantic writing rules are
oriented first to use the conversion to XML. But the evaluation of the parsers result in free Java programming or using a
reflection based writer for Zbnf-results to Java-instances described in javadoc:_org/vishia/zbnf/ZbnfJavaOutput or Topic:.ZbnfJava.ZbnfJavaOutput. is able too. The ZbnfJavaOutput requires adequate fix rules for writing semantic like XML-output. The rules for both use cases are coordinated, so the same
syntax script can be used for both post-processing variants. A free Java programming may accept all writing forms of semantic,
but it should regard this same rules. A test output of any parse result done in XML maybe necessary or well useable in some
cases.
For XML-output the semantic in ZBNF determines the names of tags and attributes and contingently childs of the XML-tree. The structure of ZBNF-syntax-components determines the possible structure of the generated XML-tree. But the existence of concretly data in the parsed input determine whether or not a XML-element or a bough in the tree is created.
At example the following syntax:
syntax::= {<?set> <head> { <data> } } -end-.
head::= idx = <#?@index>.
data::= value = <#?value>.
with the following data:
idx=1 value=5 value=6 idx=123 value=7 value=23 -end-
creates the followed XML-tree:
<syntax>
<set>
<head index="1" />
<data><value>5</value></data>
<data><value>5</value></data>
</set>
<set>
<head index="2" />
<data><value>7</value></data>
<data><value>23</value></data>
</set>
</syntax>
This example demonstrates the basic idea of XML by the way. The source of data is shorter, but maybe no clearly structure
is cognizable. The XML tree contains a structure: a set with head and data. The semantic of the data is contained in the XML-text.
The semantic of any syntax element is written inside the <...?...> after the question mark or after the special designations ?!, ?+ or ?-, till the closing >.In ZBNF it is:
syntax_component_call::=\<<syntax>?[!|+|-][<semantic>]\>.
or better able to read with italic characters in the printing variant:
syntax_component_call ::= < syntax >? [ ! | + | - ] [ semantic ] > .
The following writing rules for semantic regarding the XML necessities are permitted:
|
semantic |
Rule |
|
|
Writing a simple ident, a new element in XML-Output with this given ident as tag name is created and added to the current
XML-element. The For Java-output using javadoc:_org/vishia/zbnf/ZbnfJavaOutput the The associated Zbnf-parse-result is written as textual value of the element, if it is a simple parse result. If the Zbnf-parse-result is a component, its content will expand as child of the created element. For Java-output the found field or method is set respecitively called with the Zbnf-parse-results value. Thereby the type
of parse result is regarded. If the parse result is a component, the field or return value of called |
|
|
Writing a For Java-output it is the same like |
|
|
A colon |
|
|
For XML-output an element with the tagname val1=<#?result/@val1>; val1=<#?result/@val2>; Both attributes { val= <#?result/@value> }
That case is better to write in the following form. In that form the repetition creates an new element {<?result> val=<#?@value> }
For Java-output an adequate behaviour is supposed: A field named with the name |
|
|
The kind of writing with slash on end is a special form of {<?result/> val=<#?@value> }
produces only one element like <component?tag> ..other syntax...[ <componentPart?tag/> ] In the input text matching to two divided parts in syntax prescript should be written in the same XML-Element respectively in the same Java-object. The behaviour on JavaOutput is adequat. |
|
semantic
|
Especially for the form |
|
semantic
|
This is a special form. First it is a signal to the Zbnf-parser, it should store also the parsed input text of a component.
Second it is a signal to store this input text as text of the element. Use the form |
|
semantic
|
This is a special form for Zbnf2Xml-Conversion. It means, that an expansion of formatation in the node's result text using
Wikistyle will be done. The Zbnf2Xml-Converter calls the method |
|
semantic
|
It is the universal method to postprocessing the parser's result to expand in XML childs. Matching to prepare an class should be named in the header of the ZBNF-script, which implements the interface:_org/vishia/zbnf/Zbnf2Xml.PrepareXmlNode. This feature is not available yet, planned in version 1.1 of ZBNF. |
Topic:.ZBNF_syntaxDescription.semantic..
pStyle=std tableStyle=stdTable
The square brackets of an option designates a sub-syntax maybe also alternatives. Writing
.[<?semantic>...]
without space after the left square bracket, the option as whole unit produces an own semantic-signated parse result. If the
option is not used, this semantic isn't produced: [<?sematic> realy-option] or if the emtpy bough is possible: [<?semantic> A | B |].
An adequat behavior is given on repetitions:
.{<?semantic> ...}
Each entry of the repetition produces a parse result with the given semantic. The conten of the repetition is stored as child of this ZBNF-component. At example
testRepetition::={ <head> : {<?dataBlock> <data?> | <info?> ? , } ; }.
head::= idx = <#?@index>.
info::= <""?@info>.
data::= <#f?@value>.
produces with the following data:
idx=5 : 7.34, 23, "text", 0.01; idx=0 : 34;
the following XML-tree (parse result):
<testRepetition > <head index="5" /> <dataBlock value="7.34" /> <dataBlock value="23.0" /> <dataBlock info="text" /> <dataBlock value="0.01" /> <head index="0" /> <dataBlock value="34.0" /> </testRepetition>
Topic:.ZBNF_syntaxDescription.semantic..
pStyle=std tableStyle=stdTable
A special case is writing
.[<?whichOption> a | b |]
If the alternatives hasn't an own semantic, the parsed text without leading and trailing whitespaces is stored as the result with that given semantic. It is an important feature. At example some assign operators are given in form:
assignOperator::=<?> [<?@assignOperator> = | += | -= | *= | /= | &= | \|= | \<\<= | \>\>= ] .
Thereby the parse result of testing the variants of operators are stored with semantic assignOperator, the parse result is the operator itself, at example -=. The <?> means, no semantic is stored to the ZBNF-component, it is clearly, because the semantic is produced in the option bracket
(next chapter).
Topic:.ZBNF_syntaxDescription.semantic..
pStyle=std tableStyle=stdTable
If a ZBNF-component is written like
.component::=<?semantic> ...
the following behaviour is present:
If the ZBNF-component is called writing <component>, the semantic isn't component, the semantic defined on the component is used.
If the ZBNF-component is called writing <component?specialSemantic>, the here given specialSemantic is used.
If the ZBNF-component is called writing <component?>, no semantic is produces for the whole component.
A special case of this is the form
.component::=<?> ...
It means, that the component hasn't an own semantic as default. But the same rules as shown above are valid. At example if
it is written <component?specialSemantic> this component has the given semantic. Only if a simple call is done: <component>, no semantic is produced. That form is able to write twice too: The call can be written as <component?>. Thereby it is shown both at calling position and at definition that no semantic should be used.
Topic:.ZBNF_syntaxDescription.semantic.semanticHelp.
pStyle=std tableStyle=stdTable
In the syntax script it is possible to explain a semantic with plain language. The explaination should be placed outside of syntax definitions, above or below. It should be written like:
?en:mySemantic::= "Explanation text. It may be more detailed. It is a helpness.". ?de:mySemantic::= "Erkl�ungstext, etwas umfangreicher als Hilfestellung".
It is able to specify it in several languages. The explaination should be written in quotation marks. A dot on end is necessary. The related semantic shouldn't be only an identifier of a ZBNF-component, it may be a semantic identifier inside a syntax definition too. But ist should be written as a path of semantic. At example:
definition::= <$?type> <$?name> [ = <#f?value> ] ?en:definition/value::="This value means this or that.".
This information isn't use in the parsing process. The parser accepts but ignores it while reading the syntax script. It is a documentaiton in the script only, able to read manually. But for future extensions, especially for ZBNF-based editors at example as Eclipse-plugin, this information may be used as context sensitive help while typing a part of input text, which is matching to the syntax.
Topic:.ZBNF_syntaxDescription.innerSyntax.
pStyle=std tableStyle=stdTable
Especially if an input text is parsed like <*endchars?...> or <*|endstr?...> or <""?...>, the string-result may be evaluate additionally with an inner syntax. The string is recognized first because it is matching to the outer syntax. But following, the inner syntax is tested with this provisional result. The result of that syntax test supplies the conclusive parse result.
The notation form is: .<...?!syntax>
There isn't a semantic after the exclamation mark, but the identifier of the inner syntax as a ZBNF-component. If the inner syntax isn't matched, the outer syntax isn't recognized as matching too. Another bough in the outer syntax is tested than.
The special construct [>...] for required positive test may be used to prescribe the matching of the inner syntax, see Topic:ZBNF_syntaxDescription.syntaxControl.options.requiredPositive.
Topic:.ZBNF_syntaxDescription.transformationOfResult.
pStyle=std tableStyle=stdTable
In some cases informations are placed outside of an syntactical construct, but this informations should be assigned to any inner syntactical content. The informations outside may be written only once, but they are duplicated in meaning. The style of writing a definition for some variables of equal types in C/C++ or Java is an example:
/**Description valid for all Variables. */ int a,b,c;
The variables a, b, and c are of type int all. The description should be assigned to all three variables too. The style of writing is some times shorter as:
/**Description valid for all Variables. */ int a; /**Description valid for all Variables. */ int b; /**Description valid for all Variables. */ int c;
The parsed result of ZBNF should be duplicate the commonly information, so that the short and the long variants are not differenced in result because there is no difference in meaning. The ZBNF knows a construct, written like:
attributeSyntax::= [/**<description?-?>*/] <type?-?> {<attributedef?+attribute> ?,};.
The example is part of Cheader.zbnf. The rule of writing is: If a ?- is written instead a simple ?, than the content of that ZBNF-component isn't added to the parse result, but it is temporary stored in a buffer associated
to the syntax prescript. If more as one of such construct is found, the informations are accumulated in this temporary store.
A writing style like ?-? means, that the semantic for storing is identical with the syntax identifier. It is a short way to write <ident?-ident>. Otherwise, ?-> means that the content of the ZBNF-component is stored but it hasn't an own semantic element. stored
The opposite is the construct with chars ?+. If this ZBNF-component is parsed successfully, the current temporary stored parse result of the syntax prescript is written
inside the component at begin of that. The temporary result is not cleared yet. If a second ?+ is found, the same content is stored also there. In the example the ?-?> is outside of the repetition brace, but the ?+...> may repeated some times, any repetition's result gets the same common content from ?-...>.
Another commonly presentated example may be following:
syntax::=<HeadInfo?-?> <AnotherInfo> [ <Variante1?+?> | <Variante2?+?> | <Variante3?+?> ]
In the XML presentation the HeadInfo should be part of the Variante... but the <AnotherInfo> should be outside. The textual content is written in the given form, simle and easy, without consideration of a XML representation.
The XML representation converted by Zbnf2Xml will be produced at example as:
<AnotherInfo>...</AnotherInfo> <Variante2><HeadInfo>...</HeadInfo> ... </Variante2>
Note: First this variant of temporary storing of parse results was developed for the improvement of calculation time while parsing. Instead writing
syntax::= [ <Variante1> | <Variante2> | <Variante3> ] Variante1::= <HeadInfo> <RestOfVariante1>. Variante2::= <HeadInfo> <RestOfVariante2>. Variante3::= <HeadInfo> <RestOfVariante3>.
the <HeadInfo> is parsed and recognized one time, after them the variants are tested. The long form repeats the detection of matching of
<HeadInfo> if the first <Variante1> doesn't match etc. It is suboptimal for calculation time of the parsing process. But such constructs to save calculation
time aren't good for structure description in the source of syntax. Not it is planned (not ready yet, release 1.1), that the
result of a matching input isn't purged if a syntax bought doesn't matched. If a second bough starts with the same syntax,
it is recognized because the syntax ident is stored in the result items, and it isn't tested twice.achtet werden.
Topic:.ZBNF_syntaxDescription.controlVariables.
pStyle=std tableStyle=stdTable
This chapter shows all control variable. A control variable is placed outside of any syntax definition, typically at begin of the syntax script before the first syntax definition. if a imported syntax script contains also control variables, they are accepted too. If a control variable, which should be written only one time, is written more as one time, the last setting will be valid. All control variables are inputted before the parsing process starts.
$import "
path
".The named syntax script are imported at this position. path is the absolute or relativ from current script-file path to a file containing a ZBNF-script to import. The imported script may contain some used syntax-definitions. This control variable can be written more as one time to import more as one script.
$keywords= { keyword ? | } .Some identifier are stored as keywords. If a identifier is parsed, writing <$..?...> and the parse result is equal to any of the keywords, the test is declared as non-matching. Thereby a falsity can be prevent,
at example if the input text contains public final int x and the syntax script contains
$keyword=private|public|protected|final|static. declaration::=,,<type> <$?name>,,.
the input doesn't match. final is a identifier and would match to <$?name> but it is a keyword and doesn't match therefore.
$setLinemode.The line-mode is activated for all syntax-prescripts. It means, that a line feed character \n resp. 0x0a isn't accepted as whitespace. The line-mode should be used if the input text is line-oriented. The linefeed itself should be tested as terminal character writing \n. A character \r or 0x0d is accepted as whitespace.
If the line-mode isn't activated, default the \n is a whitespace. But see $.
$endlineComment=
startsequence
.Another start-sequence for endline-comments in a input text is set. The default start-sequence is // like in C and Java. Note that the start sequence for endline comments in the syntax-script is ## independent of this setting. The start sequence should be not longer as 5 characters. All characters between = and the terminating . are accepted as sequence. Don't write additinal white-spaces! But the circum-scripting with \ can be used. At example \. means a dot as start-sequence-character.
At example: $endlineComment=???. or $endlineComment=\.\.\... In the second example three dots are the start sequence of endline comments.
$comment=
startsequence
...
endsequence
.Another start- and end-sequence for comments in a line or over some lines in an input text is set. The default start- and
end-sequence is /* and */ like in C and Java. The sequences should be not longer as 5 characters. All characters between = and ... respectively ... and the terminating . are accepted as sequence. Don't write additinal white-spaces! But the circum-scripting with \ can be used. At example \. means a dot as start-sequence-character.
At example: $comment=[?...?]. or $comment=[\....\.].. In the second example comments are written [.between that.].
$inputEncodingKeyword="
encoding-detect-string
".The key-string to detect a encoding in the first line of an input text is defined. The input text should contain the char sequence
encoding-detect-string
="
encoding
"
where encoding is one of the known encoding identifier such as ISO-8859-1 or UTF-8. See Chapter: 11.2 Definition of encoding of the input file to parse
$inputEncoding="
encoding
".The encoding of the input text is defined here. This define may be used concurrently to a $inputEncodingKeyword=".... If no input encoding keyword is found in the input text, this given encoding is valid than.
$xmlns:namespacekey="
namespace-url
".A XML-namespace is defined. The namespace-key can be used in semantic identifier while a conversion to XML is done: Zbnf2Xml. The namespace declaration is written also in the outputted XML-text. This control variable can be written more as one time to have some namespace declarations.
$main=
syntax-definitionThe so designated syntax-definition is used as the main script valid for the top of input text (the whole input text). If such an control variable isn't used, the first syntax-definition is the main-definition.
Topic:.ZBNF_syntaxDescription.encoding.
pStyle=std tableStyle=stdTable
.
Topic:.ZBNF_syntaxDescription.encoding.syntaxScript.
pStyle=std tableStyle=stdTable
A text given as content of file is written in a specific encoding. There are some ordinary encodings used, at least the UTF8 and some 8-bit-ASCII tables like ISO-8859-1. See http://en.wikipedia.org/wiki/ASCII, The UTF-16-format isn't used frequently but it may be regarded also.
In a ZBNF script file the encoding is able to define adquate XML: The first line starts with an head information, containing
the encoding also. This line contains only characters from the 7-bit-US-ASCII, which are identical in all encoding tables.
Using UTF-16 may be able to detect too, because any second byte is 0.
The first line should be written in form:
.<?ZBNF-www.vishia.org version="1.0" encoding="ISO-8859-1" ?>
If the ZBNF-script is given in a String inside Java, this line isn't necessary because inside Java UTF-16 is used. It is necessary only for file input of the ZBNF-script.
Topic:.ZBNF_syntaxDescription.encoding.input.
pStyle=std tableStyle=stdTable
The ZBNF-Parser used as class:_org/vishia/zbnf/ZbnfParser inputs the text in a internally String form. Java uses internally an UTF-16-encoding. From this view the encoding is a problem of the environment of the parser call.
In the Zbnf2Xml-Application, called from command line, a file is inputted, At this level the encoding should be able to define in a proper way. Maybe, the content of a file doesn't be determined by encoding problems, because its content format is given already. But it may be, that a encoding is able to define in the input file, if the format is free at first. Therefore there are some possibilities or variants:
The encoding of the input is defined in the ZBNF-Script directly writing at example
$InputEncoding=
ISO-8859-1
.
Than the Zbnf2Xml converter reads the required encoding from the ZBNF-script file.
The encoding of the input should be given in the input text itself. Than a keyword is able to define in the ZBNF-script, at example:
$inputEncodingKeyword="My-Encoding".
The rule therefore is:
The first line of a input file may contain that keyword, read in 7-bit-US-ASCII, followed by an = and the encoding string either in "" or not. This given encoding is used. Thereby only the first 250 chars of the first line are tested. At example:
First line of the file... with some head informations, My-Encoding="UTF-8", some others...
If the first line doesn't contain such an String, it is also accepted. Than either the other possibilities are used.
The encoding may be given as command line argument of Zbnf2Xml.
The adequate possibilities are able to use for a users application of ZBNF parsing. The possibility of definition the encoding or the encoding keyword in the ZBNF-script is a part of definition of the ZBNF.
+++++