1. Approach

Reading an XML file from texts is a simple process. The basics for that in my Java classes (vishiaBase.jar) contains all necessaries, especially the org.vishia.util.StringPartScan with its extension org.vishia.util.StringPartFromFileLines and the package for xml itself: link:../../docuSrcJava_vishiaBase/org/vishia/xmlReader/package-summary.html

Of course some special transliterations like &lt; for < should be known, and the principle of name spaces.

The essential reason for writing an own XML reader was: Selection of content from a given XML file. Some XML files, for example from Word & co but also Simulink (slx) contain too much stuff in there XML data. If only essential data should be read, the selection of the data should be done so early as possible.

The org.vishia.xmlReader.XmlJzReader selects the input data from a given XML file with a template, which as very similar to the given XML files. And this file contains also the destinations for the data, which are addressed by reflection.

This reader was written firstly in 2017/18, for using for Simulink slx files. But meanwhile it was using also for some professional projects and other XML files.

2. Usage

You need a configuration file. This can be manual written, or generated, see How to create the configuration file for given example XML files. See the example Example for config.xml file, how to get.

You need a destination class for the read data. This can also be manual written, or code generated from a given configuration file. See [ExmplJavaData]

Note that this both should be done only one time initially (or maybe improved on usage).

To read an XML file, you should configure the XmlJzReader, and then read to an instance to the destination class. Then you can evaluate all data for your own. See Example for parsing with config.xml and storage class

Intrinsically this is a DOM reader (DOM = "Document Object Model", reads the whole content of the XML file as one). But because certain elements are selected, it stores only the relevant data. The operations to write the data can be also re programmed as interpretive, it can check the data while parsing, and write processed results, or ignore the read data if not relevant. For this working flow it is a SAX parser (SAX = "Simple Application interface for Xml").

2.1. Example for config.xml file, how to get

Look on a simple given XML file, src/test/files/xmlReader/bom_exmpl.xml:

<?xml version="1.0" encoding="ISO-8859-1"?>
<BillOfMaterial xmlns:bom="www.vishia.org/XmlSeqWriter/ExampleBom">
  <bom:entry bom:part="R" bom:value="3k3" bom:footprint="1206" bom:ordernumber="123456-789" bom:count="23">some resistors</bom:entry>
  <bom:entry bom:part="R" bom:value="490 Ohm" bom:footprint="1206" bom:ordernumber="123433-789" bom:count="5">some other resisitors
verbose description</bom:entry>
  <bom:entry bom:part="C" bom:value="4n7" bom:footprint="1206" bom:ordernumber="123abc-789" bom:count="17"></bom:entry>
</BillOfMaterial>

The config file for this kind of XML data in XML looks like:

<?xml version="1.0" encoding="UTF-8"?>
<!-- written with org.vishia.xmlSimple.SimpleXmlOutputter -->
<xmlinput:root xmlns:xmlinput="www.vishia.org/XmlReader-xmlinput" >
  <xmlinput:cfg xmlinput:data="!new_root()"  xmlinput:class="root" >
    <BillOfMaterial xmlinput:data="!new_BillOfMaterial()"  xmlinput:class="BillOfMaterial"  xmlns:bom="www.vishia.org/XmlSeqWriter/ExampleBom" >
      <bom:entry 
        bom:count="!@bom_count"  
        bom:footprint="!@bom_footprint"  
        bom:ordernumber="!@bom_ordernumber"  
        bom:part="!@bom_part"  
        bom:value="!@bom_value"  
        xmlinput:list=""  
        xmlinput:data="!new_bom_entry(bom_count,bom_footprint,bom_ordernumber,bom_part,bom_value)"  
        xmlinput:class="bom_entry"
      > !set_text(text)
      </bom:entry>
    </BillOfMaterial>
  </xmlinput:cfg>
</xmlinput:root>

As you see it is similar the input file. Elements which occur more as one are written here one time, but with the hint, use a list: xmlinput:list="" (scroll right above to see it).

  • All elements with the name space of xmlinput are hints how elememts should be processed.

  • Instead the values there are hints how to store the data. For example bom:count="!@bom_count" determines that the value of this attribute is stored in a temporary bom_count variable which is then used to create the element with

 xmlinput:data="!new_bom_entry(bom_count,bom_footprint,bom_ordernumber,bom_part,bom_value)"  xmlinput:class="bom_entry"

It means in the current storage class an operation new_bom_entry should be existing which is used to create the instance for an entry and store it in the given list. The XmlJzReader evaluates this information using Reflection in Java.

The text of the entry is stored calling set_text(text) which should be given as an operation of the created data (output of new_bom_entry).

This is the template to process any XML file with this structure.

You can change this gotten config.xml for your own, use other operations to store, and especially remove elements or attributes which are not needed. If the XmlJzReader finds data which are not contained in the config, it skipped over it.

The config file is similar a schema about the content of an XML file. But it does not follow the Xschema strategy. If you create the config file from one example of a user’s file and this file does not contain all possibilities, which you may need, you should use another file too and merge all.

2.2. Example for the storage class for the data

Now you need only destination class definitions to store the data …​.

/**This file is generated by genJavaOut.jzTc script. */
public class ClassForBom {
    protected BillOfMaterial billOfMaterial;
    /**Access to parse result.*/
    public BillOfMaterial get_BillOfMaterial() { return billOfMaterial; }

  /**Class for Component BillOfMaterial. */
  public static class BillOfMaterial {
    protected List<Bom_entry> bom_entry;
    /**Access to parse result, get the elements of the container bom_entry*/
    public Iterable<Bom_entry> get_bom_entry() { return bom_entry; }
    /**Access to parse result, get the size of the container bom_entry.*/
    public int getSize_bom_entry() { return bom_entry ==null ? 0 : bom_entry.size(); }
  }

  /**Class for Component Bom_entry. */
  public static class Bom_entry {
    protected String bom_count;
    protected String bom_footprint;
  .....
    /**Access to parse result.*/
    public String get_bom_count() { return bom_count; }
    /**Access to parse result.*/
    public String get_bom_footprint() { return bom_footprint; }
  .....
  }

Additionally a derived class is generated which contains only write operations:

/**This file is generated by genJavaOut.jzTc script.
 * It is the derived class to write Zbnf result. */
public class ClassForBom_Zbnf extends ClassForBom{
  .....
  /**Class for Component BillOfMaterial.*/
  public static class BillOfMaterial_Zbnf extends ClassForBom.BillOfMaterial {

    /**create and add routine for the list component <Bom_entry?bom_entry>. */
    public Bom_entry_Zbnf new_bom_entry() {
      Bom_entry_Zbnf val = new Bom_entry_Zbnf();
      if(super.bom_entry==null) { super.bom_entry = new LinkedList<Bom_entry>(); }
      super.bom_entry.add(val);
      return val;
    }

    /**Creates an instance for the Xml data storage with default attibutes. &lt;Bom_entry?bom_entry&gt;  */
    public Bom_entry_Zbnf new_bom_entry(String bom_count, String bom_footprint, String bom_ordernumber, String bom_part, String bom_value ) {
      Bom_entry_Zbnf val = new Bom_entry_Zbnf();
      val.bom_count = bom_count;
      val.bom_footprint = bom_footprint;
      val.bom_ordernumber = bom_ordernumber;
      val.bom_part = bom_part;
      val.bom_value = bom_value;
      //
      if(super.bom_entry==null) { super.bom_entry = new LinkedList<Bom_entry>(); }
      super.bom_entry.add(val);
      return val; //Note: needs the derived Zbnf-Type.
    }
  .....

Both classes can also be used if the data come from the Zbnf_Parser.html. Because the Zbnf parser was the first one which uses the concept, the writer classes have the _Zbnf suffix.

This class offers an example where a manually change may be sensible: The element bom_count is a number. It may be stored in the data better as int value than as String. The conversion from the read text value (from XML, also from ZBNF) can be done immediately in this shown constructor. The user receives immediately this count as expected in integer. Also the names can/may be changed. The prefix bom_ comes from the name space in XML. The formal generation should regard it, but the user data do not need it. The names can be changed both in the confix.xml as also in the Java operations and data. It is better for the application to understand. The both worlds: gather and store data, and evaluate data, comes together here, it is the interface or border of both and can be proper adapted.

2.3. Example for parsing with config.xml and storage class

The example is contained in

//from source: src/test/java/org/vishia/xmlReader/test/Test_XmlJzReaderSimpleExmpl.java
  static void readTheBom() {
    XmlJzReader xmlReader = new XmlJzReader();
    try {
      XmlCfg cfg = xmlReader.readCfg(new File("src/test/files/xmlReader/bom_cfg.xml"));
      ClassForBom data = new ClassForBom_Zbnf();
      xmlReader.readXml(new File("src/test/files/xmlReader/bom_exmpl.xml"), data, cfg);
      System.out.println(data.get_BillOfMaterial().toString()); //set breakpoint here to view data
    } catch (IOException e) {
      e.printStackTrace();
    }
  }

With one given instance of XmlJzReader it is possible to read more as one file, but with the given XmlJzReader.readCfg(…​) operation. or also with XmlJzReader.readCfgFromJar(…​). The last one is a typical operation because the config.xml file is often stored with given content inside a jar.

The Output data class data for the read XML data should be proper to the cfg.xml file, see chapter above. Here it is the generated class from the config.xml, as shown in chapter above. But the class can be tuned with more capability, so long as it matches to the config.xml. This class is associated to the root element of the read xml file, and, depending on the config.xml, also for further content. But usual referred sub instances are created for child nodes.

The XmlJzReader.readXml(file, dataOut) operation reads a XML file with the given config.xml and stores the result to dataOut. This is the usual used operation for XML files. Some more variants are given, read from a opened java.io.InputStream, from a java.io.Reader, from a zip file or some more, see chapter All operation variants to read XML data

3. Detail explanation of the configuration file

The config.xml file has the key position of the XmlJzReader both for interpretation of the XML file content and for storing the data.

The configuration file can be given as XML file itself, or since ~2022 used first for the LibreOFB tool (../../../fbg/html/Videos_OFB_VishiaDiagrams.html) also as textual file. The last one allows more simple manually editing and is preferred.

3.1. How to create the configuration file for given example XML files.

Generally, if the structure of a given XML file, its Xschema is known, the config file for org.vishia.xmlReader.XmlCfg can be written also manually, with a little bit diligence, by knowing the XML schema definition. This is especially feasible in the text format.

The capability of org.vishia.xmlReader.XmlJzCfgAnalyzer is helpfully if the Xschema of a given XML file definition is not available, or maybe too complex for a simple given XML file, which do not uses anyway all capabilities of the possible given XML Schema definition. Or vice versa, the structure of the XML file is in itself obviously, albeit extensive. The effort to look for a XML schema is a higher effort and seems to be unnecessary.

In other words: It is a way to ignore sophisticated or not given XML Schema descriptions.

The basic idea is: Read given typically XML files, look what is inside, and built a map to its structure. Whereas, if a background knowledge (which data are stored and necessary in the XML file) it is possible to supplement manually details in the found configuration file. For example certain entries in the XML files are known from a description, but not yet given in example XML files, it can be manually supplemented.

Analyse one or a few XML files and look on its output

You can see the structure of a firstly unknown XML file also by manual view to the content in the textual XML presentation. But this is not so proper obviously.

Call the XmlJzCfgAnalyzer either per command line, or maybe also from debugging in an Java-IDE, with the following cmd line example:

RETDIR="$PWD"
callPATH="$(realpath "$0")"  ## with the original used PWD to reach this shell script
cd $(dirname $0)/..          ## currdir is the parent of where this script is located
echo called: $callPATH
java -cp ../../tools/vishiaBase.jar org.vishia.xmlReader.XmlJzCfgAnalyzer --@$callPATH:args
### args: ##
### -iXml:/home/...../src/ExmplPositionCtrlPID/odg/ExmplPositionCtrlPID.odg:styles.xml
### -iXml:/home/...../src/docuFBcl/odt/Handling-OFB_VishiaDiagrams.odt:styles.xml
### -dirDebug:../../build/Style-test
### -oCfg:../../build/Style-test/styles_XmlCfg.txt
read -p "...Press ENTER..." VAR
cd $RETDIR

This script can be found on LibreOffc/SOURCE.wrk/src/srcJava_vishiaLibreOffc/makeScripts/+createXmlConfigStyles.sh in https://vishia.org/LibreOffc/deploy/LibreOffcZmLConv-2025-12-19.zip or later versions.

  • The detailed arguments are part of the shell script (in comment, below the java call) with --@file:Label which is a capability of all vishia calls from the command line.

  • -iXml:…​: two XML example files in a zip archive are read which’s internal styles.xml should be analysed. It is here the styles.cfg inside a LibreOffice file. To analyse immediately a XML file, write its path without the ":internal/path/file.xml". This writing style is for a XML file in a zip file, which is appropriate for the Libreoffice files.

  • -dirDebug:…​ for debug outputs, see there, not necessary, for inquisitiveness

  • -oCfg:…​ output path for the XmlJzReader configuration file in text form. If the extension .xml is given, it writes the older no more recommended XML form for configuration.

Now you can look on the content, compare this content textual with a configuration file before created (what is new), and save this file on the necessary position in your file tree to used with sources using the org.vishia.xmlReader.XmlJzReader.

3.2. Syntax of the textual presentation

The whole textual config file is described with following syntay in ZBNF2 notation. See ZbnfParser.html#ZBNF2 A text with this syntax is created on usage org/vishia/xmlReader/XmlJzCfgAnalyzer.java, see chapter above.

With a cyan colour, the exact ZBNF2 syntax is shown. Because this is completely and hence not so proper to understand, an example for the configuration is shown in purple colour below, which contains the mentioned parts of syntax and illustrates the ZBNF2 syntax definition.

All links in the explanation goes immediately to the appropriate element in the class org.vishia.smlReader.XmlCfg.java and its sub classes. This is also a documentation how and where this element are stored to use while parsing with org.vishia.xmlReader.XmlJzReader.java

XmlCfg::=XmlJzReader-Config 2024-05
[{?ns=xmlnsAssign:
NS: $$:-$ns="??ns_key" :}]
[{?subtreeNode=subtrees:
SUBTREE:$$:-$subtreeNode_key <=:: node?subtreeNode ::=> :}]
<=:: node?rootNode ::=>
=::.
XmlJzReader-Config 2024-05
NS: oooc="http://openoffice.org/2004/calc"
NS: loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0"
.....
SUBTREE:loext:theme <loext:theme>
  .....
</loext:theme>
SUBTREE:any:other <any:other>
.....
</any:other>
.....
<root>
  .....
</root>
  • The textual presentation should start with the text literal XmlJzReader-Config 2024-05

  • xmlnsAssign: One per line the name space declaration of the XML file is written. Note that the used name space identifier in the parsed XML file need not be the same as here used. As usual in XML the name space identifier is each valid only for the each file, here for the configuration. The name space value is the important one. In the container 'xmlnsAssign' the key is the name space value, and the name space identifier is the value in this TreeMap<String, String>

  • SUBTREE: subtrees: A subtree node is a description of one node in XML with its sub nodes dissolved from the context. It is used (called) with the =⇒SUBTREE:$$:-$node.cfgSubtreeName in a node, see next syntax for node::=.. ... Such subtree nodes are necessary, because some node structures are used on several positions in the XML tree, especially recursively. Then it is only able to describe in the dissolved form using the SUBTREE: definition.

  • rootNode: The root node structure (configuration to parse) is now following.

The following ZBNF2 definition defines one node in XML which may have attributes, text and sub nodes. The syntax presentation is dispersed till the end of chapter with the closing =::..

node::= <$$node.tag> [{?attr=node.attribsForCheck: @$$attr.name=="??attr.storeInMap" :}]
... [: ==>SUBTREE:$$:-$node.cfgSubtreeName [?node.bList: LIST :] [: CLASS:$$node.dstClassName :]
  [: ADD:"??node.elementFinishPath" :]
:|:
<style:style> =>SUBTREE:style:style LIST CLASS:style_style
  ADD:"add_style_style(value)"
</style:style>

This is the alternative definition of a node stored as SUBTREE. Note that the opening alternative in syntax on [: =⇒SUBTREE is closed with the :|: on end of this part of syntax, and the really closing :] of the both alternatives is below on :] </$:-$node.tag> in the last syntax part.

  • $$node.tag: The tag name of the XML node

  • $$node.attribsForCheck: @attrName=identifier is written if this element is only valid for a XML node which have this attribute with the given value.

    This feature allows to process XML nodes with the same tag name, but with different values for certain attributes in a different kind. The data of the node are processed (stored) as different data. It is the same as the XML node would have a specific tag name. This situation occurs in some XML definitions. Especially the destination for the data "??node.elementStoreInPath" and also CLASS:$$node.dstClassName may be different.

    • $$attr.name: The name of the attribute which should be tested.

    • $$attr.storeInMap: The necessary attribute value to use this node description to process and store the node.

  • SUBTREE: $$node.cfgSubtreeName: If this is given, then the description how the parsed XML node should be processed is described in the proper SUBTREE:…​ part.

  • LIST, CLASS and ADD have the same meanings as for any other node, see next.

    • The property $$node.bList is regarded to the parent node how to store sub nodes, and hence cannot be defined within the SUBTREE.

    • The`CLASS` should be identically as in the SUBTREE defintion.

    • The property ADD $$node.elementStoreInPath: need to be defined in the parent node, how to store instances of the SUBTREE node in the parent, whereas NEW is defined in the context of the SUBTREE.

... [?node.bList: LIST :] [: CLASS:$$node.dstClassName :]
<style:font-face> LIST CLASS:Style_font_face
  • LIST $$node.bList: If the identifier 'LIST' is given, then there is a container, usual a LinkedList in java, to store each found XML node as element of this container. It means more as one XML node is expected.

    If LIST is not given, then each parsed occurrence of a node of this type overwrites a maybe found node before. It means, only one node of this type should be expected. Then a container is not necessary, a simple reference to the node’s data of the given class type is sufficient.

  • CLASS:$$node.dstClassName: This here written name is used only to create the user’s data class using org/vishia/xmlReader/GenXmlCfgJavaData.java. If CLASS is not given, the result to store from XML is a String.

  [: NEW:"??node.elementStoreInPath" :]
  [: ADD:"??node.elementFinishPath" :]
  [: TEXT:"??node.contentStorePath" :]
  NEW:"new_style_graphic_properties()"
  ADD:"add_style_graphic_properties(value)"
  • NEW:$$node.elementStoreInPath: This is the description of reflection access to an operation in the data (user’s) class to create the new data instance to store the data of the XML node to parse and add it to the data container (if LIST is written) or to set the reference in the parent instance with it.

  • ADD:$$node.elementStoreInPath: This is the description of reflection access to an operation in the data (user’s) class which is called, if exits, to finish after parsing the data of this node. The ADD means or supposed 'adding the data'. But also any post processing can be done here, for example complex process and output of the data for a SAX model for XML parsing. It depends on the content of this operation written by the user.

  • TEXT:$$node.contentStorePath: This is the description of reflection access to an operation in the data (user’s) class to store a text in the XML node. It is possible to have a container for all sub nodes where the text is one of this, as a specific sub node. This has the advantage that the order of texts within other sub nodes is documented. But due to the user’s decision also an only one String can be set, if only one text is expected and the order does not play a role.

  [: NAMESPACE:"$$node.nameSpaceDef" :]
  [{?attrib=node.attribs: @$$attr.name=[: "@$$attr.storeInMap" :|: "??attr.daccess" :] :}]
  [{?subnode=node.subnodes: <=::node:subnode::=> :}]
:]
<$$node.tag> =::.
  @fo:color="@fo_color"
  @draw:auto-grow-height="set_draw_auto_grow_height(value)"
  • NAMESPACE:$$node.nameSpaceDef: This is the description of reflection access to an operation in the data (user’s) class to store a local used name space definition for this given XML node. It is used if an attribute xmlns:key="value" is detected in the XML node.

  • node.attribs: @attrName="…​" is given to process all attributes of the node:

    • attr.name: The name of the attribute which should be processed

    • attr.storeInMap: If @identifier is given, then the attribute value is stored with this identifier in a temporary map before the instance for the XML node data is created. The identifier given here is the name of the argument in the NEW:operation(args,…​) definition. It may (do not need) also the real used argument name in this user’s operation. It can be often identically with the attribute name in XML, but of course without the ':' and often used '-'. On analysing the XML with org/vishia/xmlReader/XmlJzCfgAnalyzer.java. attributes which are existing on all nodes are marked with this.

    • attr.daccess: This is the description of reflection access to an operation in the data (user’s) class If this is given, then the attribute value is stored calling the here given operation, after the instance for the node was created.

  • node.subnodes: All sub nodes are presented in this recursively used same syntax. A two space indent assures the overview.

3.3. Syntax of the XML presentation

All identifier in upper case are placeholder for user identification.

It starts always with the root node:

<?xml version="1.0" encoding="utf-8"?>
<xmlinput:root xmlns:xmlinput="www.vishia.org/XmlReader-xmlinput">

subtree

The root node can contain some:

  <xmlinput:subtree xmlinput:name="SUBTREE_NAME">

This sub tree can contain the configuration of any part of an XML file which is inserted with:

  <ANY_TAG xmlinput:subtree="SUBTREE_NAME" xmlinput:data="!OPER_FOR_DATA()"/>

Subtrees can be used similar a subroutine call in a programming language either if a long nested structure should be broken in parts, or especially if the same XML tree structure is used on different positions.

The operation to create the data is associated on the subtree invocation. So on different positions different data can be created (which are of course similar because the sub tree internal data should match). - Usual using derived classes for the instances or usual really the same classes. But also the calling environment may be different, so differet creation routines are necessary.

The xmlinput:cfg node as main

  <xmlinput:cfg>

is below the <xmlinput:root and beside the <xmlinput:subtree. It contains as sub nodes the expected nodes (tag names) of the XML files to read.

From the example above:

<xmlinput:root xmlns:xmlinput="www.vishia.org/XmlReader-xmlinput" >
  <xmlinput:cfg xmlinput:data="!new_root()"  xmlinput:class="root" >
    <BillOfMaterial xmlinput:data="!new_BillOfMaterial()"  xmlinput:class="BillOfMaterial"  xmlns:bom="www.vishia.org/XmlSeqWriter/ExampleBom" >
      <bom:entry bom:count="!@bom_count"  bom:footprint="!@bom_footprint"  bom:ordernumber="!@bom_ordernumber"  bom:part="!@bom_part"  bom:value="!@bom_value"  xmlinput:list=""  xmlinput:data="!new_bom_entry(bom_count,bom_footprint,bom_ordernumber,bom_part,bom_value)"  xmlinput:class="bom_entry" >!set_text(text)</bom:entry>

matches to a XML file:

<?xml version="1.0" encoding="ISO-8859-1"?>
<BillOfMaterial xmlns:bom="www.vishia.org/XmlSeqWriter/ExampleBom">
  <bom:entry bom:part="R" bom:value="3k3" bom:footprint="1206" bom:ordernumber="123456-789" bom:count="23">some resistors</bom:entry>

Sub nodes and attributes

The root node below <xmlinput:cfg> and any other node can contain sub nodes with given tag name. Then the occurrence of a node with this tag is accepted in the read XML file. The sub node definiton in the xmlCfg should contain a proper xmlinput:data="!…​" or ommit this as special case, see the next chapter.

All attribute values which are defined in the XmlCfg are used. Non defined attributes are ignored. See chapter [cfg_attrValue].

Sub nodes, decision using with specific attribute values

Sometimes nodes in an XML file have the same tag name but there are really different in the meaning of the data. Then often specific key attributes should control the usage of the data. This is also supported by the XmlJzReader:

Firstly you should define which attributes are used to check:

  <TAG ATTR1="!CHECK" />

Now this attribute type is used for checking. After them you can define some sub nodes with a specific value in the checked attributes:

  <TAG ATTR1="VALUE1" ..... > ..... </TAG>
  <TAG ATTR1="VALUE2" ..... > ..... </TAG>

Now you have two sub node variants for the configuration. Both sub node variants refer to the same TAG but they are independent. From the read XML file that sub node definition is used, which’s attribute value matches. If no attribute value is matching, this node is ignored by the XmlJzReader.

As example the configuration to read a part of a Simulink ( © Mathworks) slx file should be presented:

      <Line xmlinput:data="!addLine()">
        <P Name="!CHECK"/>
        <P Name="Name">!name</P>
        <P Name="ZOrder">!zorder</P>
        <P Name="Labels">!labels</P>
        <P Name="Src">!src</P>
        <P Name="Dst">!add_dst(text)</P>
        <Branch xmlinput:subtree="Branch" />
      </Line>

Here the properties of a line a not stored in attributes which may be expectable, instead, maybe for some reason, sub XML nodes all with the name <P are used. There is an association between name and value, the name determines the usage of the value. To evaluate this, the name sensitiveness is used here.

Data for a node

As already displayed above:

    xmlinput:data="!EXPRESSION"

contains a reflection-evaluable expression (more exact evaluating by org.vishia.util.DataAccess used in the constructor DataPathElement("EXPRESSION", variables, null).

This EXPRESSION is executed in the data of the current (parent) node via reflection mechanism. For the root node the data instance is given from outside with invocation

xmljzReader.readXml( ..., data, ...);

The result of the EXPRESSION (value of a field, return value of an operation) is used as data for this node.

Same data from parent also for a sub node:

If you write xmlinput:data="!this" then the same instance is used for a sub node as for the parent node. This is sensible if a sub node can exists only one time and you will prevent a too much data nesting. But in this case you can omit this entry, it is the same (it means a child node, but not another instance for the data).

Access existing data via reference:

Similar as this you can have a referred instance (already existing) for a sub node writing xmlinput:data="!REF" with this public field REF.

Calling an opertion with arguments for data creation or access:

Usual it is proper to invoke an operation because the operation can programmatically create, check etc. for example to create a container for the first occurrence of an element with further adding, or check somewhat other information. The operation can have parameter. That is tag for the tag name of this XML node and the content of special named attributes of this XML node. All attributes are gathered firstly before calling the xmlinput:data="!CREATE_OPER(ARG1, ARG2)" according the following schema:

 <TAG ATTR="!@ARG1" ATTR2="!@ARG2" xmlinput:data="!CREATE_OPER(ARG1, ARG2)" .....

For the argument variable the identifier after "!@ is relevant, not the attribute name. But usual both may be exact the same or similar.

The advantage of giving attribute values to the operation is: The operation can decide with this attribute values what to do. A second one: If a constructor is called (often so), then the attribute values can be stored as final values in the class. Using final designation has an advantage for software engineering. And last but not least: It can be checked the attribute values in their interrelations, can be calculate resulting values, and if necessary stored as private. An operation with values has more flexibility.

Store data of attributes

The attribute values can be either gather as argument values as shown above. Or it can be also written with an expression or operations to the data which are associated to the new node. This is of course the returned data from the xmlinput:data="!…​." access.

 <TAG ATTR="!EXPRESSION" ATTR2="!OPERATION(name, tag, value)" xmlinput:data="!...." .....

The EXPRESSION presents the reference, where to store the value of the read attribute. EXPRESSION can refer to a field as also can be an operation of the destination class. If it is an operation it may be return a value of type java.lang.reflect.Field. This reference is evaluated via Reflection. It means it presents a java.lang.reflect.Field. If an operation is used and it does not return a java.lang.reflect.Field then it is not used as expression for the destination value. See next:

If the EXPRESSION is an OPERATION(name, tag, value) as shown for ATTR2, then it is possible to store the value of the read attribute also as argument of this operation. Then the OPERATION(name, tag, value) does not need to return a java.lang.reflect.Field. The return can be void.

  • The name argument is the name of the attribute given as written (with name space short form).

  • The tag can be used as argument. It is the tag name of the XML node where the read attribute is member of.

  • The 'value' is the read value of the attribute.

  • All attributes are optional.

  • The type of this arguments in the OPERATION(…​) argument list should be String.

Store a text inside a XML node

Inside a XML node there can be sub nodes or free texts. They can be more as one text between some sub nodes. That order may be semantically important, but an order of the texts with sub nodes cannot be regarded formally in the config.xml.

Storing a text inside a node us describes similar as storing of values of attributes as:

 <TAG ... xmlinput:data="!...." >!EXPRESSION<SUBNODE...></TAG>
 <TAG ... xmlinput:data="!...." >!OPERATION(tag, text)<SUBNODE...></TAG>

It means the EXPRESSION or OPERATION(…​) is written in the config.xml as value inside the node (instead the expected text of a read xml).

This EXPRESSION or OPERATION(…​) is used for any text in the read xml and it is invoked in order of the read xml together with the order of sub nodes. It means, especially if an OPERATION(…​) is used, the operation can sort in the texts in the context also of the other sub nodes.

The argument text contains the text in this sub range (between the maybe existing sub nodes, not as a whole).

4. How to create automatically a storage class for the data

As well as from given XML input files the config.xml can be generated, from a given config.xml the proper Java Class can be generated which stores all data and support all operations which are listed in the config.xml file.

As also for generation config.xml from given XML data, this can be seen as generation of the first version of the Java Class which is furthermore manually maintained, maybe with re-generation and merge. But it is more near to the config.xml file, therefore, manual maintenance is not so obvious but possible.

The problem of generation and maintenance is: There may be requests to process the data immediately in the calling (generated) operations which are not substantiated in the config.xml file itself.

Look on the example for bom_cfg.xml from chapter above. The following statements create a proper Java class:

//from source: src/test/java/org/vishia/xmlReader/test/Test_XmlJzReaderSimpleExmpl.java
  static void genJavaClasses() {
    String[] args = 
      { "-cfg:src/test/files/xmlReader/bom_cfg.xml"
      , "-dirJava:" + Arguments.replaceEnv("$(TMP)/test_vishiaBase/Test_XmlJzReader/Java")
      , "-pkg:org.vishia.xmlReader.test"
      , "-class:ClassForBom"
      };
      
    GenXmlCfgJavaData.smain(args);
  }

This is the same as java.exe invocation from command line. The arguments are explained on empty call of this main:

  • -cfg: determines any valid config file, automatic created and/or manually changed.

  • -dirJava: is the output directory for the generation. Usual you should not override your pre-generated and maybe manually changed Java classes, instead you should generate to a temporary directory as here shown, and then compare and merge it.

  • -pkg: and -class: are the package path and class name.

The generation looks like (only excerp, the full classes are part of the test environment on src/test/java in the cmpnJava_vishiaBase githul archive and downloads):

/**This file is generated by genJavaOut.jzTc script. */
public class ClassForBom {
    protected BillOfMaterial billOfMaterial;
    /**Access to parse result.*/
    public BillOfMaterial get_BillOfMaterial() { return billOfMaterial; }

  /**Class for Component BillOfMaterial. */
  public static class BillOfMaterial {
    protected List<Bom_entry> bom_entry;
    /**Access to parse result, get the elements of the container bom_entry*/
    public Iterable<Bom_entry> get_bom_entry() { return bom_entry; }
    /**Access to parse result, get the size of the container bom_entry.*/
    public int getSize_bom_entry() { return bom_entry ==null ? 0 : bom_entry.size(); }
  }

  /**Class for Component Bom_entry. */
  public static class Bom_entry {
    protected String bom_count;
    protected String bom_footprint;
  .....
    /**Access to parse result.*/
    public String get_bom_count() { return bom_count; }
    /**Access to parse result.*/
    public String get_bom_footprint() { return bom_footprint; }
  .....
  }

Additionally a derived class is generated which contains only write operations:

/**This file is generated by genJavaOut.jzTc script.
 * It is the derived class to write Zbnf result. */
public class ClassForBom_Zbnf extends ClassForBom{
  .....
  /**Class for Component BillOfMaterial.*/
  public static class BillOfMaterial_Zbnf extends ClassForBom.BillOfMaterial {

    /**create and add routine for the list component <Bom_entry?bom_entry>. */
    public Bom_entry_Zbnf new_bom_entry() {
      Bom_entry_Zbnf val = new Bom_entry_Zbnf();
      if(super.bom_entry==null) { super.bom_entry = new LinkedList<Bom_entry>(); }
      super.bom_entry.add(val);
      return val;
    }

    /**Creates an instance for the Xml data storage with default attibutes. &lt;Bom_entry?bom_entry&gt;  */
    public Bom_entry_Zbnf new_bom_entry(String bom_count, String bom_footprint, String bom_ordernumber, String bom_part, String bom_value ) {
      Bom_entry_Zbnf val = new Bom_entry_Zbnf();
      val.bom_count = bom_count;
      val.bom_footprint = bom_footprint;
      val.bom_ordernumber = bom_ordernumber;
      val.bom_part = bom_part;
      val.bom_value = bom_value;
      //
      if(super.bom_entry==null) { super.bom_entry = new LinkedList<Bom_entry>(); }
      super.bom_entry.add(val);
      return val; //Note: needs the derived Zbnf-Type.
    }
  .....

Both classes can also be used if the data come from the Zbnf_Parser.html. Because the Zbnf parser was the first one which uses the concept, the writer classes have the _Zbnf suffix.

This class offers an example where a manually change may be sensible: The element bom_count is a number. It may be stored in the data better as int value than as String. The conversion from the read text value (from XML, also from ZBNF) can be done immediately in this shown constructor. The user receives immediately this count as expected in integer. Also the names can/may be changed. The prefix bom_ comes from the name space in XML. The formal generation should regard it, but the user data do not need it. The names can be changed both in the confix.xml as also in the Java operations and data. It is better for the application to understand. The both worlds: gather and store data, and evaluate data, comes together here, it is the interface or border of both and can be proper adapted.

5. All operation variants to read XML data

There are some operations with and without immediately given configuration (XmlCfg) reading from a File, opened ressource and from a zip archive.

5.1. Reading with given XmlCfg

You need anyway an instance of org.vishia.xmlReader.XmlJzReader

An instance of The org.vishia.xmlReader.XmlCfg can be gotten with calling:

Both routines uses XML files, but one as really file, one as file in a jar. The result is a XmlCfg instance, which is also referenced in the XmlJzReader to use the reader with extra given cfg. But the reference as return value can also be stored as reference for your own.

Now you can invoke a XmlJzReader on demand with the different gotten configurations:

5.2. Reading with one time defined XmlCfg

The routines to define the configuration for the XmlJzReader instance are the same as to get a XmlCfg because the gotten configuration is referenced in the XmlJzReader instance.

Then you can invoke

That are the same operations as with given xmlCfg, only this argument is missed - because it is used from the internal stored last usage.

6. How does the XmlJzReader XML parser work

6.1. Read the text

A XML file is firstly a textual file. To get the data stored in it, the basically syntax of XML is evaluated. This is <name>…​</name> for an xml element (may contain sub structures), <name attrib="value"…​. etc. as known. It means the start and end tag should be detected and the content between should be stored. This is done by procedure the text using the capability of the Class org.vishia.util.StringPartScan.

A XML specific problem is the replacement of specific characters. Instead a < in a text content of a XML node, &lt; is notated. Because the < means start of an inner node, it would be confused. It means all &lt; should be replaced by <. But how to write a &lt; as text part. The answer is: &amp;lt; is notated, because the &amp; is the replacement for &. There are some more systematically replacements for the control character in the XML syntax. But also special character of Coding should be replaced.

The encoding of the whole XML textual file can be varied. UTF-8 is a standard, but also such as ISO-8859-x or US-ASCII with the advantage of exact one byte per character is used on file level on disk. Special character can be any time written as &#x4567; with its known UTF-16 code, also decimal, here hexadecimal. This needs a translation. This is not a problem of reading the file, on reading a correct XML such characters are not faulty used. It is a problem on storing the data: Instead &lt; a < should be stored. …​etc.

Java uses internally UTF-16 for all character. It means all is anyway converted to UTF16, also for data processing. For output the data, the known rules are valid (using FileWriter or FileOutputStream, with specific encoding etc). This is not related to the XML coding.

For binary data XML uses a CDATA region. <![CDATA[ …​ ]]> is detected and also correct stored. (TODO maybe test because it is really rarely used.)

Last but not least of course <!-- …​ ..> is detected as commented area and hence skipped.

6.2. Select node and attribute types in config.xml

If a node is detected after < the name of the node is read. Then this node name is checked whether it is specified in the given configuration file. If it isn’t so, this node and all sub nodes are ignored. The node structure from the sub nodes are detected, of course, but no data are stored for all.

Only for nodes which are specified in the config.xml file data are stored. This has the advantage that sophisticated XML files can be analyzed firstly in its essential data, secondly than enhanced with more interesting data. This approach accepts that the data from the XML file should be evaluated knowing its meaning, not only stored. That is a process of getting to know the meaning of the XML data.

Sometimes till often not all data should be used, and this data should not be stored.

Traditional the DOM and SAX model is familiar on reading XML. With DOM the whole XML file is stored in internal data, whereas SAX has a selection algorithm which data are stored how. This algorithm for SAX may be complex, hence often DOM will be used firstly. But with DOM the amount and selection of data is the problem.

The XmlJzReader can be seen as a SAX implementation. But the algorithm to proceed the data are not to write manually, they are a result of the config file. The user should only prepare a proper config file.

6.3. How to store the data

The path to store the data is given too in the config.xml file. It is given as textual path which is evaluated by the reflection capability of Java. It means it is not compiled, it is interpreted.

This has two disadvantages:

  • Calculation time: Reflection access is a search algorithm and need time for any storing process. But, of course, evaluating the text given path via reflection is done only on the first access to the given element. All following accesses uses the stored access path as reference respectively the found Reflection Method for immediately call with the given reference.

  • No checking of the expression while compile time. Errors in the expression are detected firstly on run time, as error "'Reflection …​ expression not found'". But this is a problem only on changing the config.xml file. The changes should be carefully done, including a test.

6.4. Name spaces

This is very simple. The namespace short designation should be replaced by the declared long designation. Only the long namespace should be used to determine elements and attributes. The short namespace designation is anytime valid only locally in a given XML file, though it seems to be the same for all.

It is a quest of simple replacement with a map.

6.5. Encoding

The encoding is written always in the head of a XML file. The head is firstly read with US-ASCII with only ~256 byte, search a encoding designation, and then read again completely with the given namespace.

But that is also a property of the org.vishia.util.StringPartFromFileLines .

6.6. Reading xml content in zip archives

Often files are stored as zip, and the zip contains some XML and more files. This is true for most of file types with extension '.*x', for example '.docx', '.slx'.

The unzip capabilities of Java ('java.util.zip.*') is combined with the 'XmlJzReader', that’s all. The XmlJzReader.readZipXml(…​) do so. Also an opened zip can be evaluated with XmlJzReader.readXml(InputStream, …​)

7. Algorithm to analyse the structure of XML files

ccc