JUMBO-Converters

EditEdit InfoInfo TalkTalk
Search:    

  1. Running the converters
  2. Overview of the parsing philosophy
  3. Actual implementation
    1. Parsing a module with a template
      1. Template Attributes
      2. Records
      3. Unit Testing Framework
      4. Transforming the raw XML
      5. Key Transforms
      6. Notes on Transforms
  4. JUMBO-Converters filesystem structure

JUMBO-Converters are modules that transform inputs into outputs, usually 1:1 such as Foo2CML and CML2Foo.

Primary site: [WWW]https://bitbucket.org/wwmm/jumboconverters-compchem

Running the converters

This page is concerned with the philosophy and design of the converters. If you are just interested in running them, please go the the [WWW]Tutorials and problems page.

Overview of the parsing philosophy

The approach that has been adopted by the parsers is to break the monolithic text block of the logfile into a series of separate chunks that encapsulate a coherent piece of data.

There may be many repeated chunks within a log file. For example, if a chunk is an SCF calculation, for a single-point energy calculation there would just be a single SCF chunk, whereas for a geometry optimisation calculation, there would be as many SCF chunks as there were SCF calculations.

Chunks are often nested, so using the geometry optimisation example, a single geometry optimisation step would itself be a chunk, and this in turn would contain (one or more) SCF chunks. There would then be as many geometry optimisation step chunks as there were geometry optimisation steps.

For a more detailed explanation, please see the pages on Chunkers and Block, and also the older How converters work.

Parsing is currently a multi-stage process. The parser reads the log files and converts it into "raw" XML. This process splits the file up into modules, each module corresponding to a chunk in the file. Within each module the text is preserved, so there is no loss of data from the log file; additional structure and annotation have just been added.

Each module is then parsed separately. The module may be parsed into a number of sub-modules or have data extracted into records.

The process of parsing a module into a record is the only process that removes text from the log file, but this is also the process of marking up data, so again nothing should be lost.

At the end of the parsing process we have a raw XML file that contains all of the information from the original log file, separated up into modules and with known quantities marked up with XML.

The terms in the raw XML are defined in the code-specific dictionary, which describes what each of the quantities are, their units etc.

This raw XML is then transformed into CML in a second step, where the quantities in the code-specific dictionary are either mapped onto CML or domain-specific dictionaries, and additional annotations or properties can be added (e.g. bond lengths could be calculated etc.).

Actual implementation

Jumbo converters are written in Java, although the template parsing technology is described entirely in XML, so that once a new parser module has been created, only XML files need to be edited in order to extend and develop the parser.

The reference parser for computational chemistry is the NWChem parser, so any examples will refer to it.

The java class that controls the two-stage parsing for the NWChem is [WWW]NWChemLog2CompchemConverter.java.

The first stage (controlled by the [WWW]NWChemLog2XMLConverter.java class), uses the [WWW]topTemplate.xml file to include the various XML templates that parse the different chunks of the logfile.

The second stage (controlled by the [WWW]NWChemLogXML2CompchemConverter.java class), uses the transforms in the [WWW]nwchem2compchem.xml to manipulate the raw XML into a [WWW]convention-compliant form.

See Declarative parsing syntax for a complete list of the rules followed by the parsers and their relations to the template XML files.

Parsing a module with a template

The structure of a typical template is shown below with comments to explain the various sections.

<!-- The template is contained in an XML element, with the behaviour controlled by various attributes
of the form ATTRIBUTE="VALUE". See the 'Template Attributes' section below for more information -->
<template id="foo" pattern="…">

<!-- The templates contain their own unit-testing framework in the comments. See the 'Unit Testing Framework' section below
for more information -->
<comment class="example.input" id="foo">
EXAMPLE LOGFILE TEXT
</comment>

<!-- Templates can themselves include other templates using a templateList. Only templates,
or include directives to include other templates should be in a templateList -->
<templateList xmlns:xi="http://www.w3.org/2001/XInclude">
      <xi:include href="basis.summary.xml"/>
</templateList>

<!-- The record is the mechanism to extract text into XML. See the 'Records' section below for further details -->
<record
id="iter"
repeat="*">\s*{I,compchem:iterationIndex}\s+{F,compchem:totalEnergy}\s+{E,n:gnorm}\s+{E,n:gmax}\s+{F,compchem:wallTime}</record>

<!-- The XML elements created with the records can be manipulated with transforms. See the 'Transforming the raw XML'
section below for more information -->
<transform process="addUnits"
xpath=".//cml:scalar[@dictRef='compchem:totalEnergy']"
value="nonsi:hartree"
/>

<!-- This is part of the unit testing framework, and contains the marked-up text that should be created from the
EXAMPLE LOGFILE TEXT above. See the 'Unit Testing Framework' section below for more information -->
<comment class='"example.output" id="foo">
PARSED OUTPUT
</comment>

</template>

Template Attributes

The possible ATTRIBUTES on a template are:

<module xmlns="http://www.xml-cml.org/schema" xmlns:cmlx="http://www.xml-cml.org/schema/cmlx" cmlx:templateRef="foo"> 
PARSED TEXT
</module>

Records

Records are the machinery used to extract text from a file and mark it up into XML.

A record is an XML element, which can have a number of attributes (see below) and which may contain a string, which is a simple regular expression-type language for determining what will be extracted and how it will be marked up.

Unlike the templates, where each template is tried in turn against each line of the file, records are processed sequentially. Each record is processed in turn until it fails, at which point the next record is processed until all records in the module have been processed.

An empty record (such as <record repeat="2"/>) can be used to "gobble" lines (which are discarded).

If the record has content, then the text of the line is parsed into a CML list with a templateRef as specified by the id of the record.

A simple example to read the XYZ format geometry printed in an NWChem output is shown below. The text that is to be parsed is:

            XYZ format geometry
            -------------------
    11
 geometry
 fe                    0.00000000     0.00000000     0.00000000
 c                     0.00000000     0.00000000     1.80680057
 o                     0.77109980    -2.87778364     0.00000000

The records to parse this are:

  <!-- Read 2 lines. The record has no content, so the lines are discarded. -->
  <record repeat="2"/>

  <!-- Read a line with a single integer. The integer will be placed in a CML scalar with the dictRef "compchem:numAtoms".
         The scalar will itself be within a CML list with the templateRef of "atoms". -->
  <record id="atoms">\s*{I,compchem:numAtoms}\s*</record>

  <!-- Read a line with a single character string. The string will be placed in a CML scalar with the dictRef "n:geomtype".
         The scalar will itself be within a CML list with the templateRef of "atoms". -->
  <record id="geo">\s*{A,n:geomtype}\s*</record>

  <!-- Keep reading lines while they contain a character string, followed by 3 floats. Make an array of all matching variables.
         The arrays will be held in a CML list with the templateRef of "mol". -->
  <record makeArray="true" repeat="*"
      id="mol">\s*{A,compchem:elementType}\s*{F,compchem:x3}\s*{F,compchem:y3}\s*{F,compchem:z3}\s*</record>

This result of this parsing is as follows:

<list cmlx:templateRef="atoms">
   <scalar dataType="xsd:integer" dictRef="compchem:numAtoms">11</scalar>
</list>
<list cmlx:templateRef="geo">
   <scalar dataType="xsd:string" dictRef="n:geomtype">geometry</scalar>
</list>
<list cmlx:lineCount="3" cmlx:templateRef="mol">
   <array dataType="xsd:string" dictRef="compchem:elementType" size="3">fe c o</array>
   <array dataType="xsd:double" dictRef="compchem:x3" size="3">0.0 0.0 0.7710998</array>
   <array dataType="xsd:double" dictRef="compchem:y3" size="3">0.0 0.0 -2.87778364</array>
   <array dataType="xsd:double" dictRef="compchem:z3" size="3">0.0 1.80680057 0.0</array>
</list>

Unit Testing Framework

The templates contain their own internal testing framework, in the form of one or more pairs of comment blocks within them.

A comment block with the class attribute "example.input" should contain a small representative chunk of text that the parsers can be tested with. The id attribute is used to match the example input with the the representative output that should be produced when the template acts on the sample text.

An input comment is shown below:

  <comment class="example.input" id="l601.fermi">
                          Isotropic Fermi Contact Couplings
        Atom                 a.u.       MegaHertz       Gauss      10(-4) cm-1
     1  C(13)              0.02539      28.54777      10.18656       9.52251
     2  C(13)              0.00582       6.54434       2.33518       2.18296
    13  Cl(35)             0.05688      24.94015       8.89927       8.31914
 --------------------------------------------------------
       Center         ----  Spin Dipole Couplings  ----
                      3XX-RR        3YY-RR        3ZZ-RR
 --------------------------------------------------------
     1   Atom        0.005300     -0.061839      0.056540
     2   Atom       -0.039723     -0.068059      0.107782
    13   Atom        0.621221     -2.038530      1.417309
 --------------------------------------------------------
                        XY            XZ            YZ
 --------------------------------------------------------
     1   Atom        0.000010      0.095387      0.000013
     2   Atom        0.005157      0.081893      0.006262
    13   Atom        0.000344      3.043747      0.000390
 --------------------------------------------------------

  </comment>

The matching example.output comment is below:

   <comment class="example.output" id="l601.fermi">
    <module cmlx:templateRef="l601.fermi" xmlns="http://www.xml-cml.org/schema" xmlns:cmlx="http://www.xml-cml.org/schema/cmlx">
      <list cmlx:lineCount="3" cmlx:templateRef="fermi.atom">
        <array dataType="xsd:integer" dictRef="cc:serial" size="3">1 2 13</array>
        <array dataType="xsd:string" dictRef="x:elementType" size="3">C C Cl</array>
        <array dataType="xsd:integer" dictRef="x:isotopeNumber" size="3">13 13 35</array>
        <array dataType="xsd:double" dictRef="cc:coupling" size="3">0.02539 0.00582 0.05688</array>
        <array dataType="xsd:double" dictRef="cc:coupling" size="3">28.54777 6.54434 24.94015</array>
        <array dataType="xsd:double" dictRef="cc:coupling" size="3">10.18656 2.33518 8.89927</array>
        <array dataType="xsd:double" dictRef="cc:coupling" size="3">9.52251 2.18296 8.31914</array>
      </list>
      <list cmlx:lineCount="3" cmlx:templateRef="fermi.spindipole">
        <array dataType="xsd:integer" dictRef="cc:serial" size="3">1 2 13</array>
        <array dataType="xsd:double" dictRef="g:spindipole.xx" size="3">0.0053 -0.039723 0.621221</array>
        <array dataType="xsd:double" dictRef="g:spindipole.yy" size="3">-0.061839 -0.068059 -2.03853</array>
        <array dataType="xsd:double" dictRef="g:spindipole.zz" size="3">0.05654 0.107782 1.417309</array>
      </list>
      <list cmlx:lineCount="3" cmlx:templateRef="fermi.spindipole">
        <array dataType="xsd:integer" dictRef="cc:serial" size="3">1 2 13</array>
        <array dataType="xsd:double" dictRef="g:spindipole.xy" size="3">1.0E-5 0.005157 3.44E-4</array>
        <array dataType="xsd:double" dictRef="g:spindipole.xz" size="3">0.095387 0.081893 3.043747</array>
        <array dataType="xsd:double" dictRef="g:spindipole.yz" size="3">1.3E-5 0.006262 3.9E-4</array>
      </list>
    </module>
  </comment>

It is possible for the templates to contain multiple examples, provided that each pair has matching id attributes. In this case, each matching pair will be tested in turn and all must pass for the unit test to be successful.

For the NWChem logfile templates, the code that runs these tests lives in the file:

[WWW]TemplateUnitTests.java

To test and develop an individual template (using the xyz template as an example), the following line needs to be added to the TemplateTest.java file.

   @Test public void testXyz()                                   {runTemplateTest("xyz");}

The individual test can be run from within Eclipse, but from the command-line, it only appears possible to run all of the TemplateTests (see note below), using the following command, whilst sat in the jumboconverters-compchem/jc-compchem-nwchem directory:

mvn -Dtest="log.TemplateUnitTests" test

If you are developing a template, the first time this is run, it will fail. However, it will print out the output of running the test, and something like the following:

==============template=================== 
Error: template expected:<3> but was:<4>
    XMLDIFF reference

------------test---------------------
<?xml version="1.0" encoding="UTF-8"?>
  <module cmlx:templateRef="xyz" xmlns="http://www.xml-cml.org/schema" xmlns:cmlx="http://www.xml-cml.org/schema/cmlx">
      <list cmlx:lineCount="3" cmlx:templateRef="fermi.atom">
        <array dataType="xsd:integer" dictRef="cc:serial" size="3">1 2 13</array>

The chunk of test after the ————test——————— line, and excluding the <?xml version="1.0" encoding="UTF-8"?> line is the output of the test. This should be checked, and if correct, placed in the <comment class="example.output" id="xyz"> tag in the template. Re-running the test should then lead to a successful result.

Note:' The discussion at [WWW]stackoverflow and [WWW]maven documentation suggests that the following syntax should work:

mvn -Dtest="log.TemplateTest#testXyz" test

But this appears not to be the case. Are we using the junit < 4.7?

Transforming the raw XML

As has been mentioned, the parsing is a two-stage process, consisting of marking up the file with xml and then converting the raw XML to valid CML. In some cases, the raw XML may already be valid CML, but it most cases transforms will need to be applied.

The transforms can either be applied within the template, after the text has been parsed and marked up, or as an entirely separate step, once the whole file has been parsed.

The transformation process relies heavily on the powerful [WWW]XPath language. A short tutorial on xpath can be found [WWW]here.

The philosophy of the transforms is very similar to the idea of templates in [WWW]xslt, using the idea of "nodeset" to which operations are applied.

The transforms are carried out by elements like the following:

<transform process="addAttribute" xpath="./cml:module[@cmlx:templateRef='job']" name="id" value="job" />

In this case, the attribute id="job" will be added to all cml modules that are direct children of the document, and have the templateRef "job".

The transforms have a process which defines the operation that will be carried out, almost all have an xpath that is an xpath expression indicating the elements the process will be applied to (the nodeset), and a variable number of arguments, depending on the process being carried out.

A brief overview of the key transformations follows below, however, for those with a strong constitution, a more comprehensive documentation can be found by examining the code in the file [WWW]TransformElement.java

The text from ~ line 160, starting with the comment // process values lists the processes that are available.

Various miscellaneous notes will be added in the section below, which will be merged into the documentation in due course.

Key Transforms

  <transform process="addAttribute"
                   xpath=".//cml:molecule"
                   name="formalCharge"
                   value="$number(.//cml:scalar[@dictRef='g:charge'])" />

 <transform process="addChild"
                 xpath="."
                 elementName="cml:module"
                 id="jobList1"
                 position="0"
                 dictRef="cc:jobList" />
<transform process="addDictRef"
                 xpath="//cml:property[cml:module[@cmlx:templateRef='l601.popanal']]"
                 value="cc:popanal "/>
  <transform process="addId"
                  value="mol9999"
                  xpath=".//cml:molecule[starts-with(@id,'a')]" />
  <transform process="addMap"
                   xpath="."
                   id="variableConstantMap"
                   from=".//cml:scalar[@dictRef='g:variable' or @dictRef='g:const']"
                   to=".//cml:scalar[@dictRef='g:value']" />
  <transform process="addNamespace"
                  xpath="."
                 name="convention"
                 value="http://www.xml-cml.org/convention/" />
  <transform process="addSibling"
                  xpath="./cml:module[@id='calculation']/cml:module[@cmlx:templateRef='l202.rotconst']"
                  elementName="cml:module"
                  id="l202.group"
                  position="1" />
  <transform process="addUnits"
  xpath=".//cml:scalar[@dictRef='compchem:total_energy']"
  value="nonsi:hartree"
  />
<transform process="copy"
                xpath="(//cml:list[@cmlx:templateRef='l914_excit2'])[1]"
                 to="."/>
<transform process="createAngle"
                 xpath=".//cml:list/cml:list[cml:atom]"
                atomRefs="$string(cml:scalar[3]) $string(cml:scalar[1]) $string(cml:atom/@id)" value="$string(cml:scalar[4])" />
   <transform process="createArray"
                   xpath="."
                  from="./cml:list[@cmlx:templateRef='length']/cml:scalar[@dictRef='g:symbol']"/>
  <transform process="createAtom"
                  xpath=".//cml:scalar[@dictRef='cc:elementType']" />
 <transform process="createDate"
                 xpath=".//cml:list[@dictRef='g:archive1']/cml:scalar[9]"
                 format="dd-MMM-yyyy"
                 dictRef="cc:date"/>
 <transform process="createDouble"
                 xpath=".//cml:list[@dictRef='g:archive.namevalue']/cml:scalar[@dictRef='x:HF']"
                 dictRef="cc:hfenergy" />
  <transform process="createFormula"
                  xpath=".//cml:list[@dictRef='g:archive1']/cml:scalar[7]"/>
  <transform process="createLength"
                  xpath=".//cml:list/cml:list[cml:atom]"
                  atomRefs="$string(cml:scalar[1]) $string(cml:atom/@id)" value="$string(cml:scalar[2])"/>
   <transform process="createList"
                    xpath=".//cml:module[@cmlx:templateRef='multipole']"/>
      <transform process="createMatrix33"
                      xpath="." dictRef="g:axis"
                      from=".//cml:scalar[contains(@dictRef,':x.') or contains(@dictRef,':y.') or contains(@dictRef,':z.')]" />
      <transform process="createMatrix33" xpath="."
                       dictRef="g:axis"
                       from=".//cml:scalar[contains(@dictRef,':x.') or contains(@dictRef,':y.') or contains(@dictRef,':z.')]" />
<template id="l202.orient" name="input or standard orientation" repeat="*"
    pattern="\s*(Input|Standard)\s*orientation:\s*$\s*\-+\s*"
    endPattern="\s*\d.*$\s*\-+\s*" endOffset="2">

  <comment class="example.input" id="l202.orient">
                          Input orientation:
 ---------------------------------------------------------------------
 Center     Atomic     Atomic              Coordinates (Angstroms)
 Number     Number      Type              X           Y           Z
 ---------------------------------------------------------------------
    1          6             0        0.000000    0.000000    0.000000
    2          1             0        0.000000    0.000000    1.093266
    3          1             0        1.030741    0.000000   -0.364422
    4          1             0       -0.515370   -0.892648   -0.364422
    5          1             0       -0.515371    0.892648   -0.364422
 ---------------------------------------------------------------------
  </comment>

  <record repeat="5"/>
  <record repeat="*" makeArray="true" id="atom">{I,cc:serial}{I,cc:elementType}{I,g:atomicType}{F,cc:x3}{F,cc:y3}{F,cc:z3}</record>
  <record/>

  <transform process="createMolecule" xpath="./cml:list[@cmlx:templateRef='atom']/cml:array" id="mol.l202.orient"/>
  <transform process="pullupSingleton" xpath="./cml:list"/>

  <comment class="example.output" id="l202.orient">
    <module cmlx:templateRef="l202.orient" xmlns="http://www.xml-cml.org/schema" xmlns:cmlx="http://www.xml-cml.org/schema/cmlx">
      <molecule id="mol.l202.orient" cmlx:templateRef="atom">
        <atomArray>
          <atom id="a1" elementType="C" x3="0.0" y3="0.0" z3="0.0">
            <scalar dataType="xsd:integer" dictRef="cc:serial">1</scalar>
            <scalar dataType="xsd:integer" dictRef="g:atomicType">0</scalar>
            <scalar dataType="xsd:integer" dictRef="cc:atomicNumber">6</scalar>
          </atom>
          <atom id="a2" elementType="H" x3="0.0" y3="0.0" z3="1.093266">
            <scalar dataType="xsd:integer" dictRef="cc:serial">2</scalar>
            <scalar dataType="xsd:integer" dictRef="g:atomicType">0</scalar>
            <scalar dataType="xsd:integer" dictRef="cc:atomicNumber">1</scalar>
          </atom>
          <atom id="a3" elementType="H" x3="1.030741" y3="0.0" z3="-0.364422">
            <scalar dataType="xsd:integer" dictRef="cc:serial">3</scalar>
            <scalar dataType="xsd:integer" dictRef="g:atomicType">0</scalar>
            <scalar dataType="xsd:integer" dictRef="cc:atomicNumber">1</scalar>
          </atom>
          <atom id="a4" elementType="H" x3="-0.51537" y3="-0.892648" z3="-0.364422">
            <scalar dataType="xsd:integer" dictRef="cc:serial">4</scalar>
            <scalar dataType="xsd:integer" dictRef="g:atomicType">0</scalar>
            <scalar dataType="xsd:integer" dictRef="cc:atomicNumber">1</scalar>
          </atom>
          <atom id="a5" elementType="H" x3="-0.515371" y3="0.892648" z3="-0.364422">
            <scalar dataType="xsd:integer" dictRef="cc:serial">5</scalar>
            <scalar dataType="xsd:integer" dictRef="g:atomicType">0</scalar>
            <scalar dataType="xsd:integer" dictRef="cc:atomicNumber">1</scalar>
          </atom>
        </atomArray>
        <formula formalCharge="0" concise="C 1 H 4">
          <atomArray elementType="C H" count="1.0 4.0"/>
        </formula>
        <bondArray>
          <bond atomRefs2="a1 a2" id="a1_a2" order="S"/>
          <bond atomRefs2="a1 a3" id="a1_a3" order="S"/>
          <bond atomRefs2="a1 a4" id="a1_a4" order="S"/>
          <bond atomRefs2="a1 a5" id="a1_a5" order="S"/>
        </bondArray>
        <property dictRef="cml:molmass">
          <scalar dataType="xsd:double" units="unit:dalton" xmlns:unit="http://www.xml-cml.org/unit/si/">16.04246</scalar>
        </property>
      </molecule>
    </module>
  </comment>
</template>
  <transform process="createNameValue"
                  xpath="./cml:list/cml:list"
                  name=".//cml:scalar[@dictRef='x:name']"
                 value=".//cml:scalar[@dictRef='x:value']" />
  <transform process="createString"
                  xpath="./cml:list/cml:scalar"/>
  <transform process="createTable"
                   xpath=".//cml:list[@cmlx:templateRef='symmadapt']" />
 <transform process="createTorsion"
                 xpath=".//cml:list/cml:list[cml:atom]"
                atomRefs="$string(cml:scalar[5]) $string(cml:scalar[3]) $string(cml:scalar[1]) $string(cml:atom/@id)" value="$string(cml:scalar[6])" />
<transform process="createVector3"
                 xpath="."
                dictRef="g:coupling.ten"
                from="./cml:list/cml:list/cml:scalar[contains(@dictRef,'.a.t') or contains(@dictRef,'.b.t') or contains(@dictRef,'.c.t')]"  />
<transform process="createWrapper"
                xpath=".//cml:module/text()"
                elementName="UNPARSED"/>
<metadata name="n:basis_type">
  <scalar dataType="xsd:string">ao basis</scalar>
</metadata>
 <transform process="createWrapperMetadata"
                  xpath=".//cml:scalar[@dictRef='cc:version' or
                           @dictRef='cc:date' or
                           @dictRef='cc:title']"/>
  <transform process="createWrapperParameter"
                  xpath=".//cml:scalar[@dictRef='cc:hostname' or
                           @dictRef='cc:jobname' or
                           @dictRef='cc:method' or
                           @dictRef='cc:basis' ]"/>
  <transform process="createWrapperProperty"
                   xpath=".//*[@dictRef='cc:electronicstate' or
                           @dictRef='cc:hfenergy' or
                           @dictRef='cc:dipole' or
                           @dictRef='cc:dipolederiv' or
                           @dictRef='cc:polarizability' or
                           @dictRef='cc:pointgroup' or
                           @dictRef='cc:rmsd' or
                           @dictRef='cc:rmsf']"/>
 <transform process="createZMatrix"
                 xpath="." id="zinitial"/>
  <transform process="delete"
                   xpath="(//cml:list[@cmlx:templateRef='l914_excit2'])[1]"/>
  <transform process="debugNodes"
                   xpath=".//cml:module[not(cml:array)]"/>
 <transform process="groupSiblings"
                 xpath=".//cml:module[@id='l202.group']" />
 <transform process="joinArrays"
                 xpath=".//cml:list[@cmlx:templateRef='atom']/cml:array"   />
<transform process="move"
                 to="."
                 xpath=".//*[contains(@dictRef,':serial') or contains(@dictRef,':elementType') or contains(@dictRef,':isotop') or contains(@dictRef,':coupling')]" />
<transform process="moveRelative"
                 xpath="//cml:module[@cmlx:templateRef='l4601.virtual']"
                 to="parent::*/parent::*/parent::*"/>
<transform process="pullup"
                xpath=".//cml:module[@cmlx:templateRef='l1.version']/cml:*"/>
  <transform process="pullupSingleton"
                  xpath=".//cml:list"/>
 <transform process="reparse"
                  xpath=".//cml:scalar[@id='scraped']"
                  regexPath=".//record[@id='natoms']"/>
 <transform process="setValue"
                 xpath=".//cml:list/cml:scalar[2] |
                            .//cml:list/cml:scalar[4] |
                            .//cml:list/cml:scalar[6]"
               map="//cml:map[@id='variableMap']"
               value="$string(.)"/>
 <transform process="split" xpath=".//cml:array[@dictRef='cc:mulliken']"/>

Notes on Transforms

<transform process="debugNodes" xpath=".//cml:array[position() &gt; 1 and position() &lt; 4]"/>

foo[@dictRef[namespace-uri()='http://www.xml-cml.org/dict/gaussian' and .='charge']]

JUMBO-Converters filesystem structure

(I guess some of this is standard for Maven projects but my ignorance forces me to document everything. The bright side is that other newbies like me will feel happy!)

The main folder is

jumboconverters-compchem/

Under this is:

jumboconverters-compchem/
   jc-compchem-nwchem/

The two most important subfolders of this are

jumboconverters-compchem/
  jc-compchem-nwchem/
     src/
     target/

The second one is where the final compiled Java classes are located (any more stuff?) and we will not care about it for the moment. The src subfolder, as its name indicates, contains the source code associated to the compchem part of JUMBOconverters (i.e., the one most related to the Quixote project). Inside the src subfolder, we have the following chain of folders, at the bottom of which all Java source code is located:

jumboconverters-compchem/
  jc-compchem-nwchem/
    src/
      main/
        java/
          org/
            xmlcml/
              cml/
                converters/

Inside converters, we have two main subfolders:

jumbo-converters/
  jumbo-converters-compchem/
    src/
      main/
        java/
          org/
            xmlcml/
              cml/
                converters/
                  compchem/
                  marker/

The most specific compchem code is in compchem (as you might have guessed!) ordered by the name of the compchem package (gamessus, gaussian, nwchem, etc.), and marker contains more general source code to support the former.

If you are Java-savy, you might want to check these folders and read the code, but one of the great things about the declarative approach that [WWW]PMR has created into JUMBOconverters and we describe in this page is that you don't need to! If you know [WWW]regular expressions and some very basic [WWW]XPath (both of which you could even infer from already made examples), that should be sufficient.

One important thing to remember though, even if you don't plan to read the Java source code, is that the above folders structure translates into the names of the classes that do all the magic stuff, so, if you want to call these classes in the command line, like in

mvn -e exec:java -Dexec.mainClass=org.xmlcml.cml.converters.compchem.nwchem.log.NWChemLog2XMLConverter -Dexec.args="./src/test/resources/compchem/nwchem/log/in/test1.out ./test.cml"

you need to have this structure in mind.

The declarative bits of the parsing infrastructure (i.e., what you, parsers developer, will have to check, understand and probably make a version for your favourite compchem code) are inside a similar folder tree under src/main:

jumbo-converters/
  jumbo-converters-compchem/
    src/
      main/
        resources/
          org/
            xmlcml/
              cml/
                converters/
                   compchem/
                     amber/
                     gamessus/
                     gaussian/
                     nwchem/
                     ...

Inside each code folder, one can find subfolders for the different types of file, and inside each one of them a templates subfolder, e.g.,

jumbo-converters/
  jumbo-converters-compchem/
    src/
      main/
        resources/
          org/
            xmlcml/
              cml/
                converters/
                   compchem/
                     gaussian/
                       in/
                         templates/
                       log/
                         templates/
                       ...

In the rest of the sections and in some of the tutorials, we explain in detail how the different bits of declarative parsing are related and how everything works, but let us mention at this point that, at the filetype folders (i.e., at in or log) the top level parsing template list file templateList.xml can be found, while each one of the smaller templates included in this list are located in templates.

Now, branching out at the same level as main, still inside src, we have a test subfolder, which contains, on the one hand (under java), the Java source code for performing automatic tests, and, on the other hand (under resources), a number of example files produced by the compchem codes that Quixote wants to tackle. The scheme of the folder tree is as follows:

jumbo-converters/
  jumbo-converters-compchem/
    src/
      test/
        java/
          org/
            xmlcml/
              cml/
                converters/
                   compchem/
                     amber/
                     gamessus/
                     gaussian/
                     nwchem/
                     ...
        resources/
           compchem/
              amber/
              gamessus/
              gaussian/
                in/
                log/
                ...
              nwchem/
              ...

A general scheme summarizing all the details commented above is the following:

jumbo-converters/                    *** Main JUMBOconverters folder
  jumbo-converters-compchem/         *** Compchem JUMBOconverters
    src/                             *** Source code and test files
      main/                          *** Source code for the parsing machinery
        java/                        *** Java source code
          org/
            xmlcml/
              cml/
                converters/
                  compchem/
                  marker/
        resources/                   *** Declarative parsing source code
          org/
            xmlcml/
              cml/
                converters/
                   compchem/
                     amber/
                     gamessus/
                     gaussian/
                       in/           *** Top level parsing directives
                         templates/  *** Subparsers templates
                       log/
                         templates/
                       ...
                     nwchem/
                     ...
      test/                          *** Source code for the automatic testing
        java/
          org/
            xmlcml/
              cml/
                converters/
                   compchem/
                     amber/
                     gamessus/
                     gaussian/
                     nwchem/
                     ...
        resources/                   *** Example test files
           compchem/
              amber/
              gamessus/
              gaussian/
                in/
                log/
                ...
              nwchem/
              ...
    target/                          *** Compiled classes
This is a Wiki Spot wiki. Wiki Spot is a 501(c)3 non-profit organization that helps communities collaborate via wikis.