Tutorials and problems

EditEdit InfoInfo TalkTalk
Search:    

  1. First of all
  2. Some tricks of the trade
  3. JUMBO-Converters basic compiling and setup
  4. Running JUMBO-Converters for NWChem and Gaussian
    1. Gaussian
    2. NWChem
    3. Status
  5. How to run tutorial 1
  6. Writing a small parser
  7. Machine-specific known problems
    1. MacOS
  8. 25/08/2011 meeting usecase
    1. Machine
    2. Preliminaries
    3. Installing jumbo-converters-compchem
    4. Installing lensfield


Warning: a number of old tutorials which have not been checked recently but which may contain important information can be found at old - Tutorials and problems.


First of all

You will need to have the following components installed to use the Quixote infrastructure:

[WWW]rumah dijual | [WWW]hipnoterapi surabaya | [WWW]parfum | [WWW]perlengkapan bayi | [WWW]baby pink

Some tricks of the trade

Conventions:

Useful commands:

hg clone https://bitbucket.org/petermr/jumbo-converters myFolder
hg pull
hg update
mvn clean install

At the moment, the developing environment is a bit unstable, so, if you happen to have some problems with the tutorials here, it is sometimes a good idea to "restart" your environment before trying anything out. Some possible "restarting steps" are:

hg clone https://bitbucket.org/petermr/jumbo-converters
mvn clean install

and Delete the .m2 hidden folder in your home.

JUMBO-Converters basic compiling and setup

Clone the JUMBO-converters repository with the command (be patient; it may take several minutes):

hg clone https://bitbucket.org/petermr/jumbo-converters

A jumbo-converters directory (this will be the <project-root>) will be created in the directory where you issued the command to clone the repository. Enter into this directory with:

cd jumbo-converters

Run the command:

mvn clean install

This will download the java jar files required by the converters into your local maven repository (~/.m2/repository on unix), compile, test and install the code into your local maven repository. Everything will then be ready for using the JUMBO-Converters software.

Running JUMBO-Converters for NWChem and Gaussian

Gaussian

To use the Gaussian parser, cd into the <project-root>/jumbo-converters-compchem/jumbo-converters-compchem-gaussian directory.

cd jumbo-converters-compchem/jumbo-converters-compchem-gaussian

and run the converter. With the command:

mvn -e exec:java -Dexec.mainClass="org.xmlcml.cml.converters.compchem.gaussian.log.GaussianLog2CompchemConverter" -Dexec.args="logfile.out"

where logfile.out is the path to a gaussian logfile you wish to convert. This will then create a cml file named logfile.cml in the directory that the command was executed.

mvn exec:java takes care of setting up the Java environment (the classpath) so as to locate all compiled files needed for the execution. For this to happen, the previous command must be executed at the jumbo-converters-compchem-gaussian folder.

The argument -Dexec.mainClass states the Java class to execute (in this case, the Gaussian converter), and the -Dexec.args argument passes the space-separated strings in the argument as arguments to the main class.

NWChem

To use the NWChem converter, go into the jumbo-converters-compchem-nwchem folder:

cd jumbo-converters-compchem-nwchem

and run the converter (it will use a test file, and it will create as output the file ch3f_rot.xml:

mvn -e exec:java -Dexec.mainClass=org.xmlcml.cml.converters.compchem.nwchem.log.NWChemLogCompchemConverter

This will take one of the nwchem output files in jumbo-converters-compchem-nwchem/src/test/resources/compchem/nwchem/log/in/ and create cml file in the directory jumbo-converters-compchem-nwchem/test folder (for details of what happens, see the file [WWW]src/main/java/org/xmlcml/cml/converters/compchem/nwchem/log/NWChemLog2CompchemConverter.java.

Status

Use this section to indicate if the tutorial works for you, if it doesn't, if you needed to do something differently, etc.:

How to run tutorial 1

[WWW]"Tutorial 1" (which can be found at your local cloned <project-root>/jumbo-converters-core/tutorial/tutorial1.html), needs the following setup:

cd jumbo-converters-core
mkdir examples
cp tutorial/amber.inp examples/
cp tutorial/amber.template.xml examples/
cp tutorial/amber.ref.xml examples/
mvn -e exec:java -Dexec.mainClass=org.xmlcml.cml.converters.text.Examples

The result of the execution will be found at examples/amber.out.xml.

When modifying the template used, or when using a different input file, you should run the tutorial program with 3 arguments (in -Dexec.args) as follows:

mvn -e exec:java -Dexec.mainClass=org.xmlcml.cml.converters.text.Examples -Dexec.args="examples/amber.template.xml examples/amber.inp examples/amber.out.xml"

First argument is the name of the template file, second one is the name of the input file, and the third and last one is the name of the output file.

Writing a small parser

At present, the JUMBO-Converters for compchem work in a declarative manner, i.e., you don't need to know Java to write a parser; you just need to know the output of your favourite compchem code, a bit of [WWW]regular expressions and some very basic [WWW]XPath (both of which you could even infer from already made examples). Then you edit some XML files that tell the JUMBO-Converters what they have to do and you are done.

Here, I will describe a small bit of Gaussian parsing and how to develop it. For more details, check JUMBO-Converters and the rest of tutorials in this page.

Also, see Declarative parsing syntax for a complete list of the rules followed by the parsers and their relations to the template XML files.

The first thing to know is that Gaussian runs are separated into links, which seem to be some kind of internal modules, each of which is in charge of a more or less conceptually bounded task. The important issue for us is that, if the calculation has been run with the #P level of verbosity, as in

%Mem=900MB
#P RHF/6-31G(d,p) 5D 7F SCF=(Conver=10) TrackIO GFInput MaxDisk=16GB

HCO-L-Ala-NH2 RHF/6-31(d,p) single point calculation close to the polyproline II minimum

0 1
H1
C2         1 rC2H1
N3         2 rN3C2             1 aN3C2H1
O4         2 rO4C2             3 aO4C2N3                 1 pO4C2N3H1
C5         3 rC5N3             2 aC5N3C2                 1 dC5N3C2H1
...

then the points at which Gaussian enters or exits a link are clearly printed into the logfile; which is a very helpful feature for parsing.

For example, at the middle of the logfile that we will use in this tutorial,

jumbo-converters/jumbo-converters-compchem/src/test/resources/echeniquep/inputQC/g03_threeSPs/RHF_6-31Gdp_sp_HCO-L-Ala-NH2_hashP.log

you can read:

 (Enter /apps/apps64/g03/l302.exe)
 NPDir=0 NMtPBC=     1 NCelOv=     1 NCel=       1 NClECP=     1 NCelD=      1
         NCelK=      1 NCelE2=     1 NClLst=     1 CellRange=     0.0.
 One-electron integrals computed using PRISM.
 NBasis=   152 RedAO= T  NBF=   152
 NBsUse=   152 1.00D-06 NBFU=   152
 Leave Link  302 at Thu Feb 17 17:30:34 2011, MaxMem=  117964800 cpu:       0.2

However, we will not try to parse now this obscure information about the 1-electron integrals. Instead, we will go for the molecular formula, which can be found in link 202, in the middle of it, in a line which begins by a space (as all lines in the logfile do) and the word Stoichiometry:

 (Enter /apps/apps64/g03/l202.exe)
                          Input orientation:
 ---------------------------------------------------------------------
 Center     Atomic     Atomic              Coordinates (Angstroms)
 Number     Number      Type              X           Y           Z
 ---------------------------------------------------------------------
    1          1             0        0.000000    0.000000    0.000000
...
   16          1             0       -0.696737    1.335650    3.331321
 ---------------------------------------------------------------------
                    Distance matrix (angstroms):
                    1          2          3          4          5
     1  H    0.000000
     2  C    1.088294   0.000000
...
    16  H    0.000000
 Stoichiometry    C4H8N2O2
 Framework group  C1[X(C4H8N2O2)]
 Deg. of freedom    42
 Full point group                 C1
 Largest Abelian subgroup         C1      NOp   1
 Largest concise Abelian subgroup C1      NOp   1
                         Standard orientation:
 ---------------------------------------------------------------------
 Center     Atomic     Atomic              Coordinates (Angstroms)
 Number     Number      Type              X           Y           Z
 ---------------------------------------------------------------------
    1          1             0        2.673167   -0.694356    1.176039
...
   16          1             0        0.631638    1.777215   -0.581971
 ---------------------------------------------------------------------
 Rotational constants (GHZ):      2.9412883      1.7541705      1.2354314
 Leave Link  202 at Thu Feb 17 17:30:29 2011, MaxMem=  117964800 cpu:       0.1

The minimal set of declarative XML files needed to parse this is as follows:

First you need a file called topTemplate.xml and located at

jumbo-converters/jumbo-converters-compchem/src/main/resources/org/xmlcml/cml/converters/compchem/gaussian/log

It contains the following code:

<?xml version="1.0" encoding="UTF-8"?>
<template id='gaussian.log' output="VERBOSE">
  <templateList id='main' xmlns:xi="http://www.w3.org/2001/XInclude">
    <xi:include href="l202.temp.xml"/>
  </templateList>
</template>

I.e., it includes a list of subtemplates which, in this example, contains only one item, l202.temp.xml, whose task is to capture and parse link 202, which is located at

jumbo-converters/jumbo-converters-compchem/src/main/resources/org/xmlcml/cml/converters/compchem/gaussian/log/templates

and whose contents are:

 <template id="l202" repeatCount="*" pattern=".*Enter.*l202[^\d].*"
     endPattern=".*Leave Link +202[^\d].*">
     <comment>
       ...
     </comment>
   <templateList id='l202_list' xmlns:xi="http://www.w3.org/2001/XInclude">
     <xi:include href="l202.molformula.temp.xml"/>
   </templateList>
 </template>

The first two lines tell the JUMBO-Converters, through regular expressions, where link 202 begins and ends, respectively, and everything between those two matches is available for parsing in this file. Then, we see another templateList containing, again, only one item for this example, the file l202.molformula.temp.xml, which is located in the same folder as l202.temp.xml and contains:

<template id="t_molformula" pattern="\sStoichiometry.*"
 endPattern="\sFramework.*">
        <record id="r_stoichiometry">\sStoichiometry\s+{A,gaussian:molformula}.*</record>
</template>

In this case, we have no further lists of subtemplates and we just take the molecular formula. We detect the piece of text where it lies using pattern and endPattern as before, and this leaves the lines between the two matches available for parsing. The record tag parses the first line using another regular expression, and stores the formula string in a variable identified by gaussian:molformula.

If you now run the maven command in the previous tutorial:

mvn -e exec:java  -Dexec.mainClass=org.xmlcml.cml.converters.compchem.gaussian.log.GaussianLog2XMLConverter -Dexec.args="./src/test/resources/echeniquep/inputQC/g03_threeSPs/RHF_6-31Gdp_sp_HCO-L-Ala-NH2_hashP.log ./src/test/resources/echeniquep/outputXML/g03_three_SPs/RHF_6-31Gdp_sp_HCO-L-Ala-NH2_hashP.log.xml"

from

jumbo-converters/jumbo-converters-compchem

this produces the file

jumbo-converters/jumbo-converters-compchem/src/test/resources/echeniquep/outputXML/g03_threeSPs/RHF_6-31Gdp_sp_HCO-L-Ala-NH2_hashP.log.xml

which now contains the expected XML tagging:

<module lineCount="91" templateRef="l202">
 (Enter /apps/apps64/g03/l202.exe)
                          Input orientation:
 ---------------------------------------------------------------------
 Center     Atomic     Atomic              Coordinates (Angstroms)
 Number     Number      Type              X           Y           Z
 ---------------------------------------------------------------------
    1          1             0        0.000000    0.000000    0.000000
...
   16          1             0       -0.696737    1.335650    3.331321
 ---------------------------------------------------------------------
                    Distance matrix (angstroms):
                    1          2          3          4          5
     1  H    0.000000
     2  C    1.088294   0.000000
...
    16  H    0.000000
<module lineCount="1" templateRef="t_molformula">
<list templateRef="r_stoichiometry">
<scalar dataType="xsd:string" dictRef="g:molformula">C4H8N2O2</scalar>
</list>
</module>
 Framework group  C1[X(C4H8N2O2)]
 Deg. of freedom    42
 Full point group                 C1
 Largest Abelian subgroup         C1      NOp   1
 Largest concise Abelian subgroup C1      NOp   1
                         Standard orientation:
 ---------------------------------------------------------------------
 Center     Atomic     Atomic              Coordinates (Angstroms)
 Number     Number      Type              X           Y           Z
 ---------------------------------------------------------------------
    1          1             0        2.673167   -0.694356    1.176039
...
   16          1             0        0.631638    1.777215   -0.581971
 ---------------------------------------------------------------------
 Rotational constants (GHZ):      2.9412883      1.7541705      1.2354314
</module>
 Leave Link  202 at Thu Feb 17 17:30:29 2011, MaxMem=  117964800 cpu:       0.1

Machine-specific known problems

MacOS

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home

  <properties>

    <project.build.sourceEncoding>
      UTF-8
    </project.build.sourceEncoding>

    <project.reporting.outputEncoding>
      UTF-8
    </project.reporting.outputEncoding>

  </properties>

25/08/2011 meeting usecase

Machine

Preliminaries

export JAVA_HOME=$(/usr/libexec/java_home)

Installing jumbo-converters-compchem

hg clone https://bitbucket.org/wwmm/jumbo-converters

rm -fr ~/.m2
cd jumbo-converters/jumbo-converters-compchem
mvn clean install
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] jumbo-converters-compchem-common .................. SUCCESS [1:31.458s]
[INFO] jumbo-converters-compchem-testutils ............... SUCCESS [0.853s]
[INFO] jumbo-converters-compchem-amber ................... SUCCESS [1.184s]
[INFO] jumbo-converters-compchem-cml ..................... SUCCESS [0.951s]
[INFO] jumbo-converters-compchem-dalton .................. SUCCESS [0.829s]
[INFO] jumbo-converters-compchem-gamessuk ................ SUCCESS [0.826s]
[INFO] jumbo-converters-compchem-gamessus ................ SUCCESS [1.251s]
[INFO] jumbo-converters-compchem-gaussian ................ SUCCESS [19.494s]
[INFO] jumbo-converters-compchem-jaguar .................. SUCCESS [1.138s]
[INFO] jumbo-converters-compchem-molcas .................. SUCCESS [1.057s]
[INFO] jumbo-converters-compchem-mopac ................... SUCCESS [0.976s]
[INFO] jumbo-converters-compchem-nwchem .................. SUCCESS [19.956s]
[INFO] jumbo-converters-compchem-qespresso ............... SUCCESS [1.155s]
[INFO] jumbo-converters-compchem-turbomole ............... SUCCESS [0.795s]
[INFO] jumbo-converters-compchem-all ..................... SUCCESS [0.026s]
[INFO] jumbo-converters-compchem-misc .................... SUCCESS [12.102s]
[INFO] jumbo-converters-compchem ......................... SUCCESS [0.012s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2:35.630s
[INFO] Finished at: Tue Aug 23 12:58:45 CEST 2011
[INFO] Final Memory: 17M/81M
[INFO] ------------------------------------------------------------------------

and it ends the installation of jumbo-converters-compchem in my machine.

Installing lensfield

hg clone http://bitbucket.org/petermr/lensfieldjumbo

cd lensfieldjumbo/
unzip lensfield-0.1.1.zip
dos2unix lensfield2-0.1.1/bin/lf
chmod u+x lensfield2-0.1.1/bin/lf
ln -s /Users/pablo/now/Quixote/lensfieldjumbo/lensfield2-0.1.1/bin/lf /Users/pablo/bin/lf
lf --update
This is a Wiki Spot wiki. Wiki Spot is a 501(c)3 non-profit organization that helps communities collaborate via wikis.