The service will be shut down on April 1st. Please see this page for more information.


EditEdit InfoInfo TalkTalk

    1. Chempound
    2. Documentation
    3. Repositories
    4. Using Chempound
      1. Getting json with Python
      2. Requesting the RDF
    5. SPARQL queries
      1. Discovering the available search terms
    6. Remote Chempound SPARQL queries with Python
    7. Hacking Chempound
      1. Overview of the repositories
      2. Adding New Data and editing the Splash page
    8. Installing Chempound
      1. Installing Chempound into an existing Jetty server
      2. Security Considerations
    9. Debugging Chempound
      1. Debugging Chempound running under Jetty


Chempound is a server for archiving and searching the outputs of computational chemistry calculations. It can be used as a standalone tool for managing the files on a users' personal computer, or as a managed server for curating the data generated by a group/company.

The website for chempound can be found at: [WWW] This also contains links to download the latest version of the software and descriptions of how to use it.

An example of a chempound server containing the results of several thousand calculations can be found here: [WWW]

The rest of this page is a temporary placeholder for information that will be moved to the chempound website, so please ignore for the time being!


The existing documentation for Chempound can be found here:


The repository for the chempound packages is hosted on bitbucket: [WWW]

Using Chempound

With a functioning chempound respository in place, we can now start to query the data held within it.

For simple searches, we can just [WWW]Browse through the files, or use the simple [WWW]Search functionality on the web interface to pull out entries of interest.

This is fine for small, arbitrary searches, but Chempound also makes it very easy to automate searches and extract subsets of the data in a variety of ways.

Chempound uses a [WWW]RESTful interface, which means that, by going to the url for a particular calculation, depending on how we make the request to the server, we can receive the requested data in a variety of formats.

The currently supported formats are:

If we take an example computational chemistry calculation done with the [WWW]Gaussian code, and hosted on the [WWW]Cambridge Chempound server, if we go to the url for the calculation with a browser, we will get an html Splash Page, with a human-readable summary of the calculation, and the ability to view the structure in jmol:


We get this page, because our browser has requested a text/html representation of the resource.

Getting json with Python

The following python script, sets the http header to Accept json, and then prints out the json returned.

import urllib2
import json

# url of the calculation we are interested in
url = ""

# Set up a a request object and add the Accept header to ask for json
request = urllib2.Request(url)
request.add_header('Accept','application/json' )
response = urllib2.urlopen(request)

# Can pass the response object to json.load, as it has a read() method
# This just creates a python dictionary, which we can query
json_output = json.load(response)

# Use json dumps method to write out formatted json
print json.dumps(json_output, sort_keys=True, indent=4)

This outputs:

    "resources": [
            "uri": ""
            "uri": ""
            "uri": ""
            "uri": ""
            "uri": ""
    "title": "C 36 H 28 P 2",
    "uri": ""

Within chempound, the various files that make up the entry for the calculation, are grouped together as an [WWW]ORE object. The resources key of the json object holds, these, and includes the uri of the original log file, the cml file, gif picture generated by jmol etc.

Requesting the RDF

Chempound is built on RDF and a primary component is a triple store containing RDF statements describing the structure of the data, and its associated metadata.

If we query the url and request the rdf serialised as xml, we can receive an object that contains the full data of the object, including the links to the files. The following python script does this and prints out the resulting rdf/xml:

import urllib2

# url of the calculation we are interested in
url = ""

# Set up a a request object and add the Accept header to ask for rdf
request = urllib2.Request(url)
request.add_header('Accept','application/rdf+xml' )
response = urllib2.urlopen(request)

#print out what we got back

SPARQL queries

SPARQL is a query language for extracting data represented as RDF, in much the same way the SQL is a language for querying data in relational databases. As the data in Chempound is stored as RDF, SPARQL is the language of choice for making complex queries against the stored data.

A good - and chemistry related - tutorial on SPARQL can be found [WWW]here.

The chempound webserver provides a page where SPARQL queries can be typed into a webpage and the results returned as html or RDF. The SPARQL page on the Cambridge server can be found [WWW]here.

The easiest way to get to grips with SPARQL is to dissect a simple query:

SELECT  ?molecule
      ?molecule <> "H 2 O 1" .

The crucial line is the one stating: ?molecule <> "H 2 O 1" .

This uses the RDF subject:predicate:object pattern. The subject is the variable molecule (variables in SPARQL are prefixed with a ?, although you can also use $), the predicate is a uri which references the CML schema, and the object is a string literal. The statement is then terminated by a full stop.

What this says is that we want to assign to the variable molecule, all the entities where the cml formula property is "'H 2 O 1"'.

The SELECT statement says that we want to the query to return the molecule variable, which will contain the list of all objects that matched the statement.

If we run this against the cambridge chempound server we get back something like the following:

Variable Bindings Result


Which returns the uri's of all the water molecules in the database.

We can now look at a more advanced query:

PREFIX cml: <>

SELECT  ?formula ?inchi ?molecule
      ?molecule cml:formula ?formula .
      ?molecule cml:inchi ?inchi .
      FILTER ( ?formula = "H 2 O 1" )

The first line is equivalent to declaring a namespace in xml, and associates a convenient label with a long uri, so that instead of writing <>, we can just write cml.

We are now selecting 3 variables from our dataset, and they will be returned in the order we have listed them. The WHERE statement has been omitted as it is implicit.

The next two lines by themselves would select all entities in the database (and return them in the molecule variable) that had the cml properites formula and inhi. However, we are filtering the returned data to restrict the values returned to those where the value contained in the formula variable is "H 2 O 1".

Discovering the available search terms

The data that is extracted into RDF and therefore available for searching in Chempound is determined by the convention and dictionaries that apply to the files in question.

Please follow these links for more information on [WWW]conventions and [WWW]dictionaries.

For CIF files, the CIF [WWW]dictionary lists all the terms that are available.

For Computational Chemistry outputs, the CompChem [WWW]dictionary lists the indexed terms.

To determine how best to search for data, it is usually useful to go to the splash page for a representative structure in chempound and download the RDF file. This will show how the form of the RDF and how a structure needs to be constructed.

For example, if we wish to search on the cell_measurement_temperature, looking at the RDF for a CIF file, we see it is structured as shown below:

    <iucr:cell_measurement_temperature rdf:parseType="Resource">
      <rdf:value rdf:datatype="">173.0</rdf:value>
      <cml:units rdf:resource=""/>
      <cml:errorValue rdf:datatype="">2.0</cml:errorValue>

If we just search for the cell_measurement_temperature, we will be returned the RDF resource, we therefore further need to extract the value, which is done with the following query:

PREFIX rdf: <>
PREFIX cif: <>

SELECT  ?entry ?value
      ?entry cif:cell_measurement_temperature ?temp  .
      ?temp rdf:value ?value .

A similar example for a CompChem file is shown below. This searches on a term in the compchem dictionary, and then filters the value for only those structures with a charge of 0.

PREFIX rdf: <>
PREFIX compchem: <>

SELECT  ?molecule ?charge
      ?molecule compchem:charge ?chargeR  .
      ?chargeR rdf:value ?charge
      FILTER ( ?charge = 0 )

Remote Chempound SPARQL queries with Python

The Chempound SPARQL page will return the results as html or rdf/xml. The rdf/xml can of course be saved and processed offline, but it is more useful to be able to query and download the results all from within a single script.

As the Chempound SPARQL endpoint exposes a RESTful API , we can query it directly. The following python script executes a SPARQL query against chempound and then saves the result as a csv (comma-separated variable) file, so that the results of the query can be imported into a spreadsheet program for (e.g.) plotting a graph of the results.

import urllib
import urllib2
import xml.etree.ElementTree as ET

# The SPARQL query we want to execute
query="""SELECT ?molecule ?inchi
?molecule <> "H 2 O 1" .
?molecule <> ?inchi .

# url of the chempound SPARQL endpoint we want to query
baseurl = ""

# The "comma-separated variable" file where the results should
# go so they can be imported into a spreadsheet program
csvFile = "/Users/jmht/sparql_results.csv"

# The real work starts here!

# SPARQL namespace - shouldn't need to change this
# NB: results format based on:

# Set up our GET query to the SPARQL endpoint
# Encode the parts of the query string into a form suitable for POST
urlparam = { "query" : query }
request = urllib2.Request(baseurl,querystr)

# Add the header to state what we want back

# Get the results
response = urllib2.urlopen(request)

# We now have the results in SPARQL xml so we need to turn them into
# a csv file - we use etree to do this:

# Parse results to create etree & get root element
etree = ET.parse(response)
root = etree.getroot()

# Sparql query always returns 2 elements: head and results
head,results = root[:]

# Get head and create dictionary for variables
resultsDict = {}
for var in head:
    resultsDict[var.get("name")] = []

# Loop through results adding the relevant bindings to the dictionary.
# Currently only support uri
for result in results:
    for binding in result:
        # One element for each binding of type: uri, literal or label
        # currently only deal with uri
        if ( len(binding) == 1 and binding[0].tag == "{%s}uri" % namespace ):
            raise RuntimeError("Results only supported for uri!")

# output as csv file
rfile = open(csvFile,'w')

# column headers
headers = resultsDict.keys()

# data
for i in range(nresults):
    for header in headers:


Hacking Chempound

This section is for those who may be interested in altering or extending Chempound. It isn't intended to be a programmer's manual, more a brief overview of chempound's current structure and a walk-though on how to add additional CML data to the repository, which is expected to be the reason why most people would currently want to extend Chempound.

NB: additional information can be found in Jorge Estrada's [WWW]repository

Chempound is actually a very general tool for managing collections of objects (collected as [WWW]ORE aggregates) and their associated data and metadata, using [WWW]RDF for the data model. As such, almost all of the chemistry functionality is implemented using plugins, so the code that needs to be modified to change the chemistry behaviour is very localised.

Overview of the repositories

The repositories for the chempound packages is hosted on bitbucket: [WWW]

Currently, there are 8 repositories as detailed below:

A slightly more detailed view of the chemistry-specific repositories and their modules follows below.

Repository Modules Description and important classes
[WWW]chemistry [WWW]chemistry-common Classes to handle the generic processing of CML datatypes and the conversion to RDF
* - mime types
* - code to handle the conversion of generic, simple cml datatypes into RDF.
[WWW]chemistry-importer Base classes for the client-side conversion of files and the generation of images
[WWW]chemistry-jmol-plugin Classes to drive jmol to generate the images, and also the jmol code itself
[WWW]chemistry-search-structure Classes to handle the chemistry-specific search page - if you want to add more chemistry search boxes, the you'll need to edit things here.
[WWW]compchem [WWW]compchem-common General code related to the compchem RDF data structures. The utility functions used by the freemarker templates to access the compchem data live here.
[WWW]compchem-handler Code to handle the processing of chemical data on the server, such as display of the html pages and the freemarker templates.
[WWW]compchem-importer The classes to handle importing code-specific logfiles (NWChem, Gaussian etc) using the jumbo-classes. These classes are used by the client, not chempound itself. The test cases for checking the imports also live here.
[WWW]compchem-test-harness Code to test the various compchem-specific modules, as most do not contain any test code themselves.

Adding New Data and editing the Splash page

Chempound extracts data from [WWW]CML in accordance with the [WWW]compchem convention. Provided that the data is a CML scalar, and is in the job's environment, initialization or finalization modules, with a dictRef (ideally) in the [WWW]compchem dictionary, then the data will already be extracted into RDF.

If additional data needs to be extracted (such as is currently done for basis sets and dft functionals), then all that may be necessary is to edit the file [WWW] to add the additional data to the RDF.

The html pages in chempound are generated using the [WWW]freemarker template engine. The freemarker template that is used to generate the html page for each individual structure is the file: [WWW]comp.ftl (other template and css files are in the parent directory).

In order to facilitate extracting key RDF data for use with the freemarker templates, several classes are used. For adding new terms, the following files needed to be edited:

When the new terms have been added, the tests should be updated, or a new test added in the directory [WWW]

If the new terms are to be added to the chemistry search page, then the [WWW] file will need to be edited, and suitable tests added to the file [WWW]

Installing Chempound

For installing chempound for personal use on a local machine, the [WWW]getting-started notes should be sufficient.

The following instructions apply for installing Chempound on an existing server, for use by an institution or group.

Installing Chempound into an existing Jetty server

Chempound is a pure java program, so can be run in any java container. These instructions are specific to installing it into [WWW]jetty for use on unix systems.

The latest version of the war file for chempound can be downloaded [WWW]here.

There are any number of ways to configure jetty, so this just describes one way, with some pointers to the other possibilities.

If you do not already have jetty installed on your server, and it is not available within the package management software for your distribution, a jetty hightide distribution, can be downloaded from [WWW]codehaus.

The following instructions assume that you have a jetty server, with a directory structure similar to the following (only the relevant files and directories are listed).

  | |
  | +-quixote.xml
  | |
  | +-jetty.xml
  | |
  | +-workspace/
  | | |
  | | +-cache/
  | | |
  | | +-content/
  | | |
  | | +-tdb/

The file start.jar is the java file used to start jetty with the command:

java -jar start.jar

By default, on startup, jetty will parse the file start.ini, which contains command-line options for the server, including the list of modules to include, and a list of XML configuration files that determine various options (these are listed one per line in the start.ini files and can be removed by commenting the line out with the # character). By default, the XML files reside in the etc directory. In this example, only one configuration file is used, the jetty.xml file in the etc directory.

This file contains the following:

<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN" "">

<!-- =============================================================== -->
<!-- Configure the Jetty Server                                      -->
<!--                                                                 -->
<!-- Documentation of this file format can be found at:              -->
<!--        -->
<!--                                                                 -->
<!-- Additional configuration files are available in $JETTY_HOME/etc -->
<!-- and can be mixed in.  For example:                              -->
<!--   java -jar start.jar etc/jetty-ssl.xml                         -->
<!--                                                                 -->
<!-- See start.ini file for the default configuraton files           -->
<!-- =============================================================== -->

<Configure id="Server" class="org.eclipse.jetty.server.Server">

    <!-- =========================================================== -->
    <!-- Server Thread Pool                                          -->
    <!-- =========================================================== -->
    <Set name="ThreadPool">
      <!-- Default queued blocking threadpool -->
      <New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
        <Set name="minThreads">10</Set>
        <Set name="maxThreads">200</Set>
        <Set name="detailedDump">false</Set>

    <!-- =========================================================== -->
    <!-- Set connectors                                              -->
    <!-- =========================================================== -->

    <Call name="addConnector">
          <New class="org.eclipse.jetty.server.nio.SelectChannelConnector">
            <Set name="host"><Property name="" /></Set>
            <Set name="port"><Property name="jetty.port" default="8181"/></Set>
            <Set name="maxIdleTime">300000</Set>
            <Set name="Acceptors">2</Set>
            <Set name="statsOn">false</Set>
            <Set name="confidentialPort">8443</Set>
            <Set name="lowResourcesConnections">20000</Set>
            <Set name="lowResourcesMaxIdleTime">5000</Set>

    <!-- =========================================================== -->
    <!-- Set handler Collection Structure                            -->
    <!-- =========================================================== -->
    <Set name="handler">
      <New id="Handlers" class="org.eclipse.jetty.server.handler.HandlerCollection">
        <Set name="handlers">
         <Array type="org.eclipse.jetty.server.Handler">
             <New id="Contexts" class="org.eclipse.jetty.server.handler.ContextHandlerCollection"/>
             <New id="DefaultHandler" class="org.eclipse.jetty.server.handler.DefaultHandler"/>

    <!-- =========================================================== -->
    <!-- extra options                                               -->
    <!-- =========================================================== -->
    <Set name="stopAtShutdown">true</Set>
    <Set name="sendServerVersion">true</Set>
    <Set name="sendDateHeader">true</Set>
    <Set name="gracefulShutdown">1000</Set>
    <Set name="dumpAfterStart">false</Set>
    <Set name="dumpBeforeStop">false</Set>

    <!-- =============================================================== -->
    <!-- Create the deployment manager                                   -->
    <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
    <!-- The deplyment manager handles the lifecycle of deploying web    -->
    <!-- applications. Apps are provided by instances of the             -->
    <!-- AppProvider interface.  Typically these are provided by         -->
    <!-- one or more of:                                                 -->
    <!--   jetty-webapps.xml       - monitors webapps for wars and dirs  -->
    <!--   jetty-contexts.xml      - monitors contexts for context xml   -->
    <!--   jetty-templates.xml     - monitors contexts and templates     -->
    <!-- =============================================================== -->
    <Call name="addBean">
        <New id="DeploymentManager" class="org.eclipse.jetty.deploy.DeploymentManager">
          <Set name="contexts">
            <Ref id="Contexts" />
          <Call name="setContextAttribute">

    <!-- =============================================================== -->
    <!-- Add a ContextProvider to the deployment manager                 -->
    <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
    <!-- This scans the contexts directory for xml files descrbing an app -->
    <!-- =============================================================== -->
    <Ref id="DeploymentManager">
      <Call name="addAppProvider">
          <New class="org.eclipse.jetty.deploy.providers.ContextProvider">
            <Set name="monitoredDirName"><Property name="jetty.home" default="." />/contexts</Set>
            <Set name="scanInterval">1</Set>


There are two ways that jetty is usually configured to serve applications:

This example uses the second approach, the final block of XML in the jetty.xml above, configuring jetty to monitor the context directory. The contexts directory contains one file, quixote.xml, the contents of which are shown below (with comments to explain relevant bits):

<?xml version="1.0"  encoding="ISO-8859-1"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN" "">

<Configure class="org.eclipse.jetty.webapp.WebAppContext">
  <!-- This is the URL where the application will be served from -->
  <Set name="contextPath">/chempound</Set>

  <!-- The absolute location of the war file for chempound on the server's filesystem -->
  <Set name="war">/home/jens/jetty-hightide-8.1.4.v20120524/quixote/quixote-repository-webapp-0.1-SNAPSHOT.war</Set>

  <!-- Set startup parameters -->
  <Get name="ServletContext">
    <Call name="setAttribute">
    <Call name="setAttribute">

NB: For general information and examples of contexts file, please see the [WWW]jetty wiki.

The first two Set commands should be self-explanatory.

The next block sets two important variables that are needed by chempound:

These two variables can also be set by setting them as environment variables before the server is started, or setting them on the command-line when the server is started as shown below:

java -Dchempound.uri="" -Dchempound.workspace="/home/jens/jetty-hightide-8.1.4.v20120524/quixote/workspace" -jar start.jar

Security Considerations

In order to make chempound available on a standard URL (such as [WWW], the server needs to listen for TCP requests on port 80.

On unix systems, only processes started by root are permitted to bind to ports numbered less than 1024, which would entail a requirement to run the chempound jetty server as root. However, this is not considered a good security practice, and there is no other reason why the server needs to run as root.

On debian-based systems, a way around this is the authbind package, which allows users to bind non-root servers to a low-numbered port.

Another approach is to start the server under a non-root user, binding to a high-numbered port and to use a firewall to redirect requests from port 80 to the port the server is listening on. If the server was started on port 8080, then the iptables rule to accomplish this would be:

iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to 8080

NB: if using this method, it is important to remember that the chempound.uri variable, will need to be set to point at the url as visible externally, and should not include the port number, as otherwise the CSS files will not be found.

Debugging Chempound

If Chempound is not working as expected, the logging facility can be used to increase the amount of information printed, which is useful for tracking down the causes of problems.

The logging subsystem consists of the interface [WWW]SLF4J 1.6.1 (Simple Logging Facade for Java): and the implementation [WWW]LOG4J 1.2.

The included configuration for DepositNWChem is:

log4j.rootLogger = WARN, A

log4j.appender.A = org.apache.log4j.ConsoleAppender
log4j.appender.A.layout = org.apache.log4j.PatternLayout
log4j.appender.A.layout.ConversionPattern = %-4r [%t] %-5p %c %x - %m%n = System.err

However, you can change the logging behavior of the application by adding your own file to the classpath. For instance, the following configuration file will set the general log level to INFO and, for class, the level will be DEBUG.

log4j.rootLogger = INFO, A

log4j.appender.A = org.apache.log4j.ConsoleAppender
log4j.appender.A.layout = org.apache.log4j.PatternLayout
log4j.appender.A.layout.ConversionPattern = %-4r [%t] %-5p %c %x - %m%n = System.err

For example, if you place the configuration file in your current working directory and you run DepositNWChem from there, you can add the current directory to the classpath as follows (it assumes you have the jar file of DepositNWChem with its dependencies in a target subdirectory):

$ java -cp .:target/quixote-utils-0.1-SNAPSHOT-jar-with-dependencies.jar net.quixote.utils.DepositNWChem http://localhost:8080/sword/collection/ n2.out

You can use logging anywhere in the code. You will need to grab a Logger object to pass the logging messages. Simply import the Logger and LoggerFactory classes, and call LoggerFactory.getLogger to obtain a Logger object. Then call any of the debug, info, warn or error methods to log your message at the appropriate log level.

The following code snippet shows how to get the root Logger as well as another child Logger (identified with the class name) and how to emit a INFO level message.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

class ... {
  ... method (...) {
        Logger rootL = LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME);"Using root logger.");

        Logger otherL = LoggerFactory.getLogger(;"Using CmlComp2RdfConverter logger.");


Debugging Chempound running under Jetty

A simple way to debug chempound when running under jetty, is to add the following lines to the jetty start.ini file, which is used to prepend command-line arguments to jetty (the arguments can also be added to the command-line when starting jetty, or indeed to any java program that supports log4j):


The first line turns on debugging for log4j itself - this is useful as it causes log4j to print which configuration file it is using. The second file gives the path to a log4j configuration file, which should contain the directives as described above.

This is a Wiki Spot wiki. Wiki Spot is a 501(c)3 non-profit organization that helps communities collaborate via wikis.