Chempound
Chempound is a server for archiving and searching the outputs of computational chemistry calculations. It can be used as a standalone tool for managing the files on a users' personal computer, or as a managed server for curating the data generated by a group/company.
The website for chempound can be found at:
http://www.chempound.net. This also contains links to download the latest version of the software and descriptions of how to use it.
An example of a chempound server containing the results of several thousand calculations can be found here:
http://quixote.ch.cam.ac.uk.
The rest of this page is a temporary placeholder for information that will be moved to the chempound website, so please ignore for the time being!
Documentation
The existing documentation for Chempound can be found here:
-
this page:
http://quixote.wikispot.org/Chempound
-
the chempound website:
http://www.chempound.net
-
Jorge's repository:
https://bytebucket.org/jestrada/quixote-docs/wiki/main/quixote-main.html
-
Sam's in-press JODI paper:
http://wwmm.ch.cam.ac.uk/~sea36/chempound/
Repositories
The repository for the chempound packages is hosted on bitbucket:
https://bitbucket.org/chempound
Using Chempound
With a functioning chempound respository in place, we can now start to query the data held within it.
For simple searches, we can just
Browse through the files, or use the simple
Search functionality on the web interface to pull out entries of interest.
This is fine for small, arbitrary searches, but Chempound also makes it very easy to automate searches and extract subsets of the data in a variety of ways.
Chempound uses a
RESTful interface, which means that, by going to the url for a particular calculation, depending on how we make the request to the server, we can receive the requested data in a variety of formats.
The currently supported formats are:
If we take an example computational chemistry calculation done with the
Gaussian code, and hosted on the
Cambridge Chempound server, if we go to the url for the calculation with a browser, we will get an html Splash Page, with a human-readable summary of the calculation, and the ability to view the structure in jmol:
http://quixote.ch.cam.ac.uk/content/compchem/spectra-dspace/to-8800_8899/to-8893/
We get this page, because our browser has requested a text/html representation of the resource.
Getting json with Python
The following python script, sets the http header to Accept json, and then prints out the json returned.
import urllib2
import json
# url of the calculation we are interested in
url = "http://quixote.ch.cam.ac.uk/content/compchem/spectra-dspace/to-8800_8899/to-8892/"
# Set up a a request object and add the Accept header to ask for json
request = urllib2.Request(url)
request.add_header('Accept','application/json' )
response = urllib2.urlopen(request)
# Can pass the response object to json.load, as it has a read() method
# This just creates a python dictionary, which we can query
json_output = json.load(response)
# Use json dumps method to write out formatted json
print json.dumps(json_output, sort_keys=True, indent=4)
This outputs:
{
"resources": [
{
"uri": "http://quixote.ch.cam.ac.uk/content/compchem/spectra-dspace/to-8800_8899/to-8892/to-8892.gjf"
},
{
"uri": "http://quixote.ch.cam.ac.uk/content/compchem/spectra-dspace/to-8800_8899/to-8892/to-8892.png"
},
{
"uri": "http://quixote.ch.cam.ac.uk/content/compchem/spectra-dspace/to-8800_8899/to-8892/to-8892_tn.png"
},
{
"uri": "http://quixote.ch.cam.ac.uk/content/compchem/spectra-dspace/to-8800_8899/to-8892/to-8892.cml"
},
{
"uri": "http://quixote.ch.cam.ac.uk/content/compchem/spectra-dspace/to-8800_8899/to-8892/to-8892.out"
}
],
"title": "C 36 H 28 P 2",
"uri": "http://quixote.ch.cam.ac.uk/content/compchem/spectra-dspace/to-8800_8899/to-8892/"
}
Within chempound, the various files that make up the entry for the calculation, are grouped together as an
ORE object. The resources key of the json object holds, these, and includes the uri of the original log file, the cml file, gif picture generated by jmol etc.
Requesting the RDF
Chempound is built on RDF and a primary component is a triple store containing RDF statements describing the structure of the data, and its associated metadata.
If we query the url and request the rdf serialised as xml, we can receive an object that contains the full data of the object, including the links to the files. The following python script does this and prints out the resulting rdf/xml:
import urllib2
# url of the calculation we are interested in
url = "http://quixote.ch.cam.ac.uk/content/compchem/spectra-dspace/to-8800_8899/to-8892/"
# Set up a a request object and add the Accept header to ask for rdf
request = urllib2.Request(url)
request.add_header('Accept','application/rdf+xml' )
response = urllib2.urlopen(request)
#print out what we got back
print response.read()
SPARQL queries
SPARQL is a query language for extracting data represented as RDF, in much the same way the SQL is a language for querying data in relational databases. As the data in Chempound is stored as RDF, SPARQL is the language of choice for making complex queries against the stored data.
A good - and chemistry related - tutorial on SPARQL can be found
here.
The chempound webserver provides a page where SPARQL queries can be typed into a webpage and the results returned as html or RDF. The SPARQL page on the Cambridge server can be found
here.
The easiest way to get to grips with SPARQL is to dissect a simple query:
SELECT ?molecule
WHERE
{
?molecule <http://www.xmlcml.org/rdf-schema#formula> "H 2 O 1" .
}
The crucial line is the one stating: ?molecule <http://www.xmlcml.org/rdf-schema#formula> "H 2 O 1" .
This uses the RDF subject:predicate:object pattern. The subject is the variable molecule (variables in SPARQL are prefixed with a ?, although you can also use $), the predicate is a uri which references the CML schema, and the object is a string literal. The statement is then terminated by a full stop.
What this says is that we want to assign to the variable molecule, all the entities where the cml formula property is "'H 2 O 1"'.
The SELECT statement says that we want to the query to return the molecule variable, which will contain the list of all objects that matched the statement.
If we run this against the cambridge chempound server we get back something like the following:
Variable Bindings Result molecule URI http://quixote.ch.cam.ac.uk/content/compchem/bangor/anna_251_300/258/#molecule URI http://quixote.ch.cam.ac.uk/content/compchem/bangor/anna_251_300/261/#molecule URI http://quixote.ch.cam.ac.uk/content/compchem/bangor/anna_251_300/262/#molecule URI http://quixote.ch.cam.ac.uk/content/compchem/bangor/anna_251_300/263/#molecule URI http://quixote.ch.cam.ac.uk/content/compchem/bangor/anna_251_300/264/#molecule URI http://quixote.ch.cam.ac.uk/content/compchem/bangor/anna_251_300/265/#molecule URI http://quixote.ch.cam.ac.uk/content/compchem/bangor/anna_251_300/266/#molecule URI http://quixote.ch.cam.ac.uk/content/compchem/bangor/anna_251_300/267/#molecule URI http://quixote.ch.cam.ac.uk/content/compchem/bangor/anna_251_300/268/#molecule URI http://quixote.ch.cam.ac.uk/content/compchem/bangor/anna_251_300/269/#molecule URI http://quixote.ch.cam.ac.uk/content/compchem/bangor/anna_1301_1350/1325/#molecule
Which returns the uri's of all the water molecules in the database.
We can now look at a more advanced query:
PREFIX cml: <http://www.xmlcml.org/rdf-schema#>
SELECT ?formula ?inchi ?molecule
{
?molecule cml:formula ?formula .
?molecule cml:inchi ?inchi .
FILTER ( ?formula = "H 2 O 1" )
}
The first line is equivalent to declaring a namespace in xml, and associates a convenient label with a long uri, so that instead of writing <http://www.xmlcml.org/rdf-schema#>, we can just write cml.
We are now selecting 3 variables from our dataset, and they will be returned in the order we have listed them. The WHERE statement has been omitted as it is implicit.
The next two lines by themselves would select all entities in the database (and return them in the molecule variable) that had the cml properites formula and inhi. However, we are filtering the returned data to restrict the values returned to those where the value contained in the formula variable is "H 2 O 1".
Discovering the available search terms
The data that is extracted into RDF and therefore available for searching in Chempound is determined by the convention and dictionaries that apply to the files in question.
Please follow these links for more information on
conventions and
dictionaries.
For CIF files, the CIF
dictionary lists all the terms that are available.
For Computational Chemistry outputs, the CompChem
dictionary lists the indexed terms.
To determine how best to search for data, it is usually useful to go to the splash page for a representative structure in chempound and download the RDF file. This will show how the form of the RDF and how a structure needs to be constructed.
For example, if we wish to search on the cell_measurement_temperature, looking at the RDF for a CIF file, we see it is structured as shown below:
<iucr:cell_measurement_temperature rdf:parseType="Resource">
<rdf:value rdf:datatype="http://www.w3.org/2001/XMLSchema#double">173.0</rdf:value>
<cml:units rdf:resource="http://www.xml-cml.org/unit/sik"/>
<cml:errorValue rdf:datatype="http://www.w3.org/2001/XMLSchema#double">2.0</cml:errorValue>
</iucr:cell_measurement_temperature>
If we just search for the cell_measurement_temperature, we will be returned the RDF resource, we therefore further need to extract the value, which is done with the following query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX cif: <http://www.xml-cml.org/dictionary/cif/>
SELECT ?entry ?value
{
?entry cif:cell_measurement_temperature ?temp .
?temp rdf:value ?value .
}
A similar example for a CompChem file is shown below. This searches on a term in the compchem dictionary, and then filters the value for only those structures with a charge of 0.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX compchem: <http://www.xml-cml.org/dictionary/compchem/>
SELECT ?molecule ?charge
{
?molecule compchem:charge ?chargeR .
?chargeR rdf:value ?charge
FILTER ( ?charge = 0 )
}
Remote Chempound SPARQL queries with Python
The Chempound SPARQL page will return the results as html or rdf/xml. The rdf/xml can of course be saved and processed offline, but it is more useful to be able to query and download the results all from within a single script.
As the Chempound SPARQL endpoint exposes a RESTful API , we can query it directly. The following python script executes a SPARQL query against chempound and then saves the result as a csv (comma-separated variable) file, so that the results of the query can be imported into a spreadsheet program for (e.g.) plotting a graph of the results.
import urllib
import urllib2
import xml.etree.ElementTree as ET
# The SPARQL query we want to execute
query="""SELECT ?molecule ?inchi
WHERE
{
?molecule <http://www.xmlcml.org/rdf-schema#formula> "H 2 O 1" .
?molecule <http://www.xmlcml.org/rdf-schema#inchi> ?inchi .
}"""
# url of the chempound SPARQL endpoint we want to query
baseurl = "http://quixote.ch.cam.ac.uk/sparql/"
# The "comma-separated variable" file where the results should
# go so they can be imported into a spreadsheet program
csvFile = "/Users/jmht/sparql_results.csv"
# The real work starts here!
# SPARQL namespace - shouldn't need to change this
namespace="http://www.w3.org/2005/sparql-results#"
# NB: results format based on: http://www.w3.org/2001/sw/DataAccess/rf1/
# Set up our GET query to the SPARQL endpoint
# Encode the parts of the query string into a form suitable for POST
urlparam = { "query" : query }
querystr=urllib.urlencode(urlparam)
request = urllib2.Request(baseurl,querystr)
# Add the header to state what we want back
request.add_header('Accept','application/sparql-results+xml')
# Get the results
response = urllib2.urlopen(request)
# We now have the results in SPARQL xml so we need to turn them into
# a csv file - we use etree to do this:
# http://effbot.org/zone/element-index.htm
# Parse results to create etree & get root element
etree = ET.parse(response)
root = etree.getroot()
# Sparql query always returns 2 elements: head and results
head,results = root[:]
# Get head and create dictionary for variables
resultsDict = {}
for var in head:
resultsDict[var.get("name")] = []
# Loop through results adding the relevant bindings to the dictionary.
# Currently only support uri
nresults=len(results)
for result in results:
for binding in result:
# One element for each binding of type: uri, literal or label
# currently only deal with uri
if ( len(binding) == 1 and binding[0].tag == "{%s}uri" % namespace ):
resultsDict[binding.get("name")].append(binding[0].text)
else:
raise RuntimeError("Results only supported for uri!")
# output as csv file
rfile = open(csvFile,'w')
# column headers
headers = resultsDict.keys()
rfile.write(",".join(headers)+"\n")
# data
for i in range(nresults):
newline=[]
for header in headers:
newline.append(resultsDict[header][i])
rfile.write(",".join(newline)+"\n")
rfile.close()
Hacking Chempound
This section is for those who may be interested in altering or extending Chempound. It isn't intended to be a programmer's manual, more a brief overview of chempound's current structure and a walk-though on how to add additional CML data to the repository, which is expected to be the reason why most people would currently want to extend Chempound.
NB: additional information can be found in Jorge Estrada's
repository
Chempound is actually a very general tool for managing collections of objects (collected as
ORE aggregates) and their associated data and metadata, using
RDF for the data model. As such, almost all of the chemistry functionality is implemented using plugins, so the code that needs to be modified to change the chemistry behaviour is very localised.
Overview of the repositories
The repositories for the chempound packages is hosted on bitbucket:
https://bitbucket.org/chempound
Currently, there are 8 repositories as detailed below:
-
https://bitbucket.org/chempound/ - this contains the main server code. There is almost no chemistry-specific code here, apart from in the chempound-rdf-cml directory, which has a very small class to add some CML data to the RDF model.
-
https://bitbucket.org/chempound/chemistry - this is where the most general chemistry code lives, and where the general functions to handle the conversion of data from CML are.
-
https://bitbucket.org/chempound/chempound-client - the base classes for the command-line client (it is the client that actually handles the conversion of logfiles into CML and the generation of the jmol pictures etc) are here, although there is no chemistry-specific code here.
-
https://bitbucket.org/chempound/chempound-parent - this just contains the central maven pom.xml that is used to configure maven for chempound.
-
https://bitbucket.org/chempound/compchem - all the code to handle the data associated with computational chemistry calculations (both server and client) lives here.
-
https://bitbucket.org/chempound/crystallography - all the code to handle the crystallography-specific aspects of the data.
-
https://bitbucket.org/chempound/deposit-client - TODO - not had to look at this yet.
-
https://bitbucket.org/chempound/quixote-client - the code to drive the code-specific imports of compchem logfiles.
-
https://bitbucket.org/chempound/quixote-repository - this is more code to package chempound for use by the quixote project and create the stand-alone chempound server war file.
A slightly more detailed view of the chemistry-specific repositories and their modules follows below.
| Repository | Modules | Description and important classes | |
|
|
|
Classes to handle the generic processing of CML datatypes and the conversion to RDF | |
| * net.chempound.chemistry.cmlChemicalMine.java - mime types | |||
| * net.chempound.chemistry.Cml2RdfConverter.java - code to handle the conversion of generic, simple cml datatypes into RDF. | |||
|
|
Base classes for the client-side conversion of files and the generation of images | ||
|
|
Classes to drive jmol to generate the images, and also the jmol code itself | ||
|
|
Classes to handle the chemistry-specific search page - if you want to add more chemistry search boxes, the you'll need to edit things here. | ||
|
|
|
General code related to the compchem RDF data structures. The utility functions used by the freemarker templates to access the compchem data live here. | |
|
|
Code to handle the processing of chemical data on the server, such as display of the html pages and the freemarker templates. | ||
|
|
The classes to handle importing code-specific logfiles (NWChem, Gaussian etc) using the jumbo-classes. These classes are used by the client, not chempound itself. The test cases for checking the imports also live here. | ||
|
|
Code to test the various compchem-specific modules, as most do not contain any test code themselves. | ||
Adding New Data and editing the Splash page
Chempound extracts data from
CML in accordance with the
compchem convention. Provided that the data is a CML scalar, and is in the job's environment, initialization or finalization modules, with a dictRef (ideally) in the
compchem dictionary, then the data will already be extracted into RDF.
If additional data needs to be extracted (such as is currently done for basis sets and dft functionals), then all that may be necessary is to edit the file
CmlComp2RdfConverter.java to add the additional data to the RDF.
The html pages in chempound are generated using the
freemarker template engine. The freemarker template that is used to generate the html page for each individual structure is the file:
comp.ftl (other template and css files are in the parent directory).
In order to facilitate extracting key RDF data for use with the freemarker templates, several classes are used. For adding new terms, the following files needed to be edited:
-
CompChemCalculation.java - this defines the interface that will be used by the freemarker template to access the data.
-
CompChem.java - this creates the RDF terms that are used.
-
CompChemCalculationImpl.java - this actually implements the functions to get the data.
When the new terms have been added, the tests should be updated, or a new test added in the directory
https://bitbucket.org/chempound/compchem/src/ef32d64ba51b/compchem-importer/src/test/java/net/chempound/compchem
If the new terms are to be added to the chemistry search page, then the
CompChemSearchProvider.java file will need to be edited, and suitable tests added to the file
CompChemSearchIntegrationTest.java
Installing Chempound
For installing chempound for personal use on a local machine, the
getting-started notes should be sufficient.
The following instructions apply for installing Chempound on an existing server, for use by an institution or group.
Installing Chempound into an existing Jetty server
Chempound is a pure java program, so can be run in any java container. These instructions are specific to installing it into
jetty for use on unix systems.
The latest version of the war file for chempound can be downloaded
here.
There are any number of ways to configure jetty, so this just describes one way, with some pointers to the other possibilities.
If you do not already have jetty installed on your server, and it is not available within the package management software for your distribution, a jetty hightide distribution, can be downloaded from
codehaus.
The following instructions assume that you have a jetty server, with a directory structure similar to the following (only the relevant files and directories are listed).
+-jetty-hightide-8.1.4.v20120524/ | +-start.ini | +-start.jar | +-contexts/ | | | +-quixote.xml | +-etc/ | | | +-jetty.xml | +-webapps/ | +-chempound/ | | | +-workspace/ | | | | | +-cache/ | | | | | +-content/ | | | | | +-tdb/
The file start.jar is the java file used to start jetty with the command:
java -jar start.jar
By default, on startup, jetty will parse the file start.ini, which contains command-line options for the server, including the list of modules to include, and a list of XML configuration files that determine various options (these are listed one per line in the start.ini files and can be removed by commenting the line out with the # character). By default, the XML files reside in the etc directory. In this example, only one configuration file is used, the jetty.xml file in the etc directory.
This file contains the following:
<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN" "http://www.eclipse.org/jetty/configure.dtd">
<!-- =============================================================== -->
<!-- Configure the Jetty Server -->
<!-- -->
<!-- Documentation of this file format can be found at: -->
<!-- http://wiki.eclipse.org/Jetty/Reference/jetty.xml_syntax -->
<!-- -->
<!-- Additional configuration files are available in $JETTY_HOME/etc -->
<!-- and can be mixed in. For example: -->
<!-- java -jar start.jar etc/jetty-ssl.xml -->
<!-- -->
<!-- See start.ini file for the default configuraton files -->
<!-- =============================================================== -->
<Configure id="Server" class="org.eclipse.jetty.server.Server">
<!-- =========================================================== -->
<!-- Server Thread Pool -->
<!-- =========================================================== -->
<Set name="ThreadPool">
<!-- Default queued blocking threadpool -->
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">200</Set>
<Set name="detailedDump">false</Set>
</New>
</Set>
<!-- =========================================================== -->
<!-- Set connectors -->
<!-- =========================================================== -->
<Call name="addConnector">
<Arg>
<New class="org.eclipse.jetty.server.nio.SelectChannelConnector">
<Set name="host"><Property name="jetty.host" /></Set>
<Set name="port"><Property name="jetty.port" default="8181"/></Set>
<Set name="maxIdleTime">300000</Set>
<Set name="Acceptors">2</Set>
<Set name="statsOn">false</Set>
<Set name="confidentialPort">8443</Set>
<Set name="lowResourcesConnections">20000</Set>
<Set name="lowResourcesMaxIdleTime">5000</Set>
</New>
</Arg>
</Call>
<!-- =========================================================== -->
<!-- Set handler Collection Structure -->
<!-- =========================================================== -->
<Set name="handler">
<New id="Handlers" class="org.eclipse.jetty.server.handler.HandlerCollection">
<Set name="handlers">
<Array type="org.eclipse.jetty.server.Handler">
<Item>
<New id="Contexts" class="org.eclipse.jetty.server.handler.ContextHandlerCollection"/>
</Item>
<Item>
<New id="DefaultHandler" class="org.eclipse.jetty.server.handler.DefaultHandler"/>
</Item>
</Array>
</Set>
</New>
</Set>
<!-- =========================================================== -->
<!-- extra options -->
<!-- =========================================================== -->
<Set name="stopAtShutdown">true</Set>
<Set name="sendServerVersion">true</Set>
<Set name="sendDateHeader">true</Set>
<Set name="gracefulShutdown">1000</Set>
<Set name="dumpAfterStart">false</Set>
<Set name="dumpBeforeStop">false</Set>
<!-- =============================================================== -->
<!-- Create the deployment manager -->
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<!-- The deplyment manager handles the lifecycle of deploying web -->
<!-- applications. Apps are provided by instances of the -->
<!-- AppProvider interface. Typically these are provided by -->
<!-- one or more of: -->
<!-- jetty-webapps.xml - monitors webapps for wars and dirs -->
<!-- jetty-contexts.xml - monitors contexts for context xml -->
<!-- jetty-templates.xml - monitors contexts and templates -->
<!-- =============================================================== -->
<Call name="addBean">
<Arg>
<New id="DeploymentManager" class="org.eclipse.jetty.deploy.DeploymentManager">
<Set name="contexts">
<Ref id="Contexts" />
</Set>
<!--
<Call name="setContextAttribute">
<Arg>org.eclipse.jetty.server.webapp.ContainerIncludeJarPattern</Arg>
<Arg>.*/servlet-api-[^/]*\.jar$</Arg>
</Call>
-->
</New>
</Arg>
</Call>
<!-- =============================================================== -->
<!-- Add a ContextProvider to the deployment manager -->
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<!-- This scans the contexts directory for xml files descrbing an app -->
<!-- =============================================================== -->
<Ref id="DeploymentManager">
<Call name="addAppProvider">
<Arg>
<New class="org.eclipse.jetty.deploy.providers.ContextProvider">
<Set name="monitoredDirName"><Property name="jetty.home" default="." />/contexts</Set>
<Set name="scanInterval">1</Set>
</New>
</Arg>
</Call>
</Ref>
</Configure>
There are two ways that jetty is usually configured to serve applications:
-
jetty can monitor a directory (by default the webapps directory) and any .war files placed there, will be served at a URL determined from the name of the war file (i.e. quixote-repository-webapp-0.1-SNAPSHOT.war would be served at the URL /quixote-repository-webapp-0.1-SNAPSHOT relative to the base server url.
-
jetty can monitor a directory (by default the contexts directory) for XML files, and these will then be parsed to determine the location of the application's war file and the options required for serving the application.
This example uses the second approach, the final block of XML in the jetty.xml above, configuring jetty to monitor the context directory. The contexts directory contains one file, quixote.xml, the contents of which are shown below (with comments to explain relevant bits):
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN" "http://www.eclipse.org/jetty/configure.dtd">
<Configure class="org.eclipse.jetty.webapp.WebAppContext">
<!-- This is the URL where the application will be served from -->
<Set name="contextPath">/chempound</Set>
<!-- The absolute location of the war file for chempound on the server's filesystem -->
<Set name="war">/home/jens/jetty-hightide-8.1.4.v20120524/quixote/quixote-repository-webapp-0.1-SNAPSHOT.war</Set>
<!-- Set startup parameters -->
<Get name="ServletContext">
<Call name="setAttribute">
<Arg>chempound.uri</Arg>
<Arg>http://cdsora4.dl.ac.uk/chempound</Arg>
</Call>
<Call name="setAttribute">
<Arg>chempound.workspace</Arg>
<Arg>/home/jens/jetty-hightide-8.1.4.v20120524/quixote/workspace</Arg>
</Call>
</Get>
</Configure>
NB: For general information and examples of contexts file, please see the
jetty wiki.
The first two Set commands should be self-explanatory.
The next block sets two important variables that are needed by chempound:
-
chempound.uri - this is a string included in the html pages served by chempound and is used to set the url where various files (such as the CSS files) are expected to be found. It should be the full url where the base chempound sever will be found, such as
http://cdsora4.dl.ac.uk/chempound.
-
chempound.workspace - this the path to a locally accessible directory on the server where all the files needed by chempound will be stored. The actual files held by chempound (such as the logfiles, CML file etc, are stored in the this directory in the content folder).
These two variables can also be set by setting them as environment variables before the server is started, or setting them on the command-line when the server is started as shown below:
java -Dchempound.uri="http://cdsora4.dl.ac.uk/chempound" -Dchempound.workspace="/home/jens/jetty-hightide-8.1.4.v20120524/quixote/workspace" -jar start.jar
Security Considerations
In order to make chempound available on a standard URL (such as
http://cdsora4.dl.ac.uk/chempound), the server needs to listen for TCP requests on port 80.
On unix systems, only processes started by root are permitted to bind to ports numbered less than 1024, which would entail a requirement to run the chempound jetty server as root. However, this is not considered a good security practice, and there is no other reason why the server needs to run as root.
On debian-based systems, a way around this is the authbind package, which allows users to bind non-root servers to a low-numbered port.
Another approach is to start the server under a non-root user, binding to a high-numbered port and to use a firewall to redirect requests from port 80 to the port the server is listening on. If the server was started on port 8080, then the iptables rule to accomplish this would be:
iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to 8080
NB: if using this method, it is important to remember that the chempound.uri variable, will need to be set to point at the url as visible externally, and should not include the port number, as otherwise the CSS files will not be found.
Debugging Chempound
If Chempound is not working as expected, the logging facility can be used to increase the amount of information printed, which is useful for tracking down the causes of problems.
The logging subsystem consists of the interface
SLF4J 1.6.1 (Simple Logging Facade for Java): and the implementation
LOG4J 1.2.
The included configuration for DepositNWChem is:
log4j.rootLogger = WARN, A log4j.appender.A = org.apache.log4j.ConsoleAppender log4j.appender.A.layout = org.apache.log4j.PatternLayout log4j.appender.A.layout.ConversionPattern = %-4r [%t] %-5p %c %x - %m%n log4j.appender.A.target = System.err
However, you can change the logging behavior of the application by adding your own log4j.properties file to the classpath. For instance, the following configuration file will set the general log level to INFO and, for class uk.ac.cam.ch.wwmm.chempound.compchem.CmlComp2RdfConverter, the level will be DEBUG.
log4j.rootLogger = INFO, A log4j.appender.A = org.apache.log4j.ConsoleAppender log4j.appender.A.layout = org.apache.log4j.PatternLayout log4j.appender.A.layout.ConversionPattern = %-4r [%t] %-5p %c %x - %m%n log4j.appender.A.target = System.err log4j.logger.uk.ac.cam.ch.wwmm.chempound.compchem.CmlComp2RdfConverter=DEBUG
For example, if you place the log4j.properties configuration file in your current working directory and you run DepositNWChem from there, you can add the current directory to the classpath as follows (it assumes you have the jar file of DepositNWChem with its dependencies in a target subdirectory):
$ java -cp .:target/quixote-utils-0.1-SNAPSHOT-jar-with-dependencies.jar net.quixote.utils.DepositNWChem http://localhost:8080/sword/collection/ n2.out
You can use logging anywhere in the code. You will need to grab a Logger object to pass the logging messages. Simply import the Logger and LoggerFactory classes, and call LoggerFactory.getLogger to obtain a Logger object. Then call any of the debug, info, warn or error methods to log your message at the appropriate log level.
The following code snippet shows how to get the root Logger as well as another child Logger (identified with the uk.ac.cam.ch.wwmm.chempound.compchem.CmlComp2RdfConverter class name) and how to emit a INFO level message.
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
class ... {
... method (...) {
Logger rootL = LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME);
rootL.info("Using root logger.");
Logger otherL = LoggerFactory.getLogger(uk.ac.cam.ch.wwmm.chempound.compchem.CmlComp2RdfConverter.class);
otherL.info("Using CmlComp2RdfConverter logger.");
}
}
Debugging Chempound running under Jetty
A simple way to debug chempound when running under jetty, is to add the following lines to the jetty start.ini file, which is used to prepend command-line arguments to jetty (the arguments can also be added to the command-line when starting jetty, or indeed to any java program that supports log4j):
-Dlog4j.debug -Dlog4j.configuration=file:/Users/jmht/Documents/quixote/jetty-hightide-8.1.4.v20120524/quixote/qc-log4j.properties
The first line turns on debugging for log4j itself - this is useful as it causes log4j to print which configuration file it is using. The second file gives the path to a log4j configuration file, which should contain the directives as described above.

