Planning

EditEdit InfoInfo TalkTalk
Search:    

  1. Overview
  2. Requirements
  3. Vision and goals
    1. Short term
    2. Mid term
    3. Long term
    4. Longest term
  4. Scope
  5. Stakeholders
  6. Use cases and scenarios
  7. Design(s)
    1. Jorge Estrada and Pablo Echenique
    2. Peter Murray-Rust
        1. Philosophy
    3. Pablo Echenique
  8. Roadmap
  9. Meetings

Overview

Each day, countless calculations are run by thousands of computational chemistry researchers around the world, on everything from ageing, dusty desktops, to the most powerful supercomputers on the planet.

It might be supposed that this would lead to a deluge of valuable data, but the surprising fact remains that most of this data, if it is archived at all, usually lies hidden away on hard disks or buried on tape backups – often lost to the original researcher and never seen by the wider chemistry community at all.

However, it is widely accepted that if the results of all these calculations were publicly accessible it would be extremely valuable as it would:

In the rare cases when data is made openly available, the output of calculations are inevitably produced in a code-specific format - there being no currently accepted output standard. This means that interpreting or reusing the data requires knowledge of the code, or the use of specific software that understands the output.

A standard output format would:

The benefits of a common data standard and results databases are obvious, but several previous efforts have failed to address them, largely because of an inability to settle on a data standard or provide any useful tools that would make it worthwhile for code developers to expend the time to make their codes compatible.

The Quixote project aims to tackle both of these problems in a pragmatic way, building an infrastructure that can be used to both archive and search calculations on a local hard-drive, or expose the data on publicly accessible servers to make it available to the wider community.

The data standard will be consolidated around the tools and encourage its adoption by providing code and tool developers with an obvious reason for adopting the data standard; the "If you build it, they will come" approach.

The project is rooted in the belief that scientific codes and data should be "open", and we are therefore focussing our efforts on using existing open-source solutions and standards where possible, and then developing any additional tools within the project.

The Quixote project is itself completely open, de-centralised and community-driven. We are composed of passionate researchers from around the globe and are happy to collaborate with anyone who shares our aims.

Requirements

The requirements of our solution are that:

Vision and goals

See [WWW]The different ways of looking at the world by Peter Murray-Rust.

Some goals are practical, some are midterm, some are wild:

Short term

Mid term

Long term

Longest term

Scope

Limits and context of Quixote, to keep it focused and manageable.

Note: This scope statement is still under revision

Quixote currently focuses on small molecules (no periodicity), and avoids dynamics and relativistic studies.

From an existing Quantum Chemical calculation, a user would upload the files to a public server running Quixote, which would parse the results (using, for instance, Jumbo converters), structure them with the help of a dictionary (such as CML and the related compchem dictionary), store them in a database (perhaps using the RDF format), and allow the users of the public servers to retrieve the structured data through web queries (in the style of the SPARQL language, for instance) and through HTML browsing.

In the future, Quixote may be extended to work with further chemical systems and to offer more kinds of retrieval and analysis web utilities.

Stakeholders

How Quixote may help you, and how you can help Quixote.

Note: This list of stakeholders is currently under revision

If you are ...

If you want to collaborate with the Quixote project, join the mailing lists mentioned in the Front Page and share your ideas, or contact one of the project members in [WWW]People (though that is not an updated list).

Use cases and scenarios

These use cases and scenarios will help in identifying what we want Quixote for.

Note: The set of use cases and scenarios is currently a draft.

The simple use cases we are currently gathering try to show how we can, from an existing QC calculation, upload the files to a public server, parse the results with Quixote modules (Jumbo converters), structure them with CML and a related compchem dictionary, store them in a database (RDF format), and retrieve them through web queries (at first, by using the SPARQL language) or HTML browsing. The importance of the use case is to show that this process is useful in some scientific or educational way, to guide Quixote future development.

You can see the minutes of [WWW]the meeting on November 26 to see a list of ideas for use cases.

Currently, we are working to develop a use case around the [WWW]cyclobutadiene mysteries.

Design(s)

Jorge Estrada and Pablo Echenique

Based on previous discussions and on what has been done until now, we have prepared a few diagrams showing the architecture of the proposed system.

The diagrams show components (running software), storage (databases, filesystems, etc.), layers (organization of the source code) and data flows (for components) or usage relations (for layers).

A short description of the different systems appearing in the diagrams (more suitable names are needed, but these serve as a first approach):

The first overview diagram shows the main components of the Quixote system, and the data flows. Some components are optional (such as the RSS Feed System), and others may be missing. This diagram may help to refine the list of functionality required.

overview.png (OpenOffice.org 3 - Draw original file overview.odg)

How does this architecture map to the specific components we are using or we plan to use? This annotated overview diagram tries to answer this question:

annotatedOverview.png (OpenOffice.org 3 - Draw original file annotatedOverview.odg)

Finally, we have already tested a raw system using Lensfield2. It would correspond to the Publishing System of the overview diagram:

publishingSystem.png (OpenOffice.org 3 - Draw original file publishingSystem.odg)

Peter Murray-Rust

See [WWW]Components of the Quixote Open computational chemistry system and the WWMM.

Philosophy

Pablo Echenique

In this way, datafiles, datagroups and databases can sit anywhere, be public or private, and anyone can choose to use, or even code, an application to access and search any collection of them.

The only things we would need to build are:

Then, you can have users for which all this process is transparent (they just see the final application in a browser), you can have people that builds and aggregate databases for their particular interests, you can have programmers that code new UIs, etc.

Roadmap

Note: This is an incomplete list.

Current efforts in the Quixote project focus on:

Meetings

This is a Wiki Spot wiki. Wiki Spot is a 501(c)3 non-profit organization that helps communities collaborate via wikis.