You are here: Home V2 Software Software More ... CCPN Data Model

Data Model Overview

Overview of the CCPN data model

Getting the code:

The Data model and APIs can be downloaded separately for various languages.
They are free and open source.

Documentation:

A collected overview can be found on the CCPN Wiki; a more condensed (and slightly older) version as part of the programmers tutorial. The documentation for the CCPN Python API includes documentation of the data model as such (with model diagrams). There is also some additional documentation, and a tutorial on programming with the Python API.

About the CCP Data Model

The CCPN Data Model for macromolecular NMR is intended to cover all data needed for macromolecular NMR spectroscopy from the initial experimental data to the final validation. It serves for exchange of data between programs, for storage, data harvesting, and database deposition. The data model proper is an abstract description of the relevant data and their relationships - it is implemented in the modeling language UML. From this CCPN autogenerates interfaces (APIs) for various languages, format description and I/O routines, and documentation.

The structure of the data model, of the data access APIs in various languages, and of the XML files and databases that store the data, closely parallel each other. The best documentation can be found for the CCPN Python API,

Quick Guide to the CCPN Software Development

The data model itself is an abstract description of all the data that is commonly used with NMR (extending into sample management, protein production, data tracking and pipelines, ...). For example, the NMR part of the data model describes an Experiment object - this corresponds to an NMR spectrum. This Experiment is linked to ExpDim object(s) - these describe the different dimensions in the spectrum. This abstract description of the data model is represented and maintained graphically using the Unified Modelling Language (UML). The part of the data model describing Experiment and ExpDim looks like this in UML:

datamodelexptoexpdim.gif

The boxes describe the Experiment and ExpDim objects. The information inside the boxes are attributes that give meaning to the object. For example, you can set the name for an Experiment. Objects are then linked to each other - this is shown by the line between Experiment and ExpDim. The diamond in the link means that ExpDim is a child of Experiment - a dimension in the spectrum cannot exist without having a spectrum first.

What are 'packages'?

The data model is split up in packages. Each of these packages describes a 'unit' of information that can be shared by other packages. For example, the description of a template molecule is done in the 'Molecule' package, the description of a molecular system with 'real' molecules is done in the 'MolSystem' package. The 'Nmr' package uses information from the 'MolSystem' package, which could be shared by an 'Xray' package if it was available. For this reason the data of each package is stored in separate locations.

What is the 'API'?

API stands for Application Programming Interface. With an API the objects described by the data model can be manipulated in computer memory. Basically this means that the data is organized in a way that is consistent with the 'data model'. The API therefore also handles consistency checking of the objects (e.g. an Nmr Experiment object has to be linked to at least one ExpDim (experiment dimension)). The API is currently available in Python (with XML) C (with XML), and Java (with XML or database storage).

To continue with the example above, the objects that are the API maintains in memory for a 3D spectrum are shown below (note that the values for the 'dim' attribute are filled in for the ExpDim objects to distinguish between them):

Experiment

Which programs use this 'API'?

CcpNmr FormatConverter and CcpNmr Analysis are built entirely on top of the data model APIs. CcpNmr programs like ChemBuild and FormatExchange have their own internal data structures but were written to connect directly to the APIs as appropriate. Programs from third parties have their own data structures and file formats, but many have been integrated so that they can be launched from a CCPN project and the results can be read back. The most closely integrated programs include ARIA, CING, and the CcpNmr ECI deposition tool. Much of this work started in EU collaborations projects like EUNMR and Extend-NMR projects,  and CCPN, as a partner in the WeNMR project, is committed to extending this integration.

How do I get my data into (and out of) the 'data model'?

The CcpNmr FormatConverter and CcpNmr FormatExchange applications allows you to import existing derived data formats (not raw spectra) into the data model. Export functions are also available so they can be used as a format converter between existing formats.

What is the advantage of having data inside the 'data model'?

  • All programs that work with the data model 'understand' each other. For example, you can read data into the data model with the CcpNmr FormatConverter, start using CcpNmr Analysis straight away (providing it understands the spectrum raw data format), and transfer the information to ARIA for a structure calculation.As long as the data remain in the data model, they will stay consistent and accessible.
  • Scripts that work on the data model can be used by every application that uses a data model API. For example, if a good automatic assignment script was written it could be run from any data model based application.
  • Import/export to foreign formats comes for free (see above). This basically allows you to store all your data in one place throughout a project, while going back and forth between different programs while doing that. Final export to an nmrStar file ready for deposition will also be included.