You are here: Home V2 Software Software CcpNmr FormatExchange

CcpNmr FormatExchange

Getting the program:

FormatExchange is part of the leading edge release. It is currently in an alpha test state - i.e. too clunky to be released. If you want to help us test it - or need to read in an NmrView or Sparky project - you can email us to ask for help, or a test version.

Documentation: Awaiting release.

About Format Exchange

Format Exchange currently lets you read entire NmrView and Sparky projects into an empty CCPN project. It is designed to allow the import of external NMR data formats to CCPN Data Model with minimal user intervention. It uses colour coding, heuristics, and simple 'try-and-retry' functionality to provide fast and intuitive user interaction. The program features a modern graphical interface providing the user with the tools to access, compare, and correct erroneous data from multiple sources.

Development

The Format Exchange framework was primarily written by Marc van Dijk (Utrecht University, the Netherlands). The low level Python code responsible for parsing specific data files are primarily written by Wim Vranken as part of the CcpNmr Format Converter package. The Format Exchange framework is written in Python built around an XML based data model written in C. The bespoke user friendly graphical interface providing the tools to interact with the Format Exchange framework was written by Doroteya Staykova (Physics Solutions Ltd., Bulgaria) using Python bindings for Qt.

Relationship to Other CCPN Applications

The FormatExchange library uses CCPN functions where applicable and thus requires a CcpNmr Suite installation to function. In future it will be part of the CCPN package installation. FormatExchange is designed to become the main CCPN data import/export program, but will not equal the number of formats covered by FormatConverter for a while yet.

Format Exchange Plugins

The parsing of external data formats is performed by code wrapped in a plugin. The plugin uses the FormatExchange API to load the data into the Format Exchange data model. This is designed to facilitate users/third parties to design and or use their own scripts. A plugin is made available to the framework as a directory. If the plugin is setup correctly and placed in the plugin directory, it will be loaded automatically by the framework with no further configuration required.

A proper plugin requires the following files in their top level directory:

  • header.py: plugin environment file. It defines such parameters as: the plugin imports and/or exports formats and the file types (extensions) it handles. In addition to this the plugin designer can specify a property dictionary with parameters that the user may change to tweak the plugin import/export behaviour
  • read.py: the read class will be called by the framework to initialize format import.
  • write.py: the write class will be called by the framework to initialize export of data in the chosen file format.

 

Data Validators

A number of validator classes use the API to validate the data in different parts of the intermediate data format (experiments, molecules, peaks, shifts and spectra). This validation corrects nomenclature, tracks errors in the data, and correlates all data to ensure consistency thereby permitting the user to make changes before committing the data to the final CCPN data model. Intelligent automation is applied wherever appropriate.

By default the data model is validated after every import and can be validated at any time up to the moment the user decides to migrate the content to a new CcpNmr project. This is a different approach than the batch oriented import strategy that the Format Converter framework uses.

Global mapping & Individual Atom Assignments

Format Exchange API performs automatically mapping of all atom assignments in CcpNmr Project and initializes its data for:

  • Global maps that contain best guess for CCPN residue and atom types based on ChemComp, ChemAtomSet and ChemAtom types and IUPAC nomenclature.
  • Individual CCPN atom assignments that are uniquely defined by their molecule, residue code, residue number, atom code and residue offset.
  • Pointers in original assignments to individual CCPN atom assignments.

Format Exchange GUI provides the user with tools to review and change (if necessary) the results of the automatic mapping.

Tree & Table Data Displays

The Format Exchange GUI is based around a tree structure with associated tables. That is to say, once a branch of the tree has been selected, the corresponding table is displayed providing real-time interaction with the underlying data part of the Format Exchange Model.

The main GUI categories, given as top level tree elements, can be described as -

  • Main: summary of CcpNmr Project data import, validation and assignment mapping
  • Assignments: global and individual mapping of atom assignments
  • Experiments: NMR experiments corresponding to import data
  • Molecules: molecules that have been imported or derived from data assignments
  • Peak Lists: imported data with peaks and peak assignments (if available)
  • Chemical Shifts: imported data with shifts and shift assignments (if available)
  • Plugin Settings: plugin specific settings

 

Multiple Display Feature

Designed to allow the user to multi select from the Tree Structure to display one or more tables on the screen at the same time. This provides the ability to seamlessly compare and interact with multiple parts of the model at the same time without switching windows.

Current Limitations / Future Plans

The NMR data formats currently supported by the Format Exchange GUI are NmrView, Sparky, UCSF and FASTA. Additional NMR data formats will be available at a late date.

Export of external data formats would be added.

In this release, the Format Exchange program does not support the import of data into an existing CcpNmr project.

The Format Exchange Model

The Format Exchange program uses a dedicated XML based data model as an intermediate between the external data and the CCPN Data Model. The import of external file formats is performed independently of the data model. As such, multiple files from different sources can be imported into the same data model. This approach has the advantage of:

  • Allows users to make changes to the data model before migration to a CcpNmr project. This includes correction or deletion of erroneous data and the addition of missing data
  • Data validation prior to migration to the CcpNmr project. The validation machinery safeguards the consistency of the model data. This minimises the chance of erroneous data being transferred to the CcpNmr project
  • The intermediate data model can be saved in XML format providing the option to redo the import at a later time with subsequent model modification

An easy validation of the data in the intermediate model is crucial. This means avoiding a large interdependency of the stored data and favouring "flat" data model instead of a deeply hierarchical model. To accomplish this, the model is divided into data groups. All external data formats should be attributed to one or more of these categories:

  • Experiment: defines an NMR experiment and all data that was obtained from that experiment.
  • molSystem: contains molecule definitions with or without atom coordinates. Molecules are built up starting from molSystem > molecule > model > residue > atom. A molSystem may be built from various molecules in witch each molecule contains one chain
  • peaklist: contains peaks with resonances and possibly assignments for the resonances
  • shiftlist: contains a list of nmr chemical shifts
  • spectrum: contains spectrum information related to an nmr experiment
  • CcpMaps: contains a list of best-guess CCPN mappings for residue and atom types in CcpNmr project (global maps)
  • linkResonances: contains a list of all unique atom assignments in the CcpNmr project (individual atom assignments)