CcpNmr Analysis Tutorial Extension - Version 2.0
An extension to the basic Analysis to tutorial which covers more advanced features.
Analysis Tutorial Data
For this tutorial you will need the example spectra and project files in this archive (236MB total size):analysisTutorialData_v2_part2.tgz
Opening the extended tutorial project
For this tutorial we will be working with a separate CCPN project to the initial tutorial. Accordingly start Analysis and load the project called TutorialProject2 via the M::Project::Open option. Alternatively, on the command line, avter navigating to the correct directory, you can issue the command
analysis TutorialProject2
This project contains data relating to the same sample as was used in the first tutorial, but contains different data for some of the more elaborate analyses. Specifically there are data for a temperature series, a relaxation experiment, a HNHA scalar coupling experiment and NOE peak lists to demonstrate export to other programs.
Following chemical shifts
Firstly we will look at the situation where you have a series of spectra which are related to one another by virtue of the fact that some experimental condition is changing, which is causing the peak positions to move in response to that changing condition.
Typically this would be a titration where there is an increase in concentration of a ligand that one hopes binds to the protein sample, a pH series or a temperature series. In this instance for simplicity we will be looking at a temperature series, and aim to measure the temperature coefficient of some of the amide resonances. For other types of series the procedure within the program is broadly the same except that the relation between peak position and chemical shift follows a different relationship; i.e. we fit different types of curve/function to the data.
To start with have a look at window 1, and you will see four spectra, coloured blue, green, orange and red. These represent 15N-HSQC spectra recorded at 288, 298, 303 and 310 K respectively. Note that only one of these spectra is assigned. (And for brevity we have only assigned a few peaks).
Before we begin, the chemical shift analysis we must group these spectra in to what we refer to as an Experiment Series. This means to let Analysis know which spectra are grouped together and how the condition parameter varies across the range. Accordingly go to M:Experiment:NMR Series. In the resultant popup window you will see that we already have one experiment series setup, which is a T1 relaxation series. We will come to this series later on. For now click [Add Series], double click the Condition Type column for the newly added row and set the value to "temperature" (at the bottom). This sets up a container for experiments so that we know what value varies, now we specify how the value varies.
Next click [Add Condition Point] FOUR TIMES, then in the lower Sample Conditions table, for all the newly added rows, double-click in the "Experiment" column, and set the for experiments to H[N]-310K, H[N]-303K, H[N]-298K and H[N]-288K. The order of experiment select is unimportant. Next double click the Value column to formally enter the temperature at which the experiment was recorded. And as you may have already guessed we have cunningly given a hint to which temperature goes with which experiment in its name. Thus enter the Values 310, 303, 298, 288 for the appropriate rows. In this instance we will neglect filling in the Error column, but if you have some measure of precision in the Value you could enter it here and Analysis would estimate the error in the temperature coefficient. Now close the Experiment Series popup.
Briefly look in the M:Experiments:Edit Experiments popup, and look at the {Experiments} tab. You should note that each experiment in our temperature series is associated with a different chemical Shift List. This is because we positively expect the chemical shifts of any resonance (i.e. group of atoms) to differ under the different conditions. It would be inappropriate for the chemical shift values to be average over all these experiments, which is what would happen if they only use one Shift List (i.e. shift table). If you forget to set the different shift lists for the experiments in a series, rest assured that Analysis will warn you, and give the option to allocate the individual lists, rather than contaminate the existing shift lists.
Close the Experiments popup and go to M:Data Analysis:Follow Shift Changes. In the {Settings} ensure that the Ref Peak List is set to H[N]-288K - this is the assigned, blue, spectrum from which the assignments will be propagated across to the other spectra. Check that the Fit Function is set to "Ax +B" - a linear relationships. (You may like to browse the available options here to see the available types of curve that Analysis can fit to the data. This list is readily expanded upon request.). Check that the Expt Series is set to "1:temperature", set the Max Step Size for 1H (in the Isotope Parameters Section) to 0.1 and then finally click [Group Peaks].
You will note that after a brief pause assignments of the assigned spectrum will be transferred to the peaks of the other spectra in the series and that various function-fit parameters appear in the {Peak Groups} table. This has been achieved by matching the chemical shift changes of the peaks across the related spectra to the linear "Ax+B" shift-distance expectation for the specified temperatures, bearing in mind the chemical shift weightings, step size limits and assuming a relatively straight peak 'trajectory'. Note that if we have any errors with the spreading of assignments across the spectra, then we can manually propagate assignments across selected peaks (which use different shift lists) via R:Assign:Propagate.
For one of the peak groups, select its row in the table, then click [Show Fit Graph], hopefully you will see a new window appear with a graph of how the chemical shift distance for a group of peaks varies according to experiment temperature. Select Follow in window [], and Mark Ref Peak? [], then click on [Next Set]. You will see that you can zoom around the spectra of the series and check the positioning and assignment of each group of peaks to check for errors. If you find an understandably wonky graph you might like to adjust the peak position (and re-fit the graph) or remove the offending point.
Referring back to the Follow Shift Changes popup, in the peak groups table you now might like to extract the temperature coefficients for further analysis. You can [Export Shifts] as a text file, or generate a graph by selecting with the right mouse button over the table Graph:Assign F1:Fit Param 1, which will give you a graphical indication of how the temperature coefficient; the "A" in the "Ax+B" equation varies across the sequence. Right mouse Graph type:Bar Chart, would be appropriate in this instance.
Relaxation Analysis
We will determine relaxation rates in the next part of this tutorial, accordingly we will be working with another experiment series, but this time it will be a T1 series, which you may have spotted earlier. Here we will be following changes in peak intensity rather than changes in peak position. For such an analysis, where the resonance positions do not change for the various points in the series, so we can use a 3-dimensional experiment (with a single shift list) where all of the planes that correspond to different delay times are combined into one spectrum data set. Within CCPN this type of data set will be referred to as a pseudo-3D experiment; the third dimension is not in PPM, it is delay time and corresponds to the "z" axis - i.e. orthogonal to the X-Y plane that is the screen. We could have used separate 2D experiments in a similar manner to the temperature series if we had that sort of data, although we would probably use a single shift list for all experiments.
Firstly look at the T1 rates experiment by selecting window 5. Switch off all spectra except for the T1 experiment. Note at the bottom of this window is a different kind of scrollbar, made up of buttons. These buttons control which planes of the pseudo-3D experiment are visible on screen. With the left mouse button you can shift the visible planes through the series, and with the middle mouse button (try both left and right buttons together if you don't have a middle button) you can extend the number of visible planes. Try this and switch on all of the planes of the experiment. Then pick some peaks in these planes by using <Shift> + <Ctrl> + Left mouse. Note that you will be picking peak maxima in all planes that are visible. If you didn't want pick in a specific plane, simply ensure that it is switched off.
For fine user control we could pick all the peaks in the T1 series, however, if you toggle on the [HSQC:115] spectrum (use the [Spectra] option at the top of the window), you will see that this assigned HSQC aligns well with the T1 series. We will use this fact to spread the HSQC assignments to all of the planes of the T1 experiment when we do the relaxation analysis.
Before we do the relaxation analysis got to M:Experiment:NMR Series. Clicking on the "T1" row you will note that the time values for the series are already entered. This is because the spectrum was loaded from a parameter file (AZARA format in this instance) where the time points of the series are already specified. Anyhow, in the top table set "Condition Type" to "delay time". Before you close the popup make sure that 'Unit' is set to seconsd ('s').
Now select M:Data Analysis:rates Analysis. You will note that the Experiment series is already correctly selected, by virtue of our selection of delay time as the parameter type. Ensure that the Reference Peak List is set to "HSQC:115" - the one we have assignments for, and that the Fitting Function is set to "A exp(-Bx)"; i.e. an exponential decay. Set the 1H tolerance to 0.05 and 15 tolerance to 0.1 - this indicates the size of the region with which analysis will try to pick peaks in the pseudo-3D experiment, based on the locations in the assigned HSQC. Finally click [Group Peaks]. If some fits fail just click 'OK'.
As with the temperature series you will see that the assignment has propagated across the whole series and that the selected function/graph has been fitted to the peaks. Although this time is is an exponential function that has been fitted to the peak intensity. Again, clicking on [Show Function Fit] and [Next Set] allows you to quickly investigate the quality of fit for all peaks in each group.
Note that the time constant for each group is recorded in the "TC" column. This is calculated from the inverse of the exponential decay rate; i.e. the "B" in "A exp(-Bx)". You may draw a graph for the T1 values or export the data as text in the same manner as the temperature series, and additionally you can choose to create a dedicated T1 list by pressing [Make T1 List]. This simply records the T1 values (as indicated by the TC column) in a list of NMR measurements that are saved when your CCPN project is saved. Accordingly this data may be looked at at any time in the future (and even deleted) by selecting M:Data Analysis:Measurement Lists and selecting the T1 row.
Scalar couplings
In the next part of this tutorial we shall look at a different kind of data extraction. In this case we will look at extracting coupling constants from the quantitative HNHA experiment. The basic principle here is that by measuring the relative intensities if the amide diagonal peak and the amide-Ha cross peak this special experiment gives us an indication of what the H-HA coupling constant (and hence the backbone PHI angle of residue) might be.
Firstly have a look at window 4. You will see that we have already picked many of the HNHA peaks, including the diagonals, and have assigned them to the amide resonances. This was achieved using the M:Assignment:Link Peak Lists functionality, as described in the previous tutorial. Note that we did not have to explicitly assign indirect proton H and HA resonances (Y axis for this window)
Bearing this setup spectrum in mind, go to M:Data Analysis:3J H-Ha Coupling. The first notable thing when the popup appears is that the main table is already filled in. This is because the experiment type is already set correctly to "H{[N]+[HA]}". You can go to M:Experiment:Edit Experiments:{Experiment Types} to verify this. In the "Calculate 3J[Hn,Ha] Couplings" popup select the {Options} table and have a look at the settings that have been used to fill in the main table. Specifically note that you are required to have specified the right Transfer Time for the experiment (known from the pulse sequence) and have reasonable estimates for the Karplus Coefficients. Calculating such coefficients with great precision is not something I will go into here.
Returning to the {Spin System Table} note that by selecting on a residue's row you are instantly zoomed to the corresponding amide location in the selected window, so that you can investigate any peaks. If you wish to exclude a particular residue from being used to make any output you can simply toggle (double-click) the "Use?" column to "No". The analysis of the amide-HA/amide-amide intensity ratio can be seen and the estimated 3J HNHA coupling and PHI backbone angles (as estimated from the Karplus curve) that derive from this are filled in. Of course the accuracy of such estimates very much depends on the parameters you have entered and whether there are any peak distortions or overlaps which would give unrepresentative intensities.
When you are content with your values, peaks and peak selection you have three choices of how to preserve the data for posterity. The first is to [Make Coupling List] - this simply makes and NME measurement list (in the same vein as chemical shifts or T1 rates). The second is to use the Karplus relation ship to make a PHI angle dihedral restraint list for your protein which can be used in structure calculations: click [Make Dihedral Restraints]. The last option is to [Make Coupling Constraints] which is useful if you have a protein structure calculation method that can back-calculate couplings and fit them to the experimentally derived values.
Dihedral constraints
Following on from generating dihedral restraints using an HNHA experiment, we will generate backbone dihedral restraints in a different manner, from backbone chemical shifts. We will be using a program called DANGLE (Dihedral ANgles from Global Likelihood Estimates) which is embedded within Analysis. DANGLE estimates dihedral angles from chemical shifts in a similar manner to TALOS; i.e. it matches a chemical shift & sequence query to a structural database of known PHI/PSI angles and chemical shifts. However, DANGLE uses a different (Bayesian) method to produce an angle estimate and tolerance, compared to TALOS. The idea is to use Bayesian inference to infer what the range of likely PHI/PSI angles might be (using the chemical shifts) by checking all PHI/SHI combinations in 10 degree square bins to see how well such angles can be used to explain the data. Such an analysis allows for the user to see uncertainties in the angle predictions, including where the chemical shift to structure mapping is redundant and there are multiple regions in the ramachandran plot which could explain the chemical shift data.
To run DANGLE select M:Structure:DANGLE - Predict Dihedrals. Note that at the top that the Chain should be set as "MS1:A", the Shift List as "ShiftList 1:1" and Max No. of Islands as 2. This simply specifies which data to use and how strict the analysis should be. Using two islands means that we will reject predictions that result in more than two discrete regions of the Ramachandran plot. To start the analysis press [Run Prediction] and accept "Run1" as the name for the job by pressing [OK] at the opportune moment. Please be aware that DANGLE will take several minutes to finish the calculation.
Once the calculation is over you will see the main table filled in with PHI and PSI backbone dihedral angle predictions and their associated error ranges. Further, if you select a row in the main table you will see a plot in Ramachandran (PHI/PSI) space of where the likely angles are deemed to be. Click on the "9 Tyr" row and note that there is a lot of red colour in the chart, indicating that DANGLE was not able to make a distinct choice of PHI/PHI: you should not use such a prediction in a structure calculation. Click on the [Next] button to get to "10 Arg". The prediction for this residue is somewhat better, ans you could use this in a structure calculation (it has one discrete region) although the error bounds for such a dihedral restraint would be suitably large. Click [Next] once more to get to "11 Glu". This residue has a very precise range of predicted PHI/PSI angles. Such a residue could be used in a structure calculation with a high degree of confidence and proportionately narrow error margins.
Note that DANGLE also predicts the secondary structure of the residues, but that this calculation is not made from the angles, but directly from the measured secondary structures in the shift-structure database. To make the restraints themselves simply press [Commit Constraints]. This will make a PHI/PHI dihedral angle restraint list in the selected Constraint Set (which you might remember corresponds to one fixed/frozen assignment state so that you always know what you actually restrained even if assignments on the peaks change).
View the generated restraints by going to M:Structure:Restraints & Violations:{Restraints} - ensuring the restraint set matches the one you put the calculation results into and the Constraint List is set to "1:Dihedral:DANGLE...". Note that if you have a structural model for your protein you can see how the model's angles match with the DANGLE prediction. You can do this by either pressing [Calculate Violations] (load a structure first), or in the DANGLE interface itself you can select a structure from which the PHI/PSI angles are superimposed on the angle plot.
Entering non-standard molecules
Now we will look again at setting up molecular information in a CCPN project, but this time we we go beyond the canonical linear protein sequence and enter some of the non-standard connectivities and residues.
We will setup a discontinuous molecule with two polypeptide sections and internal disulphide links. This would be the situation that you would find in insulin for example. To enter a molecule start by going to M:Molecules:Molecule Setup and select the {Sequences} tab. At the top change the "Molecule:" pulldown menu to "<New>" then click [Add Polymer] and accept the name for the molecule by clicking [OK]. You are now taken to the {Add Sequence} tab. Ensure that the Input Type is set to "1-letter" and type in any arbitrary protein sequence, with the sole constraint that it must have two cysteine residues. Then click [Add Sequence!] and you will be taken back to the {Sequence} tab where you can see the section of polypeptide you have just created. Now click [Add Polymer] once again and for a second time add a protein sequence with two cysteine residues and press [Add Sequence!]. When you return once again to the {Sequence} tab you will see that there are two polypeptide regions, and by looking in the "Polymer Linking" and "Linked Residues" columns you can see that the sections are separate.
Now find the row of the first Cys residue and double-click in the "Descriptor & Stereochemistry" cell. Change the descriptor from "prot:HG" to "link:SG". Repeat this for the three remaining Cys residues. Then with a Cys row selected click on [Edit Links]. You are now taken to the links tab with the appropriate Cys selected. You will see that the "prev" and "next" links will be filled in with existing residues, but the "SG" link has no destination residue set. Double-click in the "Destination Residue" column for the "SG" row and set the residue to one of the Cys residues from the other polypeptide section. We have linked two Cys residues by a disulphide link but have still one more link to make. In the Source Residue select one of the unlinked Cys residues and set its destination residue to the last unlined Cys. Returning to the {Sequence} column you will see that the Cys residues are listed as having three linked residues; two from the peptide and one from disulphide.
We will now use our fully linked molecule, which is really just a sequence template, to build a chain containing all of the atoms that can be used for NMR assignment. Accordingly select the {Chains} tab and ensure that the "Mol System for new chain:" pulldown is set to "<New>" and that the "Template for new chain" is set to the molecule we just created. Using a new molecular system is important here so that we keep the new sequence separate from the existing protein. Now click [Make Chain From Template] and accept the MolSystem code and chain code by pressing [OK], then answer [Yes] if Analysis asks about equivalent aromatic atoms. You will see that a new chain has appeared in the top table, but unlike the existing protein chain it has two chain fragments.
Click on the row of the new chain and you will see that the bottom table changes to show the two polypeptide regions. Now we will change the numbering of the second polypeptide section of our chain. Do this by double-clicking the "Start Seq Number" column for the second row. Now enter a number that is higher than the original start number.
Finally we will look at the fruits of our labour by selecting M:Molecules:Atom Browser. In the Chain pulldown menu select the last entry, which should correspond to our newly entered sequence, and ensure that the hydrogen atoms are visible by clicking the [H] button. Firstly have a look at the Cys residues, you will see that they have no gamma hydrogen, which is what we would expect given the disulphide links. Then scroll down in the table to look at the end of the first polypeptide region and the beginning of the second. Note that the Residue number is discontinuous after we set a different starting number for the second section. Note that we could have also changed the starting number of the first section too, as long as there is no overlap with the second (i.e. residue numbers must be unique).
Isotope labelling schemes
We will now have a look at isotope labelling patterns. Select M:Molecules:Isotopomer Schemes. You will see that Analysis already comes with some ready made prepared isotope labelling patterns, such as "2-13C Glycerol" and "GAFY". We will have a look at one of these labelling schemes, so select the "13Glycerol" row and then select the {Isotopomers} tab. Select the Ala row in the top table and you will see that the bottom table is filled to show that the carbonyl "C" and "CB" have a 19:1 13C:12C labelling (i.e. 95% 13C 5% 12C). Note that the "CA" atom remains at natural abundance ratios (98.93% 12C). The setup for Ala within this scheme is relatively simple, as it only has one pattern. Returning to the top table again you can see that whereas Ala has only one labelling variant Arg has 8 varients (8 isotopomers). Click on each of the Arg variant rows in turn and you will see that in each case a different combination of its carbon atoms is labelled with 95% 13C, this is because the labelling scheme represents the pattern that results from feeding bacteria 1,3 13C labelled glycerol and in the case of arginine there are multiple ways that the carbon atoms from glycerol can find their way into the amino acid. All of these residue labelling patterns must be considered when we do NMR because overall they tell us which atoms will be visible (labelled) in an experiment and individually they specify which correlations are observable; e.g. can we see a CA to CB peak. - In the case of Ala with 1,3 13C Glycerol the answer is no.
If we wanted to we could define our own isotope labelling schemes, but for now we will leave the default set and see how such schemes are used at various points within Analysis. First have a look at M:Molecules:Atom Browser, choose the Chain to be "Default:A" and select only the [C] button (turning other element types off as required). Now in the labelling scheme pulldown on the {Options} tab select "13Glycerol". You will see that selection of the labelling scheme has removed some of the atom options; these are now deemed to be unlabelled and hence invisible. Referring back to what we discovered about alanine in this labelling scheme, note that there is no Ala CA labelled. Now change the labelling scheme to "GAFY"; which represents a situation where only four kinds of residue have 13C labelling (i.e. glycine, alanine, phenylalanine & tyrosine). Looking at the carbon atoms in the sequence you will see that most are not shown, indeed only those for Gly, Ala, Phe & Tyr are shown, consistent with the labelling pattern. So you see that overall if you have a labelled molecule in your NMR sample as long as you select the correct scheme you will be presented only with those atoms which it ought to be possible to assign to.
Labelling schemes can also affect assignment options, as well as atom options. To illustrate this go to window 1 and select toggle the [HSQC:115] spectrum on (you may need to select [Spectra] to do this). With the mouse over one of the assigned peaks press the <a> key to bring up the Edit Assignment popup. You will see that the F1 (1H) dimension and F2 (15N) dimension both have resonances assigned to them, as indicated in the left hand tables. In the right hand tables note that both dimensions have one or more resonance possibilities (and one of these will be chosen for the assignment). Now in the Labelling Scheme pulldown select "uni_15N13C2H" to pretend that we have a triple-labelled sample with deuterium instead of 1H. Note how with this scheme selected the hydrogen resonance possibilities disappear from the top right table; under such a labelling scheme it ought not be possible to assign 1H resonances.
Finally we will look at how a labelling scheme can also be used to filter out impossible combinations when generating distance restraints. Select M:Structure:Make Dist Restraints and then set the Peak Lists pulldown menu in the resulting popup to be "C-NOESY". Now press [Make Shift Match Restraints] after a while you will see that this generates arouns 1107 distance restraints for our carbon NOESY experiment. And clicking on the {Restraints} table on the Restraints & Violations popup you will see that these restraints are highly ambiguous. Now return to the Make Distance Restraints popup and change the labelling scheme to "2Glycerol", set the minimum isotope fraction to 0.25 and then click [Make Shift Match Restraints] once again. Once the restraint generation is complete (it takes longer this time because of the labelling scheme), return to the restraint table and select the second restraint list in the pulldown menu. Note that this time we have generated fewer restraints (and that the restraints are less ambiguous). This is because we have only pairs of protons where the bound carbon is labelled in the scheme.