Spronk Contract 2010
Task description and draft for contract proposal for Chris Spronk on CCPN grant 2010-2013
Spronk Tasks
- As his first task, the new programmer hired by Chris Spronk (henceforth 'the programmer') shall make a report on issue tracking, bug tracking and project control systems, listing pros and cons and recommendations. The CCPN team will provide some initial suggestions. This shall be used to select a system to use in communicating between the programmer and the rest of the CCPN team, and as applicable with other collaborators, for bug reports, etc.
- Secondly, the programmer shall set up a testing framework as described below, subtasks and timings to be decided. This should include analysing the code to look for functions that are priority targets for testing, including utility subroutines that are particularly widely used, and in general checking for test coverage.
- Thirdly the programmer shall write and maintain tests as practicable
- Fourthly the programmer shall write and maintain high level documentation, as well as the contents, organisation and search facilities of the CCPN Documentation WIKI. The exact tasks etc. need to be agreed in more detail, possibly as the project goes along.
Initial progression of work
To ensure a relatively gradual introduction to the CCPN API and CcpNmr code, the initial work could be framed as follows:- Identify a testing framework and implement a testing setup.
- Identify core functions, Python only (memops.universal), and start writing tests for these, in the meantime updating step 1.
- Identify simple core functions using CCPN objects (e.g. memops.general, ccp.general, ...), again updating step 1. if required.
- Progress to more complex tasks and testing - this will have to be detailed later.
Testing framework
- Comprehensive tests should be run nightly, both on current release with current patches, current stable repository, and possible leading-edge repository ('trunk').
- Tests should start with build, or install-and-patch to catch errors in this stage.
- Tests should be organised in small units so they are easy to modify, and one failure does not break subsequent tests. The system must allow individual tests to be (re-)run
- A condensed testing log should be emailed to CCPN core programmers every morning. Detailed logs may or may not be necessary if failed tests can be easily re-run.
- The latest successful build should be made available for users to use.
- Testing and building shall be on Linux in the first instance. Other operating systems can be considered later.
- Testing should be parallelised, to make use of the parallel processor capacity of Mammoth (or similar).
- The large, closely interlinked nature of CCPN projects make testing difficult. The general framework for a test will be:
- Load a starting project, optionally start with an empty project.
- Carry out a number of scripted operations on the project.
- At the end, and possibly at points along the way, compare the entire CCPN project with a stored correct version.
- More traditional unit testing may or may not be required in specific cases at some point. It is unlikely to be important at best because 1) most functions involve CCPN objects that implicitly allow changes at any point in the data structure 2) CCPN functions tend to be large and complex and do not lend themselves to comprehensive testing.
- Tests should include passing in wrong input and checking that the correct Exceptions are raised. Testing for correct functioning with correct input has higher priority, however.
- Result comparison code must be written by the CCPN core team. Possibilities are:
- Adding permanent object ID to make XML files diff-able, and comparing at the file level. This is on the TODO list anyway for other reasons.
- Writing an in-memory tree comparator, which could be based on the work done for Major backwards compatibility.
- In either case comparison would have to allow for exceptions to take care of guids, file paths, time stamps and sundries
- Any model change will change the look of relevant files. This will have to be allowed for in maintaining the system
Test tasks
- Pylint or similar code checkers should be used on all code. NB this requires the core coders to agree on a house style, lax or strict as it may be.
- FormatConverter tests. Wim already has a functioning system and test data to test the underlying FormatConverter code (not the GUI).
- Data Compatibility tests. Tests on data backwards and forwards compatibility code can easily be combined with FormatConverter tests as final output in non-CCPN format should be independent of any compatibility transformations along the way.
- Full system tests would involve running programs in the Extend-NMR pipeline, either singly or several in succession, and comparing the results. Wim has utility code for this purpose but it has not been combined and used yet. We can not realistically run very many, and the testable programs would in practice be limited to ARIA, HADDOCK, and CING. We might need CCPNGrid and/or external servers for this.
- GUI testing should test every single button, switch and editable column for Analysis and FormatConverter at least once, and should go through tutorials and examples. It should be set up using Xnee, unless the programmer can make a very good case for preferring an alternative. Xnee allows you to capture mouse clicks and data entry an so replay an interactive session in X11. As all operations depend on exact pixel positions we would need to run this on a single virtual machine for reproducibility. There should be a separate test for each popup, as well as tests for each step in the tutorial. It is the plan for Tim to make screen movies for documentation, which should allow non-specialists (e.g. the programmer) to make and maintain GUI tests.
- Function testing. Testing all possible input combinations for all functions in the manner of full unit testing is impossible - both code and data are too large and complex. Exercising every line of code at east once is desirable but probably still too difficult. For practicality code tests would have to be organised around high level functions. where large swathes of code could be tested several times with relatively little independent input. Testing would probably require handwritten scripts for each case, so these had better be short and organised to be easy to expand and modify. Many common tasks would be covered by the functions in the various XyzBasic.py modules, and if any are missing there would be a good case for adding them. Any workable ideas of how to simplify the task in practice would be great - but the workable part is a very big problem. The specific tests would have to be written by the core CCPN team, but the programmer should do as much of the framework as possible.