Biochemistry databases & software

Databases

Transterm. The translation control database.

Imprinted Gene Catalogue A catalogue of imprinted genes and parent-of-origin effects in humans and animals.

Retrobase. Database of transposable genetic elements.

Online Tools

In collaboration with Dr Andrew Firth (University of Cambridge), Dr Wayne Patrick has investigated the statistics associated with constructing and sampling large, randomized protein-encoding libraries. They have written algorithms for estimating the diversity in libraries generated by the most commonly-used randomization methods. These are available through a user-friendly web interface.

Use GLUE, PEDEL, or DRIVeR

Software - developed by Dr Peter Stockwell

DMAP (300k compressed TAR archive) Differential Methylation Analysis Package.

Test dataset 1 for DMAP (compressed TAR archive ~ 650 MB)  Reads from chromosome 1 for 2 samples ( 1 control and 1 disease).

Test dataset 2 for DMAP (compressed TAR archive ~ 88 MB) - Reads from first 10 MBases of chromosome 21 for 6 samples ( 3 control and 3 disease)

DMAP (Differential Methylation Analysis Package) is a suite of tools to facilitate large-scale genomic DNA methylation analysis. Some of the tools were described before (Aniruddha Chatterjee, Peter A. Stockwell, Euan J. Rodger, and Ian M. Morison, Comparison of alignment software for genome-wide bisulphite sequence data, Nucleic Acids Research Volume 40, Issue 10Pp. e79); they are updated and are part of the DMAP package along with further new tools to complete the workflow for DNA methylation analysis. 

DMAP components can filter and process aligned bisulphite sequenced data to generate comprehensive reference methylomes in different units for any genome. DMAP can process aligned SAM files of multiple samples to provide reliable and statistically significant differentially methylated regions, then can relate them to proximal genes and CpG features with reasonable rapidity. The package provides output in an appropriate format for bench-scientists to further analyse the results without much bioinformatics expertise.

DMAP is distributed as scripts and source code in a compressed tar archive which can be unpacked to generate the complete sources and Makefile. DMAP has been developed and tested on MacOS X systems (10.6 and 10.7) using gcc v4.2.1 and on various Linux platforms (RedHat, Centos, Fedora, Ubuntu) and will compile and run on any appropriate C compiler and 64 bit environment.  The test data download can be unpacked to a directory test_MDS_data containing fastq sequence reads and sam files from bismark mapping.  A pdf in the download describes these in more detail.

DMAP has been written by Peter Stockwell with considerable input from Aniruddha Chatterjee. We aim to continue development of the tool and its documentation to further facilitate DNA methylation analysis.

 

SSEdit(~11Mb dmg file) - Single (Short) Sequence Editor

A MacOS X application for viewing and editing single sequences.  Input and output files in FASTA, NRBF, Staden and other formats are supported with the capability to recognise some by the file extension name.  SSedit will display nucleic acid sequences in double stranded form, can reverse and complement the sequence and can translate to peptide sequences using various genetic codes including user-created ones.  Sequences of arbitrarily large length can be processed, limited by available memory on the computer.  Search functions are provided and printer listings and screen displays can be controlled extensively.  External sequences can be inserted in whole or in part into the main sequence at the active position.  SSedit v6.08 works with MacOS 10.5 (Leopard) or later and is distributed as a universal application.  Documentation is included.

Homed(~11Mb dmg file) - HOMologous sequence EDitor

A MacOs X application for displaying and editing multiple sequence
alignments.  A considerable number of MSA formats are supported,
including Nexus, Phylip and Clustal.  Nucleic acid alignments can be
displayed and edited as the corresponding peptide translation, using
all usual and user-specified genetic codes.  Furthermore, in order to
maintain nucleic acid alignments on codon boundaries, it is possible
to take a set of nucleic acids, export them as a peptide set which can
be aligned by usual multiple alignment applications, then imprint the
gapping back on to the original nucleic acid set.  This process is
often needed in phylogenetic work where a nucleic acid multiple
alignment will not preserve the codon boundaries of the encoded
peptide.

Homed allows various uses of colour, for showing a range of parameters
for different amino acid residues or for highlighting specific
residues: various colour control functions are provided for this.
Alternatively, portions of individual sequences or of the complete
alignment can be coloured as blocks to indicate structural or
functional domains. using information that is contained in textual
files that can be saved from the application or optionally created or
modified manually.

Homed is not a multiple sequence aligner, but will allow a new
sequence to be manually edited to conform to an existing alignment.
Substantial control is available for the appearance of printer
listings and screen displays with the ability to reorder sequences and
other elements of the display by dragging and dropping.  Interfacing
with the MacOS printing system allows Homed to write listings as pdf
files, complete with any colour features that have been set.

Homed has no formal limits for the number of sequences or their
lengths, although very large sets will take appreciable time to load
and modify on edits.  However, available memory on the computer will
cause some upper limit.  Homed v6.8 works with MacOS 10.5 (Leopard) or
later and is distributed as a universal application.  Documentaton is
included.