Resources from Chris Brown's Group at the University of Otago
Major headings are at the top, with links to lower down the page, then links to the resources.
Most of the tools listed here have been described in publications and abstracts are shown.
TransTerm.A species centric database of key mRNA regions and features (e.g. stop codon usage, stop signal usage, initiation context).
Transterm: a database to aid the analysis of regulatory sequences in mRNAs (Jacobs et al 2009).
Grant H. Jacobs Augustine Chen, Stewart G. Stevens, Peter A. Stockwell, Michael A. Black, Warren P. Tate and Chris M. Brown
Messenger RNAs, in addition to coding for proteins, may contain regulatory elements that affect how the protein is translated. These include protein and microRNA-binding sites. Transterm (http://mRNA.otago.ac.nz/Transterm.html) is a database of regions and elements that affect translation with two major unique components. The first is integrated results of analysis of general features that affect translation (initiation, elongation, termination) for species or strains in Genbank, processed through a standard pipeline. The second is curated descriptions of experimentally determined regulatory elements that function as translational control elements in mRNAs. Transterm focuses on protein binding sites, particularly those in 3′-untranslated regions (3′-UTR). For this release the interface has been extensively updated based on user feedback. The data is now accessible by strain rather than species, for example there are 10 Escherichia coli strains (genomes) analysed separately. In addition to providing a repository of data, the database also provides tools for users to query their own mRNA sequences. Users can search sequences for Transterm or user defined regulatory elements, including protein or miRNA targets. Transterm also provides a central core of links to related resources for complementary analyses.
Translation Efficiency. Understanding the relationship between proteins and the mRNAs that encode them. Genes under translational control. The poor correlation between protein and RNA levels in human cells.
In Silico Estimation of Translation Efficiency in Human Cell Lines: Potential Evidence for Widespread Translational Control (Stevens and Brown, PLoS ONE 2013)
Stewart G. Stevens, Chris M. Brown
Recently large scale transcriptome and proteome datasets for human cells have become available. A striking finding from these studies is that the level of an mRNA typically predicts no more than 40% of the abundance of protein. This correlation represents the overall figure for all genes. We present here a bioinformatic analysis of translation efficiency – the rate at which mRNA is translated into protein. We have analysed those human datasets that include genome wide mRNA and protein levels determined in the same study. The analysis comprises five distinct human cell lines that together provide comparable data for 8,170 genes. For each gene we have used levels of mRNA and protein combined with protein stability data from the HeLa cell line to estimate translation efficiency. This was possible for 3,990 genes in one or more cell lines and 1,807 genes in all five cell lines. Interestingly, our analysis and modelling shows that for many genes this estimated translation efficiency has considerable consistency between cell lines. Some deviations from this consistency likely result from the regulation of protein degradation. Others are likely due to known translational control mechanisms. These findings suggest it will be possible to build improved models for the interpretation of mRNA expression data. The results we present here provide a view of translation efficiency for many genes. We provide an online resource allowing the exploration of translation efficiency in genes of interest within different cell lines (http://bioanalysis.otago.ac.nz/TranslationEfficiency).
CRISPRTarget. Discovery of the functional targets of CRISPR RNA elements in viral (bacteriophage), mobile elements, or chromosomal DNA. Computational or bioinformatic prediction of CRISPR targets.
More Information: CRISPRTarget: Bioinformatic prediction and analysis of crRNA targets (Biswas et al, 2013)
Ambarish Biswas Joshua N. Gagnon Stan J.J. Brouns, Peter C. Fineran and Chris M. Brown
The bacterial and archaeal CRISPR/Cas adaptive immune system targets specific protospacer nucleotide sequences in invading organisms. This requires base pairing between processed CRISPR RNA and the target protospacer. For type I and II CRISPR/Cas systems, protospacer adjacent motifs (PAM) are essential for target recognition, and for type III, mismatches in the flanking sequences are important in the antiviral response. In this study, we examine the properties of each class of CRISPR. We use this information to provide a tool (CRISPRTarget) that predicts the most likely targets of CRISPR RNAs (http://bioanalysis.otago.ac.nz/CRISPRTarget). This can be used to discover targets in newly sequenced genomic or metagenomic data. To test its utility, we discover features and targets of well-characterized Streptococcus thermophilus and Sulfolobus solfataricus type II and III CRISPR/Cas systems. Finally, in Pectobacterium species, we identify new CRISPR targets and propose a model of temperate phage exposure and subsequent inhibition by the type I CRISPR/Cas systems.
More Information: CRISPRDetect: A flexible algorithm to define CRISPR arrays (2016)
Biswas, A., R. H. Staals, S. E. Morales, P. C. Fineran and C. M. Brown
BACKGROUND: CRISPR (clustered regularly interspaced short palindromic repeats) RNAs provide the specificity for noncoding RNA-guided adaptive immune defence systems in prokaryotes. CRISPR arrays consist of repeat sequences separated by specific spacer sequences. CRISPR arrays have previously been identified in a large proportion of prokaryotic genomes. However, currently available detection algorithms do not utilise recently discovered features regarding CRISPR loci. RESULTS: We have developed a new approach to automatically detect, predict and interactively refine CRISPR arrays. It is available as a web program and command line from bioanalysis.otago.ac.nz/CRISPRDetect. CRISPRDetect discovers putative arrays, extends the array by detecting additional variant repeats, corrects the direction of arrays, refines the repeat/spacer boundaries, and annotates different types of sequence variations (e.g. insertion/deletion) in near identical repeats. Due to these features, CRISPRDetect has significant advantages when compared to existing identification tools. As well as further support for small medium and large repeats, CRISPRDetect identified a class of arrays with 'extra-large' repeats in bacteria (repeats 44-50 nt). The CRISPRDetect output is integrated with other analysis tools. Notably, the predicted spacers can be directly utilised by CRISPRTarget to predict targets. CONCLUSION: CRISPRDetect enables more accurate detection of arrays and spacers and its gff output is suitable for inclusion in genome annotation pipelines and visualisation. It has been used to analyse all complete bacterial and archaeal reference genomes.
Scan for Motifs. Computational models of experimentally determined translational control elements and search your sequences with them. MicroRNA and RBA-BP sites are searched simultaneously.
More Information: Scan for Motifs: a webserver for the analysis of post-transcriptional regulatory elements in the 3' untranslated regions (3' UTRs) of mRNAs (Biswas et al 2014)
Biswas, A. and C. M. Brown
Scan for Motifs (SFM) simplifies the process of identifying a wide range of regulatory elements on alignments of vertebrate 3'UTRs. SFM includes identification of both RNA Binding Protein (RBP) sites and targets of miRNAs. In addition to searching pre-computed alignments, the tool provides users the flexibility to search their own sequences or alignments. The regulatory elements may be filtered by expected value cutoffs and are cross-referenced back to their respective sources and literature. The output is an interactive graphical representation, highlighting potential regulatory elements and overlaps between them. The output also provides simple statistics and links to related resources for complementary analyses. The overall process is intuitive and fast. As SFM is a free web-application, the user does not need to install any software or databases.
CisRegRNA . Computational models (covariance/RFam) of structured RNA cis-regulatory elements.
More Information: Global or local? Predicting secondary structure and accessibility in mRNAs (Lange et al 2012)
Lange, S. J., D. Maticzka, M. Mohl, J. N. Gagnon, C. M. Brown and R. Backofen
Determining the structural properties of mRNA is key to understanding vital post-transcriptional processes. As experimental data on mRNA structure are scarce, accurate structure prediction is required to characterize RNA regulatory mechanisms. Although various structure prediction approaches are available, it is often unclear which to choose and how to set their parameters. Furthermore, no standard measure to compare predictions of local structure exists. We assessed the performance of different methods using two types of data: transcriptome-wide enzymatic probing information and a large, curated set of cis-regulatory elements. To compare the approaches, we introduced structure accuracy, a measure that is applicable to both global and local methods. Our results showed that local folding was more accurate than the classic global approach. We investigated how the locality parameters, maximum base pair span and window size, influenced the prediction performance. A span of 150 provided a reasonable balance between maximizing the number of accurately predicted base pairs, while minimizing effects of incorrect long-range predictions. We characterized the error at artificial sequence ends, which we reduced by setting the window size sufficiently greater than the maximum span. Our method, LocalFold, diminished all border effects and produced the most robust performance.
CisRNA-SVM Genome wide predictions of novel structured RNA cis-regulatory elements in human 3' UTRs.
More Information:Computational identification of new structured cis-regulatory elements in the 3'-untranslated region of human protein coding genes (Chen and Brown 2012b)
Chen, X. S. and C. M. Brown
Messenger ribonucleic acids (RNAs) contain a large number of cis-regulatory RNA elements that function in many types of post-transcriptional regulation. These cis-regulatory elements are often characterized by conserved structures and/or sequences. Although some classes are well known, given the wide range of RNA-interacting proteins in eukaryotes, it is likely that many new classes of cis-regulatory elements are yet to be discovered. An approach to this is to use computational methods that have the advantage of analysing genomic data, particularly comparative data on a large scale. In this study, a set of structural discovery algorithms was applied followed by support vector machine (SVM) classification. We trained a new classification model (CisRNA-SVM) on a set of known structured cis-regulatory elements from 3'-untranslated regions (UTRs) and successfully distinguished these and groups of cis-regulatory elements not been strained on from control genomic and shuffled sequences. The new method outperformed previous methods in classification of cis-regulatory RNA elements. This model was then used to predict new elements from cross-species conserved regions of human 3'-UTRs. Clustering of these elements identified new classes of potential cis-regulatory elements. The model, training and testing sets and novel human predictions are available at: http://mRNA.otago.ac.nz/CisRNA-SVM.
Iron Responsive Elements (Stevens et al 2011) Genome wide predictions in human mRNAs
Disease associated 3' UTR variants (UTRPathDB): In preparation unpublished.
More Information: HBVRegDB: annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences (Panjaworayan et al 2007)
Panjaworayan, N., S. K. Roessner, A. E. Firth and C. M. Brown
BACKGROUND: The many Hepadnaviridae sequences available have widely varied functional annotation. The genomes are very compact (approximately 3.2 kb) but contain multiple layers of functional regulatory elements in addition to coding regions. Key regions are subject to purifying selection, as mutations in these regions will produce non-functional viruses. RESULTS: These genomic sequences have been organized into a structured database to facilitate research at the molecular level. HBVRegDB is a comparative genomic analysis tool with an integrated underlying sequence database. The database contains genomic sequence data from representative viruses. In addition to INSDC and RefSeq annotation, HBVRegDB also contains expert and systematically calculated annotations (e.g. promoters) and comparative genome analysis results (e.g. blastn, tblastx). It also contains analyses based on curated HBV alignments. Information about conserved regions - including primary conservation (e.g. CDS-Plotcon) and RNA secondary structure predictions (e.g. Alidot) - is integrated into the database. A large amount of data is graphically presented using the GBrowse (Generic Genome Browser) adapted for analysis of viral genomes. Flexible query access is provided based on any annotated genomic feature. Novel regulatory motifs can be found by analysing the annotated sequences. CONCLUSION: HBVRegDB serves as a knowledge database and as a comparative genomic analysis tool for molecular biologists investigating HBV. It is publicly available and complementary to other viral and HBV focused datasets and tools http://hbvregdb.otago.ac.nz. The availability of multiple and highly annotated sequences of viral genomes in one database combined with comparative analysis tools facilitates detection of novel genomic elements.
Viral Division of TransTerm: CDS sequences, Compositional statistics, codon usages and biases in the viral division of GenBank- used for novel virus discovery and annotation.
HBV PRE: Computational models of the HBV Post-transcriptional Regulatory Element (PRE) Panjaworayan 2007
More Information: HBVRegDB: Annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences
Nattanan Panjaworayan, Stephan K Roessner, Andrew E Firth and Chris M Brown
Background. The many Hepadnaviridae sequences available have widely varied functional annotation. The genomes are very compact (~3.2 kb) but contain multiple layers of functional regulatory elements in addition to coding regions. Key regions are subject to purifying selection, as mutations in these regions will produce non-functional viruses.
These genomic sequences have been organized into a structured database to facilitate research at the molecular level. HBVRegDB is a comparative genomic analysis tool with an integrated underlying sequence database. The database contains genomic sequence data from representative viruses. In addition to INSDC and RefSeq annotation, HBVRegDB also contains expert and systematically calculated annotations (e.g. promoters) and comparative genome analysis results (e.g. blastn, tblastx). It also contains analyses based on curated HBV alignments. Information about conserved regions – including primary conservation (e.g. CDS-Plotcon) and RNA secondary structure predictions (e.g. Alidot) – is integrated into the database. A large amount of data is graphically presented using the GBrowse (Generic Genome Browser) adapted for analysis of viral genomes. Flexible query access is provided based on any annotated genomic feature. Novel regulatory motifs can be found by analysing the annotated sequences.
HBVRegDB serves as a knowledge database and as a comparative genomic analysis tool for molecular biologists investigating HBV. It is publicly available and complementary to other viral and HBV focused datasets and tools http://hbvregdb.otago.ac.nz. The availability of multiple and highly annotated sequences of viral genomes in one database combined with comparative analysis tools facilitates detection of novel genomic elements.
© 2012-2016, University of Otago, Dunedin, New Zealand
Dr Chris Brown, Biochemistry and Genetics Otago