RATT: rapid annotation transfer tool aka Big Brother Annotator
RATT is software to transfer annotation from a reference (annotated) genome to an unannotated query genome.
It was first developed to transfer annotations between different genome assembly versions. However, can also transfer annotations between strains and even different species, like Plasmodium chabaudi onto P. berghei or Salmonella enterica onto Salmonella virchow. RATT is able to transfer any entries present on a reference sequence, such as the systematic id or an annotator’s notes; such information would be lost in a de novo annotation. Furthermore, RATT checks whether gene models have changed between the two sequences and can correct changed start and stop codons, or frameshifts.
Please visit the http://ratt.sourceforge.net page for examples.
Algorithm based automatic contiguation of assembled sequences (ABACAS)
ABACAS is intended to rapidly contiguate (align, order, orientate) , visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. It uses MUMmer to find alignment positions and identify syntenies of assembly contigs against the reference. The output is then processed to generate a pseudomolecule taking overlaping contigs and gaps in to account. MUMmer’s alignment generating programs, Nucmer and Promer are used followed by the ‘delta-filter’ utility function. Users could also run tblastx on contigs that are not used to generate the pseudomolecule.
abacas.pl -r -q -p [Options]
IMAGE (Iteratively Mapping and Assembly for Gap Elimination) is a top secret pipeline to improve existing capillary/454 genome assembly using Illumina reads. But it is not a secret any more.
RATT out put file
There are several types of output file: Statistics that report differences, files that refer to the query and files that refer to the reference. The files start with the resultName prefix specified by the user when starting RATT. Report files end with .csv and can be imported into spreadsheet programs. These files ends with gff or embl, and can be loaded into Artemis or ACT, see below. All files that have the name of a replicon of the reference, are relative to the reference. Those files that contain the name of the query replicons, are relative to the query sequence.
The first report is given when the program is running. It tells the user how many regions of the reference are syntenic with the query and vice versa. It also reports, how many tags are transferred and how many are not. Tags include features like ncRNA, UTR, gap-tags, repetitive regions or CDS.
The file ResultName-prefix.replicon.report.csv – Reports how many gene model were wrong after the transfer, and how they could be corrected.
Files for the reference:
ResultName-prefix.replicon.NOTTransfered.embl – These are annotations that couldn’t be transfered. This can include whole genes, or just exons.
Reference/ResultName-prefix.replicon.Mutations.gff – This files contains all the difference of the query compared to the reference. Also it shows the regions that are not syntenic between both genomes. This can be due to insertions/deletions, low similarity, or 100% similar repeats. Important the annotation of those regions cannot be transferred!
Files for the query:
ResultName-prefix.replicon.embl – These are the uncorrected transfered annotations from the reference onto the query.
ResultName-prefix.replicon.Final.embl – These are the corrected annotations for the query.
ResultName-prefix.replicon.report.gff – An important file, as it shows, where RATT has corrected CDS models, or where errors remain. This includes corrections/errors in start/stop codon, splice sites, frameshifts and joined exons.
Query/ResultName-prefix.replicon.Mutations.gff – This files contains all the differences between the reference and query. In addition, it shows regions that are not syntenic between both genomes. This can be due to insertions/deletions, low similarity, or 100% similar repeats. Important the annotation of these regions will not be transferred! These regions in the query the annotation must be determined by other tools.
Sequencing tools for genomes (<500M)