Name: Dr. Ruhul Amin
Affiliation: Assistant Professor, Computer and Information Sciences Department, Fordham University.
Date: 16/02/2022
Title: Homology Based Sequence Annotation Algorithms
Abstract: Research in bioinformatics is driven to analyze and interpret biological sequences. Analysis of biological sequences is done with respect to the corresponding interpretation of biological functions that begin with annotation. However, published genomes have been shown to be very uneven in terms of both sequencing and annotation quality, reducing dramatically in both aspects as we enter the long tail of non-model organisms. In this talk, I will present methods to identify massive numbers of prokaryotic sequence annotation errors in public databases and demonstrate that homology and pattern matching techniques can be deployed to solve them. In summary, we have re-annotated 12,495 16S rRNA 30 ends, increasing the total number of prokaryotes with 16S rRNAs containing antiSD sequences from 8,153 to 20,648, and increasing the number of organisms known to lack an antiSD from 15 to 128. Following this discussion, I will present DeepAnnotator, a deep learning method to solve the problem of genome annotation at a large scale. DeepAnnotator uses Recurrent Neural Network with Long Short-Term Memory to predict the start, stop, and coding sequences of a gene and accumulates all those scores by a downstream algorithm to annotate genome sequences. DeepAnnotator establishes a generalized computational approach for genome annotation using deep learning and achieves an F-score of 94%.