Interactive visual tools for multiple sequence alignments

*Click the Title above to view complete article on https://www.nation.com.pk/.

2014-12-07T01:35:09+05:00


Muhammad Tariq Pervez - Protein multiple sequence alignment is a highly scrutinised approach of identifying evolutionary related positions in a set of amino acid sequences. Although the protein alignment problem has been investigated for tens of years, several current studies have demonstrated significant development in improving the accuracy and scalability of multiple and pair wise alignment methods. Multiple Sequence Alignment (MSA) is normally the arrangement of three or more biological sequences (protein or nucleic acid) of similar length. The output reveals homology and the evolutionary relationships between the sequences studied.
All characteristics are subjects to the arrangement changing of different things like Protein, Corbin, hydrogen and potassium. Seeing the need of time I researched on Multiple Sequence Alignments (MSAs). Multiple sequence alignments (MSAs) are an essential first step for a number of computational approaches such as protein secondary structure/function prediction, phylogeny inference and many other common tasks in sequence analysis. Several software tools for generating multiple sequence alignments are available in all over the world markets but none of them is suitable for all situations. . Consequently, in order to generate a true alignment, a need raises for inspection and adjusting alignments by hand which is a very laborious job. Furthermore, handling large alignments is another problem in the domain of bioinformatics. Many popular alignment editors such as Jalview, STRAP, SeaView, PFAAT, MEGA, CINEMA and Base-By-Base are available. All of these tools either do not support big alignments or they do not have user friendly editing features.  Jalview and STRAP do not work when size of an alignment exceeds 30.01MB. CINEMA and Base-By-Base are good for alignments with 500 sequences. They hang up if size of an alignment exceeds 3MB.  SeaView and MEGA can load big alignments but the editing features provided by them involve multiple steps. Many software tools provide graphical user interfaces for MSA reconstruction tools but they do not allow loading multiple sequence files at the same. MSA comparison tools such as SuiteMSA permit to compare multiple MSAs directly but it does not support an alignment comprising more than 1000 sequences. Multiple formats of MSAs exist but FASTA is the most popular format and currently there is no tool that can convert format of an alignment of unlimited size into FASTA format. Many tools such as MatGAT for calculating identity matrix exist but they do not support big alignments. Therefore, it is extremely needed to develop software which can display, process and analyze very big alignments in very short time period. Institute of Biochemistry & Biotechnology, University of Veterinary and Animal Sciences, Lahore (IBBT-UVAS) took a leading step to develop software which provides view and allows the user to analyze an alignment with tens of thousands of sequences in very little time period.
 IVisTMSA is a software package of seven interactive visual tools for multiple sequence alignments. It is written in Java programming language. The main feature of this software is to manipulate alignments with hundreds of thousands sequences. MSApad is an editing and analysis tool for multiple sequence alignments. It can load 409% big alignments than Jalview, STRAP, CINEMA and Base by Base. It has implemented divide and conquer approach (implemented through Java threads) for efficient computation of consensus and conserved sequence, distance matrix for phylogenetic tree and identity of sequences. It also provides several unique editing features such as a user can insert sequence at any place of an alignment; a user can edit a single residue of an alignment without the need of opening a new interface. MSA comparator allows user to compute sum of pairs score and column score of alignments with several thousands of sequence in a very short time. It is 5200% efficient than the program written in ā€˜Cā€™ language by the BAliBASE developers. It also allows the user to evaluate where the reference and test alignments have conserved regions. MSA reconstruction tool provides graphical user interfaces for Clustal Omega, ClustalW2, MAFFT, MUSCLE and BioJava implementation of the algorithm of Smith and Waterman.  MSA reconstruction tool provides feature to load several sequence files simultaneously and then align them one by one. FASTA generator converts alignments of ClustalW, MSF, Phylip, PIR, GDE and Nexus formats of unlimited size into FASTA format. MSA ID calculator is a tool which can calculate identity matrix of more than 11000 sequences with sequence length of 2696 base pairs in less than 100 seconds. Tree and Distance Matrix calculation tools generate phylogenetic tree and distance matrix respectively using neighbor joining % identity and BLOSUM 62 matrix. We claim that MSApad, MSA comparator, FASTA generator and MSA ID calculator should process extra large alignments if they are executed using a computing machine with higher specifications described in this article.
 All tools of IVisTMSA were written in Java programming language. NetBeans IDE (Integrative Development Environment) 7.4 was used to write all tools of IVisTMSA. XML was used to save the state of work performed in MSApad and MSA comparator. BioJava was used to embed Jmol in IVisTMSA. Dell Vostro 1510 computing machine with MS Windows 7 Professional comprising 3GB RAM and 2.0 GHz Intel processor was used to develop all tools of IVisTMSA.
 Most of the tools of IVisTMSA use the divide and conquer (DnC) approach for performing efficient computations on MSAs. DnC approach has been implemented using power feature of multithreading provided by Java programming language. DnC approach divides an alignment horizontally into sub alignments and Java threads are generated for each sub alignment. All Java threads return the results to the main thread which computes the final value. MSApad uses divide and conquer approach to compute consensus, conserved sequence(s) and distance matrix to construct phylogenetic tree.  
 IVisTMSA has a high impact in the domain of bioinformatics. Now a user can view, edit and analyze very big alignments efficiently. The user can compare several reference and test alignments simultaneously. The user can view consistent and inconsistent regions of two alignments having several thousand sequences. The user can generate alignments using mouse clicks or keyboard through several famous sequence aligners. The user can generate identity matrix of a big alignment very efficiently. The user can convert six popular formats of protein alignment of unlimited size into FASA format.
(A research of Institute of Biochemistry and Biotechnology, UVAS Lahore)

 
 
View More News