Phylogeny and ALIgnment of homologous protein structures

(Version 3.0)

Database Description

The database of Phylogeny and ALIgnment of homologous protein structures (PALI) contains structure-based sequence alignments and dendrograms based on information primarily derived from the structural alignments at domain level [1, 2]. Protein domain decomposition as proposed by SCOPe (Version 2.04) [3, 4] is used in the PALI database (version 3.0). Non-redundant entries obtained based on unique UniProt Ids (excluding mutants) for X-ray (resolution better than 3.0 Å) and NMR (first model) structures corresponding to each SCOP domain family are used for the structural alignments.

There are 2167 multi-member families and 1663 orphans (single-member families) consisting of about 17,300 domains in the current version of PALI. Over 1, 32, 004 pair-wise and 2167 multiple structural alignments have been generated for all multimember families. Every family with at least three members is associated with two dendrograms generated using APE library in R [5], one based on the structural dissimilarity metric (SDM) [6, 7] defined for every pair-wise superposition and the other based on the TM-score [8, 9]. For orphan families, the domain level sequences are provided. Alignments of protein domains of known 3-D structure from PALI integrated with homologous sequences from UniProt (Universal Protein Resource) database [10, 11] are also available for every family in PALI. PSI-BLAST [12] search using a query sequence can be performed against the structural members in PALI, the structural members integrated with the sequence homologues from UniProt database (PALI). All the pair-wise structural superposition were generated using TM-align [13] program. Structure alignment program MultiProt [14] was used to superimpose multiple homologous protein domain structures. A graphical interface (Jmol applet) for every family in PALI to view the structure based multiple alignment and pair-wise structural alignments is also provided.

Recent Developments

In the current version (v3.0) the PALI structural families have been integrated with the sequences from the latest UniProt database (Uniref90, November, 2014). This was achieved in the following two steps: Structure based multiple sequence alignments for each family were queried against Uniref90 using PSI-BLAST for 10 iterations and the hits obtained were filtered subsequently based on 70% query coverage and E-value of 0.0001. Further HMM profiles for each family were generated based on structure based multiple sequence alignment using hmmbuild (HMMER 3.0) [15]. These profiles were used to obtain integrated sequence-structure alignments at the family level using hmmalign (HMMER 3.0) and also using Clustal Omega [16]. For orphan PALI families (1663) subsequent to PSI-BLAST runs the sequences were aligned using MAFFT and Clustal Omega [17]. This entire process from structural alignment to generation of integrated sequence structure alignment has been automated in the current version.

