PALI

Phylogeny and ALIgnment of homologous protein structures

(Version 3.0)
phylogen

Database Description

The database of Phylogeny and ALIgnment of homologous protein structures (PALI) contains structure-based sequence alignments and dendrograms based on information primarily derived from the structural alignments at domain level [1, 2]. Protein domain decomposition as proposed by SCOPe (Version 2.04) [3, 4] is used in the PALI database (version 3.0). Non-redundant entries obtained based on unique UniProt Ids (excluding mutants) for X-ray (resolution better than 3.0 Å) and NMR (first model) structures corresponding to each SCOP domain family are used for the structural alignments.

There are 2167 multi-member families and 1663 orphans (single-member families) consisting of about 17,300 domains in the current version of PALI. Over 1, 32, 004 pair-wise and 2167 multiple structural alignments have been generated for all multimember families. Every family with at least three members is associated with two dendrograms generated using APE library in R [5], one based on the structural dissimilarity metric (SDM) [6, 7] defined for every pair-wise superposition and the other based on the TM-score [8, 9]. For orphan families, the domain level sequences are provided. Alignments of protein domains of known 3-D structure from PALI integrated with homologous sequences from UniProt (Universal Protein Resource) database [10, 11] are also available for every family in PALI. PSI-BLAST [12] search using a query sequence can be performed against the structural members in PALI, the structural members integrated with the sequence homologues from UniProt database (PALI). All the pair-wise structural superposition were generated using TM-align [13] program. Structure alignment program MultiProt [14] was used to superimpose multiple homologous protein domain structures. A graphical interface (Jmol applet) for every family in PALI to view the structure based multiple alignment and pair-wise structural alignments is also provided.

Recent Developments

In the current version (v3.0) the PALI structural families have been integrated with the sequences from the latest UniProt database (Uniref90, November, 2014). This was achieved in the following two steps: Structure based multiple sequence alignments for each family were queried against Uniref90 using PSI-BLAST for 10 iterations and the hits obtained were filtered subsequently based on 70% query coverage and E-value of 0.0001. Further HMM profiles for each family were generated based on structure based multiple sequence alignment using hmmbuild (HMMER 3.0) [15]. These profiles were used to obtain integrated sequence-structure alignments at the family level using hmmalign (HMMER 3.0) and also using Clustal Omega [16]. For orphan PALI families (1663) subsequent to PSI-BLAST runs the sequences were aligned using MAFFT and Clustal Omega [17]. This entire process from structural alignment to generation of integrated sequence structure alignment has been automated in the current version.

Database development : Rakesh Ramachandran rakesh@mbu.iisc.ernet.in

Project led by : N. Srinivasan ns@mbu.iisc.ernet.in and S.Balaji sbalaji@mbu.iisc.ernet.in

Click here to view the PALI organisation

To cite PALI:

Balaji, S., Sujatha, S., Kumar, S.S.C. and Srinivasan, N. (2001) PALI: A database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res. 29, 61-65. Click for the PDF file of the 2003 paper in the NAR website.


References

1. Balaji, S., Sujatha, S., Kumar, S.S.C. and Srinivasan, N. (2001) PALI: A database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res. 29, 61-65.

2. Gowri, V.S., Pandit S.B., Karthik P.S., Srinivasan, N. & Balaji, S. (2003) Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database Nucleic Acids Res. 31, 486-488.

3. Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536-540.

4. Fox, N, Brenner S.E., Chandonia J.M. (2014) SCOPe: Structural Classification of Proteins –extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 42, D304-309.

5. Paradis, E., Claude, J. & Strimmer, K. (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290.

6. Levitt, M. and Gerstein, M. (1998) A unified structural framework for sequence comparison and structure comparison. Proc. Natl Acad. Sci. USA 95, 5913–5920.

7. Johnson, M.S., Sutcliffe, M.J. and Blundell, T.L. (1990) Molecular anatomy: phyletic relationships derived from three-dimensional structures of proteins. J. Mol. Evol. 1, 43–59.

8. Zhang, Y., Skolnick, J. (2004) Scoring function for automated assessment of protein structure template quality. Proteins 57, 702-710.

9. Xu J., Zhang Y. (2004), How significant is a protein structure similarity with TM-score=0.5? Bioinformatics 26, 889-895. 

10. Apweiler, R,. Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et. al. (2004) UniProt: the Universal Protein Knowledgebase. Nucleic Acids Res. 32, D115-D119.

11. Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et. al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res. 33, D154-159.

12. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhanng, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.

13. Zhang Y., Skolnick J. (2005) TM-align: A protein structure alignment algorithm based on TM-score, Nucleic Acids Res. 33, 2302-2309.

14. Shatsky, M. and Nussinov, R. and Wolfson, H.J. (2004) A method for simultaneous alignment of multiple protein structures. Proteins: Structure, Function, and Bioinformatics. 56 (1), 143-56. 

15. Eddy S. R.(1998). Profile hidden Markov models. Bioinformatics 14, 755–763.

16. Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J., Thompson J.D., Higgins D.G. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology 7, 539.

17. Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acid Res. 30, 3059-3066.