Protein design with fragment databases
Highlights
► Casting protein structures in fragment libraries reveals structural symmetries. ► Current databases cover the known variation in regular secondary structure. ► Exhaustive protein fragment libraries can produce conformational backbone ensembles. ► We describe interactions between polypeptide fragments as recurrent patterns. ► Intramolecular protein architectures can serve as protein-interaction scaffolds.
Introduction
Although the number of single protein structures in the Protein Data Bank (PDB) is steadily increasing [1], experimental methods to determine these intricate atomic constructions have not truly joined the high throughput era [2]. This has reinforced the interest in protein modeling methods as protein structures hold crucial information required to truly understand the function of proteins and protein-interaction networks and enable computational methods such as rational drug design, mutation annotation and protein design. Computational protein design has made significant progress and current approaches can solve simple design tasks with relative ease, such as predicting the effect of single mutations on protein stability or interaction specificity. However, more difficult tasks like accumulating multiple non-conservative mutations, which require adaptive changes to the backbone of the target model are less obvious and computationally expensive [3]. Likewise, de novo design of selective protein-binding peptides in the absence of a reliable template structure remains impossible or very cumbersome at best [4]. In general, the availability of good backbone templates seems to be crucial for most challenging protein design problems, as to most intents and purposes the side chain placement problem has been solved. The key breakthrough came from the insight that amino acid conformations can be simplified into a number of preferred states called rotamers, derived from mining large collections of high quality protein structures. The popularity of the rotamer concept today still, nicely illustrates the power of classifying conformational diversity. Moreover, it has inspired the idea of charting protein structure space into structural catalogs at many different levels. Thus far, protein structures have been decomposed and classified at the level of atomic interactions (e.g. knowledge based potentials [5]), amino acid conformations (the famous rotamer libraries mentioned above [6, 7, 8]), amino acid interactions (e.g. catalytic site atlas [9]), polypeptide fragments [10••] and fragment interactions [11••], protein domains [12] and domain–domain interactions [13]. Given the vast amount and seeming complexity of structural data, relational databases have been the obvious choice to represent these catalogs and facilitating the search for simplifying architectural rules underlying the dazzling diversity of protein structures. At the same time, computational algorithms have become increasingly popular too as they are able to sample multiple conformational states in very efficient ways. These algorithms classically divide the complex task of protein design in two steps, firstly, the search for one or more suitable protein backbone templates to serve as structural scaffolds and secondly, in silico sequence mutagenesis to select optimal side chains for the template, guided by a scoring function that calculates the changes in binding energy. In this review we will mainly focus on the search for optimal backbone scaffolds by properties discovered through fragmentation and classification of protein structure and algorithms that exploit these properties to produce structural scaffolds for protein design (Table 1).
Section snippets
Describing tertiary structure space with a parts-list of fragments
Fragmentation of protein structures is an appealing strategy to reduce the complexity of the structural protein universe to a ‘parts list’ of polypeptide building blocks (see Figure 1a). The power to describe the seemingly continuous combinatorial space of all possible conformations with a discrete alphabet is of great interest to provide leads in de novo protein design applications, mainly because it brings about a great reduction of the size of the conformational landscape that needs to be
Discussion and outlook
In the current review, we have shown that casting protein structures as a collection of polypeptide fragments is an attractive description scale because it balances simplicity while retaining the critical atomic interaction intricacies. It has been shown that a limited set of protein fragments (about 2000 of length 5–10) suffices to capture all the shapes observed in known protein structures and from analysis of recent structures it would seem that this number will remain stable even as more
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest
References (57)
- et al.
Knowledge-based potentials in protein design
Curr Opin Struct Biol
(2006) Rotamer libraries in the 21st century
Curr Opin Struct Biol
(2002)- et al.
Data growth and its impact on the SCOP database: new developments
Nucleic Acids Res
(2008) - et al.
Structural alphabets for protein structure classification: a comparison study
J Mol Biol
(2009) - et al.
Structural characteristics of novel protein folds
PLoS Comput Biol
(2010) - et al.
CC+: a relational database of coiled-coil structures
Nucleic Acids Res
(2009) - et al.
Architectures and functional coverage of protein–protein interfaces
J Mol Biol
(2008) - et al.
Designing ensembles in conformational and sequence space to characterize and engineer proteins
Curr Opin Struct Biol
(2010) - et al.
RosettaBackrub — a web server for flexible backbone protein structure modeling and design
Nucleic Acids Rese
(2010) - et al.
The ASTRAL compendium in 2004
Nucleic Acids Res
(2004)
Novel peptide-mediated interactions derived from high-resolution 3-dimensional structures
PLoS Comput Biol
Computational design of an enzyme catalyst for a stereoselective bimolecular Diels–Alder reaction
Science
The multiple-specificity landscape of modular peptide recognition domains
Mol Syst Biol
Structural genomics is the largest contributor of novel structural leverage
J Struct Funct Genomics
Structural domain–domain interactions: assessment and comparison with protein–protein interaction data to improve the interactome
Proteins: Struct Funct Bioinform
Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details
Protein Eng Des Sel: PEDS
Computational design of peptide ligands
Trends Biotechnol
The penultimate rotamer library
Proteins
Improved prediction of protein side-chain conformations with SCWRL4
Proteins
The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data
Nucleic Acids Res
BriX: a database of protein building blocks for structural analysis, modeling and design
Nucleic Acids Res
protein–peptide interactions adopt the same structural motifs as monomeric protein folds
Struct/Fold Des
3did: identification and classification of domain-based interactions of known three-dimensional structure
Nucleic Acids Res
Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions
J Mol Biol
Reconstruction of protein backbones from the BriX collection of canonical protein fragments
PLoS Comput Biol
FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately
Proc Natl Acad Sci U S A
A supersecondary structure library and search algorithm for modeling loops in protein structures
Nucleic Acids Res
Cited by (27)
Protein structural motifs in prediction and design
2017, Current Opinion in Structural BiologyCitation Excerpt :Another recent study revealed that interacting helical segments in TM and soluble proteins can be described with a small number of shared structural classes, even though there were important differences in sequence preferences and hydrogen bonding patterns [27•]. Recent work has also found considerable modularity at the level of quaternary structure [19,21,22,28–30]. For example, pairs of interacting BRIX fragments from monomeric proteins were used to show that 65% of protein–peptide interactions were similar to structural motifs in monomeric folds [19].
Adaptive Assembly: Maximizing the Potential of a Given Functional Peptide with a Tailor-Made Protein Scaffold
2015, Chemistry and BiologyCitation Excerpt :Rational and/or combinatorial approaches have allowed for the generation of functional proteins for biomedical and industrial applications (Smith and Hecht, 2011). Generating proteins from scratch, however, still remains challenging because of the enormous sequential diversity of a protein; thus de novo protein engineering has been generally performed in a stepwise manner from several components (Blaber and Lee, 2012; Verschueren et al., 2011) such as protein scaffolds (Binz et al., 2005; Gronwall and Stahl, 2009) or functional peptides (Eichler, 2008). Because a number of peptides possess a wide range of functions, protein engineering that exploits those peptides holds great promise for further development of novel functional proteins.
Mining tertiary structural motifs for assessment of designability
2013, Methods in EnzymologyCitation Excerpt :Larger structural motifs, which contain pairs of segments not in contact and free to evolve independently, may not be sampled well either in the PDB or indeed in nature. However, this concern is greatly diminished for compact structural motifs, whose possible geometries are more likely to be well represented in the known structural universe (Fernandez-Fuentes et al., 2010; Grigoryan & Degrado, 2011; Vanhee et al., 2009; Verschueren et al., 2011). Further, even without any assumptions on the saturation of the PDB, if we do observe a motif to be highly recurrent, it is very likely designable.
Recent advances in rational approaches for enzyme engineering
2012, Computational and Structural Biotechnology JournalCitation Excerpt :Recently, the thermostability of a Baeyer-Villiger monooxygenase (BVMO) phenylacetone monooxygenase was combined with the broader substrate range of other BVMOs by structure-guided subdomain exchange [55]. Interested readers can find more information about ‘Protein design with fragment databases’ in the recent review by Verschueren and coauthors [56]. Generally, enzymes can be promiscuous concerning their reaction conditions (reaction condition promiscuity), their substrate range (substrate promiscuity), show an additional activity in the same active site (catalytic promiscuity) or due to a second active site (alternate site promiscuity) [57].
De novo protein fold design through sequence-independent fragment assembly simulations
2023, Proceedings of the National Academy of Sciences of the United States of America