Lipsh-Sokolik R. & Fleishman S. J.
(2024)
Proceedings of the National Academy of Sciences.
121,
34,
e231499912.
Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional activesite variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.
King L. D., Pulido D., Barrett J. R., Davies H., Quinkert D., Lias A. M., Silk S. E., Pattinson D. J., Diouf A., Williams B. G., McHugh K., Rodrigues A., Rigby C. A., Strazza V., Suurbaar J., Rees-Spear C., Dabbs R. A., Ishizuka A. S., Zhou Y., Gupta G., Jin J., Li Y., Carnrot C., Minassian A. M., Campeotto I., Fleishman S. J., Noe A. R., MacGill R. S., King C. R., Birkett A. J., Soisson L. A., Long C. A., Miura K., Ashfield R., Skinner K., Howarth M. R., Biswas S. & Draper S. J.
(2024)
Cell Reports Medicine.
5,
7,
101654.
Plasmodium falciparum reticulocyte-binding protein homolog 5 (RH5) is a leading blood-stage malaria vaccine antigen target, currently in a phase 2b clinical trial as a full-length soluble protein/adjuvant vaccine candidate called RH5.1/Matrix-M. We identify that disordered regions of the full-length RH5 molecule induce non-growth inhibitory antibodies in human vaccinees and that a re-engineered and stabilized immunogen (including just the alpha-helical core of RH5) induces a qualitatively superior growth inhibitory antibody response in rats vaccinated with this protein formulated in Matrix-M adjuvant. In parallel, bioconjugation of this immunogen, termed \u201cRH5.2,\u201d to hepatitis B surface antigen virus-like particles (VLPs) using the \u201cplug-and-display\u201d SpyTag-SpyCatcher platform technology also enables superior quantitative antibody immunogenicity over soluble protein/adjuvant in vaccinated mice and rats. These studies identify a blood-stage malaria vaccine candidate that may improve upon the current leading soluble protein vaccine candidate RH5.1/Matrix-M. The RH5.2-VLP/Matrix-M vaccine candidate is now under evaluation in phase 1a/b clinical trials.
Weinstein J. J., Saikia C., Karbat I., Goldenzweig A., Reuveny E. & Fleishman S. J.
(2024)
Protein Science.
33,
6,
e4995.
Membrane proteins play critical physiological roles as receptors, channels, pumps, and transporters. Despite their importance, however, low expression levels often hamper the experimental characterization of membrane proteins. We present an automated and web-accessible design algorithm called mPROSS (https://mPROSS.weizmann.ac.il), which uses phylogenetic analysis and an atomistic potential, including an empirical lipophilicity scale, to improve native-state energy. As a stringent test, we apply mPROSS to the Kv1.2–Kv2.1 paddle chimera voltage-gated potassium channel. Four designs, encoding 9–26 mutations relative to the parental channel, were functional and maintained potassium-selective permeation and voltage dependence in Xenopus oocytes with up to 14-fold increase in whole-cell current densities. Additionally, single-channel recordings reveal no significant change in the channel-opening probability nor in unitary conductance, indicating that functional expression levels increase without impacting the activity profile of individual channels. Our results suggest that the expression levels of other dynamic channels and receptors may be enhanced through one-shot design calculations.
Münch J., Dietz N., Barber-Zucker S., Seifert F., Matschi S., Püllmann P., Fleishman S. J. & Weissenborn M. J.
(2024)
ACS Catalysis.
14,
7,
p. 4738-4748
Unspecific peroxygenases (UPOs) are fungal enzymes that attract significant attention for their ability to perform versatile oxyfunctionalization reactions using H2O2. Unlike other oxygenases, UPOs do not require additional reductive equivalents or electron transfer chains that complicate basic and applied research. Nevertheless, UPOs generally exhibit low to no heterologous production levels and only four UPO structures have been determined to date by crystallography limiting their usefulness and obstructing research. To overcome this bottleneck, we implemented a workflow that applies PROSS stability design to AlphaFold2 model structures of 10 unique and diverse UPOs followed by a signal peptide shuffling to enable heterologous production. Nine UPOs were functionally produced in Pichia pastoris, including the recalcitrant CciUPO and three UPOs derived from oomycetes─the first nonfungal UPOs to be experimentally characterized. We conclude that the high accuracy and reliability of new modeling and design workflows dramatically expand the pool of enzymes for basic and applied research.
Listov D., Goverde C. A., Correia B. E. & Fleishman S. J.
(2024)
Nature Reviews Molecular Cell Biology.
25,
8,
p. 639-653
The field of protein design has made remarkable progress over the past decade. Historically, the low reliability of purely structure-based design methods limited their application, but recent strategies that combine structure-based and sequence-based calculations, as well as machine learning tools, have dramatically improved protein engineering and design. In this Review, we discuss how these methods have enabled the design of increasingly complex structures and therapeutically relevant activities. Additionally, protein optimization methods have improved the stability and activity of complex eukaryotic proteins. Thanks to their increased reliability, computational design methods have been applied to improve therapeutics and enzymes for green chemistry and have generated vaccine antigens, antivirals and drug-delivery nano-vehicles. Moreover, the high success of design methods reflects an increased understanding of basic rules that govern the relationships among protein sequence, structure and function. However, de novo design is still limited mostly to α-helix bundles, restricting its potential to generate sophisticated enzymes and diverse protein and small-molecule binders. Designing complex protein structures is a challenging but necessary next step if we are to realize our objective of generating new-to-nature activities.
King L. D. W., Pulido D., Barrett J. R., Davies H., Quinkert D., Lias A. M., Silk S. E., Pattinson D. J., Diouf A., Williams B. G., McHugh K., Rodrigues A., Rigby C. A., Strazza V., Suurbaar J., Rees-Spear C., Dabbs R. A., Ishizuka A. S., Zhou Y., Gupta G., Jin J., Li Y., Carnrot C., Minassian A. M., Campeotto I., Fleishman S. J., Noe A. R., MacGill R. S., King C. R., Birkett A. J., Soisson L. A., Long C. A., Miura K., Ashfield R., Skinner K., Howarth M., Biswas S. & Draper S. J.
(2024)
BioRxiv.
The development of a highly effective vaccine against the pathogenic blood-stage infection of human malaria will require a delivery platform that can induce an antibody response of both maximal quantity and functional quality. One strategy to achieve this includes presenting antigens to the immune system on virus-like particles (VLPs). Here we sought to improve the design and delivery of the blood-stage Plasmodium falciparum reticulocyte-binding protein homolog 5 (RH5) antigen, which is currently in a Phase 2 clinical trial as a full-length soluble protein-in-adjuvant vaccine candidate called RH5.1/Matrix-M™. We identify disordered regions of the full-length RH5 molecule induce non-growth inhibitory antibodies in human vaccinees, and a re-engineered and stabilized immunogen that includes just the alpha-helical core of RH5 induces a qualitatively superior growth-inhibitory antibody response in rats vaccinated with this protein formulated in Matrix-M™ adjuvant. In parallel, bioconjugation of this new immunogen, termed textquotedblleftRH5.2textquotedblright, to hepatitis B surface antigen VLPs using the textquotedblleftplug-and-displaytextquotedblright SpyTag-SpyCatcher platform technology also enabled superior quantitative antibody immunogenicity over soluble antigen/adjuvant in vaccinated mice and rats. These studies identify a new blood-stage malaria vaccine candidate that may improve upon the current leading soluble protein vaccine candidate RH5.1/Matrix-M™. The RH5.2-VLP/Matrix-M™ vaccine candidate is now under evaluation in Phase 1a/b clinical trials.Competing Interest StatementSJD is an inventor on patent applications relating to RH5 malaria vaccines and antibodies; is a co-founder of and shareholder in SpyBiotech; and has been a consultant to GSK on malaria vaccines. AMM has been a consultant to GSK on malaria vaccines; and has an immediate family member who is an inventor on patent applications relating to RH5 malaria vaccines and antibodies and is a co-founder of and shareholder in SpyBiotech. MH is an inventor on patents relating to peptide targeting via spontaneous amide bond formation, and is a co-founder of and shareholder in SpyBiotech. SB is an inventor on patent applications relating to vaccines made using spontaneous amide bond formation and is a co-founder of, shareholder in and employee of SpyBiotech. JJ is an inventor on patent applications relating to vaccines made using spontaneous amide bond formation and is a co-founder of and shareholder in SpyBiotech. RAD is an inventor on patent applications relating to vaccines made using spontaneous amide bond formation and shareholder in SpyBiotech. LDWK, JRB, DQ, AML, SES, BGW, KMc, IC, SJF and DP are inventors on patent applications relating to RH5 malaria vaccines and/or antibodies. All other authors have declared that no conflict of interest exists.
Tennenhouse A., Khmelnitsky L., Khalaila R., Yeshaya N., Noronha A., Lindzen M., Makowski E. K., Zaretsky I., Sirkis Y. F., Galon-Wolfenson Y., Tessier P. M., Abramson J., Yarden Y., Fass D. & Fleishman S. J.
(2024)
Nature Biomedical Engineering.
8,
1,
p. 30-44
Conventional methods for humanizing animal-derived antibodies involve grafting their complementarity-determining regions onto homologous human framework regions. However, this process can substantially lower antibody stability and antigen-binding affinity, and requires iterative mutational fine-tuning to recover the original antibody properties. Here we report a computational method for the systematic grafting of animal complementarity-determining regions onto thousands of human frameworks. The method, which we named CUMAb (for computational human antibody design; available at http://CUMAb.weizmann.ac.il), starts from an experimental or model antibody structure and uses Rosetta atomistic simulations to select designs by energy and structural integrity. CUMAb-designed humanized versions of five antibodies exhibited similar affinities to those of the parental animal antibodies, with some designs showing marked improvement in stability. We also show that (1) non-homologous frameworks are often preferred to highest-homology frameworks, and (2) several CUMAb designs that differ by dozens of mutations and that use different human frameworks are functionally equivalent.
Zelnik I. D., Mestre B., Weinstein J. J., Dingjan T., Izrailov S., Ben-Dor S., Fleishman S. J. & Futerman A. H.
(2023)
Nature Communications.
14,
1,
2330.
Until now, membrane-protein stabilization has relied on iterations of mutations and screening. We now validate a one-step algorithm, mPROSS, for stabilizing membrane proteins directly from an AlphaFold2 model structure. Applied to the lipid-generating enzyme, ceramide synthase, 37 designed mutations lead to a more stable form of human CerS2. Together with molecular dynamics simulations, we propose a pathway by which substrates might be delivered to the ceramide synthases.
Füzesi-Levi M. G., Ben-Nissan G., Listov D., Fridmann Sirkis Y., Hayouka Z., Fleishman S. & Sharon M.
(2023)
Life Science Alliance.
6,
10,
e202201634.
Protein degradation is one of the essential mechanisms that enables reshaping of the proteome landscape in response to various stimuli. The largest E3 ubiquitin ligase family that targets proteins to degradation by catalyzing ubiquitination is the cullin-RING ligases (CRLs). Many of the proteins that are regulated by CRLs are central to tumorigenesis and tumor progression, and dysregulation of the CRL family is frequently associated with cancer. The CRL family comprises ∼300 complexes, all of which are regulated by the COP9 signalosome complex (CSN). Therefore, CSN is considered an attractive target for therapeutic intervention. Research efforts for targeted CSN inhibition have been directed towards inhibition of the complex enzymatic subunit, CSN5. Here, we have taken a fresh approach focusing on CSNAP, the smallest CSN subunit. Our results show that the C-terminal region of CSNAP is tightly packed within the CSN complex, in a groove formed by CSN3 and CSN8. We show that a 16 amino acid C-terminal peptide, derived from this CSN-interacting region, can displace the endogenous CSNAP subunit from the complex. This, in turn, leads to a CSNAP null phenotype that attenuates CSN activity and consequently CRLs function. Overall, our findings emphasize the potential of a CSNAP-based peptide for CSN inhibition as a new therapeutic avenue.
Khersonsky O., Goldsmith M., Zaretsky I., Hamer-Rogotner S., Dym O., Unger T., Yona M., Fridmann-Sirkis Y. & Fleishman S. J.
(2023)
Journal of Molecular Biology.
435,
17,
168191.
Albumin is the most abundant protein in the blood serum of mammals and has essential carrier and physiological roles. Albumins are also used in a wide variety of molecular and cellular experiments and in the cultivated meat industry. Despite their importance, however, albumins are challenging for heterologous expression in microbial hosts, likely due to 17 conserved intramolecular disulfide bonds. Therefore, albumins used in research and biotechnological applications either derive from animal serum, despite severe ethical and reproducibility concerns, or from recombinant expression in yeast or rice. We use the PROSS algorithm to stabilize human and bovine serum albumins, finding that all are highly expressed in E. coli. Design accuracy is verified by crystallographic analysis of a human albumin variant with 16 mutations. This albumin variant exhibits ligand binding properties similar to those of the wild type. Remarkably, a design with 73 mutations relative to human albumin exhibits over 40 °C improved stability and is stable beyond the boiling point of water. Our results suggest that proteins with many disulfide bridges have the potential to exhibit extreme stability when subjected to design. The designed albumins may be used to make economical, reproducible, and animal-free reagents for molecular and cell biology. They also open the way to high-throughput screening to study and enhance albumin carrier properties.
Pokorna S., Khersonsky O., Lipsh-Sokolik R., Goldenzweig A., Nielsen R., Ashani Y., Peleg Y., Unger T., Albeck S., Dym O., Tirosh A., Tarayra R., Hocquemiller M., Laufer R., Ben-Dor S., Silman I., Sussman J. L., Fleishman S. J. & Futerman A. H.
(2023)
FEBS Journal.
290,
13,
p. 3383-3399
Acid-β-glucosidase (GCase, EC 3.2.1.45), the lysosomal enzyme which hydrolyzes the simple glycosphingolipid, glucosylceramide (GlcCer), is encoded by the GBA1 gene. Biallelic mutations in GBA1 cause the human inherited metabolic disorder, Gaucher disease (GD), in which GlcCer accumulates, while heterozygous GBA1 mutations are the highest genetic risk factor for Parkinson's disease (PD). Recombinant GCase (e.g., Cerezyme®) is produced for use in enzyme replacement therapy for GD and is largely successful in relieving disease symptoms, except for the neurological symptoms observed in a subset of patients. As a first step towards developing an alternative to the recombinant human enzymes used to treat GD, we applied the PROSS stability-design algorithm to generate GCase variants with enhanced stability. One of the designs, containing 55 mutations compared to wild type human GCase, exhibits improved secretion and thermal stability. Furthermore, the design has higher enzymatic activity than the clinically used human enzyme when incorporated into an AAV vector, resulting in a larger decrease in the accumulation of lipid substrates in cultured cells. Based on stability-design calculations, we also developed a machine-learning based approach to distinguish benign from deleterious (i.e., disease-causing) GBA1 mutations. This approach gave remarkably accurate predictions of the enzymatic activity of single nucleotide polymorphisms in the GBA1 gene that are not currently associated with GD or PD. This latter approach could be applied to other diseases to determine risk factors in patients carrying rare mutations.
Deshmukh F. K., Ben-Nissan G., Olshina M. A., Füzesi-Levi M. G., Polkinghorn C., Arkind G., Leushkin Y., Fainer I., Fleishman S. J., Tawfik D. & Sharon M.
(2023)
Nature Communications.
14,
1,
3126.
Controlled degradation of proteins is necessary for ensuring their abundance and sustaining a healthy and accurately functioning proteome. One of the degradation routes involves the uncapped 20S proteasome, which cleaves proteins with a partially unfolded region, including those that are damaged or contain intrinsically disordered regions. This degradation route is tightly controlled by a recently discovered family of proteins named Catalytic Core Regulators (CCRs). Here, we show that CCRs function through an allosteric mechanism, coupling the physical binding of the PSMB4 β-subunit with attenuation of the complex's three proteolytic activities. In addition, by dissecting the structural properties that are required for CCR-like function, we could recapitulate this activity using a designed protein that is half the size of natural CCRs. These data uncover an allosteric path that does not involve the proteasome's enzymatic subunits but rather propagates through the non-catalytic subunit PSMB4. This way of 20S proteasome-specific attenuation opens avenues for decoupling the 20S and 26S proteasome degradation pathways as well as for developing selective 20S proteasome inhibitors.
Weinstein J. Y., Martí-Gómez C., Lipsh-Sokolik R., Hoch S. Y., Liebermann D., Nevo R., Weissman H., Petrovich-Kopitman E., Margulies D., Ivankov D., McCandlish D. M. & Fleishman S. J.
(2023)
Nature Communications.
14,
2890.
Mutations in a protein active site can lead to dramatic and useful changes in protein activity. The active site, however, is sensitive to mutations due to a high density of molecular interactions, substantially reducing the likelihood of obtaining functional multipoint mutants. We introduce an atomistic and machine-learning-based approach, called high-throughput Functional Libraries (htFuncLib), that designs a sequence space in which mutations form low-energy combinations that mitigate the risk of incompatible interactions. We apply htFuncLib to the GFP chromophore-binding pocket, and, using fluorescence readout, recover >16,000 unique designs encoding as many as eight active-site mutations. Many designs exhibit substantial and useful diversity in functional thermostability (up to 96 °C), fluorescence lifetime, and quantum yield. By eliminating incompatible active-site mutations, htFuncLib generates a large diversity of functional sequences. We envision that htFuncLib will be used in one-shot optimization of activity in enzymes, binders, and other proteins.
Duart G., Elazar A., Weinstein J. Y., Gadea-Salom L., Ortiz-Mateu J., Fleishman S. J., Mingarro I. & Martinez-Gil L.
(2023)
Proceedings of the National Academy of Sciences of the United States of America.
120,
11,
e221964812.
Several methods have been developed to explore interactions among water-soluble proteins or regions of proteins. However, techniques to target transmembrane domains (TMDs) have not been examined thoroughly despite their importance. Here, we developed a computational approach to design sequences that specifically modulate protein-protein interactions in the membrane. To illustrate this method, we demonstrated that BclxL can interact with other members of the B cell lymphoma 2 (Bcl2) family through the TMD and that these interactions are required for BclxL control of cell death. Next, we designed sequences that specifically recognize and sequester the TMD of BclxL. Hence, we were able to prevent BclxL intramembrane interactions and cancel its antiapoptotic effect. These results advance our understanding of protein-protein interactions in membranes and provide a means to modulate them. Moreover, the success of our approach may trigger the development of a generation of inhibitors targeting interactions between TMDs.
Gomez de Santos P., Mateljak I., Hoang M. D., Fleishman S. J., Hollmann F. & Alcalde M.
(2023)
Journal of the American Chemical Society.
145,
6,
p. 3443-3453
The generation of enantiodivergent biocatalysts for C-H oxyfunctionalizations is ever more important in modern synthetic chemistry. Here, we have applied the FuncLib algorithm based on phylogenetic and Rosetta calculations to design a diverse repertoire of active, stable, and enantiodivergent fungal peroxygenases. 24 designs, each carrying 4-5 mutations in the catalytic core, were expressed functionally in yeast and benchmarked against characteristic model compounds. Several designs were active and stable in a range of temperature and pH, displaying unprecedented enantiodivergence, changing regioselectivity from alkyl to aromatic hydroxylation, and increasing catalytic efficiencies up to 10-fold, with 15-fold improvements in total turnover numbers over the parental enzyme. We find that this dramatic functional divergence stems from beneficial epistasis among the mutations and an extensive reorganization of the heme channel. Our work demonstrates that FuncLib can rapidly design highly functional libraries enriched in enantioselective peroxygenases not seen in nature for a range of biotechnological applications.
Lipsh-Sokolik R., Khersonsky O., Schröder S. P., de Boer C., Hoch S., Davies G. J., Overkleeft H. S. & Fleishman S. J.
(2023)
Science.
379,
6628,
p. 195-201
The design of structurally diverse enzymes is constrained by long-range interactions that are necessary for accurate folding. We introduce an atomistic and machine learning strategy for the combinatorial assembly and design of enzymes (CADENZ) to design fragments that combine with one another to generate diverse, low-energy structures with stable catalytic constellations. We applied CADENZ to endoxylanases and used activity-based protein profiling to recover thousands of structurally diverse enzymes. Functional designs exhibit high active-site preorganization and more stable and compact packing outside the active site. Implementing these lessons into CADENZ led to a 10-fold improved hit rate and more than 10,000 recovered enzymes. This design-test-learn loop can be applied, in principle, to any modular protein family, yielding huge diversity and general lessons on protein design principles.
Barber-Zucker S., Mateljak I., Goldsmith M., Kupervaser M., Alcalde M. & Fleishman S. J.
(2022)
ACS Catalysis.
12,
21,
p. 13164-13173
White-rot fungi secrete an impressive repertoire of high-redox potential laccases (HRPLs) and peroxidases for efficient oxidation and utilization of lignin. Laccases are attractive enzymes for the chemical industry due to their broad substrate range and low environmental impact. Since expression of functional recombinant HRPLs is challenging, however, iterative-directed evolution protocols have been applied to improve their expression, activity, and stability. We implement a rational, stabilize-and-diversify strategy to two HRPLs that we could not functionally express. First, we use the PROSS stability-design algorithm to allow functional expression in yeast. Second, we use the stabilized enzymes as starting points for FuncLib active-site design to improve their activity and substrate diversity. Four of the FuncLib-designed HRPLs and their PROSS progenitor exhibit substantial diversity in reactivity profiles against high-redox potential substrates, including lignin monomers. Combinations of 3-4 subtle mutations that change the polarity, solvation, and sterics of the substrate-oxidation site result in orders of magnitude changes in reactivity profiles. These stable and versatile HRPLs are a step toward generating an effective lignin-degrading consortium of enzymes that can be secreted from yeast. The stabilize-and diversify strategy can be applied to other challenging enzyme families to study and expand the utility of natural enzymes.
Cohen-Dvashi H., Weinstein J., Katz M., Ashkenazy-Eilon M., Mor Y., Shimon A., Achdout H., Tamir H., Israely T., Strobelt R., Shemesh M., Stoler-Barak L., Shulman Z., Paran N., Fleishman S. J. & Diskin R.
(2022)
iScience.
25,
10,
105193.
Blocking the interaction of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with its angiotensin-converting enzyme 2 (ACE2) receptor was proved to be an effective therapeutic option. Various protein binders as well as monoclonal antibodies that effectively target the receptor-binding domain (RBD) of SARS-CoV-2 to prevent interaction with ACE2 were developed. The emergence of SARS-CoV-2 variants that accumulate alterations in the RBD can severely affect the efficacy of such immunotherapeutic agents, as is indeed the case with Omicron that resists many of the previously isolated monoclonal antibodies. Here, we evaluate an ACE2-based immunoadhesin that we have developed early in the pandemic against some of the recent variants of concern (VoCs), including the Delta and the Omicron variants. We show that our ACE2-immunoadhesin remains effective in neutralizing these variants, suggesting that immunoadhesin-based immunotherapy is less prone to escape by the virus and has a potential to remain effective against future VoCs.
Marciano S., Dey D., Listov D., Fleishman S. J., Sonn-Segev A., Mertens H., Busch F., Kim Y., Harvey S. R., Wysocki V. H. & Schreiber G.
(2022)
Chemical Science.
13,
39,
p. 11680-11695
Over half the proteins in the E. coli cytoplasm form homo or hetero-oligomeric structures. Experimentally determined structures are often considered in determining a protein's oligomeric state, but static structures miss the dynamic equilibrium between different quaternary forms. The problem is exacerbated in homo-oligomers, where the oligomeric states are challenging to characterize. Here, we re-evaluated the oligomeric state of 17 different bacterial proteins across a broad range of protein concentrations and solutions by native mass spectrometry (MS), mass photometry (MP), size exclusion chromatography (SEC), and small-angle X-ray scattering (SAXS), finding that most exhibit several oligomeric states. Surprisingly, some proteins did not show mass-action driven equilibrium between the oligomeric states. For approximately half the proteins, the predicted oligomeric forms described in publicly available databases underestimated the complexity of protein quaternary structures in solution. Conversely, AlphaFold multimer provided an accurate description of the potential multimeric states for most proteins, suggesting that it could help resolve uncertainties on the solution state of many proteins.
Listov D., Lipsh R., Rosset S. R., Yang C., Correia B. E. & Fleishman S.
(2022)
Protein science : a publication of the Protein Society.
31,
9,
e4400.
Recent advances in protein-design methodology have led to a dramatic increase in reliability and scale. With these advances, dozens and even thousands of designed proteins are automatically generated and screened. Nevertheless, the success rate, particularly in design of functional proteins, is low and fundamental goals such as reliable de novo design of efficient enzymes remain beyond reach. Experimental analyses have consistently indicated that a major reason for design failure is inaccuracy and misfolding relative to the design conception. To address this challenge, we describe complementary methods to diagnose and ameliorate suboptimal regions in designed proteins: first, we develop a Rosetta atomistic computational mutation scanning approach to detect energetically suboptimal positions in designs (available on a web server ); second, we demonstrate that AlphaFold2 ab initio structure prediction flags regions that may misfold in designed enzymes and binders; and third, we focus FuncLib design calculations on suboptimal positions in a previously designed low-efficiency enzyme, improving its catalytic efficiency by 330-fold. Furthermore, applied to a de novo designed protein that exhibited limited stability, the same approach markedly improved stability and expressibility. Thus, foldability analysis and enhancement may dramatically increase the success rate in design of functional proteins.
Lv Y., Zheng S., Goldenzweig A., Liu F., Gao Y., Yang X., Kandale A., McGeary R. P., Williams S., Kobe B., Schembri M. A., Landsberg M. J., Wu B., Brück T. B., Sieber V., Boden M., Rao Z., Fleishman S. J., Schenk G. & Guddat L. W.
(2022)
Applied Biosciences.
1,
2,
p. 163-178
The branched-chain amino acids (BCAAs) leucine, isoleucine and valine are synthesized via a common biosynthetic pathway. Ketol-acid reductoisomerase (KARI) is the second enzyme in this pathway. In addition to its role in BCAA biosynthesis, KARI catalyzes two rate-limiting steps that are key components of a cell-free biofuel biosynthesis route. For industrial applications, reaction temperature and enzyme stability are key factors that affect process robustness and product yield. Here, we have solved the cryo-EM structure (2.94 Å resolution) of a homododecameric Class I KARI (from Campylobacter jejuni) and demonstrated how a triad of amino acid side chains plays a crucial role in promoting the oligomerization of this enzyme. Importantly, both its thermal and solvent stability are greatly enhanced in the dodecameric state when compared to its dimeric counterpart (apparent melting temperatures (Tm) of 83.1 °C and 51.5 °C, respectively). We also employed protein design (PROSS) for a tetrameric Class II KARI (from Escherichia coli) to generate a variant with improved thermal and solvent stabilities. In total, 34 mutations were introduced, which did not affect the oligomeric state of this enzyme but resulted in a fully functional catalyst with a significantly elevated Tm (58.5 °C vs. 47.9 °C for the native version).
Allouche-Arnon H., Khersonsky O., Tirukoti N. D., Peleg Y., Dym O., Albeck S., Brandis A., Mehlman T., Avram L., Harris T., Yadav N. N., Fleishman S. J. & Bar-Shir A.
(2022)
Nature biotechnology.
40,
7,
p. 1143-1149
Imaging of gene-expression patterns in live animals is difficult to achieve with fluorescent proteins because tissues are opaque to visible light. Imaging of transgene expression with magnetic resonance imaging (MRI), which penetrates to deep tissues, has been limited by single reporter visualization capabilities. Moreover, the low-throughput capacity of MRI limits large-scale mutagenesis strategies to improve existing reporters. Here we develop an MRI system, called GeneREFORM, comprising orthogonal reporters for two-color imaging of transgene expression in deep tissues. Starting from two promiscuous deoxyribonucleoside kinases, we computationally designed highly active, orthogonal enzymes ('reporter genes') that specifically phosphorylate two MRI-detectable synthetic deoxyribonucleosides ('reporter probes'). Systemically administered reporter probes exclusively accumulate in cells expressing the designed reporter genes, and their distribution is displayed as pseudo-colored MRI maps based on dynamic proton exchange for noninvasive visualization of transgene expression. We envision that future extensions of GeneREFORM will pave the way to multiplexed deep-tissue mapping of gene expression in live animals.
D V. P., Giulia R., L M. J., J S. S., P G. A., Mitchell B., A R. S., Olga K., J F. S., Aleksandra F. & Oliver R.
(2022)
Nature Communications.
13,
3023.
The ability to alter the genomes of living cells is key to understanding how genes influence the functions of organisms and will be critical to modify living systems for useful purposes. However, this promise has long been limited by the technical challenges involved in genetic engineering. Recent advances in gene editing have bypassed some of these challenges but they are still far from ideal. Here we use FuncLib to computationally design Cas9 enzymes with substantially higher donor-independent editing activities. We use genetic circuits linked to cell survival in yeast to quantify Cas9 activity and discover synergistic interactions between engineered regions. These hyperactive Cas9 variants function efficiently in mammalian cells and introduce larger and more diverse pools of insertions and deletions into targeted genomic regions, providing tools to enhance and expand the possible applications of CRISPR-based gene editing.
Lassa virus (LASV) is a human pathogen, causing substantial morbidity and mortality1,2. Similar to other Arenaviridae, it presents a class-I spike complex on its surface that facilitates cell entry. The virus’s cellular receptor is matriglycan, a linear carbohydrate that is present on α-dystroglycan3,4, but the molecular mechanism that LASV uses to recognize this glycan is unknown. In addition, LASV and other arenaviruses have a unique signal peptide that forms an integral and functionally important part of the mature spike5,6,7,8; yet the structure, function and topology of the signal peptide in the membrane remain uncertain9,10,11. Here we solve the structure of a complete native LASV spike complex, finding that the signal peptide crosses the membrane once and that its amino terminus is located in the extracellular region. Together with a double-sided domain-switching mechanism, the signal peptide helps to stabilize the spike complex in its native conformation. This structure reveals that the LASV spike complex is preloaded with matriglycan, suggesting the mechanism of binding and rationalizing receptor recognition by α-dystroglycan-tropic arenaviruses. This discovery further informs us about the mechanism of viral egress and may facilitate the rational design of novel therapeutics that exploit this binding site.
Barber-Zucker S., Mindel V., Garcia-Ruiz E., Weinstein J. J., Alcalde M. & Fleishman S. J.
(2022)
Journal of the American Chemical Society.
144,
8,
p. 3564-3571
White-rot fungi secrete a repertoire of high-redox potential oxidoreductases to efficiently decompose lignin. Of these enzymes, versatile peroxidases (VPs) are the most promiscuous biocatalysts. VPs are attractive enzymes for research and industrial use but their recombinant production is extremely challenging. To date, only a single VP has been structurally characterized and optimized for recombinant functional expression, stability, and activity. Computational enzyme optimization methods can be applied to many enzymes in parallel but they require accurate structures. Here, we demonstrate that model structures computed by deep-learning-based structure prediction methods are reliable starting points for one-shot PROSS stability-design calculations. Four designed VPs encoding as many as 43 mutations relative to the wildtype enzymes are functionally expressed in yeast, whereas their wildtype parents are not. Three of these designs exhibit substantial and useful diversity in their reactivity profiles and tolerance to environmental conditions. The reliability of the new generation of structure predictors and design methods increases the scale and scope of computational enzyme optimization, enabling efficient discovery and exploitation of the functional diversity in natural enzyme families directly from genomic databases.
Mechaly A., Diamant E., Alcalay R., Ben David A., Dor E., Torgeman A., Barnea A., Girshengorn M., Levin L., Epstein E., Tennenhouse A., Fleishman S. J., Zichel R. & Mazor O.
(2022)
Antibodies (Basel).
11,
1,
21.
Botulinum neurotoxin type E (BoNT/E), the fastest acting toxin of all BoNTs, cleaves the 25 kDa synaptosomal-associated protein (SNAP-25) in motor neurons, leading to flaccid paralysis. The specific detection and quantification of the BoNT/E-cleaved SNAP-25 neoepitope can facilitate the development of cell-based assays for the characterization of anti-BoNT/E antibody preparations. In order to isolate highly specific monoclonal antibodies suitable for the in vitro immuno-detection of the exposed neoepitope, mice and rabbits were immunized with an eight amino acid peptide composed of the C-terminus of the cleaved SNAP-25. The immunized rabbits developed a specific and robust polyclonal antibody response, whereas the immunized mice mostly demonstrated a weak antibody response that could not discriminate between the two forms of SNAP-25. An immune scFv phage-display library was constructed from the immunized rabbits and a panel of antibodies was isolated. The sequence alignment of the isolated clones revealed high similarity between both heavy and light chains with exceptionally short HCDR3 sequences. A chimeric scFv-Fc antibody was further expressed and characterized, exhibiting a selective, ultra-high affinity (pM) towards the SNAP-25 neoepitope. Moreover, this antibody enabled the sensitive detection of cleaved SNAP-25 in BoNT/E treated SiMa cells with no cross reactivity with the intact SNAP-25. Thus, by applying an immunization and selection procedure, we have isolated a novel, specific and high-affinity antibody against the BoNT/E-derived SNAP-25 neoepitope. This novel antibody can be applied in in vitro assays that determine the potency of antitoxin preparations and reduce the use of laboratory animals for these purposes.
Nikitin D., Mican J., Toul M., Bednar D., Peskova M., Kittova P., Thalerova S., Vitecek J., Damborsky J., Mikulik R., Fleishman S. J., Prokop Z. & Marek M.
(2022)
Computational and Structural Biotechnology Journal.
20,
p. 1366-1377
Cardio- and cerebrovascular diseases are leading causes of death and disability, resulting in one of the highest socio-economic burdens of any disease type. The discovery of bacterial and human plasminogen activators and their use as thrombolytic drugs have revolutionized treatment of these pathologies. Fibrin-specific agents have an advantage over non-specific factors because of lower rates of deleterious side effects. Specifically, staphylokinase (SAK) is a pharmacologically attractive indirect plasminogen activator protein of bacterial origin that forms stoichiometric noncovalent complexes with plasmin, promoting the conversion of plasminogen into plasmin. Here we report a computer-assisted re-design of the molecular surface of SAK to increase its affinity for plasmin. A set of computationally designed SAK mutants was produced recombinantly and biochemically characterized. Screening revealed a pharmacologically interesting SAK mutant with ∼7-fold enhanced affinity toward plasmin, ∼10-fold improved plasmin selectivity and moderately higher plasmin-generating efficiency in vitro. Collectively, the results obtained provide a framework for SAK engineering using computational affinity-design that could pave the way to next-generation of effective, highly selective, and less toxic thrombolytics.
Leonard A. C., Weinstein J. J., Steiner P. J., Erbse A. H., Fleishman S. J. & Whitehead T. A.
(2022)
Protein Engineering, Design and Selection.
35,
gzac002.
Stabilizing antigenic proteins as vaccine immunogens or diagnostic reagents is a stringent case of protein engineering and design as the exterior surface must maintain recognition by receptor(s) and antigen—specific antibodies at multiple distinct epitopes. This is a challenge, as stability enhancing mutations must be focused on the protein core, whereas successful computational stabilization algorithms typically select mutations at solvent-facing positions. In this study, we report the stabilization of SARS-CoV-2 Wuhan Hu-1 Spike receptor binding domain using a combination of deep mutational scanning and computational design, including the FuncLib algorithm. Our most successful design encodes I358F, Y365W, T430I, and I513L receptor binding domain mutations, maintains recognition by the receptor ACE2 and a panel of different anti-receptor binding domain monoclonal antibodies, is between 1 and 2°C more thermally stable than the original receptor binding domain using a thermal shift assay, and is less proteolytically sensitive to chymotrypsin and thermolysin than the original receptor binding domain. Our approach could be applied to the computational stabilization of a wide range of proteins without requiring detailed knowledge of active sites or binding epitopes. We envision that this strategy may be particularly powerful for cases when there are multiple or unknown binding sites.
Graphical Abstract
Graphical Abstract
Elazar A., Chandler N. J., Davey A. S., Weinstein J. Y., Nguyen J. V., Trenker R., Cross R. S., Jenkins M. R., Call M. J., Call M. E. & Fleishman S. J.
(2022)
eLife.
11,
e75660.
De novo-designed receptor transmembrane domains (TMDs) present opportunities for precise control of cellular receptor functions. We developed a de novo design strategy for generating programmed membrane proteins (proMPs): single-pass α-helical TMDs that self-assemble through computationally defined and crystallographically validated interfaces. We used these proMPs to program specific oligomeric interactions into a chimeric antigen receptor (CAR) that we expressed in mouse primary T cells and found that both in vitro CAR T cell cytokine release and in vivo antitumor activity scaled linearly with the oligomeric state encoded by the receptor TMD, from monomers up to tetramers. All programmed CARs stimulated substantially lower T cell cytokine release relative to the commonly used CD28 TMD, which we show elevated cytokine release through lateral recruitment of the endogenous T cell costimulatory receptor CD28. Precise design using orthogonal and modular TMDs thus provides a new way to program receptor structure and predictably tune activity for basic or applied synthetic biology.
Khersonsky O. & Fleishman S. J.
(2022)
BioDesign Research.
2022,
9787581.
The overarching goal of computational protein design is to gain complete control over protein structure and function. The majority of sophisticated binders and enzymes, however, are large and exhibit diverse and complex folds that defy atomistic design calculations. Encouragingly, recent strategies that combine evolutionary constraints from natural homologs with atomistic calculations have significantly improved design accuracy. In these approaches, evolutionary constraints mitigate the risk from misfolding and aggregation, focusing atomistic design calculations on a small but highly enriched sequence subspace. Such methods have dramatically optimized diverse proteins, including vaccine immunogens, enzymes for sustainable chemistry, and proteins with therapeutic potential. The new generation of deep learning-based ab initio structure predictors can be combined with these methods to extend the scope of protein design, in principle, to any natural protein of known sequence. We envision that protein engineering will come to rely on completely computational methods to efficiently discover and optimize biomolecular activities.
Vongsouthi V., Whitfield J. H., Unichenko P., Mitchell J. A., Breithausen B., Khersonsky O., Kremers L., Janovjak H., Monai H., Hirase H., Fleishman S. J., Henneberger C. & Jackson C. J.
(2021)
ACS Sensors.
6,
11,
p. 4193-4205
Solute-binding proteins (SBPs) have evolved to balance the demands of ligand affinity, thermostability, and conformational change to accomplish diverse functions in small molecule transport, sensing, and chemotaxis. Although the ligand-induced conformational changes that occur in SBPs make them useful components in biosensors, they are challenging targets for protein engineering and design. Here, we have engineered a d-alanine-specific SBP into a fluorescence biosensor with specificity for the signaling molecule d-serine (D-serFS). This was achieved through binding site and remote mutations that improved affinity (K D = 6.7 ± 0.5 μM), specificity (40-fold increase vs glycine), thermostability (T m = 79 °C), and dynamic range (∼14%). This sensor allowed measurement of physiologically relevant changes in d-serine concentration using two-photon excitation fluorescence microscopy in rat brain hippocampal slices. This work illustrates the functional trade-offs between protein dynamics, ligand affinity, and thermostability and how these must be balanced to achieve desirable activities in the engineering of complex, dynamic proteins.
In recent decades, antibodies (Abs) have attracted the attention of academia and the biopharmaceutical industry due to their therapeutic properties and versatility in binding a vast spectrum of antigens. Different engineering strategies have been developed for optimizing Ab specificity, efficacy, affinity, stability and production, enabling systematic screening and analysis procedures for selecting lead candidates. This quality assessment is critical but usually demands time-consuming and labor-intensive purification procedures. Here, we harnessed the direct-mass spectrometry (direct-MS) approach, in which the analysis is carried out directly from the crude growth media, for the rapid, structural characterization of designed Abs. We demonstrate that properties such as stability, specificity and interactions with antigens can be defined, without the need for prior purification.
Fleishman S. J. & Horovitz A.
(2021)
Journal of Molecular Biology.
433,
20,
167007.
Recent progress in structure-prediction methods that rely on deep learning suggests that the atomic structure of almost any protein may soon be predictable directly from its amino acid sequence. This much-awaited revolution was driven by substantial improvements in the reliability of methods for inferring the spatial distances between amino acid pairs from an analysis of homologous sequences. Improved reliability has been accompanied, however, by a reduced ability to detect amino acid relationships that are not due to direct spatial contacts, such as those that arise from protein dynamics or allostery. Given the central importance of dynamics and allostery to protein activity, we argue that an important future advance would extend modeling beyond predicting a single static structure. Here, we briefly review some of the developments that have led to the remarkable recent achievement in structure prediction and speculate what methods and sources of information may be leveraged in the future to develop a modeling framework that addresses protein dynamics and allostery.
Makdasi E., Zvi A., Alcalay R., Noy-Porat T., Peretz E., Mechaly A., Levy Y., Epstein E., Chitlaru T., Tennenhouse A., Aftalion M., Gur D., Paran N., Tamir H., Zimhony O., Weiss S., Mandelboim M., Mendelson E., Zuckerman N., Nemet I., Kliker L., Yitzhaki S., Shapira S. C., Israely T., Fleishman S. J., Mazor O. & Rosenfeld R.
(2021)
Cell Reports.
36,
10,
109679.
A wide range of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) neutralizing monoclonal antibodies (mAbs) have been reported, most of which target the spike glycoprotein. Therapeutic implementation of these antibodies has been challenged by emerging SARS-CoV-2 variants harboring mutated spike versions. Consequently, re-assessment of previously identified mAbs is of high priority. Four previously selected mAbs targeting non-overlapping epitopes are now evaluated for binding potency to mutated RBD versions, reported to mediate escape from antibody neutralization. In vitro neutralization potencies of these mAbs, and two NTD-specific mAbs, are evaluated against two frequent SARS-CoV-2 variants of concern, the B.1.1.7 Alpha and the B.1.351 Beta. Furthermore, we demonstrate therapeutic potential of three selected mAbs by treatment of K18-human angiotensin-converting enzyme 2 (hACE2) transgenic mice 2 days post-infection with each virus variant. Thus, despite the accumulation of spike mutations, the highly potent MD65 and BL6 mAbs retain their ability to bind the prevalent viral mutants, effectively protecting against B.1.1.7 and B.1.351 variants.
Adams E. M., Pezzotti S., Ahlers J., Rüttermann M., Levin M., Goldenzweig A., Peleg Y., Fleishman S. J., Sagi I. & Havenith M.
(2021)
JACS Au.
1,
7,
p. 1076-1085
Although it is well-known that limited local mutations of enzymes, such as matrix metalloproteinases (MMPs), may change enzyme activity by orders of magnitude as well as its stability, the completely rational design of proteins is still challenging. These local changes alter the electrostatic potential and thus local electrostatic fields, which impacts the dynamics of water molecules close the protein surface. Here we show by a combined computational design, experimental, and molecular dynamics (MD) study that local mutations have not only a local but also a global effect on the solvent: In the specific case of the matrix metalloprotease MMP14, we found that the nature of local mutations, coupled with surface morphology, have the ability to influence large patches of the water hydrogen-bonding network at the protein surface, which is correlated with stability. The solvent contribution can be experimentally probed via terahertz (THz) spectroscopy, thus opening the door to the exciting perspective of rational protein design in which a systematic tuning of hydration water properties allows manipulation of protein stability and enzymatic activity.
Borenstein-Katz A., Warszawski S., Amon R., Eilon M., Cohen-Dvashi H., Leviatan Ben-Arye S., Tasnima N., Yu H., Chen X., Padler-Karavani V., Fleishman S. J. & Diskin R.
(2021)
Journal of Molecular Biology.
433,
15,
167099.
Glycans decorate the cell surface, secreted glycoproteins and glycolipids, and altered glycans are often found in cancers. Despite their high diagnostic and therapeutic potential, however, glycans are polar and flexible molecules that are quite challenging for the development and design of high-affinity binding antibodies. To understand the mechanisms by which glycan neoantigens are specifically recognized by antibodies, we analyze the biomolecular recognition of the tumor-associated carbohydrate antigen CA19-9 by two distinct antibodies using X-ray crystallography. Despite the potential plasticity of glycans and the very different antigen-binding surfaces presented by the antibodies, both structures reveal an essentially identical extended CA19-9 conformer, suggesting that this conformer's stability selects the antibodies. Starting from the bound structure of one of the antibodies, we use the AbLIFT computational algorithm to design a variant with seven core mutations in the variable domain's light-heavy chain interface that exhibits tenfold improved affinity for CA19-9. The results reveal strategies used by antibodies to specifically recognize glycan antigens and show how automated antibody-optimization methods may be used to enhance the clinical potential of existing antibodies.
Aharoni A. & Fleishman S. J.
(2021)
The FEBS journal.
288,
13,
p. 3880-3883
Dan (Danny) Tawfik, a leader in biochemistry and protein evolution, sadly died due to a fatal climbing accident on May 4th, 2021. Apart from science, rock climbing was Danny's passion and a source of pride as only a handful of researchers are active climbers. Danny made unique and long‐lasting contributions to our understanding of molecular evolution. He was also incredibly generous with his time and insights, and many researchers around the world are indebted to him, not least the two authors of this obituary.
Peleg Y., Vincentelli R., Collins B. M., Chen K., Livingstone E. K., Weeratunga S., Leneva N., Guo Q., Remans K., Perez K., Bjerga G. E., Larsen Ø., Vaněk O., Skořepa O., Jacquemin S., Poterszman A., Kjaer S., Christodoulou E., Albeck S., Dym O., Ainbinder E., Unger T., Schuetz A., Matthes S., Bader M., de Marco A., Storici P., Semrau M. S., Stolt-Bergner P., Aigner C., Suppmann S., Goldenzweig A. & Fleishman S. J.
(2021)
Journal of Molecular Biology.
433,
13,
166964.
Recent years have seen a dramatic improvement in protein-design methodology. Nevertheless, most methods demand expert intervention, limiting their widespread adoption. By contrast, the PROSS algorithm for improving protein stability and heterologous expression levels has been successfully applied to a range of challenging enzymes and binding proteins. Here, we benchmark the application of PROSS as a stand-alone tool for protein scientists with no or limited experience in modeling. Twelve laboratories from the Protein Production and Purification Partnership in Europe (P4EU) challenged the PROSS algorithm with 14 unrelated protein targets without support from the PROSS developers. For each target, up to six designs were evaluated for expression levels and in some cases, for thermal stability and activity. In nine targets, designs exhibited increased heterologous expression levels either in prokaryotic and/or eukaryotic expression systems under experimental conditions that were tailored for each target protein. Furthermore, we observed increased thermal stability in nine of ten tested targets. In two prime examples, the human Stem Cell Factor (hSCF) and human Cadherin-Like Domain (CLD12) from the RET receptor, the wild type proteins were not expressible as soluble proteins in E. coli, yet the PROSS designs exhibited high expression levels in E. coli and HEK293 cells, respectively, and improved thermal stability. We conclude that PROSS may improve stability and expressibility in diverse cases, and that improvement typically requires target-specific expression conditions. This study demonstrates the strengths of community-wide efforts to probe the generality of new methods and recommends areas for future research to advance practically useful algorithms for protein science.
Scherer M., Fleishman S. J., Jones P. R., Dandekar T. & Bencurova E.
(2021)
Frontiers in Bioengineering and Biotechnology.
9,
673005.
To enable a sustainable supply of chemicals, novel biotechnological solutions are required that replace the reliance on fossil resources. One potential solution is to utilize tailored biosynthetic modules for the metabolic conversion of CO2 or organic waste to chemicals and fuel by microorganisms. Currently, it is challenging to commercialize biotechnological processes for renewable chemical biomanufacturing because of a lack of highly active and specific biocatalysts. As experimental methods to engineer biocatalysts are time- and cost-intensive, it is important to establish efficient and reliable computational tools that can speed up the identification or optimization of selective, highly active, and stable enzyme variants for utilization in the biotechnological industry. Here, we review and suggest combinations of effective state-of-the-art software and online tools available for computational enzyme engineering pipelines to optimize metabolic pathways for the biosynthesis of renewable chemicals. Using examples relevant for biotechnology, we explain the underlying principles of enzyme engineering and design and illuminate future directions for automated optimization of biocatalysts for the assembly of synthetic metabolic pathways.
Weinstein J. J., Goldenzweig A., Hoch S. & Fleishman S. J.
(2021)
Bioinformatics.
37,
1,
p. 123-125
Many natural and designed proteins are only marginally stable limiting their usefulness in research and applications. Recently, we described an automated structure and sequence-based design method, called PROSS, for optimizing protein stability and heterologous expression levels that has since been validated on dozens of proteins. Here, we introduce improvements to the method, workflow and presentation, including more accurate sequence analysis, error handling and automated analysis of the quality of the sequence alignment that is used in design calculations.
VanDrisse C. M., Lipsh-Sokolik R., Khersonsky O., Fleishman S. J. & Newman D. K.
(2021)
Proceedings of the National Academy of Sciences - PNAS.
118,
12,
e202201211.
Pseudomonas aeruginosa is an opportunistic human pathogen that develops difficult-to-treat biofilms in immunocompromised individuals, cystic fibrosis patients, and in chronic wounds. P. aeruginosa has an arsenal of physiological attributes that enable it to evade standard antibiotic treatments, particularly in the context of biofilms where it grows slowly and becomes tolerant to many drugs. One of its survival strategies involves the production of the redox-active phenazine, pyocyanin, which promotes biofilm development. We previously identified an enzyme, PodA, that demethylated pyocyanin and disrupted P. aeruginosa biofilm development in vitro. Here, we asked if this protein could be used as a potential therapeutic for P. aeruginosa infections together with tobramycin, an antibiotic typically used in the clinic. A major roadblock to answering this question was the poor yield and stability of wild-type PodA purified from standard Escherichia coli overexpression systems. We hypothesized that the insufficient yields were due to poor packing within PodA’s obligatory homotrimeric interfaces. We therefore applied the protein design algorithm, AffiLib, to optimize the symmetric core of this interface, resulting in a design that incorporated five mutations leading to a 20-fold increase in protein yield from heterologous expression and purification and a substantial increase in stability to environmental conditions. The addition of the designed PodA with tobramycin led to increased killing of P. aeruginosa cultures under oxic and hypoxic conditions in both the planktonic and biofilm states. This study highlights the potential for targeting extracellular metabolites to assist the control of P. aeruginosa biofilms that tolerate conventional antibiotic treatment.
Lipsh\u2010Sokolik R., Listov D. & Fleishman S. J.
(2021)
Protein Science.
30,
1,
p. 151-159
The functional sites of many protein families are dominated by diverse backbone regions that lack secondary structure (loops) but fold stably into their functionally competent state. Nevertheless, the design of structured loop regions from scratch, especially in functional sites, has met with great difficulty. We therefore developed an approach, called AbDesign, to exploit the natural modularity of many protein families and computationally assemble a large number of new backbones by combining naturally occurring modular fragments. This strategy yielded large, atomically accurate, and highly efficient proteins, including antibodies and enzymes exhibiting dozens of mutations from any natural protein. The combinatorial backbone\u2010conformation space that can be accessed by AbDesign even for a modestly sized family of homologs may exceed the diversity in the entire PDB, providing the sub\u2010Ångstrom level of control over the positioning of active\u2010site groups that is necessary for obtaining highly active proteins. This manuscript describes how to implement the pipeline using code that is freely available at https://github.com/Fleishman-Lab/AbDesign_for_enzymes .
Warszawski S., Katz A. B., Lipsh R., Khmelnitsky L., Nissan G. B., Javitt G., Dym O., Unger T., Knop O., Albeck S., Diskin R., Fass D., Sharon M. & Fleishman S. J.
(2020)
PLoS Computational Biology.
16,
10,
e1008382.
The funding statement for this article should read as follows: \u201cThe research was supported by grants from the European Research Council (335439 to SJF, 636752 to MS, and 310649 and 825076 to DF), the Israel Science Foundation to MS (300/ 17) and through its Center of Excellence in Structural Cell Biology to SJF and DF (1775/12), a research grant from Sheri and David E. Stone and by a charitable donation from Sam Switzer and family. M.S. is an incumbent of the Aharon and Ephraim Katzir Memorial Professorial Chair. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.\u201d
Our ability to design new or improved biomolecular activities depends on understanding the sequence-function relationships in proteins. The large size and fold complexity of most proteins, however, obscure these relationships, and protein-optimization methods continue to rely on laborious experimental iterations. Recently, a deeper understanding of the roles of stability-threshold effects and biomolecular epistasis in proteins has led to the development of hybrid methods that combine phylogenetic analysis with atomistic design calculations. These methods enable reliable and even single-step optimization of protein stability, expressibility, and activity in proteins that were considered outside the scope of computational design. Furthermore, ancestral-sequence reconstruction produces insights on missing links in the evolution of enzymes and binders that may be used in protein design. Through the combination of phylogenetic and atomistic calculations, the long-standing goal of general computational methods that can be universally applied to study and optimize proteins finally seems within reach.
Warszawski S., Dekel E., Campeotto I., Marshall J. M., Wright K. E., Lyth O., Knop O., Regev-Rudzki N., Higgins M. K., Draper S. J., Baum J. & Fleishman S. J.
(2020)
Proteins-Structure Function And Bioinformatics.
88,
1,
p. 187-195
Many human pathogens use host cell-surface receptors to attach and invade cells. Often, the host-pathogen interaction affinity is low, presenting opportunities to block invasion using a soluble, high-affinity mimic of the host protein. The Plasmodium falciparum reticulocyte-binding protein homolog 5 (RH5) provides an exciting candidate for mimicry: it is highly conserved and its moderate affinity binding to the human receptor basigin (K
D ≥1 μM) is an essential step in erythrocyte invasion by this malaria parasite. We used deep mutational scanning of a soluble fragment of human basigin to systematically characterize point mutations that enhance basigin affinity for RH5 and then used Rosetta to design a variant within the sequence space of affinity-enhancing mutations. The resulting seven-mutation design exhibited 1900-fold higher affinity (K
D approximately 1 nM) for RH5 with a very slow binding off rate (0.23 h
−1) and reduced the effective Plasmodium growth-inhibitory concentration by at least 10-fold compared to human basigin. The design provides a favorable starting point for engineering on-rate improvements that are likely to be essential to reach therapeutically effective growth inhibition.
Malladi S. K., Schreiber D., Pramanick I., Sridevi M. A., Goldenzweig A., Dutta S., Fleishman S. J. & Varadarajan R.
(2020)
Current Research in Structural Biology.
2,
p. 45-55
Stabilization of the metastable envelope glycoprotein (Env) of HIV-1 is hypothesized to improve induction of broadly neutralizing antibodies. We improved the expression yield and stability of the HIV-1 envelope glycoprotein BG505SOSIP.664 gp140 by means of a previously described automated sequence and structure-guided computational thermostabilization approach, PROSS. This combines sequence conservation information with computational assessment of mutant stabilization, thus taking advantage of the extensive natural sequence variation present in HIV-1 Env. PROSS is used to design three gp140 variants with 17–45 mutations relative to the parental construct. One of the designs is experimentally observed to have a fourfold improvement in yield and a 4 °C increment in thermostability. In addition, the designed immunogens have similar antigenicity profiles to the native flexible linker version of wild type, BG505SOSIP.664 gp140 (NFL Wt) to major epitopes targeted by broadly neutralizing antibodies. PROSS eliminates the laborious process of screening many variants for stability and functionality, providing a proof of principle of the method for stabilization and improvement of yield without compromising antigenicity for next generation complex, highly glycosylated vaccine candidates.
Weinstein J. Y., Elazar A. & Fleishman S. J.
(2019)
PLoS Computational Biology.
15,
8,
e1007318.
Membrane-protein design is an exciting and increasingly successful research area which has led to landmarks including the design of stable and accurate membrane-integral proteins based on coiled-coil motifs. Design of topologically more complex proteins, such as most receptors, channels, and transporters, however, demands an energy function that balances contributions from intra-protein contacts and protein-membrane interactions. Recent advances in water-soluble all-atom energy functions have increased the accuracy in structure-prediction benchmarks. The plasma membrane, however, imposes different physical constraints on protein solvation. To understand these constraints, we recently developed a high-throughput experimental screen, called dsTβL, and inferred apparent insertion energies for each amino acid at dozens of positions across the bacterial plasma membrane. Here, we express these profiles as lipophilicity energy terms in Rosetta and demonstrate that the new energy function outperforms previous ones in modelling and design benchmarks. Rosetta ab initio simulations starting from an extended chain recapitulate two-thirds of the experimentally determined structures of membrane-spanning homo-oligomers with
Warszawski S., Katz A. B., Lipsh R., Khmelnitsky L., Ben Nissan G., Javitt G., Dym O., Unger T., Knop O., Albeck S., Diskin R., Fass D., Sharon M. & Fleishman S. J.
(2019)
PLoS Computational Biology.
15,
8,
e1007207.
Antibodies developed for research and clinical applications may exhibit suboptimal stability, expressibility, or affinity. Existing optimization strategies focus on surface mutations, whereas natural affinity maturation also introduces mutations in the antibody core, simultaneously improving stability and affinity. To systematically map the mutational tolerance of an antibody variable fragment (Fv), we performed yeast display and applied deep mutational scanning to an anti-lysozyme antibody and found that many of the affinity-enhancing mutations clustered at the variable light-heavy chain interface, within the antibody core. Rosetta design combined enhancing mutations, yielding a variant with tenfold higher affinity and substantially improved stability. To make this approach broadly accessible, we developed AbLIFT, an automated web server that designs multipoint core mutations to improve contacts between specific Fv light and heavy chains (http://AbLIFT.weizmann.ac.il). We applied AbLIFT to two unrelated antibodies targeting the human antigens VEGF and QSOX1. Strikingly, the designs improved stability, affinity, and expression yields. The results provide proof-of-principle for bypassing laborious cycles of antibody engineering through automated computational affinity and stability design.
Methods for antibody structure prediction rely on sequence homology to experimentally determined structures. Resulting models may be accurate but are often stereochemically strained, limiting their usefulness in modeling and design workflows. We present the AbPredict 2 web-server, which instead of using sequence homology, conducts a Monte Carlo-based search for low-energy combinations of backbone conformations to yield accurate and unstrained antibody structures.
Trudeau D. L., Edlich-Muth C., Zarzycki J., Scheffen M., Goldsmith M., Khersonsky O., Avizemer Z., Fleishman S. J., Cotton C. A. R., Erb T. J., Tawfik D. S. & Bar-Even A.
(2018)
Proceedings Of The National Academy Of Sciences Of The United States Of America-Physical Sciences.
115,
49,
p. E11455-E11464
Photorespiration recycles ribulose-1,5-bisphosphate carboxylase/oxygenase ( Rubisco) oxygenation product, 2-phosphoglycolate, back into the Calvin Cycle. Natural photorespiration, however, limits agricultural productivity by dissipating energy and releasing CO2. Several photorespiration bypasses have been previously suggested but were limited to existing enzymes and pathways that release CO2. Here, we harness the power of enzyme and metabolic engineering to establish synthetic routes that bypass photorespiration without CO2 release. By defining specific reaction rules, we systematically identified promising routes that assimilate 2-phosphoglycolate into the Calvin Cycle without carbon loss. We further developed a kinetic-stoichiometric model that indicates that the identified synthetic shunts could potentially enhance carbon fixation rate across the physiological range of irradiation and CO2, even if most of their enzymes operate at a tenth of Rubisco's maximal carboxylation activity. Glycolate reduction to glycolaldehyde is essential for several of the synthetic shunts but is not known to occur naturally. We, therefore, used computational design and directed evolution to establish this activity in two sequential reactions. An acetyl-CoA synthetase was engineered for higher stability and glycolyl-CoA synthesis. A propionyl-CoA reductase was engineered for higher selectivity for glycolyl-CoA and for use of NADPH over NAD(+), thereby favoring reduction over oxidation. The engineered glycolate reduction module was then combined with downstream condensation and assimilation of glycolaldehyde to ribulose 1,5-bisphosphate, thus providing proof of principle for a carbonconserving photorespiration pathway.
Ben-Nissan G., Vimer S., Warszawski S., Katz A., Yona M., Unger T., Peleg Y., Morgenstern D., Cohen-Dvashi H., Diskin R., Fleishman S. J. & Sharon M.
(2018)
Communications Biology.
1,
1,
213.
Characterization of overexpressed proteins is essential for assessing their quality, and providing input for iterative redesign and optimization. This process is typically carried out following purification procedures that require pronounced cost of time and labor. Therefore, quality assessment of recombinant proteins with no prior purification offers a major advantage. Here, we report a native mass spectrometry method that enables characterization of overproduced proteins directly from culture media. Properties such as solubility, molecular weight, folding, assembly state, overall structure, post-translational modifications and binding to relevant biomolecules are immediately revealed. We show the applicability of the method for in-depth characterization of secreted recombinant proteins from eukaryotic systems such as yeast, insect, and human cells. This method, which can be readily extended to high-throughput analysis, considerably shortens the time gap between protein production and characterization, and is particularly suitable for characterizing engineered and mutated proteins, and optimizing yield and quality of overexpressed proteins.
Kantaev R., Riven I., Goldenzweig A., Barak Y., Dym O., Peleg Y., Albeck S., Fleishman S. J. & Haran G.
(2018)
Journal of Physical Chemistry B.
122,
49,
p. 11030-11038
Folding of proteins to their functional conformation is paramount to life. Though 75% of the proteome consists of multi-domain proteins, our knowledge of folding has been based primarily on studies conducted on single-domain and fast-folding proteins. Nonetheless, the complexity of folding landscapes exhibited by multi-domain proteins has received increased scrutiny in recent years. We study the three-domain protein adenylate kinase from E. coli (AK), which has been shown to fold through a series of pathways involving several intermediate states. We use protein design method to manipulate the folding landscape of AK, and single-molecule FRET spectroscopy to study the effects on the folding process. Mutations introduced in the NMP binding (NMPbind) domain of the protein are found to have unexpected effects on the folding landscape. Thus, while stabilizing mutations in the core of the NMPbind domain retain the main folding pathways of wild-type AK, a destabilizing mutation at the interface between the NMPbind and the CORE domains causes a significant repartition of the flux between the folding pathways. Our results demonstrate the outstanding plasticity of the folding landscape of AK, and reveal how specific mutations in the primary structure are translated into changes in folding dynamics. The combination of methodologies introduced in this work should prove useful for deepening our understanding of the folding process of multi-domain proteins.
Netzer R., Listov D., Lipsh R., Dym O., Albeck S., Knop O., Kleanthous C. & Fleishman S. J.
(2018)
Nature Communications.
9,
1,
5286.
Protein networks in all organisms comprise homologous interacting pairs. In these networks, some proteins are specific, interacting with one or a few binding partners, whereas others are multispecific and bind a range of targets. We describe an algorithm that starts from an interacting pair and designs dozens of new pairs with diverse backbone conformations at the binding site as well as new binding orientations and sequences. Applied to a high-affinity bacterial pair, the algorithm results in 18 new ones, with cognate affinities from pico- to micromolar. Three pairs exhibit 3-5 orders of magnitude switch in specificity relative to the wild type, whereas others are multispecific, collectively forming a protein-interaction network. Crystallographic analysis confirms design accuracy, including in new backbones and polar interactions. Preorganized polar interaction networks are responsible for high specificity, thus defining design principles that can be applied to program synthetic cellular interaction networks of desired affinity and specificity.
Khersonsky O., Lipsh R., Avizemer Z., Ashani Y., Goldsmith M., Leader H., Dym O., Rogotner S., Trudeau D. L., Prilusky J., Amengual-Rigo P., Guallar V., Tawfik D. S. & Fleishman S. J.
(2018)
Molecular Cell.
72,
1,
p. 178-186.e5
Substantial improvements in enzyme activity demand multiple mutations at spatially proximal positions in the active site. Such mutations, however, often exhibit unpredictable epistatic (non-additive) effects on activity. Here we describe FuncLib, an automated method for designing multipoint mutations at enzyme active sites using phylogenetic analysis and Rosetta design calculations. We applied FuncLib to two unrelated enzymes, a phosphotriesterase and an acetyl-CoA synthetase. All designs were active, and most showed activity profiles that significantly differed from the wild-type and from one another. Several dozen designs with only 3–6 active-site mutations exhibited 10- to 4,000-fold higher efficiencies with a range of alternative substrates, including hydrolysis of the toxic organophosphate nerve agents soman and cyclosarin and synthesis of butyryl-CoA. FuncLib is implemented as a web server (http://FuncLib.weizmann.ac.il); it circumvents iterative, high-throughput experimental screens and opens the way to designing highly efficient and diverse catalytic repertoires.
Cveticanin J., Netzer R., Arkind G., Fleishman S. J., Horovitz A. & Sharon M.
(2018)
Analytical Chemistry.
90,
17,
p. 10090-10094
A powerful method to determine the energetic coupling between amino acids is double mutant cycle analysis. In this method, two residues are mutated separately and in combination and the energetic effects of the mutations are determined. A deviation of the effect of the double mutation from the sum of effects of the single mutations indicates that the two residues are interacting directly or indirectly. Here, we show that double mutant cycle analysis by native mass spectrometry can be carried out for interactions in crude Escherichia coli cell extracts, thereby obviating the need for protein purification and generating binding isotherms. Our results indicate that intermolecular hydrogen bond strengths are not affected by the more crowded conditions in cell lysates.
Lapidoth G., Khersonsky O., Lipsh R., Dym O., Albeck S., Rogotner S. & Fleishman S. J.
(2018)
Nature Communications.
9,
2780.
Automated design of enzymes with wild-type-like catalytic properties has been a long-standing but elusive goal. Here, we present a general, automated method for enzyme design through combinatorial backbone assembly. Starting from a set of homologous yet structurally diverse enzyme structures, the method assembles new backbone combinations and uses Rosetta to optimize the amino acid sequence, while conserving key catalytic residues. We apply this method to two unrelated enzyme families with TIM-barrel folds, glycoside hydrolase 10 (GH10) xylanases and phosphotriesterase-like lactonases (PLLs), designing 43 and 34 proteins, respectively. Twenty-one GH10 and seven PLL designs are active, including designs derived from templates with
Amon R., Grant O. C., Ben-Arye S. L., Makeneni S., Nivedha A. K., Marshanski T., Norn C., Yu H., Glushka J. N., Fleishman S. J., Chen X., Woods R. J. & Padler-Karavani V.
(2018)
Scientific Reports.
8,
10786.
Anti-carbohydrate monoclonal antibodies (mAbs) hold great promise as cancer therapeutics and diagnostics. However, their specificity can be mixed, and detailed characterization is problematic, because antibody-glycan complexes are challenging to crystallize. Here, we developed a generalizable approach employing high-throughput techniques for characterizing the structure and specificity of such mAbs, and applied it to the mAb TKH2 developed against the tumor-associated carbohydrate antigen sialyl-Tn (STn). The mAb specificity was defined by apparent KD values determined by quantitative glycan microarray screening. Key residues in the antibody combining site were identified by site-directed mutagenesis, and the glycan-antigen contact surface was defined using saturation transfer difference NMR (STD-NMR). These features were then employed as metrics for selecting the optimal 3D-model of the antibody-glycan complex, out of thousands plausible options generated by automated docking and molecular dynamics simulation. STn-specificity was further validated by computationally screening of the selected antibody 3D-model against the human sialyl-Tn-glycome. This computational-experimental approach would allow rational design of potent antibodies targeting carbohydrates.
Goldenzweig A. & Fleishman S. J.
(2018)
Annual Review of Biochemistry.
87,
p. 105-129
Proteins are increasingly used in basic and applied biomedical research. Many proteins, however, are only marginally stable and can be expressed in limited amounts, thus hampering research and applications. Research has revealed the thermodynamic, cellular, and evolutionary principles and mechanisms that underlie marginal stability. With this growing understanding, computational stability design methods have advanced over the past two decades starting from methods that selectively addressed only some aspects of marginal stability. Current methods are more general and, by combining phylogenetic analysis with atomistic design, have shown drastic improvements in solubility, thermal stability, and aggregation resistance while maintaining the protein's primary molecular activity. Stability design is opening the way to rational engineering of improved enzymes, therapeutics, and vaccines and to the application of protein design methodology to large proteins and molecular activities that have proven challenging in the past.
Bandyopadhyay B., Goldenzweig A., Unger T., Adato O., Fleishman S. J., Unger R. & Horovitz A.
(2017)
Journal of Biological Chemistry.
292,
50,
p. 20583-20591
The GroE chaperonin system in Escherichia coli comprises GroEL and GroES and facilitates ATP-dependent protein folding in vivo and in vitro Proteins with very similar sequences and structures can differ in their dependence on GroEL for efficient folding. One potential but unverified source for GroEL dependence is frustration, wherein not all interactions in the native state are optimized energetically, thereby potentiating slow folding and misfolding. Here, we chose enhanced green fluorescent protein as a model system and subjected it to random mutagenesis, followed by screening for variants whose in vivo folding displays increased or decreased GroEL dependence. We confirmed the altered GroEL dependence of these variants with in vitro folding assays. Strikingly, mutations at positions predicted to be highly frustrated were found to correlate with decreased GroEL dependence. Conversely, mutations at positions with low frustration were found to correlate with increased GroEL dependence. Further support for this finding was obtained by showing that folding of an enhanced green fluorescent protein variant designed computationally to have reduced frustration is indeed less GroEL-dependent. Our results indicate that changes in local frustration also affect partitioning in vivo between spontaneous and chaperonin-mediated folding. Hence, the design of minimally frustrated sequences can reduce chaperonin dependence and improve protein expression levels.
Baran D., Pszolla M. G., Lapidoth G. D., Norn C., Dym O., Unger T., Albeck S., Tyka M. D. & Fleishman S. J.
(2017)
Proceedings of the National Academy of Sciences of the United States of America.
114,
41,
p. 10900-10905
Natural proteins must both fold into a stable conformation and exert their molecular function. To date, computational design has successfully produced stable and atomically accurate proteins by using so-called "ideal" folds rich in regular secondary structures and almost devoid of loops and destabilizing elements, such as cavities. Molecular function, such as binding and catalysis, however, often demands nonideal features, including large and irregular loops and buried polar interaction networks, which have remained challenging for fold design. Through five design/experiment cycles, we learned principles for designing stable and functional antibody variable fragments (Fvs). Specifically, we (i) used sequence-design constraints derived from antibody multiple-sequence alignments, and (ii) during backbone design, maintained stabilizing interactions observed in natural antibodies between the framework and loops of complementarity-determining regions (CDRs) 1 and 2. Designed Fvs bound their ligands with midnanomolar affinities and were as stable as natural antibodies, despite having >30 mutations from mammalian antibody germlines. Furthermore, crystallographic analysis demonstrated atomic accuracy throughout the framework and in four of six CDRs in one design and atomic accuracy in the entire Fv in another. The principles we learned are general, and can be implemented to design other nonideal folds, generating stable, specific, and precise antibodies and enzymes.
Rosenfeld R., Alcalay R., Mechaly A., Lapidoth G., Epstein E., Kronman C., Fleishman S. J. & Mazor O.
(2017)
Protein Engineering, Design and Selection.
30,
9,
p. 611-617
While potent monoclonal antibodies against ricin were introduced over the years, the question whether increasing antibody affinity enables better toxin neutralization was not fully addressed yet. The aim of this study was to characterize the contribution of antibody affinity to the ricin neutralization potential of the antibody. cHD23 monoclonal antibody that targets the toxin B-subunit and interferes with its binding to membranal receptors, was isolated. In order to create antibody clones with improved affinity toward ricin, a scFv-phage display library containing mutated versions of the variable regions of cHD23 was constructed and clones with improved binding of ricin were isolated. Structural modeling of these mutants suggests that the inserted mutations may increase the antibody conformational flexibility thus improving its ability to bind ricin. While it was found that the selected clones exhibited improved neutralization of ricin, the correlation between the K-D values and potency was only minor (r = 0.55). However, a positive correlation (r = 0.84) exist between the off-rate values (k(off)) of the affinity matured clones and their ability to neutralize ricin. As cell membranes display inordinately large amounts of potential surface binding sites for ricin, it is suggested that antibodies with improved off-rate values block the ability of the toxin to bind to target receptors, in a highly efficient manner. Currently, antibody-based therapy is the most effective treatment for ricin intoxication and it is anticipated that the findings of this study will provide useful information and a possible strategy to design an improved antibody-based therapy for the toxin.
Goldsmith M., Aggarwal N., Ashani Y., Jubran H., Greisen P. J., Ovchinnikov S., Leader H., Baker D., Sussman J., Goldenzweig A., Fleishman S. J. & Tawfik D.
(2017)
Protein Engineering, Design and Selection.
30,
4,
p. 333-345
Improving an enzyme's initially low catalytic efficiency with a new target substrate by an order of magnitude or two may require only a few rounds of mutagenesis and screening or selection. However, subsequent rounds of optimization tend to yield decreasing degrees of improvement (diminishing returns) eventually leading to an optimization plateau. We aimed to optimize the catalytic efficiency of bacterial phosphotriesterase (PTE) toward V-type nerve agents. Previously, we improved the catalytic efficiency of wild-type PTE toward the nerve agent VX by 500-fold, to a catalytic efficiency (k(cat)/K-M) of 5 x 10(6)M(-1) min(-1). However, effective in vivo detoxification demands an enzyme with a catalytic efficiency of > 10(7) M-1 min(-1). Here, following eight additional rounds of directed evolution and the computational design of a stabilized variant, we evolved PTE variants that detoxify VX with a k(cat)/K-M >= 5 x 10(7)M(-1) min(-1) and Russian VX (RVX) with a k(cat)/K-M >= 10(7) M-1 min(-1). These final 10-fold improvements were the most time consuming and laborious, as most libraries yielded either minor or no improvements. Stabilizing the evolving enzyme, and avoiding tradeoffs in activity with different substrates, enabled us to obtain further improvements beyond the optimization plateau and evolve PTE variants that were overall improved by > 5000-fold with VX and by > 17 000-fold with RVX. The resulting variants also hydrolyze G-type nerve agents with high efficiency (GA, GB at k(cat)/K-M > 5 x 10(7) M-1 min(-1)) and can thus serve as candidates for broadspectrum nerve-agent prophylaxis and post-exposure therapy using low enzyme doses.
Khersonsky O. & Fleishman S. J.
(2017)
Protein Science.
26,
4,
p. 807-813
Allosteric regulation underlies living cells' ability to sense changes in nutrient and signaling-molecule concentrations, but the ability to computationally design allosteric regulation into non-allosteric proteins has been elusive. Allosteric-site design is complicated by the requirement to encode the relative stabilities of active and inactive conformations of the same protein in the presence and absence of both ligand and effector. To address this challenge, we used Rosetta to design the backbone of the flexible heavy-chain complementarity-determining region 3 (HCDR3), and used geometric matching and sequence optimization to place a Zn2+-coordination site in a fluorescein-binding antibody. We predicted that due to HCDR3's flexibility, the fluorescein-binding pocket would configure properly only upon Zn2+ application. We found that regulation by Zn2+ was reversible and sensitive to the divalent ion's identity, and came at the cost of reduced antibody stability and fluorescein-binding affinity. Fluorescein bound at an order of magnitude higher affinity in the presence of Zn2+ than in its absence, and the increase in fluorescein affinity was due almost entirely to faster fluorescein on-rate, suggesting that Zn2+ preorganized the antibody for fluorescein binding. Mutation analysis demonstrated the extreme sensitivity of Zn2+ regulation on the atomic details in and around the metal-coordination site. The designed antibody could serve to study how allosteric regulation evolved from non-allosteric binding proteins, and suggests a way to designing molecular sensors for environmental and biomedical targets.
Gaines J. C., Virrueta A., Buch D. A., Fleishman S. J., O'Hern C. S. & Regan L.
(2017)
Protein Engineering, Design and Selection.
30,
5,
p. 387-394
Protein core repacking is a standard test of protein modeling software. A recent study of six different modeling software packages showed that they are more successful at predicting side chain conformations of core compared to surface residues. All the modeling software tested have multicomponent energy functions, typically including contributions from solvation, electrostatics, hydrogen bonding and Lennard\u2013Jones interactions in addition to statistical terms based on observed protein structures. We investigated to what extent a simplified energy function that includes only stereochemical constraints and repulsive hard-sphere interactions can correctly repack protein cores. For single residue and collective repacking, the hard-sphere model accurately recapitulates the observed side chain conformations for Ile, Leu, Phe, Thr, Trp, Tyr and Val. This result shows that there are no alternative, sterically allowed side chain conformations of core residues. Analysis of the same set of protein cores using the Rosetta software suite revealed that the hard-sphere model and Rosetta perform equally well on Ile, Leu, Phe, Thr and Val; the hard-sphere model performs better on Trp and Tyr and Rosetta performs better on Ser. We conclude that the high prediction accuracy in protein cores obtained by protein modeling software and our simplified hard-sphere approach reflects the high density of protein cores and dominance of steric repulsion.
Campeotto I., Goldenzweig A., Davey J., Barfod L., Marshall J. M., Silk S. E., Wright K. E., Draper S. J., Higgins M. K. & Fleishman S. J.
(2017)
Proceedings of the National Academy of Sciences of the United States of America.
114,
5,
p. 998-1002
Many promising vaccine candidates from pathogenic viruses, bacteria, and parasites are unstable and cannot be produced cheaply for clinical use. For instance, Plasmodium falciparum reticulocyte-binding protein homolog 5 (PfRH5) is essential for erythrocyte invasion, is highly conserved among field isolates, and elicits antibodies that neutralize in vitro and protect in an animal model, making it a leading malaria vaccine candidate. However, functional RH5 is only expressible in eukaryotic systems and exhibits moderate temperature tolerance, limiting its usefulness in hot and low-income countries where malaria prevails. Current approaches to immunogen stabilization involve iterative application of rational or semirational design, random mutagenesis, and biochemical characterization. Typically, each round of optimization yields minor improvement in stability, and multiple rounds are required. In contrast, we developed a onestep design strategy using phylogenetic analysis and Rosetta atomistic calculations to design PfRH5 variantswith improved packing and surface polarity. To demonstrate the robustness of this approach, we tested three PfRH5 designs, all of which showed improved stability relative to wild type. The best, bearing 18 mutations relative to PfRH5, expressed in a folded form in bacteria at >1 mg of protein per L of culture, and had 10-15 °C higher thermal tolerance than wild type, while also retaining ligand binding and immunogenic properties indistinguishable from wild type, proving its value as an immunogen for a future generation of vaccines against the malaria blood stage. We envision that this efficient computational stability design methodology will also be used to enhance the biophysical properties of other recalcitrant vaccine candidates from emerging pathogens.
Current methods for antibody structure prediction rely on sequence homology to known structures. Although this strategy often yields accurate predictions, models can be stereo-chemically strained. Here, we present a fully automated algorithm, called AbPredict, that disregards sequence homology, and instead uses a Monte Carlo search for low-energy conformations built from backbone segments and rigid-body orientations that appear in antibody molecular structures. We find cases where AbPredict selects accurate loop templates with sequence identity as low as 10%, whereas the template of highest sequence identity diverges substantially from the query's conformation. Accordingly, in several cases reported in the recent Antibody Modeling Assessment benchmark, AbPredict models were more accurate than those from any participant, and the models' stereo-chemical quality was consistently high. Furthermore, in two blind cases provided to us by crystallographers prior to structure determination, the method achieved
Reichen C., Hansen S., Forzani C., Honegger A., Fleishman S. J., Zhou T., Parmeggiani F., Ernst P., Madhurantakam C., Ewald C., Mittl P. R., Zerbe O., Baker D., Caflisch A. & Plückthun A.
(2016)
Journal of Molecular Biology.
428,
22,
p. 4467-4489
Armadillo repeat proteins (ArmRPs) recognize their target peptide in extended conformation and bind, in a first approximation, two residues per repeat. Thus, they may form the basis for building a modular system, in which each repeat is complementary to a piece of the target peptide. Accordingly, preselected repeats could be assembled into specific binding proteins on demand and thereby avoid the traditional generation of every new binding molecule by an independent selection from a library. Stacked armadillo repeats, each consisting of 42 aa arranged in three α-helices, build an elongated superhelical structure. Here, we analyzed the curvature variations in natural ArmRPs and identified a repeat pair from yeast importin-α as having the optimal curvature geometry that is complementary to a peptide over its whole length. We employed a symmetric in silico design to obtain a uniform sequence for a stackable repeat while maintaining the desired curvature geometry. Computationally designed ArmRPs (dArmRPs) had to be stabilized by mutations to remove regions of higher flexibility, which were identified by molecular dynamics simulations in explicit solvent. Using an N-capping repeat from the consensus-design approach, two different crystal structures of dArmRP were determined. Although the experimental structures of dArmRP deviated from the designed curvature, the insertion of the most conserved binding pockets of natural ArmRPs onto the surface of dArmRPs resulted in binders against the expected peptide with low nanomolar affinities, similar to the binders from the consensus-design series.
Assaf E., Weinstein J. J., Prilusky J. & Fleishman S.
(2016)
Proceedings of the National Academy of Sciences of the United States of America.
113,
37,
p. 10340-10345
The energetics of membrane-protein interactions determine protein topology and structure: hydrophobicity drives the insertion of helical segments into the membrane, and positive charges orient the protein with respect to the membrane plane according to the positive-inside rule. Until recently, however, quantifying these contributions met with difficulty, precluding systematic analysis of the energetic basis for membrane-protein topology. We recently developed the dsTβL method, which uses deep sequencing and in vitro selection of segments inserted into the bacterial plasma membrane to infer insertion-energy profiles for each amino acid residue across the membrane, and quantified the insertion contribution from hydrophobicity and the positive-inside rule. Here, we present a topology-prediction algorithm called TopGraph, which is based on a sequence search for minimum dsTβL insertion energy. Whereas the average insertion energy assigned by previous experimental scales was positive (unfavorable), the average assigned by TopGraph in a nonredundant set is -6.9 kcal/mol. By quantifying contributions from both hydrophobicity and the positive-inside rule we further find that in about half of large membrane proteins polar segments are inserted into the membrane to position more positive charges in the cytoplasm, suggesting an interplay between these two energy contributions. Because membrane-embedded polar residues are crucial for substrate binding and conformational change, the results implicate the positive-inside rule in determining the architectures of membrane-protein functional sites. This insight may aid structure prediction, engineering, and design of membrane proteins. TopGraph is available online (topgraph.weizmann.ac.il).
Goldenzweig A., Goldsmith M., Hill S. E., Gertman O., Laurino P., Ashani Y., Dym O., Unger T., Albeck S., Prilusky J., Lieberman R. L., Aharoni A., Silman I., Sussman J., Tawfik D. & Fleishman S. J.
(2016)
Molecular Cell.
63,
2,
p. 337-346
Upon heterologous overexpression, many proteins misfold or aggregate, thus resulting in low functional yields. Human acetylcholinesterase (hAChE), an enzyme mediating synaptic transmission, is a typical case of a human protein that necessitates mammalian systems to obtain functional expression. We developed a computational strategy and designed an AChE variant bearing 51 mutations that improved core packing, surface polarity, and backbone rigidity. This variant expressed at ∼2,000-fold higher levels in E. coli compared to wild-type hAChE and exhibited 20°C higher thermostability with no change in enzymatic properties or in the active-site configuration as determined by crystallography. To demonstrate broad utility, we similarly designed four other human and bacterial proteins. Testing at most three designs per protein, we obtained enhanced stability and/or higher yields of soluble and active protein in E. coli. Our algorithm requires only a 3D structure and several dozen sequences of naturally occurring homologs, and is available at http://pross.weizmann.ac.il.
Khersonsky O. & Fleishman S. J.
(2016)
Protein Science.
p. 1179-1187
We protein engineers are ambivalent about evolution: on the one hand, evolution inspires us with myriad examples of biomolecular binders, sensors, and catalysts; on the other hand, these examples are seldom well-adapted to the engineering tasks we have in mind. Protein engineers have therefore modified natural proteins by point substitutions and fragment exchanges in an effort to generate new functions. A counterpoint to such design efforts, which is being pursued now with greater success, is to completely eschew the starting materials provided by nature and to design new protein functions from scratch by using de novo molecular modeling and design. While important progress has been made in both directions, some areas of protein design are still beyond reach. To this end, we advocate a synthesis of these two strategies: by using design calculations to both recombine and optimize fragments from natural proteins, we can build stable and as of yet un-sampled structures, thereby granting access to an expanded repertoire of conformations and desired functions. We propose that future methods that combine phylogenetic analysis, structure and sequence bioinformatics, and atomistic modeling may well succeed where any one of these approaches has failed on its own.
Over the past decade, scientists have made exciting progress in designing protein folds entirely on the computer and then successfully synthesizing them in the laboratory (1–5). These designer proteins had the same structure in experiment as in the model and were very stable; however, they lacked important structural features seen in protein interfaces and enzyme active sites. In two reports on pages 680 and 687 of this issue, Boyken et al. (6) and Jacobs et al. (7) use the Rosetta biomolecular modeling software to design proteins that include some of these features. Experiments show that these new designs retain high structural precision and stability.
Assaf E., Weinstein J., Biran I., Fridman Y., Bibi E. & Fleishman S.
(2016)
eLife.
5,
JANUARY2016,
e12125.
Insertion of helix-forming segments into the membrane and their association determines the structure, function, and expression levels of all plasma membrane proteins. However, systematic and reliable quantification of membrane-protein energetics has been challenging. We developed a deep mutational scanning method to monitor the effects of hundreds of point mutations on helix insertion and self-association within the bacterial inner membrane. The assay quantifies insertion energetics for all natural amino acids at 27 positions across the membrane, revealing that the hydrophobicity of biological membranes is significantly higher than appreciated. We further quantitate the contributions to membrane-protein insertion from positively charged residues at the cytoplasm-membrane interface and reveal large and unanticipated differences among these residues. Finally, we derive comprehensive mutational landscapes in the membrane domains of Glycophorin A and the ErbB2 oncogene, and find that insertion and self-association are strongly coupled in receptor homodimers.
Grossman I., Ilani T., Fleishman S. & Fass D.
(2016)
Protein engineering, design & selection : PEDS.
29,
4,
p. 135-147
The secreted disulfide catalyst Quiescin sulfhydryl oxidase-1 (QSOX1) affects extracellular matrix organization and is overexpressed in various adenocarcinomas and associated stroma. Inhibition of extracellular human QSOX1 by a monoclonal antibody decreased tumor cell migration in a cell co-culture model and hence may have therapeutic potential. However, the species specificity of the QSOX1 monoclonal antibody has been a setback in assessing its utility as an anti-metastatic agent in vivo, a common problem in the antibody therapy industry. We therefore used structurally guided engineering to expand the antibody species specificity, improving its affinity toward mouse QSOX1 by at least four orders of magnitude. A crystal structure of the re-engineered variant, complexed with its mouse antigen, revealed that the antibody accomplishes dual-species targeting through altered contacts between its heavy and light chains, plus replacement of bulky aromatics by flexible side chains and versatile water-bridged polar interactions. In parallel, we produced a surrogate antibody targeting mouse QSOX1 that exhibits a new QSOX1 inhibition mode. This set of three QSOX1 inhibitory antibodies is compatible with various mouse models for pre-clinical trials and biotechnological applications. In this study we provide insights into structural blocks to cross-reactivity and set up guideposts for successful antibody design and re-engineering.
Computational design of protein function has made substantial progress, generating new enzymes, binders, inhibitors, and nanomaterials not previously seen in nature. However, the ability to design new protein backbones for functionessential to exert control over all polypeptide degrees of freedomremains a critical challenge. Most previous attempts to design new backbones computed the mainchain from scratch. Here, instead, we describe a combinatorial backbone and sequence optimization algorithm called AbDesign, which leverages the large number of sequences and experimentally determined molecular structures of antibodies to construct new antibody models, dock them against target surfaces and optimize their sequence and backbone conformation for high stability and binding affinity. We used the algorithm to produce antibody designs that target the same molecular surfaces as nine natural, high-affinity antibodies; in five cases interface sequence identity is above 30%, and in four of those the backbone conformation at the core of the antibody binding surface is within 1 angstrom root-mean square deviation from the natural antibodies. Designs recapitulate polar interaction networks observed in natural complexes, and amino acid sidechain rigidity at the designed binding surface, which is likely important for affinity and specificity, is high compared to previous design studies. In designed anti-lysozyme antibodies, complementarity-determining regions (CDRs) at the periphery of the interface, such as L1 and H2, show greater backbone conformation diversity than the CDRs at the core of the interface, and increase the binding surface area compared to the natural antibody, potentially enhancing affinity and specificity.
Cameron K., Weinstein J., Zhivin O., Bule P., Fleishman S., Alves V., Gilbert H., Ferreira L., Fontes C., Bayer E. & Najmudin S.
(2015)
Journal of Biological Chemistry.
290,
26,
p. 16215-16225
Background: Cellulosomal cohesin-dockerin types are reversed in Bacteroides cellulosolvens. Results: Combined crystallographic and computational approaches of a lone cohesin yielded a structural model of the cohesin-dockerin complex that was verified experimentally. Conclusion: The dockerin dual-binding mode is not exclusive to enzyme integration into cellulosomes; it also characterizes cell-surface attachment. Significance: This combined approach provides a platform for generating testable hypotheses of the high affinity cohesin-dockerin interaction. Cohesin-dockerin interactions orchestrate the assembly of one of nature's most elaborate multienzyme complexes, the cellulosome. Cellulosomes are produced exclusively by anaerobic microbes and mediate highly efficient hydrolysis of plant structural polysaccharides, such as cellulose and hemicellulose. In the canonical model of cellulosome assembly, type I dockerin modules of the enzymes bind to reiterated type I cohesin modules of a primary scaffoldin. Each type I dockerin contains two highly conserved cohesin-binding sites, which confer quaternary flexibility to the multienzyme complex. The scaffoldin also bears a type II dockerin that anchors the entire complex to the cell surface by binding type II cohesins of anchoring scaffoldins. In Bacteroides cellulosolvens, however, the organization of the cohesin-dockerin types is reversed, whereby type II cohesin-dockerin pairs integrate the enzymes into the primary scaffoldin, and type I modules mediate cellulosome attachment to an anchoring scaffoldin. Here, we report the crystal structure of a type I cohesin from B. cellulosolvens anchoring scaffoldin ScaB to 1.84-angstrom resolution. The structure resembles other type I cohesins, and the putative dockerin-binding site, centered at -strands 3, 5, and 6, is likely to be conserved in other B. cellulosolvens type I cohesins. Combined computational modeling, mutagenesis, and affinity-based binding studies revealed similar hydrogen-bonding networks between putative Ser/Asp recognition residues in the dockerin at positions 11/12 and 45/46, suggesting that a dual-binding mode is not exclusive to the integration of enzymes into primary cellulosomes but can also characterize polycellulosome assembly and cell-surface attachment. This general approach may provide valuable structural information of the cohesin-dockerin interface, in lieu of a definitive crystal structure.
Oftedal B., Hellesen A., Erichsen M., Bratland E., Vardi A., Perheentupa J., Kemp E., Fiskerstrand T., Viken M., Weetman A., Fleishman S., Banka S., Newman W., Sewell W., Sozaeva L., Zayats T., Haugarvoll K., Orlova E., Haavik J., Johansson S., Knappskog P., Lovas K., Wolff A., Abramson J. & Husebye E.
(2015)
Immunity.
42,
6,
p. 1185-1196
The autoimmune regulator (AIRE) gene is crucial forestablishing central immunological tolerance and preventing autoimmunity. Mutations in AIRE cause a rare autosomal-recessive disease, autoimmune polyendocrine syndrome type 1 (APS-1), distinguished by multi-organ autoimmunity. We have identified multiple cases and families with mono-allelic mutations in the first plant homeodomain (PHD1) zinc finger of AIRE that followed dominant inheritance, typically characterized by later onset, milder phenotypes, and reduced penetrance compared to classical APS-1. These missense PHD1 mutations suppressed gene expression driven by wild-type AIRE in a dominant-negative manner, unlike CARD or truncated AIRE mutants that lacked such dominant capacity. Exome array analysis revealed that the PHD1 dominant mutants were found with relatively high frequency (>0.0008) in mixed populations. Our results provide insight into the molecular action of AIRE and demonstrate that disease-causing mutations in the AIRE locus are more common than previously appreciated and cause more variable autoimmune phenotypes.
Warszawski S., Netzer R., Tawfik D. S. & Fleishman S. J.
(2014)
Journal of Molecular Biology.
426,
24,
p. 4125-4138
To carry out their activities, biological macromolecules balance different physical traits, such as stability, interaction affinity, and selectivity. How such often opposing traits are encoded in a macromolecular system is critical to our understanding of evolutionary processes and ability to design new molecules with desired functions. We present a framework for constraining design simulations to balance different physical characteristics. Each trait is represented by the equilibrium fractional occupancy of the desired state relative to its alternatives, ranging from none to full occupancy, and the different traits are combined using Boolean operators to effect a "fuzzy"-logic language for encoding any combination of traits. In another paper, we presented a new combinatorial backbone design algorithm AbDesign where the fuzzy-logic framework was used to optimize protein backbones and sequences for both stability and binding affinity in antibody-design simulation. We now extend this framework and find that fuzzy-logic design simulations reproduce sequence and structure design principles seen in nature to underlie exquisite specificity on the one hand and multispecificity on the other hand. The fuzzy-logic language is broadly applicable and could help define the space of tolerated and beneficial mutations in natural biomolecular systems and design artificial molecules that encode complex characteristics.
Strauch E. M., Fleishman S. J. & Baker D.
(2014)
Proceedings of the National Academy of Sciences of the United States of America.
111,
2,
p. 675-680
Computational design provides the opportunity to program protein-protein interactions for desired applications. We used denovo protein interface design to generate a pH-dependent Fc domain binding protein that buries immunoglobulin G (IgG) His-433.Using next-generation sequencing of naive and selected pools ofa library of design variants, we generated a molecular footprint ofthe designed binding surface, confirming the binding mode andguiding further optimization of the balance between affinity andpH sensitivity. In biolayer interferometry experiments, the optimized design binds IgG with a Kd of ~4 nM at pH 8.2, and approximately 500-fold more weakly at pH 5.5. The protein is extremelystable, heat-resistant and highly expressed in bacteria, and allowspH-based control of binding for IgG affinity purification and diagnostic devices.
Schreiber G. & Fleishman S. J.
(2013)
Current Opinion in Structural Biology.
23,
6,
p. 903-910
A long-term aim of computational design is to generate specific protein-protein interactions at desired affinity, specificity, and kinetics. The past three years have seen the first reports on atomically accurate de novo interactions. These were based on advances in design algorithms and the ability to harness high-throughput experimental characterization of design variants to optimize binding. Current state-of-the-art in computational design lacks precision, and therefore requires intensive experimental optimization to achieve parity with natural binders. Recent successes (and failures) point the way to future progress in design methodology that would enable routine and robust design of binders and inhibitors, while also shedding light on the essential features of biomolecular recognition.
Moretti R., Fleishman S. J., Agius R., Torchala M., Bates P. A., Kastritis P. L., Rodrigues J. P. G. L. M., Trellet M., Bonvin A. M. J. J., Cui M., Rooman M., Gillis D., Dehouck Y., Moal I., Romero-Durana M., Perez-Cano L., Pallara C., Jimenez B. & Fernandez-Recio J.
(2013)
Proteins-Structure Function And Bioinformatics.
81,
11,
p. 1980-1987
Community-wide blind prediction experiments such as CAPRI and CASP provide an objective measure of the current state of predictive methodology. Here we describe a community-wide assessment of methods to predict the effects of mutations on protein-protein interactions. Twenty-two groups predicted the effects of comprehensive saturation mutagenesis for two designed influenza hemagglutinin binders and the results were compared with experimental yeast display enrichment data obtained using deep sequencing. The most successful methods explicitly considered the effects of mutation on monomer stability in addition to binding affinity, carried out explicit side-chain sampling and backbone relaxation, evaluated packing, electrostatic, and solvation effects, and correctly identified around a third of the beneficial mutations. Much room for improvement remains for even the best techniques, and large-scale fitness landscapes should continue to provide an excellent test bed for continued evaluation of both existing and new prediction methodologies.
Procko E., Hedman R., Hamilton K., Seetharaman J., Fleishman S. J., Su M., Aramini J., Kornhaber G., Hunt J. F., Tong L., Montelione G. T. & Baker D.
(2013)
Journal of Molecular Biology.
425,
18,
p. 3563-3575
While there has been considerable progress in designing protein-protein interactions, the design of proteins that bind polar surfaces is an unmet challenge. We describe the computational design of a protein that binds the acidic active site of hen egg lysozyme and inhibits the enzyme. The design process starts with two polar amino acids that fit deep into the enzyme active site, identifies a protein scaffold that supports these residues and is complementary in shape to the lysozyme active-site region, and finally optimizes the surrounding contact surface for high-affinity binding. Following affinity maturation, a protein designed using this method bound lysozyme with low nanomolar affinity, and a combination of NMR studies, crystallography, and knockout mutagenesis confirmed the designed binding surface and orientation. Saturation mutagenesis with selection and deep sequencing demonstrated that specific designed interactions extending well beyond the centrally grafted polar residues are critical for high-affinity binding.
Khare S. D. & Fleishman S. J.
(2013)
FEBS Letters.
587,
8,
p. 1147-1154
Recent years have seen the first applications of computational protein design to generate novel catalysts, binding pairs of proteins, protein inhibitors, and large oligomeric assemblies. At their core these methods rely on a similar hybrid energy function, composed of physics-based and database-derived terms, while different sequence and conformational sampling approaches are used for each design category. Although these are first steps for the computational design of novel function, crystal structures and biochemical characterization already point out where success and failure are likely in the application of protein design. Contrasting failed and successful design attempts has been used to diagnose deficiencies in the approaches and in the underlying hybrid energy function. In this manner, design provides an inherent mechanism by which crucial information is obtained on pressing areas where focused efforts to improve methods are needed. Of the successful designs, many feature pre-organized sites that are poised to perform their intended function, and improvements often result from disfavoring alternative functionally suboptimal states. These rapid developments and fundamental insights obtained thus far promise to make computational design of novel molecular function general, robust, and routine.
Fridman Y., Gur E., Fleishman S. J. & Aharoni A.
(2013)
Proteins: Structure, Function and Bioinformatics.
81,
2,
p. 341-348
Increasing the affinity of binding proteins is invaluable for basic and applied biological research. Currently, directed protein evolution experiments are the main approach for generating such proteins through the construction and screening of large mutant libraries. Proliferating cell nuclear antigen (PCNA) is an essential hub protein that interacts with many different partners to tightly regulate DNA replication and repair in all eukaryotes. Here, we used computational design to generate human PCNA mutants with enhanced affinity for several different partners. We identified double mutations in PCNA, outside the main partner binding site, that were predicted to increase PCNA-partner binding affinities compared to the wild-type protein by forming additional hydrophobic interactions with conserved residues in the PCNA partners. Affinity increases were experimentally validated with four different PCNA partners, demonstrating that computational design can reveal unexpected regions where affinity enhancements in natural systems are possible. The designed PCNA mutants can be used as a valuable tool for further examination of the regulation of PCNA-partner interactions during DNA replication and repair both in vitro and in vivo. More broadly, the ability to engineer affinity increases toward several PCNA partners suggests that interaction affinity is not an evolutionarily optimized trait of this system.
Whitehead T. A., Baker D. & Fleishman S. J.
(2013)
Methods in Protein Design
.
p. 1-19
Computational design of novel protein binders has recently emerged as a useful technique to study biomolecular recognition and generate molecules for use in biotechnology, research, and biomedicine. Current limitations in computational design methodology have led to the adoption of high-throughput screening and affinity maturation techniques to diagnose modeling inaccuracies and generate high activity binders. Here, we scrutinize this combination of computational and experimental aspects and propose areas for future methodological improvements.
Whitehead T. A., Chevalier A., Song Y., Dreyfus C., Fleishman S. J., De Mattos C., Myers C. A., Kamisetty H., Blair P., Wilson I. A. & Baker D.
(2012)
Nature Biotechnology.
30,
6,
p. 543-548
We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.
Fleishman S. J. & Baker D.
(2012)
Cell.
149,
2,
p. 262-273
The folding of natural biopolymers into unique three-dimensional structures that determine their function is remarkable considering the vast number of alternative states and requires a large gap in the energy of the functional state compared to the many alternatives. This Perspective explores the implications of this energy gap for computing the structures of naturally occurring biopolymers, designing proteins with new structures and functions, and optimally integrating experiment and computation in these endeavors. Possible parallels between the generation of functional molecules in computational design and natural evolution are highlighted.
Wojdyla J. A., Fleishman S. J., Baker D. & Kleanthous C.
(2012)
Journal of Molecular Biology.
417,
1-2,
p. 79-94
How proteins achieve high-affinity binding to a specific protein partner while simultaneously excluding all others is a major biological problem that has important implications for protein design. We report the crystal structure of the ultra-high-affinity protein-protein complex between the endonuclease domain of colicin E2 and its cognate immunity (Im) protein, Im2 (Kd∼ 10-15 M), which, by comparison to previous structural and biophysical data, provides unprecedented insight into how high affinity and selectivity are achieved in this model family of protein complexes. Our study pinpoints the role of structured water molecules in conjoining hotspot residues that govern stability with residues that control selectivity. A key finding is that a single residue, which in a noncognate context massively destabilizes the complex through frustration, does not participate in specificity directly but rather acts as an organizing center for a multitude of specificity interactions across the interface, many of which are water mediated.
Fleishman S. J., Whitehead T. A., Strauch E., Corn J. E., Qin S., Zhou H., Mitchell J. C., Demerdash O. N. A., Takeda-Shitaka M., Terashi G., Moal I. H., Li X., Bates P. A., Zacharias M., Park H., Ko J., Lee H., Seok C., Bourquard T., Bernauer J., Poupon A., Aze J., Soner S., Ovali S. K., Ozbek P., Ben Tal N., Haliloglu T., Hwang H., Vreven T., Pierce B. G., Weng Z., Perez-Cano L., Pons C., Fernandez-Recio J., Jiang F., Yang F., Gong X., Cao L., Xu X., Liu B., Wang P., Li C., Wang C., Robert C. H., Guharoy M., Liu S., Huang Y., Li L., Guo D., Chen Y., Xiao Y., London N., Itzhaki Z., Schueler-Furman O., Inbar Y., Potapov V., Cohen M., Schreiber G., Tsuchiya Y., Kanamori E., Standley D. M., Nakamura H., Kinoshita K., Driggers C. M., Hall R. G., Morgan J. L., Hsu V. L., Zhan J., Yang Y., Zhou Y., Kastritis P. L., Bonvin A. M. J. J., Zhang W., Camacho C. J., Kilambi K. P., Sircar A., Gray J. J., Ohue M., Uchikoga N., Matsuzaki Y., Ishida T., Akiyama Y., Khashan R., Bush S., Fouches D., Tropsha A., Esquivel-Rodriguez J., Kihara D., Stranges P. B., Jacak R., Kuhlman B., Huang S., Zou X., Wodak S. J., Janin J. & Baker D.
(2011)
Journal of Molecular Biology.
414,
2,
p. 289-302
The CAPRI (Critical Assessment of Predicted Interactions) and CASP (Critical Assessment of protein Structure Prediction) experiments have demonstrated the power of community-wide tests of methodology in assessing the current state of the art and spurring progress in the very challenging areas of protein docking and structure prediction. We sought to bring the power of community-wide experiments to bear on a very challenging protein design problem that provides a complementary but equally fundamental test of current understanding of protein-binding thermodynamics. We have generated a number of designed protein-protein interfaces with very favorable computed binding energies but which do not appear to be formed in experiments, suggesting that there may be important physical chemistry missing in the energy calculations. A total of 28 research groups took up the challenge of determining what is missing: we provided structures of 87 designed complexes and 120 naturally occurring complexes and asked participants to identify energetic contributions and/or structural features that distinguish between the two sets. The community found that electrostatics and solvation terms partially distinguish the designs from the natural complexes, largely due to the nonpolar character of the designed interactions. Beyond this polarity difference, the community found that the designed binding surfaces were, on average, structurally less embedded in the designed monomers, suggesting that backbone conformational rigidity at the designed surface is important for realization of the designed function. These results can be used to improve computational design strategies, but there is still much to be learned; for example, one designed complex, which does form in experiments, was classified by all metrics as a nonbinder.
Protein-protein interactions play critical roles in biology, and computational design of interactions could be useful in a range of applications. We describe in detail a general approach to de novo design of protein interactions based on computed, energetically optimized interaction hotspots, which was recently used to produce high-affinity binders of influenza hemagglutinin. We present several alternative approaches to identify and build the key hotspot interactions within both core secondary structural elements and variable loop regions and evaluate the method's performance in natural-interface recapitulation. We show that the method generates binding surfaces that are more conformationally restricted than previous design methods, reducing opportunities for off-target interactions.
Fleishman S. J., Leaver-Fay A., Corn J. E., Strauch E. M., Khare S. D., Koga N., Ashworth J., Murphy P., Richter F., Lemmon G., Meiler J. & Baker D.
(2011)
PLoS ONE.
6,
6,
e20161.
Macromolecular modeling and design are increasingly useful in basic research, biotechnology, and teaching. However, the absence of a user-friendly modeling framework that provides access to a wide range of modeling capabilities is hampering the wider adoption of computational methods by non-experts. RosettaScripts is an XML-like language for specifying modeling tasks in the Rosetta framework. RosettaScripts provides access to protocol-level functionalities, such as rigid-body docking and sequence redesign, and allows fast testing and deployment of complex protocols without need for modifying or recompiling the underlying C++ code. We illustrate these capabilities with RosettaScripts protocols for the stabilization of proteins, the generation of computationally constrained libraries for experimental selection of higher-affinity binding proteins, loop remodeling, small-molecule ligand docking, design of ligand-binding proteins, and specificity redesign in DNA-binding proteins.
We describe a general computational method for designing proteins that bind a surface patch of interest on a target macromolecule. Favorable interactions between disembodied amino acid residues and the target surface are identified and used to anchor de novo designed interfaces. The method was used to design proteins that bind a conserved surface patch on the stem of the influenza hemagglutinin (HA) from the 1918 H1N1 pandemic virus. After affinity maturation, two of the designed proteins, HB36 and HB80, bind H1 and H5 HAs with low nanomolar affinity. Further, HB80 inhibits the HA fusogenic conformational changes induced at low pH. The crystal structure of HB36 in complex with 1918/H1 HA revealed that the actual binding interface is nearly identical to that in the computational design model. Such designed binding proteins may be useful for both diagnostics and therapeutics.
Fleishman S. J., Khare S. D., Koga N. & Baker D.
(2011)
Protein Science.
20,
4,
p. 753-757
Protein-design methodology can now generate models of protein structures and interfaces with computed energies in the range of those of naturally occurring structures. Comparison of the properties of native structures and complexes to isoenergetic design models can provide insight into the properties of the former that reflect selection pressure for factors beyond the energy of the native state. We report here that sidechains in native structures and interfaces are significantly more constrained than designed interfaces and structures with equal computed binding energy or stability, which may reflect selection against potentially deleterious non-native interactions. Published by Wiley-Blackwell.
Leaver-Fay A., Tyka M., Lewis S. M., Lange O. F., Thompson J., Jacak R., Kaufman K., Renfrew P. D., Smith C. A., Sheffler W., Davis I. W., Cooper S., Treuille A., Mandell D. J., Richter F., Ban Y. E. A., Fleishman S. J., Corn J. E., Kim D. E., Lyskov S., Berrondo M., Mentzer S., Popović Z., Havranek J. J., Karanicolas J., Das R., Meiler J., Kortemme T., Gray J. J., Kuhlman B., Baker D. & Bradley P.
(2011)
Computer Methods
: Part C
.
C ed.
Vol. C.
p. 545-574
We have recently completed a full rearchitecturing of the Rosetta molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy-to-use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as Rosetta3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This chapter describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.
Fleishman S. J., Corn J. E., Strauch E. M., Whitehead T. A., Andre I., Thompson J., Havranek J. J., Das R., Bradley P. & Baker D.
(2010)
Proteins: Structure, Function and Bioinformatics.
78,
15,
p. 3212-3218
Modeling the conformational changes that occur on binding of macromolecules is an unsolved challenge. In previous rounds of the Critical Assessment of PRediction of Interactions (CAPRI), it was demonstrated that the Rosetta approach to macromolecular modeling could capture side chain conformational changes on binding with high accuracy. In rounds 13-19 we tested the ability of various backbone remodeling strategies to capture the main-chain conformational changes observed during binding events. These approaches span a wide range of backbone motions, from limited refinement of loops to relieve clashes in homologous docking, through extensive remodeling of loop segments, to large-scale remodeling of RNA. Although the results are encouraging, major improvements in sampling and energy evaluation are clearly required for consistent high accuracy modeling. Analysis of our failures in the CAPRI challenges suggest that conformational sampling at the termini of exposed beta strands is a particularly pressing area for improvement.
We present a large-scale approach to investigate the functional consequences of sequence variation in a protein. The approach entails the display of hundreds of thousands of protein variants, moderate selection for activity and high-throughput DNA sequencing to quantify the performance of each variant. Using this strategy, we tracked the performance of >600,000 variants of a human WW domain after three and six rounds of selection by phage display for binding to its peptide ligand. Binding properties of these variants defined a high-resolution map of mutational preference across the WW domain; each position had unique features that could not be captured by a few representative mutations. Our approach could be applied to many in vitro or in vivo protein assays, providing a general means for understanding how protein function relates to sequence.
Meenan N. A., Sharma A., Fleishman S. J., MacDonald C. J., Morel B., Boetzel R., Moore G. R., Baker D. & Kleanthous C.
(2010)
Proceedings of the National Academy of Sciences of the United States of America.
107,
22,
p. 10080-10085
High-affinity, high-selectivity protein-protein interactions that are critical for cell survival present an evolutionary paradox: How does selectivity evolve when acquired mutations risk a lethal loss of high-affinity binding? A detailed understanding of selectivity in such complexes requires structural information on weak, non-cognate complexes which can be difficult to obtain due to their transient and dynamic nature. Using NMR-based docking as a guide, we deployed a disulfide-trapping strategy on a noncognate complex between the colicin E9 endonuclease (E9 DNase) and immunity protein 2 (Im2), which is seven orders of magnitude weaker binding than the cognate femtomolar E9 DNase-Im9 interaction. The 1.77 Å crystal structure of the E9 DNase-Im2 complex reveals an entirely noncovalent interface where the intersubunit disulfide merely supports the crystal lattice. In combination with computational alanine scanning of interfacial residues, the structure reveals that the driving force for binding is so strong that a severely unfavorable specificity contact is tolerated at the interface and as a result the complex becomes weakened through "frustration." As well as rationalizing past mutational and thermodynamic data, comparing our noncognate structure with previous cognate complexes highlights the importance of loop regions in developing selectivity and accentuates the multiple roles of buried water molecules that stabilize, ameliorate, or aggravate interfacial contacts. The study provides direct support for dual-recognition in colicin DNase-Im protein complexes and shows that weakened noncognate complexes are primed for high-affinity binding, which can be achieved by economical mutation of a limited number of residues at the interface.
McBeth C., Seamons A., Pizarro J. C., Fleishman S. J., Baker D., Kortemme T., Goverman J. M. & Strong R. K.
(2008)
Journal of Molecular Biology.
375,
5,
p. 1306-1319
We report crystal structures of a negatively selected T cell receptor (TCR) that recognizes two I-Au-restricted myelin basic protein peptides and one of its peptide/major histocompatibility complex (pMHC) ligands. Unusual complementarity-determining region (CDR) structural features revealed by our analyses identify a previously unrecognized mechanism by which the highly variable CDR3 regions define ligand specificity. In addition to the pMHC contact residues contributed by CDR3, the CDR3 residues buried deep within the Vα/Vβ interface exert indirect effects on recognition by influencing the Vα/Vβ interdomain angle. This phenomenon represents an additional mechanism for increasing the potential diversity of the TCR repertoire. Both the direct and indirect effects exerted by CDR residues can impact global TCR/MHC docking. Analysis of the available TCR structures in light of these results highlights the significance of the Vα/Vβ interdomain angle in determining specificity and indicates that TCR/pMHC interface features do not distinguish autoimmune from non-autoimmune class II-restricted TCRs.
Fuchs A., Martin-Galiano A. J., Kalman M., Fleishman S., Ben-Tal N. & Frishman D.
(2007)
Bioinformatics.
23,
24,
p. 3312-3319
Motivation: The analysis of co-evolving residues has been exhaustively evaluated for the prediction of intramolecular amino acid contacts in soluble proteins. Although a variety of different methods for the detection of these co-evolving residues have been developed, the fraction of correctly predicted contacts remained insufficient for their reliable application in the construction of structural models. Membrane proteins, which constitute between one-fourth and one-third of all proteins in an organism, were only considered in few individual case studies. Results: We present the first general study of correlated mutations in α-helical membrane proteins. Using seven different prediction algorithms, we extracted co-evolving residues for 14 membrane proteins having a solved 3D structure. On average, distances between correlated pairs of residues lying on different transmembrane segments were found to be significantly smaller compared to a random prediction. Covariation of residues was frequently found in direct sequence neighborhood to helix-helix contacts. Based on the results obtained from individual prediction methods, we constructed a consensus prediction for every protein in the dataset that combines obtained correlations from different prediction algorithms and simultaneously removes likely false positives. Using this consensus prediction, 53% of all predicted residue pairs were found within one helix turn of an observed helix-helix contact. Based on the combination of co-evolving residues detected with the four best prediction algorithms, interacting helices could be predicted with a specificity of 83% and sensitivity of 42%.
Wang C., Schueler-Furman O., Andre I., London N., Fleishman S. J., Bradley P., Qian B. & Baker D.
(2007)
Proteins: Structure, Function and Genetics.
69,
4,
p. 758-763
A challenge in protein-protein docking is to account for the conformational changes in the monomers that occur upon binding. The RosettaDock method, which incorporates sidechain flexibility but keeps the backbone fixed, was found in previous CAPRI rounds (4 and 5) to generate docking models with atomic accuracy, provided that conformational changes were mainly restricted to protein sidechains. In the recent rounds of CAPRI (6-12), large backbone conformational changes occur upon binding for several target complexes. To address these challenges, we explicitly introduced backbone flexibility in our modeling procedures by combining rigid-body docking with protein structure prediction techniques such as modeling variable hops and building homology models. Encouragingly, using this approach we were able to correctly predict a significant backbone conformational change of an interface loop for Target 20 (12 Å rmsd between those in the unbound monomer and complex structures), but accounting for backbone flexibility in protein-protein docking is still very challenging because of the significantly larger conformational space, which must be surveyed. Motivated by these CAPRI challenges, we have made progress in reformulating RosettaDock using a "fold-tree" representation, which provides a general framework for treating a wide variety of flexible-backbone docking problems.
Enosh A., Fleishman S. J., Ben-Tal N. & Halperin D.
(2007)
Bioinformatics.
23,
2,
p. e212-e218
Motivation: Motion in transmembrane (TM) proteins plays an essential role in a variety of biological phenomena. Thus, developing an automated method for predicting and simulating motion in this class of proteins should result in an increased level of understanding of crucial physiological mechanisms. We have developed an algorithm for predicting and simulating motion in TM proteins of the α-helix bundle type. Our method employs probabilistic motion-planning techniques to suggest possible collision-free motion paths. The resulting paths are ranked according to the quality of the van der Waals interactions between the TM helices. Our algorithm considers a wide range of degrees of freedom (dofs) involved in the motion, including external and internal moves. However, in order to handle the vast dimensionality of the problem, we employ some constraints on these dofs in a way that is unlikely to rule out the native motion of the protein. Our algorithm simulates the motion, including all the dofs, and automatically produces a movie that demonstrates it. Results: Overexpression of the RTK ErbB2 was implicated in causing a variety of human cancers. Recently, a molecular mechanism for rotation-coupled activation of the receptor was suggested. We applied our algorithm to investigate the TM domain of this protein, and compared our results with this mechanism. A motion pathway that was similar to the proposed mechanism ranked first, and motions with partial overlap to this pathway followed in rank order. In addition, we conducted a negative-control computational-experiment using Glycophorin A. Our results confirmed the immobility of this TM protein, resulting in degenerate paths comprising native-like conformations.
Fleishman S. J., Harrington S. E., Enosh A., Halperin D., Tate C. G. & Ben-Tal N.
(2006)
Journal of Molecular Biology.
364,
1,
p. 54-67
Small multidrug resistance (SMR) transporters contribute to bacterial resistance by coupling the efflux of a wide range of toxic aromatic cations, some of which are commonly used as antibiotics and antiseptics, to proton influx. EmrE is a prototypical small multidrug resistance transporter comprising four transmembrane segments (M1-M4) that forms dimers. It was suggested recently that EmrE molecules in the dimer have different topologies, i.e. monomers have opposite orientations with respect to the membrane plane. A 3-D structure of EmrE acquired by electron cryo-microscopy (cryo-EM) at 7.5 Å resolution in the membrane plane showed that parts of the structure are related by quasi-symmetry. We used this symmetry relationship, combined with sequence conservation data, to assign the transmembrane segments in EmrE to the densities seen in the cryo-EM structure. A Cα model of the transmembrane region was constructed by considering the evolutionary conservation pattern of each helix. The model is validated by much of the biochemical data on EmrE with most of the positions that were identified as affecting substrate translocation being located around the substrate-binding cavity. A suggested mechanism for proton-coupled substrate translocation in small multidrug resistance antiporters provides a mechanistic rationale to the experimentally observed inverted topology.
Fleishman S. J., Sabag A. D., Ophir E., Avraham K. B. & Ben-Tal N.
(2006)
Journal of Biological Chemistry.
281,
39,
p. 28958-28963
Gap junctions form intercellular channels that mediate metabolic and electrical signaling between neighboring cells in a tissue. Lack of an atomic resolution structure of the gap junction has made it difficult to identify interactions that stabilize its transmembrane domain. Using a recently computed model of this domain, which specifies the locations of each amino acid, we postulated the existence of several interactions and tested them experimentally. We introduced mutations within the transmembrane domain of the gap junction-forming protein connexin that were previously implicated in genetic diseases and that apparently destabilized the gap junction, as evidenced here by the absence of the protein from the sites of cell-cell apposition. The model structure helped identify positions on adjacent helices where second-site mutations restored membrane localization, revealing possible interactions between residue pairs. We thus identified two putative salt bridges and one pair involved in packing interactions in which one disease-causing mutation suppressed the effects of another. These results seem to reveal some of the physical forces that underlie the structural stability of the gap junction transmembrane domain and suggest that abrogation of such interactions bring about some of the effects of disease-causing mutations.
Magidovich E., Fleishman S. J. & Yifrach O.
(2006)
Bioinformatics.
22,
13,
p. 1546-1550
Membrane-embedded voltage-activated potassium channels (Kv) bind intracellular scaffold proteins, such as the Post Synaptic Density 95 (PSD-95) protein, using a conserved PDZ-binding motif located at the channels' C-terminal tip. This interaction underlies Kv-channel clustering, and is important for the proper assembly and functioning of the synapse. Here we demonstrate that the C-terminal segments of Kv channels adjacent to the PDZ-binding motif are intrinsically disordered. Phylogenetic analysis of the Kv channel family reveals a cluster of channel sequences belonging to three out of the four main channel families, for which an association is demonstrated between the presence of the consensus terminal PDZ-binding motif and the intrinsically disordered nature of the immediately adjacent C-terminal segment. Our observations, combined with a structural analogy to the N-terminal intra-molecular ball-and-chain mechanism for Kv channel inactivation, suggest that the C-terminal disordered segments of these channel families encode an inter-molecular fishing rod-like mechanism for K+ channel binding to scaffold proteins.
Shental-Bechor D., Fleishman S. J. & Ben-Tal N.
(2006)
Trends in Biochemical Sciences.
31,
4,
p. 192-196
Polypeptides chains are segregated by the translocon channel into secreted or membrane-inserted proteins. Recent reports claim that an in vivo system has been used to break the 'amino acid code' used by translocons to make the determination of protein type (i.e. secreted or membrane-inserted). However, the experimental setup used in these studies could have confused the derivation of this code, in particular for polar amino acids. These residues are likely to undergo stabilizing interactions with other protein components in the experiment, shielding them from direct contact with the inhospitable membrane. Hence, it is our view that the 'code' for protein translocation has not yet been deciphered and that further experiments are required for teasing apart the various energetic factors contributing to protein translocation.
Fleishman S. J., Unger V. M. & Ben-Tal N.
(2006)
Trends in Biochemical Sciences.
31,
2,
p. 106-113
Transmembrane (TM) proteins constitute 15-30% of the genome, but 50% of the membrane protein families in eukaryotes lack bacterial homologs. Therefore, it is conceivable that many more years will elapse before high-resolution structures of eukaryotic TM proteins emerge. Until then, integrated approaches that combine biochemical and computational analyses with low-resolution structures are likely to have increasingly important roles in providing frameworks for the mechanistic understanding of membrane-protein structure and function.
Landau M., Fleishman S. J. & Ben-Tal N.
(2004)
Structure.
12,
12,
p. 2265-2275
Tyrosine kinase receptors of the EGFR family play a significant role in vital cellular processes and in various cancers. EGFR members are unique among kinases, as the regulatory elements of their kinase domains are constitutively ready for catalysis. Nevertheless, the receptors are not constantly active. This apparent paradox has prompted us to seek mechanisms of regulation in EGFR's cytoplasmic domain that do not involve conformational changes of the kinase domain. Our computational analyses, based on the three-dimensional structure of EGFR's kinase domain suggest that direct contact between the kinase and a segment from the C-terminal regulatory domains inhibits enzymatic activity. EGFR activation would then involve temporal dissociation of this stable complex, for example, via ligand-induced contact formation between the extracellular domains, leading to the reorientation of the transmembrane and intracellular domains. The model provides an explanation at the molecular level for the effects of several cancer-causing EGFR mutations.
Fleishman S. J., Harrington S., Friesner R. A., Honig B. & Ben-Tal N.
(2004)
Biophysical Journal.
87,
5,
p. 3448-3459
The transmembrane (TM) domains of many integral membrane proteins are composed of α-helix bundles. Structure determination at high resolution (10 Å) resolutions using, for example, cryo-electron microscopy (cryo-EM). These structures reveal the packing arrangement of the TM domain, but cannot be used to determine the positions of individual amino acids. The observation that typically, the lipid-exposed faces of TM proteins are evolutionarily more variable and less charged than their core provides a simple rule for orienting their constituent helices. Based on this rule, we developed score functions and automated methods for orienting TM helices, for which locations and tilt angles have been determined using, e.g., cryo-EM data. The method was parameterized with the aim of retrieving the native structure of bacteriorhodopsin among near- and far-from-native templates. It was then tested on proteins that differ from bacteriorhodopsin in their sequences, architectures, and functions, such as the acetylcholine receptor and rhodopsin. The predicted structures were within 1.5-3.5 Å from the native state in all cases. We conclude that the computational method can be used in conjunction with cryo-EM data to obtain approximate model structures of TM domains of proteins for which a sufficiently heterogeneous set of homologs is available. We also show that in those proteins in which relatively short loops connect neighboring helices, the scoring functions can discriminate between near- and far-from-native conformations even without the constraints imposed on helix locations and tilt angles that are derived from cryo-EM.
Fleishman S. J., Unger V. M., Yeager M. & Ben-Tal N.
(2004)
Molecular Cell.
15,
6,
p. 879-888
Gap junction channels connect the cytoplasms of apposed cells via an intercellular conduit formed by the end-to-end docking of two hexameric hemichannels called connexons. We used electron cryomicroscopy to derive a three-dimensional density map at 5.7 Å in-plane and 19.8 Å vertical resolution, allowing us to identify the positions and tilt angles for the 24 α helices within each hemichannel. The four hydrophobic segments in connexin sequences were assigned to the α helices in the map based on biochemical and phylogenetic data. Analyses of evolutionary conservation and compensatory mutations in connexin evolution identified the packing interfaces between the helices. The final model, which specifies the coordinates of C α atoms in the transmembrane domain, provides a structural basis for understanding the different physiological effects of almost 30 mutations and polymorphisms in terms of structural deformations at the interfaces between helices, revealing an intimate connection between molecular structure and disease.
Enosh A., Fleishman S. J., Ben-Tal N. & Halperin D.
(2004)
Bioinformatics.
20,
SUPPL. 1,
p. i122-i129
Motivation: Transmembrane (TM) proteins that form α-helix bundles constitute approximately 50% of contemporary drug targets. Yet, it is difficult to determine their high-resolution (< 4 Å) structures. Some TM proteins yield more easily to structure determination using cryo electron microscopy (cryo-EM), though this technique most often results in lower resolution structures, precluding an unambiguous assignment of TM amino acid sequences to the helices seen in the structure. We present computational tools for assigning the TM segments in the protein's sequence to the helices seen in cryo-EM structures. Results: The method examines all feasible TM helix assignments and ranks each one based on a score function that was derived from loops in the structures of soluble α-helix bundles. A set of the most likely assignments is then suggested. We tested the method on eight TM chains of known structures, such as bacteriorhodopsin and the lactose permease. Our results indicate that many assignments can be rejected at the outset, since they involve the connection of pairs of remotely placed TM helices. The correct assignment received a high score, and was ranked highly among the remaining assignments. For example, in the lactose permease, which contains 12 TM helices, most of which are connected by short loops, only 12 out of 479 million assignments were found to be feasible, and the native one was ranked first. Availability: The program and the non-redundant set of protein structures used here are available at http://www.cs.tau.ac.il/~angela.
Artzy-Randrup Y., Fleishman S. J., Ben-Tal N. & Stone L.
(2004)
Science.
305,
5687,
p. 1107; author reply 1107
1107.
Recently, excitement has surrounded the application of null-hypothesis approaches for identifying evolutionary design principles in biological, technological, and social networks (1–13) and for classifying diverse networks into distinctive superfamilies (2). Here, we argue that the basic method suggested by Milo et al. (1, 2) often has limitations in identifying evolutionary design principles.
Oren I., Fleishman S. J., Kessel A. & Ben-Tal N.
(2004)
Biophysical Journal.
87,
2,
p. 768-779
Steroid hormones such as progesterone, testosterone, and estradiol are derived from cholesterol, a major constituent of biomembranes. Although the hormones might be expected to associate with the bilayer in a fashion similar to that of cholesterol, their biological action in regulating transcription of target genes involves transbilayer transfer by free diffusion, which is not observed for cholesterol. We used a novel combination of a continuum-solvent model and the downhill simplex search method for the calculation of the free energy of interaction of these hormones with lipid membranes, and compared these values to that of cholesterol-membrane interaction. The hormones were represented in atomic detail and the membrane as a structureless hydrophobic slab embedded in implicit water. A deep free-energy minimum of ∼-15 kcal/mol was obtained for cholesterol at its most favorable location in the membrane, whereas the most favorable locations for the hormones were associated with shallower minima of -5.0 kcal/mol or higher. The free-energy difference, which is predominantly due to the substitution of cholesterol's hydrophobic tail with polar groups, explains the different manner in which cholesterol and the hormones interact with the membrane. Further calculations were conducted to estimate the rate of transfer of the hormones from the aqueous phase into hexane, and from hexane back into the aqueous phase. The calculated rates agreed reasonably well with measurements in closely related systems. Based on these calculations, we suggest putative pathways for the free diffusion of the hormones across biomembranes. Overall, the calculations imply that the hormones may rapidly cross biomembrane barriers. Implications for gastrointestinal absorption and transfer across the blood-brain barrier and for therapeutic uses are discussed.
Fleishman S. J., Yifrach O. & Ben-Tal N.
(2004)
Journal of Molecular Biology.
340,
2,
p. 307-318
A novel sequence-analysis technique for detecting correlated amino acid positions in intermediate-size protein families (50-100 sequences) was developed, and applied to study voltage-dependent gating of potassium channels. Most contemporary methods for detecting amino acid correlations within proteins use very large sets of data, typically comprising hundreds or thousands of evolutionarily related sequences, to overcome the relatively low signal-to-noise ratio in the analysis of co-variations between pairs of amino acid positions. Such methods are impractical for voltage-gated potassium (Kv) channels and for many other protein families that have not yet been sequenced to that extent. Here, we used a phylogenetic reconstruction of paralogous Kv channels to follow the evolutionary history of every pair of amino acid positions within this family, thus increasing detection accuracy of correlated amino acids relative to contemporary methods. In addition, we used a bootstrapping procedure to eliminate correlations that were statistically insignificant. These and other measures allowed us to increase the method's sensitivity, and opened the way to reliable identification of correlated positions even in intermediate-size protein families. Principal-component analysis applied to the set of correlated amino acid positions in Kv channels detected a network of inter-correlated residues, a large fraction of which were identified as gating-sensitive upon mutation. Mapping the network of correlated residues onto the 3D structure of the Kv channel from Aeropyrum pernix disclosed correlations between residues in the voltage-sensor paddle and the pore region, including regions that are involved in the gating transition. We discuss these findings with respect to the evolutionary constraints acting on the channel's various domains. The software is available on our website http://ashtoret.tau.ac.il/~sarel/CorrMut.html
Fleishman S. J., Dagan T. & Graur D.
(2003)
Molecular Biology and Evolution.
20,
11,
p. 1876-1880
We present a method for pairwise Assessment of Nonfunctionalization Times (pANT) in processed pseudogenes. Contrary to existing methods for estimating nonfunctionalization times, pANT utilizes previously calculated probabilities of nucleotide substitution as explicit rate measurements, rather than assume that the substitution rates are the same for all nucleotides. Thus, the method allows a more accurate computation of the time that has elapsed since the nonfunctionalization of a pseudogene. Whereas existing methods require the sequence of an orthologous functional gene, which is not always at hand, pANT only uses the pairwise alignment of the gene/pseudogene pair, thus expanding the range of problems that can be tackled. To estimate evolutionary times in nonfunctional sequences, pANT measures the differences in the pairwise alignment of a gene and its paralogous processed pseudogene, using only the first and second codon positions. It assumes that, because of functional constraints, these positions in the sequence of the functional homolog have not changed since the time of nonfunctionalization of the pseudogene. Hence, the sequence of the gene may be used as the ancestor of the pseudogene. We show that the method's reliance on a detailed substitution matrix, which is derived separately for each species, makes it more accurate than existing methods. We applied pANT to the case of the unitary α-1,3-galactosyltransferase human pseudogene and found that our estimate of the non-functionalization time was in agreement with that obtained by taxonomic and paleontological considerations pertaining to the divergence between platyrrhines (New World monkeys) and cattarhines (Old World monkeys).
Fleishman S. J., Schlessinger J. & Ben-Tal N.
(2002)
Proceedings of the National Academy of Sciences of the United States of America.
99,
25,
p. 15937-15940
Overexpression of the receptor tyrosine kinase (RTK) erB2 also designated neu or HER2) was implicated in causing a variety of human cancers, including mammary and ovarian carcinomas. Ligand-induced receptor dimerization is critical for stimulation of the intrinsic protein tyrosine kinase (PTK) of RTKs. It was therefore proposed that PTK activity is stimulated as a result of the reorientation of the cytoplasmic domains within receptor dimers, leading to transautophosphorylation and stimulation of enzymatic activity. Here, we propose a molecular mechanism for rotation-coupled activation of the erbB2 receptor. Using a computational exploration of conformation space of the transmembrane (TM) segments of an erbB2 homodimer, we found two stable conformations of the TM domain. We suggest that these conformations correspond to the active and inactive states of erbB2, and that the receptor molecules may switch from one conformation to the other without crossing exceedingly unfavorable states. This model provides an explanation for the biochemical and oncogenic properties of erbB2, such as the effects of erbB2 overexpression on kinase activity and cell transformation. Furthermore, the opposing effects of the neu* activating oncogenic point mutation and the Val-655→-lle single-nucleoticle polymorphism shown to be linked to reduced risk of breast cancer are explained in terms of shifts in the equilibrium between the active and inactive states of erbB2 in vivo.
Fleishman S. J. & Ben-Tal N.
(2002)
Journal of Molecular Biology.
321,
2,
p. 363-378
Pairs of helices in transmembrane (TM) proteins are often tightly packed. We present a scoring function and a computational methodology for predicting the tertiary fold of a pair of α-helices such that its chances of being tightly packed are maximized. Since the number of TM protein structures solved to date is small, it seems unlikely that a reliable scoring function derived statistically from the known set of TM protein structures will be available in the near future. We therefore constructed a scoring function based on the qualitative insights gained in the past two decades from the solved structures of TM and soluble proteins. In brief, we reward the formation of contacts between small amino acid residues such as Gly, Cys, and Ser, that are known to promote dimerization of helices, and penalize the burial of large amino acid residues such as Arg and Trp. As a case study, we show that our method predicts the native structure of the TM homodimer glycophorin A (GpA) to be, in essence, at the global score optimum. In addition, by correlating our results with empirical point mutations on this homodimer, we demonstrate that our method can be a helpful adjunct to mutation analysis. We present a data set of canonical α-helices from the solved structures of TM proteins and provide a set of programs for analyzing it (http://ashtoret.tau.ac.il/∼sarel). From this data set we derived 11 helix pairs, and conducted searches around their native states as a further test of our method. Approximately 73% of our predictions showed a reasonable fit (RMS deviation