Munch J., Soler J., Gildor-Cristal O., Fleishman S. J., Garcia-Borras M. & Weissenborn M. J.
(2025)
ACS Catalysis.
15,
15,
p. 12741-12755
The selective oxyfunctionalization of terpenes remains a major challenge in chemical synthesis and is of significant industrial importance. This study presents a computational enzyme design approach based on an AlphaFold2 model of an unspecific peroxygenase (MthUPO). Using the FuncLib algorithm, only 50 variants were required, and they exhibit remarkable advancements. All 50 designs retained 100% measurable activity across the tested substrate panel, with each design showing activity on at least one substrate. Among the terpene substrates, improvements in activity varied considerably: while some substrates had only a single design exhibiting a >= 2-fold increase in activity, the top-performing substrate had 26 such designs. The most active design per terpene substrate showed enhancements ranging from 2.2-fold to 7.1-fold relative to the wild type. In addition to increased activity, many designs also demonstrated useful and dramatic shifts in regio-, chemo-, and stereoselectivity. Regioselectivity for the energetically less favored 3-hydroxy-beta-damascone increased from 3 to 46%. Particularly striking is the dramatic improvement in chemoselectivity for the oxidation of geraniol and nerol to citral A (>99%) and citral B (89%), respectively. While wild-type MthUPO exhibited only a moderate selectivity of 40% for citral A and 72% for citral B, our computationally designed variants displayed significantly enhanced product preference and up to a 4.5-fold increase in activity. Additionally, further products not found with the wild-type enzyme, such as isopiperitenol from limonene and epoxides from geraniol and nerol, were synthesized. For the hydroxylation of beta-ionone, the enantioselectivity was inverted to a ratio of 1:99 from (R)- to (S)-4-hydroxy-beta-ionone. FuncLib-enabled active-site remodeling allowed us to generate a small yet highly diverse enzyme panel that significantly outperformed the wild type across multiple synthetic challenges. The best-performing variants, such as design 4 and design 11 (both 4 mutations), exhibit improvements that result from epistatic effects. MD simulations demonstrated that these mutations collectively reshape the active site, allowing for regio- and chemoselectivities that are difficult to achieve by single-point mutations. Herein, we demonstrate the potential of in silico-guided approaches to rapidly develop highly selective biocatalysts for synthetic applications.
Avizemer Z., Martí-Gómez C., Hoch S. Y., McCandlish D. M. & Fleishman S. J.
(2025)
Cell Systems.
16,
5,
101262.
Some protein-binding pairs exhibit extreme specificities that functionally insulate them from homologs. Such pairs evolve mostly by accumulating single-point mutations, and mutants are selected if they exhibit sufficient affinity. Until now, finding a fully functional single-mutation path connecting orthogonal pairs could only be achieved by full enumeration of intermediates and was restricted to pairs that were mutationally close. We present a computational framework for discovering single-mutation paths with low molecular strain and apply it to two orthogonal bacterial endonuclease-immunity pairs separated by 17 interfacial mutations. By including mutations that bridge identities that could not be exchanged by single-nucleotide mutations, we discovered a strain-free 19-mutation path that was fully functional in vivo. The change in binding preference occurred remarkably abruptly, resulting from only one radical mutation in each partner. Furthermore, each of the specificity-switch mutations increased fitness, demonstrating that functional divergence could be driven by positive Darwinian selection.
D Sa J., Krauss L., Smith L., D'Andrea L., Chan L. J., Abraham A., Kiernan-Walker N., Mazhari R., Lamont M., Lim P. S., Sattabongkot J., Lacerda M. V., Wini L., Mueller I., Longley R. J., Pymm P., Fleishman S. J. & Tham W. H.
(2025)
Journal of Biological Chemistry.
301,
3,
108290.
Plasmodium vivax is emerging as the most prevalent species causing malaria outside Africa. Most P. vivax infections are relapses due to the reactivation of the dormant liver stage parasites (hypnozoites). Hypnozoites are a major reservoir for transmission but undetectable by commercial diagnostic tests. Antibodies against P. vivax reticulocyte-binding protein 2b (PvRBP2b) are among the most reliable serological biomarkers for recent P. vivax infections in the prior 9 months and act as indirect biomarkers for risk of relapse. We sought to design stabilized variants of PvRBP2b, under stringent conditions of minimally perturbing the solvent-accessible surfaces to maintain its antigenicity profile. Furthermore, for some of the designs, due to limited diversity of natural PvRBP2b homologs, we combined AI-based ProteinMPNN and PROSS atomistic design calculations. The best, bearing 19 core mutations relative to PvRBP2b, expressed 16-fold greater amounts (up to 11 mg/l), and had 14 °C higher thermal tolerance than the parental protein. Critically, the stabilized designs retained binding to naturally acquired human mAbs with nanomolar affinities, suggesting that the immunologically competent surfaces were retained as was confirmed by crystallographic analyses. Using longitudinal observational cohorts from malaria endemic regions of Thailand, Brazil, and the Solomon Islands, we show that antibody responses against the designs are highly correlated with those against the parental protein and can classify individuals as recently infected with P. vivax. This efficient computational stability design methodology can be used to enhance the biophysical properties of other recalcitrant proteins for use as diagnostics or vaccine immunogens.
Lipsh-Sokolik R. & Fleishman S. J.
(2025)
Journal of Molecular Biology.
437,
15,
169011.
Protein function relies on accurate and densely packed constellations of amino acids within the active site. The high density in the active site optimizes activity but reduces tolerance to mutations, thereby frustrating efforts to engineer or design new or dramatically improved activity. Introducing new activities may therefore require simultaneous multipoint mutations. Still, in a phenomenon known as epistasis, the outcome of combinations of mutations can differ significantlyand even reversethe impact of the individual mutations, limiting predictability. To address these challenges we previously developed FuncLib, a method for the computational design of multipoint mutants in active sites. We recently extended FuncLib to enable the design of large combinatorial mutation libraries for high-throughput screening in a method called htFuncLib that generates compatible sets of mutations likely to yield functional multipoint mutants. htFuncLib enables scalable library design and experimental screening of hundreds and up to millions of active-site variants. This approach has generated thousands of active enzymes and fluorescent proteins with diverse functional properties. We have updated the FuncLib web server (https://FuncLib.weizmann.ac.il/) to support htFuncLib and introduced an electronic notebook (https://github.com/Fleishman-Lab/htFuncLib-web-server) for customizable library design, making those tools easily accessible for protein engineering and design. The new FuncLib web server enables reliable and scalable design of function for low-, medium- and high-throughput experiments through a single computational platform. We envision that this server will accelerate the optimization and discovery of function in enzymes, antibodies, and other proteins.
Milenkovic I., Blumenreich S., Hochfelder A., Azulay A., Biton I. E., Zerbib M., Oren R., Tsoory M., Joseph T., Fleishman S. J. & Futerman A. H.
(2024)
Gene Therapy.
31,
9-10,
p. 439-444
Almost all attempts to date at gene therapy approaches for monogenetic disease have used the amino acid sequences of the natural protein. In the current study, we use a designed, thermostable form of glucocerebrosidase (GCase), the enzyme defective in Gaucher disease (GD), to attempt to alleviate neurological symptoms in a GD mouse that models type 3 disease, i.e. the chronic neuronopathic juvenile subtype. Upon injection of an AAVrh10 (adeno-associated virus, serotype rh10) vector containing the designed GCase (dGCase) into the left lateral ventricle of Gba−/−;Gbatg mice, a significant improvement in body weight and life-span was observed, compared to injection of the same mouse with the wild type enzyme (wtGCase). Moreover, a reduction in levels of glucosylceramide (GlcCer), and an increase in levels of GCase activity were seen in the right hemisphere of Gba−/−;Gbatg mice, concomitantly with a significant improvement in motor function, reduction of neuroinflammation and a reduction in mRNA levels of various genes shown previously to be elevated in the brain of mouse models of neurological forms of GD. Together, these data pave the way for the possible use of modified proteins in gene therapy for lysosomal storage diseases and other monogenetic disorders.
Lipsh-Sokolik R. & Fleishman S. J.
(2024)
Proceedings of the National Academy of Sciences.
121,
34,
e231499912.
Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional activesite variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.
King L. D., Pulido D., Barrett J. R., Davies H., Quinkert D., Lias A. M., Silk S. E., Pattinson D. J., Diouf A., Williams B. G., McHugh K., Rodrigues A., Rigby C. A., Strazza V., Suurbaar J., Rees-Spear C., Dabbs R. A., Ishizuka A. S., Zhou Y., Gupta G., Jin J., Li Y., Carnrot C., Minassian A. M., Campeotto I., Fleishman S. J., Noe A. R., MacGill R. S., King C. R., Birkett A. J., Soisson L. A., Long C. A., Miura K., Ashfield R., Skinner K., Howarth M. R., Biswas S. & Draper S. J.
(2024)
Cell Reports Medicine.
5,
7,
101654.
Plasmodium falciparum reticulocyte-binding protein homolog 5 (RH5) is a leading blood-stage malaria vaccine antigen target, currently in a phase 2b clinical trial as a full-length soluble protein/adjuvant vaccine candidate called RH5.1/Matrix-M. We identify that disordered regions of the full-length RH5 molecule induce non-growth inhibitory antibodies in human vaccinees and that a re-engineered and stabilized immunogen (including just the alpha-helical core of RH5) induces a qualitatively superior growth inhibitory antibody response in rats vaccinated with this protein formulated in Matrix-M adjuvant. In parallel, bioconjugation of this immunogen, termed \u201cRH5.2,\u201d to hepatitis B surface antigen virus-like particles (VLPs) using the \u201cplug-and-display\u201d SpyTag-SpyCatcher platform technology also enables superior quantitative antibody immunogenicity over soluble protein/adjuvant in vaccinated mice and rats. These studies identify a blood-stage malaria vaccine candidate that may improve upon the current leading soluble protein vaccine candidate RH5.1/Matrix-M. The RH5.2-VLP/Matrix-M vaccine candidate is now under evaluation in phase 1a/b clinical trials.
Weinstein J. J., Saikia C., Karbat I., Goldenzweig A., Reuveny E. & Fleishman S. J.
(2024)
Protein Science.
33,
6,
e4995.
Membrane proteins play critical physiological roles as receptors, channels, pumps, and transporters. Despite their importance, however, low expression levels often hamper the experimental characterization of membrane proteins. We present an automated and web-accessible design algorithm called mPROSS (https://mPROSS.weizmann.ac.il), which uses phylogenetic analysis and an atomistic potential, including an empirical lipophilicity scale, to improve native-state energy. As a stringent test, we apply mPROSS to the Kv1.2Kv2.1 paddle chimera voltage-gated potassium channel. Four designs, encoding 926 mutations relative to the parental channel, were functional and maintained potassium-selective permeation and voltage dependence in Xenopus oocytes with up to 14-fold increase in whole-cell current densities. Additionally, single-channel recordings reveal no significant change in the channel-opening probability nor in unitary conductance, indicating that functional expression levels increase without impacting the activity profile of individual channels. Our results suggest that the expression levels of other dynamic channels and receptors may be enhanced through one-shot design calculations.
Münch J., Dietz N., Barber-Zucker S., Seifert F., Matschi S., Püllmann P., Fleishman S. J. & Weissenborn M. J.
(2024)
ACS Catalysis.
14,
7,
p. 4738-4748
Unspecific peroxygenases (UPOs) are fungal enzymes that attract significant attention for their ability to perform versatile oxyfunctionalization reactions using H2O2. Unlike other oxygenases, UPOs do not require additional reductive equivalents or electron transfer chains that complicate basic and applied research. Nevertheless, UPOs generally exhibit low to no heterologous production levels and only four UPO structures have been determined to date by crystallography limiting their usefulness and obstructing research. To overcome this bottleneck, we implemented a workflow that applies PROSS stability design to AlphaFold2 model structures of 10 unique and diverse UPOs followed by a signal peptide shuffling to enable heterologous production. Nine UPOs were functionally produced in Pichia pastoris, including the recalcitrant CciUPO and three UPOs derived from oomycetes─the first nonfungal UPOs to be experimentally characterized. We conclude that the high accuracy and reliability of new modeling and design workflows dramatically expand the pool of enzymes for basic and applied research.
King L. D. W., Pulido D., Barrett J. R., Davies H., Quinkert D., Lias A. M., Silk S. E., Pattinson D. J., Diouf A., Williams B. G., McHugh K., Rodrigues A., Rigby C. A., Strazza V., Suurbaar J., Rees-Spear C., Dabbs R. A., Ishizuka A. S., Zhou Y., Gupta G., Jin J., Li Y., Carnrot C., Minassian A. M., Campeotto I., Fleishman S. J., Noe A. R., MacGill R. S., King C. R., Birkett A. J., Soisson L. A., Long C. A., Miura K., Ashfield R., Skinner K., Howarth M., Biswas S. & Draper S. J.
(2024)
BioRxiv.
The development of a highly effective vaccine against the pathogenic blood-stage infection of human malaria will require a delivery platform that can induce an antibody response of both maximal quantity and functional quality. One strategy to achieve this includes presenting antigens to the immune system on virus-like particles (VLPs). Here we sought to improve the design and delivery of the blood-stage Plasmodium falciparum reticulocyte-binding protein homolog 5 (RH5) antigen, which is currently in a Phase 2 clinical trial as a full-length soluble protein-in-adjuvant vaccine candidate called RH5.1/Matrix-M™. We identify disordered regions of the full-length RH5 molecule induce non-growth inhibitory antibodies in human vaccinees, and a re-engineered and stabilized immunogen that includes just the alpha-helical core of RH5 induces a qualitatively superior growth-inhibitory antibody response in rats vaccinated with this protein formulated in Matrix-M™ adjuvant. In parallel, bioconjugation of this new immunogen, termed textquotedblleftRH5.2textquotedblright, to hepatitis B surface antigen VLPs using the textquotedblleftplug-and-displaytextquotedblright SpyTag-SpyCatcher platform technology also enabled superior quantitative antibody immunogenicity over soluble antigen/adjuvant in vaccinated mice and rats. These studies identify a new blood-stage malaria vaccine candidate that may improve upon the current leading soluble protein vaccine candidate RH5.1/Matrix-M™. The RH5.2-VLP/Matrix-M™ vaccine candidate is now under evaluation in Phase 1a/b clinical trials.Competing Interest StatementSJD is an inventor on patent applications relating to RH5 malaria vaccines and antibodies; is a co-founder of and shareholder in SpyBiotech; and has been a consultant to GSK on malaria vaccines. AMM has been a consultant to GSK on malaria vaccines; and has an immediate family member who is an inventor on patent applications relating to RH5 malaria vaccines and antibodies and is a co-founder of and shareholder in SpyBiotech. MH is an inventor on patents relating to peptide targeting via spontaneous amide bond formation, and is a co-founder of and shareholder in SpyBiotech. SB is an inventor on patent applications relating to vaccines made using spontaneous amide bond formation and is a co-founder of, shareholder in and employee of SpyBiotech. JJ is an inventor on patent applications relating to vaccines made using spontaneous amide bond formation and is a co-founder of and shareholder in SpyBiotech. RAD is an inventor on patent applications relating to vaccines made using spontaneous amide bond formation and shareholder in SpyBiotech. LDWK, JRB, DQ, AML, SES, BGW, KMc, IC, SJF and DP are inventors on patent applications relating to RH5 malaria vaccines and/or antibodies. All other authors have declared that no conflict of interest exists.
Tennenhouse A., Khmelnitsky L., Khalaila R., Yeshaya N., Noronha A., Lindzen M., Makowski E. K., Zaretsky I., Sirkis Y. F., Galon-Wolfenson Y., Tessier P. M., Abramson J., Yarden Y., Fass D. & Fleishman S. J.
(2024)
Nature Biomedical Engineering.
8,
1,
p. 30-44
Conventional methods for humanizing animal-derived antibodies involve grafting their complementarity-determining regions onto homologous human framework regions. However, this process can substantially lower antibody stability and antigen-binding affinity, and requires iterative mutational fine-tuning to recover the original antibody properties. Here we report a computational method for the systematic grafting of animal complementarity-determining regions onto thousands of human frameworks. The method, which we named CUMAb (for computational human antibody design; available at http://CUMAb.weizmann.ac.il), starts from an experimental or model antibody structure and uses Rosetta atomistic simulations to select designs by energy and structural integrity. CUMAb-designed humanized versions of five antibodies exhibited similar affinities to those of the parental animal antibodies, with some designs showing marked improvement in stability. We also show that (1) non-homologous frameworks are often preferred to highest-homology frameworks, and (2) several CUMAb designs that differ by dozens of mutations and that use different human frameworks are functionally equivalent.
Zelnik I. D., Mestre B., Weinstein J. J., Dingjan T., Izrailov S., Ben-Dor S., Fleishman S. J. & Futerman A. H.
(2023)
Nature Communications.
14,
1,
2330.
Until now, membrane-protein stabilization has relied on iterations of mutations and screening. We now validate a one-step algorithm, mPROSS, for stabilizing membrane proteins directly from an AlphaFold2 model structure. Applied to the lipid-generating enzyme, ceramide synthase, 37 designed mutations lead to a more stable form of human CerS2. Together with molecular dynamics simulations, we propose a pathway by which substrates might be delivered to the ceramide synthases.
Füzesi-Levi M. G., Ben-Nissan G., Listov D., Fridmann Sirkis Y., Hayouka Z., Fleishman S. & Sharon M.
(2023)
Life Science Alliance.
6,
10,
e202201634.
Protein degradation is one of the essential mechanisms that enables reshaping of the proteome landscape in response to various stimuli. The largest E3 ubiquitin ligase family that targets proteins to degradation by catalyzing ubiquitination is the cullin-RING ligases (CRLs). Many of the proteins that are regulated by CRLs are central to tumorigenesis and tumor progression, and dysregulation of the CRL family is frequently associated with cancer. The CRL family comprises ∼300 complexes, all of which are regulated by the COP9 signalosome complex (CSN). Therefore, CSN is considered an attractive target for therapeutic intervention. Research efforts for targeted CSN inhibition have been directed towards inhibition of the complex enzymatic subunit, CSN5. Here, we have taken a fresh approach focusing on CSNAP, the smallest CSN subunit. Our results show that the C-terminal region of CSNAP is tightly packed within the CSN complex, in a groove formed by CSN3 and CSN8. We show that a 16 amino acid C-terminal peptide, derived from this CSN-interacting region, can displace the endogenous CSNAP subunit from the complex. This, in turn, leads to a CSNAP null phenotype that attenuates CSN activity and consequently CRLs function. Overall, our findings emphasize the potential of a CSNAP-based peptide for CSN inhibition as a new therapeutic avenue.
Khersonsky O., Goldsmith M., Zaretsky I., Hamer-Rogotner S., Dym O., Unger T., Yona M., Fridmann-Sirkis Y. & Fleishman S. J.
(2023)
Journal of Molecular Biology.
435,
17,
168191.
Albumin is the most abundant protein in the blood serum of mammals and has essential carrier and physiological roles. Albumins are also used in a wide variety of molecular and cellular experiments and in the cultivated meat industry. Despite their importance, however, albumins are challenging for heterologous expression in microbial hosts, likely due to 17 conserved intramolecular disulfide bonds. Therefore, albumins used in research and biotechnological applications either derive from animal serum, despite severe ethical and reproducibility concerns, or from recombinant expression in yeast or rice. We use the PROSS algorithm to stabilize human and bovine serum albumins, finding that all are highly expressed in E. coli. Design accuracy is verified by crystallographic analysis of a human albumin variant with 16 mutations. This albumin variant exhibits ligand binding properties similar to those of the wild type. Remarkably, a design with 73 mutations relative to human albumin exhibits over 40 °C improved stability and is stable beyond the boiling point of water. Our results suggest that proteins with many disulfide bridges have the potential to exhibit extreme stability when subjected to design. The designed albumins may be used to make economical, reproducible, and animal-free reagents for molecular and cell biology. They also open the way to high-throughput screening to study and enhance albumin carrier properties.
Pokorna S., Khersonsky O., Lipsh-Sokolik R., Goldenzweig A., Nielsen R., Ashani Y., Peleg Y., Unger T., Albeck S., Dym O., Tirosh A., Tarayra R., Hocquemiller M., Laufer R., Ben-Dor S., Silman I., Sussman J. L., Fleishman S. J. & Futerman A. H.
(2023)
FEBS Journal.
290,
13,
p. 3383-3399
Acid-β-glucosidase (GCase, EC 3.2.1.45), the lysosomal enzyme which hydrolyzes the simple glycosphingolipid, glucosylceramide (GlcCer), is encoded by the GBA1 gene. Biallelic mutations in GBA1 cause the human inherited metabolic disorder, Gaucher disease (GD), in which GlcCer accumulates, while heterozygous GBA1 mutations are the highest genetic risk factor for Parkinson's disease (PD). Recombinant GCase (e.g., Cerezyme®) is produced for use in enzyme replacement therapy for GD and is largely successful in relieving disease symptoms, except for the neurological symptoms observed in a subset of patients. As a first step towards developing an alternative to the recombinant human enzymes used to treat GD, we applied the PROSS stability-design algorithm to generate GCase variants with enhanced stability. One of the designs, containing 55 mutations compared to wild type human GCase, exhibits improved secretion and thermal stability. Furthermore, the design has higher enzymatic activity than the clinically used human enzyme when incorporated into an AAV vector, resulting in a larger decrease in the accumulation of lipid substrates in cultured cells. Based on stability-design calculations, we also developed a machine-learning based approach to distinguish benign from deleterious (i.e., disease-causing) GBA1 mutations. This approach gave remarkably accurate predictions of the enzymatic activity of single nucleotide polymorphisms in the GBA1 gene that are not currently associated with GD or PD. This latter approach could be applied to other diseases to determine risk factors in patients carrying rare mutations.
Weinstein J. Y., Martí-Gómez C., Lipsh-Sokolik R., Hoch S. Y., Liebermann D., Nevo R., Weissman H., Petrovich-Kopitman E., Margulies D., Ivankov D., McCandlish D. M. & Fleishman S. J.
(2023)
Nature Communications.
14,
2890.
Mutations in a protein active site can lead to dramatic and useful changes in protein activity. The active site, however, is sensitive to mutations due to a high density of molecular interactions, substantially reducing the likelihood of obtaining functional multipoint mutants. We introduce an atomistic and machine-learning-based approach, called high-throughput Functional Libraries (htFuncLib), that designs a sequence space in which mutations form low-energy combinations that mitigate the risk of incompatible interactions. We apply htFuncLib to the GFP chromophore-binding pocket, and, using fluorescence readout, recover >16,000 unique designs encoding as many as eight active-site mutations. Many designs exhibit substantial and useful diversity in functional thermostability (up to 96 °C), fluorescence lifetime, and quantum yield. By eliminating incompatible active-site mutations, htFuncLib generates a large diversity of functional sequences. We envision that htFuncLib will be used in one-shot optimization of activity in enzymes, binders, and other proteins.
Duart G., Elazar A., Weinstein J. Y., Gadea-Salom L., Ortiz-Mateu J., Fleishman S. J., Mingarro I. & Martinez-Gil L.
(2023)
Proceedings of the National Academy of Sciences of the United States of America.
120,
11,
e221964812.
Several methods have been developed to explore interactions among water-soluble proteins or regions of proteins. However, techniques to target transmembrane domains (TMDs) have not been examined thoroughly despite their importance. Here, we developed a computational approach to design sequences that specifically modulate protein-protein interactions in the membrane. To illustrate this method, we demonstrated that BclxL can interact with other members of the B cell lymphoma 2 (Bcl2) family through the TMD and that these interactions are required for BclxL control of cell death. Next, we designed sequences that specifically recognize and sequester the TMD of BclxL. Hence, we were able to prevent BclxL intramembrane interactions and cancel its antiapoptotic effect. These results advance our understanding of protein-protein interactions in membranes and provide a means to modulate them. Moreover, the success of our approach may trigger the development of a generation of inhibitors targeting interactions between TMDs.
Gomez de Santos P., Mateljak I., Hoang M. D., Fleishman S. J., Hollmann F. & Alcalde M.
(2023)
Journal of the American Chemical Society.
145,
6,
p. 3443-3453
The generation of enantiodivergent biocatalysts for C-H oxyfunctionalizations is ever more important in modern synthetic chemistry. Here, we have applied the FuncLib algorithm based on phylogenetic and Rosetta calculations to design a diverse repertoire of active, stable, and enantiodivergent fungal peroxygenases. 24 designs, each carrying 4-5 mutations in the catalytic core, were expressed functionally in yeast and benchmarked against characteristic model compounds. Several designs were active and stable in a range of temperature and pH, displaying unprecedented enantiodivergence, changing regioselectivity from alkyl to aromatic hydroxylation, and increasing catalytic efficiencies up to 10-fold, with 15-fold improvements in total turnover numbers over the parental enzyme. We find that this dramatic functional divergence stems from beneficial epistasis among the mutations and an extensive reorganization of the heme channel. Our work demonstrates that FuncLib can rapidly design highly functional libraries enriched in enantioselective peroxygenases not seen in nature for a range of biotechnological applications.
Lipsh-Sokolik R., Khersonsky O., Schröder S. P., de Boer C., Hoch S., Davies G. J., Overkleeft H. S. & Fleishman S. J.
(2023)
Science.
379,
6628,
p. 195-201
The design of structurally diverse enzymes is constrained by long-range interactions that are necessary for accurate folding. We introduce an atomistic and machine learning strategy for the combinatorial assembly and design of enzymes (CADENZ) to design fragments that combine with one another to generate diverse, low-energy structures with stable catalytic constellations. We applied CADENZ to endoxylanases and used activity-based protein profiling to recover thousands of structurally diverse enzymes. Functional designs exhibit high active-site preorganization and more stable and compact packing outside the active site. Implementing these lessons into CADENZ led to a 10-fold improved hit rate and more than 10,000 recovered enzymes. This design-test-learn loop can be applied, in principle, to any modular protein family, yielding huge diversity and general lessons on protein design principles.
Barber-Zucker S., Mateljak I., Goldsmith M., Kupervaser M., Alcalde M. & Fleishman S. J.
(2022)
ACS Catalysis.
12,
21,
p. 13164-13173
White-rot fungi secrete an impressive repertoire of high-redox potential laccases (HRPLs) and peroxidases for efficient oxidation and utilization of lignin. Laccases are attractive enzymes for the chemical industry due to their broad substrate range and low environmental impact. Since expression of functional recombinant HRPLs is challenging, however, iterative-directed evolution protocols have been applied to improve their expression, activity, and stability. We implement a rational, stabilize-and-diversify strategy to two HRPLs that we could not functionally express. First, we use the PROSS stability-design algorithm to allow functional expression in yeast. Second, we use the stabilized enzymes as starting points for FuncLib active-site design to improve their activity and substrate diversity. Four of the FuncLib-designed HRPLs and their PROSS progenitor exhibit substantial diversity in reactivity profiles against high-redox potential substrates, including lignin monomers. Combinations of 3-4 subtle mutations that change the polarity, solvation, and sterics of the substrate-oxidation site result in orders of magnitude changes in reactivity profiles. These stable and versatile HRPLs are a step toward generating an effective lignin-degrading consortium of enzymes that can be secreted from yeast. The stabilize-and diversify strategy can be applied to other challenging enzyme families to study and expand the utility of natural enzymes.
Cohen-Dvashi H., Weinstein J., Katz M., Ashkenazy-Eilon M., Mor Y., Shimon A., Achdout H., Tamir H., Israely T., Strobelt R., Shemesh M., Stoler-Barak L., Shulman Z., Paran N., Fleishman S. J. & Diskin R.
(2022)
iScience.
25,
10,
105193.
Blocking the interaction of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with its angiotensin-converting enzyme 2 (ACE2) receptor was proved to be an effective therapeutic option. Various protein binders as well as monoclonal antibodies that effectively target the receptor-binding domain (RBD) of SARS-CoV-2 to prevent interaction with ACE2 were developed. The emergence of SARS-CoV-2 variants that accumulate alterations in the RBD can severely affect the efficacy of such immunotherapeutic agents, as is indeed the case with Omicron that resists many of the previously isolated monoclonal antibodies. Here, we evaluate an ACE2-based immunoadhesin that we have developed early in the pandemic against some of the recent variants of concern (VoCs), including the Delta and the Omicron variants. We show that our ACE2-immunoadhesin remains effective in neutralizing these variants, suggesting that immunoadhesin-based immunotherapy is less prone to escape by the virus and has a potential to remain effective against future VoCs.
Marciano S., Dey D., Listov D., Fleishman S. J., Sonn-Segev A., Mertens H., Busch F., Kim Y., Harvey S. R., Wysocki V. H. & Schreiber G.
(2022)
Chemical Science.
13,
39,
p. 11680-11695
Over half the proteins in the E. coli cytoplasm form homo or hetero-oligomeric structures. Experimentally determined structures are often considered in determining a protein's oligomeric state, but static structures miss the dynamic equilibrium between different quaternary forms. The problem is exacerbated in homo-oligomers, where the oligomeric states are challenging to characterize. Here, we re-evaluated the oligomeric state of 17 different bacterial proteins across a broad range of protein concentrations and solutions by native mass spectrometry (MS), mass photometry (MP), size exclusion chromatography (SEC), and small-angle X-ray scattering (SAXS), finding that most exhibit several oligomeric states. Surprisingly, some proteins did not show mass-action driven equilibrium between the oligomeric states. For approximately half the proteins, the predicted oligomeric forms described in publicly available databases underestimated the complexity of protein quaternary structures in solution. Conversely, AlphaFold multimer provided an accurate description of the potential multimeric states for most proteins, suggesting that it could help resolve uncertainties on the solution state of many proteins.
Lv Y., Zheng S., Goldenzweig A., Liu F., Gao Y., Yang X., Kandale A., McGeary R. P., Williams S., Kobe B., Schembri M. A., Landsberg M. J., Wu B., Brück T. B., Sieber V., Boden M., Rao Z., Fleishman S. J., Schenk G. & Guddat L. W.
(2022)
Applied Biosciences.
1,
2,
p. 163-178
The branched-chain amino acids (BCAAs) leucine, isoleucine and valine are synthesized via a common biosynthetic pathway. Ketol-acid reductoisomerase (KARI) is the second enzyme in this pathway. In addition to its role in BCAA biosynthesis, KARI catalyzes two rate-limiting steps that are key components of a cell-free biofuel biosynthesis route. For industrial applications, reaction temperature and enzyme stability are key factors that affect process robustness and product yield. Here, we have solved the cryo-EM structure (2.94 Å resolution) of a homododecameric Class I KARI (from Campylobacter jejuni) and demonstrated how a triad of amino acid side chains plays a crucial role in promoting the oligomerization of this enzyme. Importantly, both its thermal and solvent stability are greatly enhanced in the dodecameric state when compared to its dimeric counterpart (apparent melting temperatures (Tm) of 83.1 °C and 51.5 °C, respectively). We also employed protein design (PROSS) for a tetrameric Class II KARI (from Escherichia coli) to generate a variant with improved thermal and solvent stabilities. In total, 34 mutations were introduced, which did not affect the oligomeric state of this enzyme but resulted in a fully functional catalyst with a significantly elevated Tm (58.5 °C vs. 47.9 °C for the native version).
Allouche-Arnon H., Khersonsky O., Tirukoti N. D., Peleg Y., Dym O., Albeck S., Brandis A., Mehlman T., Avram L., Harris T., Yadav N. N., Fleishman S. J. & Bar-Shir A.
(2022)
Nature Biotechnology.
40,
7,
p. 1143-1149
Imaging of gene-expression patterns in live animals is difficult to achieve with fluorescent proteins because tissues are opaque to visible light. Imaging of transgene expression with magnetic resonance imaging (MRI), which penetrates to deep tissues, has been limited by single reporter visualization capabilities. Moreover, the low-throughput capacity of MRI limits large-scale mutagenesis strategies to improve existing reporters. Here we develop an MRI system, called GeneREFORM, comprising orthogonal reporters for two-color imaging of transgene expression in deep tissues. Starting from two promiscuous deoxyribonucleoside kinases, we computationally designed highly active, orthogonal enzymes ('reporter genes') that specifically phosphorylate two MRI-detectable synthetic deoxyribonucleosides ('reporter probes'). Systemically administered reporter probes exclusively accumulate in cells expressing the designed reporter genes, and their distribution is displayed as pseudo-colored MRI maps based on dynamic proton exchange for noninvasive visualization of transgene expression. We envision that future extensions of GeneREFORM will pave the way to multiplexed deep-tissue mapping of gene expression in live animals.
Elazar A., Chandler N. J., Davey A. S., Weinstein J. Y., Nguyen J. V., Trenker R., Cross R. S., Jenkins M. R., Call M. J., Call M. E. & Fleishman S. J.
(2022)
eLife.
11,
e75660.
De novo-designed receptor transmembrane domains (TMDs) present opportunities for precise control of cellular receptor functions. We developed a de novo design strategy for generating programmed membrane proteins (proMPs): single-pass α-helical TMDs that self-assemble through computationally defined and crystallographically validated interfaces. We used these proMPs to program specific oligomeric interactions into a chimeric antigen receptor (CAR) that we expressed in mouse primary T cells and found that both in vitro CAR T cell cytokine release and in vivo antitumor activity scaled linearly with the oligomeric state encoded by the receptor TMD, from monomers up to tetramers. All programmed CARs stimulated substantially lower T cell cytokine release relative to the commonly used CD28 TMD, which we show elevated cytokine release through lateral recruitment of the endogenous T cell costimulatory receptor CD28. Precise design using orthogonal and modular TMDs thus provides a new way to program receptor structure and predictably tune activity for basic or applied synthetic biology.
D V. P., Giulia R., L M. J., J S. S., P G. A., Mitchell B., A R. S., Olga K., J F. S., Aleksandra F. & Oliver R.
(2022)
Nature Communications.
13,
3023.
The ability to alter the genomes of living cells is key to understanding how genes influence the functions of organisms and will be critical to modify living systems for useful purposes. However, this promise has long been limited by the technical challenges involved in genetic engineering. Recent advances in gene editing have bypassed some of these challenges but they are still far from ideal. Here we use FuncLib to computationally design Cas9 enzymes with substantially higher donor-independent editing activities. We use genetic circuits linked to cell survival in yeast to quantify Cas9 activity and discover synergistic interactions between engineered regions. These hyperactive Cas9 variants function efficiently in mammalian cells and introduce larger and more diverse pools of insertions and deletions into targeted genomic regions, providing tools to enhance and expand the possible applications of CRISPR-based gene editing.
Barber-Zucker S., Mindel V., Garcia-Ruiz E., Weinstein J. J., Alcalde M. & Fleishman S. J.
(2022)
Journal of the American Chemical Society.
144,
8,
p. 3564-3571
White-rot fungi secrete a repertoire of high-redox potential oxidoreductases to efficiently decompose lignin. Of these enzymes, versatile peroxidases (VPs) are the most promiscuous biocatalysts. VPs are attractive enzymes for research and industrial use but their recombinant production is extremely challenging. To date, only a single VP has been structurally characterized and optimized for recombinant functional expression, stability, and activity. Computational enzyme optimization methods can be applied to many enzymes in parallel but they require accurate structures. Here, we demonstrate that model structures computed by deep-learning-based structure prediction methods are reliable starting points for one-shot PROSS stability-design calculations. Four designed VPs encoding as many as 43 mutations relative to the wildtype enzymes are functionally expressed in yeast, whereas their wildtype parents are not. Three of these designs exhibit substantial and useful diversity in their reactivity profiles and tolerance to environmental conditions. The reliability of the new generation of structure predictors and design methods increases the scale and scope of computational enzyme optimization, enabling efficient discovery and exploitation of the functional diversity in natural enzyme families directly from genomic databases.
Mechaly A., Diamant E., Alcalay R., Ben David A., Dor E., Torgeman A., Barnea A., Girshengorn M., Levin L., Epstein E., Tennenhouse A., Fleishman S. J., Zichel R. & Mazor O.
(2022)
Antibodies (Basel).
11,
1,
21.
Botulinum neurotoxin type E (BoNT/E), the fastest acting toxin of all BoNTs, cleaves the 25 kDa synaptosomal-associated protein (SNAP-25) in motor neurons, leading to flaccid paralysis. The specific detection and quantification of the BoNT/E-cleaved SNAP-25 neoepitope can facilitate the development of cell-based assays for the characterization of anti-BoNT/E antibody preparations. In order to isolate highly specific monoclonal antibodies suitable for the in vitro immuno-detection of the exposed neoepitope, mice and rabbits were immunized with an eight amino acid peptide composed of the C-terminus of the cleaved SNAP-25. The immunized rabbits developed a specific and robust polyclonal antibody response, whereas the immunized mice mostly demonstrated a weak antibody response that could not discriminate between the two forms of SNAP-25. An immune scFv phage-display library was constructed from the immunized rabbits and a panel of antibodies was isolated. The sequence alignment of the isolated clones revealed high similarity between both heavy and light chains with exceptionally short HCDR3 sequences. A chimeric scFv-Fc antibody was further expressed and characterized, exhibiting a selective, ultra-high affinity (pM) towards the SNAP-25 neoepitope. Moreover, this antibody enabled the sensitive detection of cleaved SNAP-25 in BoNT/E treated SiMa cells with no cross reactivity with the intact SNAP-25. Thus, by applying an immunization and selection procedure, we have isolated a novel, specific and high-affinity antibody against the BoNT/E-derived SNAP-25 neoepitope. This novel antibody can be applied in in vitro assays that determine the potency of antitoxin preparations and reduce the use of laboratory animals for these purposes.
Nikitin D., Mican J., Toul M., Bednar D., Peskova M., Kittova P., Thalerova S., Vitecek J., Damborsky J., Mikulik R., Fleishman S. J., Prokop Z. & Marek M.
(2022)
Computational and Structural Biotechnology Journal.
20,
p. 1366-1377
Cardio- and cerebrovascular diseases are leading causes of death and disability, resulting in one of the highest socio-economic burdens of any disease type. The discovery of bacterial and human plasminogen activators and their use as thrombolytic drugs have revolutionized treatment of these pathologies. Fibrin-specific agents have an advantage over non-specific factors because of lower rates of deleterious side effects. Specifically, staphylokinase (SAK) is a pharmacologically attractive indirect plasminogen activator protein of bacterial origin that forms stoichiometric noncovalent complexes with plasmin, promoting the conversion of plasminogen into plasmin. Here we report a computer-assisted re-design of the molecular surface of SAK to increase its affinity for plasmin. A set of computationally designed SAK mutants was produced recombinantly and biochemically characterized. Screening revealed a pharmacologically interesting SAK mutant with ∼7-fold enhanced affinity toward plasmin, ∼10-fold improved plasmin selectivity and moderately higher plasmin-generating efficiency in vitro. Collectively, the results obtained provide a framework for SAK engineering using computational affinity-design that could pave the way to next-generation of effective, highly selective, and less toxic thrombolytics.
Khersonsky O. & Fleishman S. J.
(2022)
BioDesign Research.
2022,
9787581.
The overarching goal of computational protein design is to gain complete control over protein structure and function. The majority of sophisticated binders and enzymes, however, are large and exhibit diverse and complex folds that defy atomistic design calculations. Encouragingly, recent strategies that combine evolutionary constraints from natural homologs with atomistic calculations have significantly improved design accuracy. In these approaches, evolutionary constraints mitigate the risk from misfolding and aggregation, focusing atomistic design calculations on a small but highly enriched sequence subspace. Such methods have dramatically optimized diverse proteins, including vaccine immunogens, enzymes for sustainable chemistry, and proteins with therapeutic potential. The new generation of deep learning-based ab initio structure predictors can be combined with these methods to extend the scope of protein design, in principle, to any natural protein of known sequence. We envision that protein engineering will come to rely on completely computational methods to efficiently discover and optimize biomolecular activities.
Leonard A. C., Weinstein J. J., Steiner P. J., Erbse A. H., Fleishman S. J. & Whitehead T. A.
(2022)
Protein Engineering, Design and Selection.
35,
gzac002.
Stabilizing antigenic proteins as vaccine immunogens or diagnostic reagents is a stringent case of protein engineering and design as the exterior surface must maintain recognition by receptor(s) and antigenspecific antibodies at multiple distinct epitopes. This is a challenge, as stability enhancing mutations must be focused on the protein core, whereas successful computational stabilization algorithms typically select mutations at solvent-facing positions. In this study, we report the stabilization of SARS-CoV-2 Wuhan Hu-1 Spike receptor binding domain using a combination of deep mutational scanning and computational design, including the FuncLib algorithm. Our most successful design encodes I358F, Y365W, T430I, and I513L receptor binding domain mutations, maintains recognition by the receptor ACE2 and a panel of different anti-receptor binding domain monoclonal antibodies, is between 1 and 2°C more thermally stable than the original receptor binding domain using a thermal shift assay, and is less proteolytically sensitive to chymotrypsin and thermolysin than the original receptor binding domain. Our approach could be applied to the computational stabilization of a wide range of proteins without requiring detailed knowledge of active sites or binding epitopes. We envision that this strategy may be particularly powerful for cases when there are multiple or unknown binding sites.
Graphical Abstract
Graphical Abstract
In recent decades, antibodies (Abs) have attracted the attention of academia and the biopharmaceutical industry due to their therapeutic properties and versatility in binding a vast spectrum of antigens. Different engineering strategies have been developed for optimizing Ab specificity, efficacy, affinity, stability and production, enabling systematic screening and analysis procedures for selecting lead candidates. This quality assessment is critical but usually demands time-consuming and labor-intensive purification procedures. Here, we harnessed the direct-mass spectrometry (direct-MS) approach, in which the analysis is carried out directly from the crude growth media, for the rapid, structural characterization of designed Abs. We demonstrate that properties such as stability, specificity and interactions with antigens can be defined, without the need for prior purification.
Fleishman S. J. & Horovitz A.
(2021)
Journal of Molecular Biology.
433,
20,
167007.
Recent progress in structure-prediction methods that rely on deep learning suggests that the atomic structure of almost any protein may soon be predictable directly from its amino acid sequence. This much-awaited revolution was driven by substantial improvements in the reliability of methods for inferring the spatial distances between amino acid pairs from an analysis of homologous sequences. Improved reliability has been accompanied, however, by a reduced ability to detect amino acid relationships that are not due to direct spatial contacts, such as those that arise from protein dynamics or allostery. Given the central importance of dynamics and allostery to protein activity, we argue that an important future advance would extend modeling beyond predicting a single static structure. Here, we briefly review some of the developments that have led to the remarkable recent achievement in structure prediction and speculate what methods and sources of information may be leveraged in the future to develop a modeling framework that addresses protein dynamics and allostery.
Makdasi E., Zvi A., Alcalay R., Noy-Porat T., Peretz E., Mechaly A., Levy Y., Epstein E., Chitlaru T., Tennenhouse A., Aftalion M., Gur D., Paran N., Tamir H., Zimhony O., Weiss S., Mandelboim M., Mendelson E., Zuckerman N., Nemet I., Kliker L., Yitzhaki S., Shapira S. C., Israely T., Fleishman S. J., Mazor O. & Rosenfeld R.
(2021)
Cell Reports.
36,
10,
109679.
A wide range of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) neutralizing monoclonal antibodies (mAbs) have been reported, most of which target the spike glycoprotein. Therapeutic implementation of these antibodies has been challenged by emerging SARS-CoV-2 variants harboring mutated spike versions. Consequently, re-assessment of previously identified mAbs is of high priority. Four previously selected mAbs targeting non-overlapping epitopes are now evaluated for binding potency to mutated RBD versions, reported to mediate escape from antibody neutralization. In vitro neutralization potencies of these mAbs, and two NTD-specific mAbs, are evaluated against two frequent SARS-CoV-2 variants of concern, the B.1.1.7 Alpha and the B.1.351 Beta. Furthermore, we demonstrate therapeutic potential of three selected mAbs by treatment of K18-human angiotensin-converting enzyme 2 (hACE2) transgenic mice 2 days post-infection with each virus variant. Thus, despite the accumulation of spike mutations, the highly potent MD65 and BL6 mAbs retain their ability to bind the prevalent viral mutants, effectively protecting against B.1.1.7 and B.1.351 variants.
Aharoni A. & Fleishman S. J.
(2021)
The FEBS journal.
288,
13,
p. 3880-3883
Dan (Danny) Tawfik, a leader in biochemistry and protein evolution, sadly died due to a fatal climbing accident on May 4th, 2021. Apart from science, rock climbing was Danny's passion and a source of pride as only a handful of researchers are active climbers. Danny made unique and longlasting contributions to our understanding of molecular evolution. He was also incredibly generous with his time and insights, and many researchers around the world are indebted to him, not least the two authors of this obituary.
Borenstein-Katz A., Warszawski S., Amon R., Eilon M., Cohen-Dvashi H., Leviatan Ben-Arye S., Tasnima N., Yu H., Chen X., Padler-Karavani V., Fleishman S. J. & Diskin R.
(2021)
Journal of Molecular Biology.
433,
15,
167099.
Glycans decorate the cell surface, secreted glycoproteins and glycolipids, and altered glycans are often found in cancers. Despite their high diagnostic and therapeutic potential, however, glycans are polar and flexible molecules that are quite challenging for the development and design of high-affinity binding antibodies. To understand the mechanisms by which glycan neoantigens are specifically recognized by antibodies, we analyze the biomolecular recognition of the tumor-associated carbohydrate antigen CA19-9 by two distinct antibodies using X-ray crystallography. Despite the potential plasticity of glycans and the very different antigen-binding surfaces presented by the antibodies, both structures reveal an essentially identical extended CA19-9 conformer, suggesting that this conformer's stability selects the antibodies. Starting from the bound structure of one of the antibodies, we use the AbLIFT computational algorithm to design a variant with seven core mutations in the variable domain's light-heavy chain interface that exhibits tenfold improved affinity for CA19-9. The results reveal strategies used by antibodies to specifically recognize glycan antigens and show how automated antibody-optimization methods may be used to enhance the clinical potential of existing antibodies.
Peleg Y., Vincentelli R., Collins B. M., Chen K., Livingstone E. K., Weeratunga S., Leneva N., Guo Q., Remans K., Perez K., Bjerga G. E., Larsen Ø., Vaněk O., Skořepa O., Jacquemin S., Poterszman A., Kjaer S., Christodoulou E., Albeck S., Dym O., Ainbinder E., Unger T., Schuetz A., Matthes S., Bader M., de Marco A., Storici P., Semrau M. S., Stolt-Bergner P., Aigner C., Suppmann S., Goldenzweig A. & Fleishman S. J.
(2021)
Journal of Molecular Biology.
433,
13,
166964.
Recent years have seen a dramatic improvement in protein-design methodology. Nevertheless, most methods demand expert intervention, limiting their widespread adoption. By contrast, the PROSS algorithm for improving protein stability and heterologous expression levels has been successfully applied to a range of challenging enzymes and binding proteins. Here, we benchmark the application of PROSS as a stand-alone tool for protein scientists with no or limited experience in modeling. Twelve laboratories from the Protein Production and Purification Partnership in Europe (P4EU) challenged the PROSS algorithm with 14 unrelated protein targets without support from the PROSS developers. For each target, up to six designs were evaluated for expression levels and in some cases, for thermal stability and activity. In nine targets, designs exhibited increased heterologous expression levels either in prokaryotic and/or eukaryotic expression systems under experimental conditions that were tailored for each target protein. Furthermore, we observed increased thermal stability in nine of ten tested targets. In two prime examples, the human Stem Cell Factor (hSCF) and human Cadherin-Like Domain (CLD12) from the RET receptor, the wild type proteins were not expressible as soluble proteins in E. coli, yet the PROSS designs exhibited high expression levels in E. coli and HEK293 cells, respectively, and improved thermal stability. We conclude that PROSS may improve stability and expressibility in diverse cases, and that improvement typically requires target-specific expression conditions. This study demonstrates the strengths of community-wide efforts to probe the generality of new methods and recommends areas for future research to advance practically useful algorithms for protein science.
Scherer M., Fleishman S. J., Jones P. R., Dandekar T. & Bencurova E.
(2021)
Frontiers in Bioengineering and Biotechnology.
9,
673005.
To enable a sustainable supply of chemicals, novel biotechnological solutions are required that replace the reliance on fossil resources. One potential solution is to utilize tailored biosynthetic modules for the metabolic conversion of CO2 or organic waste to chemicals and fuel by microorganisms. Currently, it is challenging to commercialize biotechnological processes for renewable chemical biomanufacturing because of a lack of highly active and specific biocatalysts. As experimental methods to engineer biocatalysts are time- and cost-intensive, it is important to establish efficient and reliable computational tools that can speed up the identification or optimization of selective, highly active, and stable enzyme variants for utilization in the biotechnological industry. Here, we review and suggest combinations of effective state-of-the-art software and online tools available for computational enzyme engineering pipelines to optimize metabolic pathways for the biosynthesis of renewable chemicals. Using examples relevant for biotechnology, we explain the underlying principles of enzyme engineering and design and illuminate future directions for automated optimization of biocatalysts for the assembly of synthetic metabolic pathways.
Weinstein J. J., Goldenzweig A., Hoch S. & Fleishman S. J.
(2021)
Bioinformatics.
37,
1,
p. 123-125
Many natural and designed proteins are only marginally stable limiting their usefulness in research and applications. Recently, we described an automated structure and sequence-based design method, called PROSS, for optimizing protein stability and heterologous expression levels that has since been validated on dozens of proteins. Here, we introduce improvements to the method, workflow and presentation, including more accurate sequence analysis, error handling and automated analysis of the quality of the sequence alignment that is used in design calculations.
Warszawski S., Katz A. B., Lipsh R., Khmelnitsky L., Nissan G. B., Javitt G., Dym O., Unger T., Knop O., Albeck S., Diskin R., Fass D., Sharon M. & Fleishman S. J.
(2020)
PLoS Computational Biology.
16,
10,
e1008382.
The funding statement for this article should read as follows: \u201cThe research was supported by grants from the European Research Council (335439 to SJF, 636752 to MS, and 310649 and 825076 to DF), the Israel Science Foundation to MS (300/ 17) and through its Center of Excellence in Structural Cell Biology to SJF and DF (1775/12), a research grant from Sheri and David E. Stone and by a charitable donation from Sam Switzer and family. M.S. is an incumbent of the Aharon and Ephraim Katzir Memorial Professorial Chair. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.\u201d
Our ability to design new or improved biomolecular activities depends on understanding the sequence-function relationships in proteins. The large size and fold complexity of most proteins, however, obscure these relationships, and protein-optimization methods continue to rely on laborious experimental iterations. Recently, a deeper understanding of the roles of stability-threshold effects and biomolecular epistasis in proteins has led to the development of hybrid methods that combine phylogenetic analysis with atomistic design calculations. These methods enable reliable and even single-step optimization of protein stability, expressibility, and activity in proteins that were considered outside the scope of computational design. Furthermore, ancestral-sequence reconstruction produces insights on missing links in the evolution of enzymes and binders that may be used in protein design. Through the combination of phylogenetic and atomistic calculations, the long-standing goal of general computational methods that can be universally applied to study and optimize proteins finally seems within reach.
Warszawski S., Dekel E., Campeotto I., Marshall J. M., Wright K. E., Lyth O., Knop O., Regev-Rudzki N., Higgins M. K., Draper S. J., Baum J. & Fleishman S. J.
(2020)
Proteins-Structure Function And Bioinformatics.
88,
1,
p. 187-195
Many human pathogens use host cell-surface receptors to attach and invade cells. Often, the host-pathogen interaction affinity is low, presenting opportunities to block invasion using a soluble, high-affinity mimic of the host protein. The Plasmodium falciparum reticulocyte-binding protein homolog 5 (RH5) provides an exciting candidate for mimicry: it is highly conserved and its moderate affinity binding to the human receptor basigin (K
D ≥1 μM) is an essential step in erythrocyte invasion by this malaria parasite. We used deep mutational scanning of a soluble fragment of human basigin to systematically characterize point mutations that enhance basigin affinity for RH5 and then used Rosetta to design a variant within the sequence space of affinity-enhancing mutations. The resulting seven-mutation design exhibited 1900-fold higher affinity (K
D approximately 1 nM) for RH5 with a very slow binding off rate (0.23 h
−1) and reduced the effective Plasmodium growth-inhibitory concentration by at least 10-fold compared to human basigin. The design provides a favorable starting point for engineering on-rate improvements that are likely to be essential to reach therapeutically effective growth inhibition.
Malladi S. K., Schreiber D., Pramanick I., Sridevi M. A., Goldenzweig A., Dutta S., Fleishman S. J. & Varadarajan R.
(2020)
Current Research in Structural Biology.
2,
p. 45-55
Stabilization of the metastable envelope glycoprotein (Env) of HIV-1 is hypothesized to improve induction of broadly neutralizing antibodies. We improved the expression yield and stability of the HIV-1 envelope glycoprotein BG505SOSIP.664 gp140 by means of a previously described automated sequence and structure-guided computational thermostabilization approach, PROSS. This combines sequence conservation information with computational assessment of mutant stabilization, thus taking advantage of the extensive natural sequence variation present in HIV-1 Env. PROSS is used to design three gp140 variants with 1745 mutations relative to the parental construct. One of the designs is experimentally observed to have a fourfold improvement in yield and a 4 °C increment in thermostability. In addition, the designed immunogens have similar antigenicity profiles to the native flexible linker version of wild type, BG505SOSIP.664 gp140 (NFL Wt) to major epitopes targeted by broadly neutralizing antibodies. PROSS eliminates the laborious process of screening many variants for stability and functionality, providing a proof of principle of the method for stabilization and improvement of yield without compromising antigenicity for next generation complex, highly glycosylated vaccine candidates.
Warszawski S., Katz A. B., Lipsh R., Khmelnitsky L., Ben Nissan G., Javitt G., Dym O., Unger T., Knop O., Albeck S., Diskin R., Fass D., Sharon M. & Fleishman S. J.
(2019)
PLoS Computational Biology.
15,
8,
e1007207.
Antibodies developed for research and clinical applications may exhibit suboptimal stability, expressibility, or affinity. Existing optimization strategies focus on surface mutations, whereas natural affinity maturation also introduces mutations in the antibody core, simultaneously improving stability and affinity. To systematically map the mutational tolerance of an antibody variable fragment (Fv), we performed yeast display and applied deep mutational scanning to an anti-lysozyme antibody and found that many of the affinity-enhancing mutations clustered at the variable light-heavy chain interface, within the antibody core. Rosetta design combined enhancing mutations, yielding a variant with tenfold higher affinity and substantially improved stability. To make this approach broadly accessible, we developed AbLIFT, an automated web server that designs multipoint core mutations to improve contacts between specific Fv light and heavy chains (http://AbLIFT.weizmann.ac.il). We applied AbLIFT to two unrelated antibodies targeting the human antigens VEGF and QSOX1. Strikingly, the designs improved stability, affinity, and expression yields. The results provide proof-of-principle for bypassing laborious cycles of antibody engineering through automated computational affinity and stability design.
Trudeau D. L., Edlich-Muth C., Zarzycki J., Scheffen M., Goldsmith M., Khersonsky O., Avizemer Z., Fleishman S. J., Cotton C. A. R., Erb T. J., Tawfik D. S. & Bar-Even A.
(2018)
Proceedings Of The National Academy Of Sciences Of The United States Of America-Physical Sciences.
115,
49,
p. E11455-E11464
Photorespiration recycles ribulose-1,5-bisphosphate carboxylase/oxygenase ( Rubisco) oxygenation product, 2-phosphoglycolate, back into the Calvin Cycle. Natural photorespiration, however, limits agricultural productivity by dissipating energy and releasing CO2. Several photorespiration bypasses have been previously suggested but were limited to existing enzymes and pathways that release CO2. Here, we harness the power of enzyme and metabolic engineering to establish synthetic routes that bypass photorespiration without CO2 release. By defining specific reaction rules, we systematically identified promising routes that assimilate 2-phosphoglycolate into the Calvin Cycle without carbon loss. We further developed a kinetic-stoichiometric model that indicates that the identified synthetic shunts could potentially enhance carbon fixation rate across the physiological range of irradiation and CO2, even if most of their enzymes operate at a tenth of Rubisco's maximal carboxylation activity. Glycolate reduction to glycolaldehyde is essential for several of the synthetic shunts but is not known to occur naturally. We, therefore, used computational design and directed evolution to establish this activity in two sequential reactions. An acetyl-CoA synthetase was engineered for higher stability and glycolyl-CoA synthesis. A propionyl-CoA reductase was engineered for higher selectivity for glycolyl-CoA and for use of NADPH over NAD(+), thereby favoring reduction over oxidation. The engineered glycolate reduction module was then combined with downstream condensation and assimilation of glycolaldehyde to ribulose 1,5-bisphosphate, thus providing proof of principle for a carbonconserving photorespiration pathway.
Ben-Nissan G., Vimer S., Warszawski S., Katz A., Yona M., Unger T., Peleg Y., Morgenstern D., Cohen-Dvashi H., Diskin R., Fleishman S. J. & Sharon M.
(2018)
Communications Biology.
1,
1,
213.
Characterization of overexpressed proteins is essential for assessing their quality, and providing input for iterative redesign and optimization. This process is typically carried out following purification procedures that require pronounced cost of time and labor. Therefore, quality assessment of recombinant proteins with no prior purification offers a major advantage. Here, we report a native mass spectrometry method that enables characterization of overproduced proteins directly from culture media. Properties such as solubility, molecular weight, folding, assembly state, overall structure, post-translational modifications and binding to relevant biomolecules are immediately revealed. We show the applicability of the method for in-depth characterization of secreted recombinant proteins from eukaryotic systems such as yeast, insect, and human cells. This method, which can be readily extended to high-throughput analysis, considerably shortens the time gap between protein production and characterization, and is particularly suitable for characterizing engineered and mutated proteins, and optimizing yield and quality of overexpressed proteins.
Kantaev R., Riven I., Goldenzweig A., Barak Y., Dym O., Peleg Y., Albeck S., Fleishman S. J. & Haran G.
(2018)
Journal of Physical Chemistry B.
122,
49,
p. 11030-11038
Folding of proteins to their functional conformation is paramount to life. Though 75% of the proteome consists of multi-domain proteins, our knowledge of folding has been based primarily on studies conducted on single-domain and fast-folding proteins. Nonetheless, the complexity of folding landscapes exhibited by multi-domain proteins has received increased scrutiny in recent years. We study the three-domain protein adenylate kinase from E. coli (AK), which has been shown to fold through a series of pathways involving several intermediate states. We use protein design method to manipulate the folding landscape of AK, and single-molecule FRET spectroscopy to study the effects on the folding process. Mutations introduced in the NMP binding (NMPbind) domain of the protein are found to have unexpected effects on the folding landscape. Thus, while stabilizing mutations in the core of the NMPbind domain retain the main folding pathways of wild-type AK, a destabilizing mutation at the interface between the NMPbind and the CORE domains causes a significant repartition of the flux between the folding pathways. Our results demonstrate the outstanding plasticity of the folding landscape of AK, and reveal how specific mutations in the primary structure are translated into changes in folding dynamics. The combination of methodologies introduced in this work should prove useful for deepening our understanding of the folding process of multi-domain proteins.
Netzer R., Listov D., Lipsh R., Dym O., Albeck S., Knop O., Kleanthous C. & Fleishman S. J.
(2018)
Nature Communications.
9,
1,
5286.
Protein networks in all organisms comprise homologous interacting pairs. In these networks, some proteins are specific, interacting with one or a few binding partners, whereas others are multispecific and bind a range of targets. We describe an algorithm that starts from an interacting pair and designs dozens of new pairs with diverse backbone conformations at the binding site as well as new binding orientations and sequences. Applied to a high-affinity bacterial pair, the algorithm results in 18 new ones, with cognate affinities from pico- to micromolar. Three pairs exhibit 3-5 orders of magnitude switch in specificity relative to the wild type, whereas others are multispecific, collectively forming a protein-interaction network. Crystallographic analysis confirms design accuracy, including in new backbones and polar interactions. Preorganized polar interaction networks are responsible for high specificity, thus defining design principles that can be applied to program synthetic cellular interaction networks of desired affinity and specificity.
Khersonsky O., Lipsh R., Avizemer Z., Ashani Y., Goldsmith M., Leader H., Dym O., Rogotner S., Trudeau D. L., Prilusky J., Amengual-Rigo P., Guallar V., Tawfik D. S. & Fleishman S. J.
(2018)
Molecular Cell.
72,
1,
p. 178-186.e5
Substantial improvements in enzyme activity demand multiple mutations at spatially proximal positions in the active site. Such mutations, however, often exhibit unpredictable epistatic (non-additive) effects on activity. Here we describe FuncLib, an automated method for designing multipoint mutations at enzyme active sites using phylogenetic analysis and Rosetta design calculations. We applied FuncLib to two unrelated enzymes, a phosphotriesterase and an acetyl-CoA synthetase. All designs were active, and most showed activity profiles that significantly differed from the wild-type and from one another. Several dozen designs with only 36 active-site mutations exhibited 10- to 4,000-fold higher efficiencies with a range of alternative substrates, including hydrolysis of the toxic organophosphate nerve agents soman and cyclosarin and synthesis of butyryl-CoA. FuncLib is implemented as a web server (http://FuncLib.weizmann.ac.il); it circumvents iterative, high-throughput experimental screens and opens the way to designing highly efficient and diverse catalytic repertoires.
Cveticanin J., Netzer R., Arkind G., Fleishman S. J., Horovitz A. & Sharon M.
(2018)
Analytical Chemistry.
90,
17,
p. 10090-10094
A powerful method to determine the energetic coupling between amino acids is double mutant cycle analysis. In this method, two residues are mutated separately and in combination and the energetic effects of the mutations are determined. A deviation of the effect of the double mutation from the sum of effects of the single mutations indicates that the two residues are interacting directly or indirectly. Here, we show that double mutant cycle analysis by native mass spectrometry can be carried out for interactions in crude Escherichia coli cell extracts, thereby obviating the need for protein purification and generating binding isotherms. Our results indicate that intermolecular hydrogen bond strengths are not affected by the more crowded conditions in cell lysates.
Lapidoth G., Khersonsky O., Lipsh R., Dym O., Albeck S., Rogotner S. & Fleishman S. J.
(2018)
Nature Communications.
9,
2780.
Automated design of enzymes with wild-type-like catalytic properties has been a long-standing but elusive goal. Here, we present a general, automated method for enzyme design through combinatorial backbone assembly. Starting from a set of homologous yet structurally diverse enzyme structures, the method assembles new backbone combinations and uses Rosetta to optimize the amino acid sequence, while conserving key catalytic residues. We apply this method to two unrelated enzyme families with TIM-barrel folds, glycoside hydrolase 10 (GH10) xylanases and phosphotriesterase-like lactonases (PLLs), designing 43 and 34 proteins, respectively. Twenty-one GH10 and seven PLL designs are active, including designs derived from templates with
Amon R., Grant O. C., Ben-Arye S. L., Makeneni S., Nivedha A. K., Marshanski T., Norn C., Yu H., Glushka J. N., Fleishman S. J., Chen X., Woods R. J. & Padler-Karavani V.
(2018)
Scientific Reports.
8,
10786.
Anti-carbohydrate monoclonal antibodies (mAbs) hold great promise as cancer therapeutics and diagnostics. However, their specificity can be mixed, and detailed characterization is problematic, because antibody-glycan complexes are challenging to crystallize. Here, we developed a generalizable approach employing high-throughput techniques for characterizing the structure and specificity of such mAbs, and applied it to the mAb TKH2 developed against the tumor-associated carbohydrate antigen sialyl-Tn (STn). The mAb specificity was defined by apparent KD values determined by quantitative glycan microarray screening. Key residues in the antibody combining site were identified by site-directed mutagenesis, and the glycan-antigen contact surface was defined using saturation transfer difference NMR (STD-NMR). These features were then employed as metrics for selecting the optimal 3D-model of the antibody-glycan complex, out of thousands plausible options generated by automated docking and molecular dynamics simulation. STn-specificity was further validated by computationally screening of the selected antibody 3D-model against the human sialyl-Tn-glycome. This computational-experimental approach would allow rational design of potent antibodies targeting carbohydrates.
Goldenzweig A. & Fleishman S. J.
(2018)
Annual Review of Biochemistry.
87,
p. 105-129
Proteins are increasingly used in basic and applied biomedical research. Many proteins, however, are only marginally stable and can be expressed in limited amounts, thus hampering research and applications. Research has revealed the thermodynamic, cellular, and evolutionary principles and mechanisms that underlie marginal stability. With this growing understanding, computational stability design methods have advanced over the past two decades starting from methods that selectively addressed only some aspects of marginal stability. Current methods are more general and, by combining phylogenetic analysis with atomistic design, have shown drastic improvements in solubility, thermal stability, and aggregation resistance while maintaining the protein's primary molecular activity. Stability design is opening the way to rational engineering of improved enzymes, therapeutics, and vaccines and to the application of protein design methodology to large proteins and molecular activities that have proven challenging in the past.
Bandyopadhyay B., Goldenzweig A., Unger T., Adato O., Fleishman S. J., Unger R. & Horovitz A.
(2017)
Journal of Biological Chemistry.
292,
50,
p. 20583-20591
The GroE chaperonin system in Escherichia coli comprises GroEL and GroES and facilitates ATP-dependent protein folding in vivo and in vitro Proteins with very similar sequences and structures can differ in their dependence on GroEL for efficient folding. One potential but unverified source for GroEL dependence is frustration, wherein not all interactions in the native state are optimized energetically, thereby potentiating slow folding and misfolding. Here, we chose enhanced green fluorescent protein as a model system and subjected it to random mutagenesis, followed by screening for variants whose in vivo folding displays increased or decreased GroEL dependence. We confirmed the altered GroEL dependence of these variants with in vitro folding assays. Strikingly, mutations at positions predicted to be highly frustrated were found to correlate with decreased GroEL dependence. Conversely, mutations at positions with low frustration were found to correlate with increased GroEL dependence. Further support for this finding was obtained by showing that folding of an enhanced green fluorescent protein variant designed computationally to have reduced frustration is indeed less GroEL-dependent. Our results indicate that changes in local frustration also affect partitioning in vivo between spontaneous and chaperonin-mediated folding. Hence, the design of minimally frustrated sequences can reduce chaperonin dependence and improve protein expression levels.
Baran D., Pszolla M. G., Lapidoth G. D., Norn C., Dym O., Unger T., Albeck S., Tyka M. D. & Fleishman S. J.
(2017)
Proceedings of the National Academy of Sciences of the United States of America.
114,
41,
p. 10900-10905
Natural proteins must both fold into a stable conformation and exert their molecular function. To date, computational design has successfully produced stable and atomically accurate proteins by using so-called "ideal" folds rich in regular secondary structures and almost devoid of loops and destabilizing elements, such as cavities. Molecular function, such as binding and catalysis, however, often demands nonideal features, including large and irregular loops and buried polar interaction networks, which have remained challenging for fold design. Through five design/experiment cycles, we learned principles for designing stable and functional antibody variable fragments (Fvs). Specifically, we (i) used sequence-design constraints derived from antibody multiple-sequence alignments, and (ii) during backbone design, maintained stabilizing interactions observed in natural antibodies between the framework and loops of complementarity-determining regions (CDRs) 1 and 2. Designed Fvs bound their ligands with midnanomolar affinities and were as stable as natural antibodies, despite having >30 mutations from mammalian antibody germlines. Furthermore, crystallographic analysis demonstrated atomic accuracy throughout the framework and in four of six CDRs in one design and atomic accuracy in the entire Fv in another. The principles we learned are general, and can be implemented to design other nonideal folds, generating stable, specific, and precise antibodies and enzymes.
Rosenfeld R., Alcalay R., Mechaly A., Lapidoth G., Epstein E., Kronman C., Fleishman S. J. & Mazor O.
(2017)
Protein Engineering, Design and Selection.
30,
9,
p. 611-617
While potent monoclonal antibodies against ricin were introduced over the years, the question whether increasing antibody affinity enables better toxin neutralization was not fully addressed yet. The aim of this study was to characterize the contribution of antibody affinity to the ricin neutralization potential of the antibody. cHD23 monoclonal antibody that targets the toxin B-subunit and interferes with its binding to membranal receptors, was isolated. In order to create antibody clones with improved affinity toward ricin, a scFv-phage display library containing mutated versions of the variable regions of cHD23 was constructed and clones with improved binding of ricin were isolated. Structural modeling of these mutants suggests that the inserted mutations may increase the antibody conformational flexibility thus improving its ability to bind ricin. While it was found that the selected clones exhibited improved neutralization of ricin, the correlation between the KD values and potency was only minor (r = 0.55). However, a positive correlation (r = 0.84) exist between the off-rate values (koff) of the affinity matured clones and their ability to neutralize ricin. As cell membranes display inordinately large amounts of potential surface binding sites for ricin, it is suggested that antibodies with improved off-rate values block the ability of the toxin to bind to target receptors, in a highly efficient manner. Currently, antibody-based therapy is the most effective treatment for ricin intoxication and it is anticipated that the findings of this study will provide useful information and a possible strategy to design an improved antibody-based therapy for the toxin.
Khersonsky O. & Fleishman S. J.
(2017)
Protein Science.
26,
4,
p. 807-813
Allosteric regulation underlies living cells' ability to sense changes in nutrient and signaling-molecule concentrations, but the ability to computationally design allosteric regulation into non-allosteric proteins has been elusive. Allosteric-site design is complicated by the requirement to encode the relative stabilities of active and inactive conformations of the same protein in the presence and absence of both ligand and effector. To address this challenge, we used Rosetta to design the backbone of the flexible heavy-chain complementarity-determining region 3 (HCDR3), and used geometric matching and sequence optimization to place a Zn2+-coordination site in a fluorescein-binding antibody. We predicted that due to HCDR3's flexibility, the fluorescein-binding pocket would configure properly only upon Zn2+ application. We found that regulation by Zn2+ was reversible and sensitive to the divalent ion's identity, and came at the cost of reduced antibody stability and fluorescein-binding affinity. Fluorescein bound at an order of magnitude higher affinity in the presence of Zn2+ than in its absence, and the increase in fluorescein affinity was due almost entirely to faster fluorescein on-rate, suggesting that Zn2+ preorganized the antibody for fluorescein binding. Mutation analysis demonstrated the extreme sensitivity of Zn2+ regulation on the atomic details in and around the metal-coordination site. The designed antibody could serve to study how allosteric regulation evolved from non-allosteric binding proteins, and suggests a way to designing molecular sensors for environmental and biomedical targets.
Goldsmith M., Aggarwal N., Ashani Y., Jubran H., Greisen P. J., Ovchinnikov S., Leader H., Baker D., Sussman J., Goldenzweig A., Fleishman S. J. & Tawfik D.
(2017)
Protein Engineering, Design and Selection.
30,
4,
p. 333-345
Improving an enzyme's initially low catalytic efficiency with a new target substrate by an order of magnitude or two may require only a few rounds of mutagenesis and screening or selection. However, subsequent rounds of optimization tend to yield decreasing degrees of improvement (diminishing returns) eventually leading to an optimization plateau. We aimed to optimize the catalytic efficiency of bacterial phosphotriesterase (PTE) toward V-type nerve agents. Previously, we improved the catalytic efficiency of wild-type PTE toward the nerve agent VX by 500-fold, to a catalytic efficiency (k(cat)/K-M) of 5 x 10(6)M(-1) min(-1). However, effective in vivo detoxification demands an enzyme with a catalytic efficiency of > 10(7) M-1 min(-1). Here, following eight additional rounds of directed evolution and the computational design of a stabilized variant, we evolved PTE variants that detoxify VX with a k(cat)/K-M >= 5 x 10(7)M(-1) min(-1) and Russian VX (RVX) with a k(cat)/K-M >= 10(7) M-1 min(-1). These final 10-fold improvements were the most time consuming and laborious, as most libraries yielded either minor or no improvements. Stabilizing the evolving enzyme, and avoiding tradeoffs in activity with different substrates, enabled us to obtain further improvements beyond the optimization plateau and evolve PTE variants that were overall improved by > 5000-fold with VX and by > 17 000-fold with RVX. The resulting variants also hydrolyze G-type nerve agents with high efficiency (GA, GB at k(cat)/K-M > 5 x 10(7) M-1 min(-1)) and can thus serve as candidates for broadspectrum nerve-agent prophylaxis and post-exposure therapy using low enzyme doses.
Gaines J. C., Virrueta A., Buch D. A., Fleishman S. J., O'Hern C. S. & Regan L.
(2017)
Protein Engineering, Design and Selection.
30,
5,
p. 387-394
Protein core repacking is a standard test of protein modeling software. A recent study of six different modeling software packages showed that they are more successful at predicting side chain conformations of core compared to surface residues. All the modeling software tested have multicomponent energy functions, typically including contributions from solvation, electrostatics, hydrogen bonding and LennardJones interactions in addition to statistical terms based on observed protein structures. We investigated to what extent a simplified energy function that includes only stereochemical constraints and repulsive hard-sphere interactions can correctly repack protein cores. For single residue and collective repacking, the hard-sphere model accurately recapitulates the observed side chain conformations for Ile, Leu, Phe, Thr, Trp, Tyr and Val. This result shows that there are no alternative, sterically allowed side chain conformations of core residues. Analysis of the same set of protein cores using the Rosetta software suite revealed that the hard-sphere model and Rosetta perform equally well on Ile, Leu, Phe, Thr and Val; the hard-sphere model performs better on Trp and Tyr and Rosetta performs better on Ser. We conclude that the high prediction accuracy in protein cores obtained by protein modeling software and our simplified hard-sphere approach reflects the high density of protein cores and dominance of steric repulsion.
Campeotto I., Goldenzweig A., Davey J., Barfod L., Marshall J. M., Silk S. E., Wright K. E., Draper S. J., Higgins M. K. & Fleishman S. J.
(2017)
Proceedings of the National Academy of Sciences of the United States of America.
114,
5,
p. 998-1002
Many promising vaccine candidates from pathogenic viruses, bacteria, and parasites are unstable and cannot be produced cheaply for clinical use. For instance, Plasmodium falciparum reticulocyte-binding protein homolog 5 (PfRH5) is essential for erythrocyte invasion, is highly conserved among field isolates, and elicits antibodies that neutralize in vitro and protect in an animal model, making it a leading malaria vaccine candidate. However, functional RH5 is only expressible in eukaryotic systems and exhibits moderate temperature tolerance, limiting its usefulness in hot and low-income countries where malaria prevails. Current approaches to immunogen stabilization involve iterative application of rational or semirational design, random mutagenesis, and biochemical characterization. Typically, each round of optimization yields minor improvement in stability, and multiple rounds are required. In contrast, we developed a onestep design strategy using phylogenetic analysis and Rosetta atomistic calculations to design PfRH5 variantswith improved packing and surface polarity. To demonstrate the robustness of this approach, we tested three PfRH5 designs, all of which showed improved stability relative to wild type. The best, bearing 18 mutations relative to PfRH5, expressed in a folded form in bacteria at >1 mg of protein per L of culture, and had 10-15 °C higher thermal tolerance than wild type, while also retaining ligand binding and immunogenic properties indistinguishable from wild type, proving its value as an immunogen for a future generation of vaccines against the malaria blood stage. We envision that this efficient computational stability design methodology will also be used to enhance the biophysical properties of other recalcitrant vaccine candidates from emerging pathogens.
Current methods for antibody structure prediction rely on sequence homology to known structures. Although this strategy often yields accurate predictions, models can be stereo-chemically strained. Here, we present a fully automated algorithm, called AbPredict, that disregards sequence homology, and instead uses a Monte Carlo search for low-energy conformations built from backbone segments and rigid-body orientations that appear in antibody molecular structures. We find cases where AbPredict selects accurate loop templates with sequence identity as low as 10%, whereas the template of highest sequence identity diverges substantially from the query's conformation. Accordingly, in several cases reported in the recent Antibody Modeling Assessment benchmark, AbPredict models were more accurate than those from any participant, and the models' stereo-chemical quality was consistently high. Furthermore, in two blind cases provided to us by crystallographers prior to structure determination, the method achieved
Goldenzweig A., Goldsmith M., Hill S. E., Gertman O., Laurino P., Ashani Y., Dym O., Unger T., Albeck S., Prilusky J., Lieberman R. L., Aharoni A., Silman I., Sussman J., Tawfik D. & Fleishman S. J.
(2016)
Molecular Cell.
63,
2,
p. 337-346
Upon heterologous overexpression, many proteins misfold or aggregate, thus resulting in low functional yields. Human acetylcholinesterase (hAChE), an enzyme mediating synaptic transmission, is a typical case of a human protein that necessitates mammalian systems to obtain functional expression. We developed a computational strategy and designed an AChE variant bearing 51 mutations that improved core packing, surface polarity, and backbone rigidity. This variant expressed at ∼2,000-fold higher levels in E. coli compared to wild-type hAChE and exhibited 20°C higher thermostability with no change in enzymatic properties or in the active-site configuration as determined by crystallography. To demonstrate broad utility, we similarly designed four other human and bacterial proteins. Testing at most three designs per protein, we obtained enhanced stability and/or higher yields of soluble and active protein in E. coli. Our algorithm requires only a 3D structure and several dozen sequences of naturally occurring homologs, and is available at http://pross.weizmann.ac.il.
Khersonsky O. & Fleishman S. J.
(2016)
Protein Science.
p. 1179-1187
We protein engineers are ambivalent about evolution: on the one hand, evolution inspires us with myriad examples of biomolecular binders, sensors, and catalysts; on the other hand, these examples are seldom well-adapted to the engineering tasks we have in mind. Protein engineers have therefore modified natural proteins by point substitutions and fragment exchanges in an effort to generate new functions. A counterpoint to such design efforts, which is being pursued now with greater success, is to completely eschew the starting materials provided by nature and to design new protein functions from scratch by using de novo molecular modeling and design. While important progress has been made in both directions, some areas of protein design are still beyond reach. To this end, we advocate a synthesis of these two strategies: by using design calculations to both recombine and optimize fragments from natural proteins, we can build stable and as of yet un-sampled structures, thereby granting access to an expanded repertoire of conformations and desired functions. We propose that future methods that combine phylogenetic analysis, structure and sequence bioinformatics, and atomistic modeling may well succeed where any one of these approaches has failed on its own.
Over the past decade, scientists have made exciting progress in designing protein folds entirely on the computer and then successfully synthesizing them in the laboratory (15). These designer proteins had the same structure in experiment as in the model and were very stable; however, they lacked important structural features seen in protein interfaces and enzyme active sites. In two reports on pages 680 and 687 of this issue, Boyken et al. (6) and Jacobs et al. (7) use the Rosetta biomolecular modeling software to design proteins that include some of these features. Experiments show that these new designs retain high structural precision and stability.
Assaf E., Weinstein J., Biran I., Fridman Y., Bibi E. & Fleishman S.
(2016)
eLife.
5,
JANUARY2016,
e12125.
Insertion of helix-forming segments into the membrane and their association determines the structure, function, and expression levels of all plasma membrane proteins. However, systematic and reliable quantification of membrane-protein energetics has been challenging. We developed a deep mutational scanning method to monitor the effects of hundreds of point mutations on helix insertion and self-association within the bacterial inner membrane. The assay quantifies insertion energetics for all natural amino acids at 27 positions across the membrane, revealing that the hydrophobicity of biological membranes is significantly higher than appreciated. We further quantitate the contributions to membrane-protein insertion from positively charged residues at the cytoplasm-membrane interface and reveal large and unanticipated differences among these residues. Finally, we derive comprehensive mutational landscapes in the membrane domains of Glycophorin A and the ErbB2 oncogene, and find that insertion and self-association are strongly coupled in receptor homodimers.
Grossman I., Ilani T., Fleishman S. & Fass D.
(2016)
Protein engineering, design & selection : PEDS.
29,
4,
p. 135-147
The secreted disulfide catalyst Quiescin sulfhydryl oxidase-1 (QSOX1) affects extracellular matrix organization and is overexpressed in various adenocarcinomas and associated stroma. Inhibition of extracellular human QSOX1 by a monoclonal antibody decreased tumor cell migration in a cell co-culture model and hence may have therapeutic potential. However, the species specificity of the QSOX1 monoclonal antibody has been a setback in assessing its utility as an anti-metastatic agent in vivo, a common problem in the antibody therapy industry. We therefore used structurally guided engineering to expand the antibody species specificity, improving its affinity toward mouse QSOX1 by at least four orders of magnitude. A crystal structure of the re-engineered variant, complexed with its mouse antigen, revealed that the antibody accomplishes dual-species targeting through altered contacts between its heavy and light chains, plus replacement of bulky aromatics by flexible side chains and versatile water-bridged polar interactions. In parallel, we produced a surrogate antibody targeting mouse QSOX1 that exhibits a new QSOX1 inhibition mode. This set of three QSOX1 inhibitory antibodies is compatible with various mouse models for pre-clinical trials and biotechnological applications. In this study we provide insights into structural blocks to cross-reactivity and set up guideposts for successful antibody design and re-engineering.
Warszawski S., Netzer R., Tawfik D. S. & Fleishman S. J.
(2014)
Journal of Molecular Biology.
426,
24,
p. 4125-4138
To carry out their activities, biological macromolecules balance different physical traits, such as stability, interaction affinity, and selectivity. How such often opposing traits are encoded in a macromolecular system is critical to our understanding of evolutionary processes and ability to design new molecules with desired functions. We present a framework for constraining design simulations to balance different physical characteristics. Each trait is represented by the equilibrium fractional occupancy of the desired state relative to its alternatives, ranging from none to full occupancy, and the different traits are combined using Boolean operators to effect a "fuzzy"-logic language for encoding any combination of traits. In another paper, we presented a new combinatorial backbone design algorithm AbDesign where the fuzzy-logic framework was used to optimize protein backbones and sequences for both stability and binding affinity in antibody-design simulation. We now extend this framework and find that fuzzy-logic design simulations reproduce sequence and structure design principles seen in nature to underlie exquisite specificity on the one hand and multispecificity on the other hand. The fuzzy-logic language is broadly applicable and could help define the space of tolerated and beneficial mutations in natural biomolecular systems and design artificial molecules that encode complex characteristics.
Strauch E. M., Fleishman S. J. & Baker D.
(2014)
Proceedings of the National Academy of Sciences of the United States of America.
111,
2,
p. 675-680
Computational design provides the opportunity to program protein-protein interactions for desired applications. We used denovo protein interface design to generate a pH-dependent Fc domain binding protein that buries immunoglobulin G (IgG) His-433.Using next-generation sequencing of naive and selected pools ofa library of design variants, we generated a molecular footprint ofthe designed binding surface, confirming the binding mode andguiding further optimization of the balance between affinity andpH sensitivity. In biolayer interferometry experiments, the optimized design binds IgG with a Kd of ~4 nM at pH 8.2, and approximately 500-fold more weakly at pH 5.5. The protein is extremelystable, heat-resistant and highly expressed in bacteria, and allowspH-based control of binding for IgG affinity purification and diagnostic devices.
Schreiber G. & Fleishman S. J.
(2013)
Current Opinion in Structural Biology.
23,
6,
p. 903-910
A long-term aim of computational design is to generate specific protein-protein interactions at desired affinity, specificity, and kinetics. The past three years have seen the first reports on atomically accurate de novo interactions. These were based on advances in design algorithms and the ability to harness high-throughput experimental characterization of design variants to optimize binding. Current state-of-the-art in computational design lacks precision, and therefore requires intensive experimental optimization to achieve parity with natural binders. Recent successes (and failures) point the way to future progress in design methodology that would enable routine and robust design of binders and inhibitors, while also shedding light on the essential features of biomolecular recognition.
Procko E., Hedman R., Hamilton K., Seetharaman J., Fleishman S. J., Su M., Aramini J., Kornhaber G., Hunt J. F., Tong L., Montelione G. T. & Baker D.
(2013)
Journal of Molecular Biology.
425,
18,
p. 3563-3575
While there has been considerable progress in designing protein-protein interactions, the design of proteins that bind polar surfaces is an unmet challenge. We describe the computational design of a protein that binds the acidic active site of hen egg lysozyme and inhibits the enzyme. The design process starts with two polar amino acids that fit deep into the enzyme active site, identifies a protein scaffold that supports these residues and is complementary in shape to the lysozyme active-site region, and finally optimizes the surrounding contact surface for high-affinity binding. Following affinity maturation, a protein designed using this method bound lysozyme with low nanomolar affinity, and a combination of NMR studies, crystallography, and knockout mutagenesis confirmed the designed binding surface and orientation. Saturation mutagenesis with selection and deep sequencing demonstrated that specific designed interactions extending well beyond the centrally grafted polar residues are critical for high-affinity binding.
Khare S. D. & Fleishman S. J.
(2013)
FEBS Letters.
587,
8,
p. 1147-1154
Recent years have seen the first applications of computational protein design to generate novel catalysts, binding pairs of proteins, protein inhibitors, and large oligomeric assemblies. At their core these methods rely on a similar hybrid energy function, composed of physics-based and database-derived terms, while different sequence and conformational sampling approaches are used for each design category. Although these are first steps for the computational design of novel function, crystal structures and biochemical characterization already point out where success and failure are likely in the application of protein design. Contrasting failed and successful design attempts has been used to diagnose deficiencies in the approaches and in the underlying hybrid energy function. In this manner, design provides an inherent mechanism by which crucial information is obtained on pressing areas where focused efforts to improve methods are needed. Of the successful designs, many feature pre-organized sites that are poised to perform their intended function, and improvements often result from disfavoring alternative functionally suboptimal states. These rapid developments and fundamental insights obtained thus far promise to make computational design of novel molecular function general, robust, and routine.
Whitehead T. A., Baker D. & Fleishman S. J.
(2013)
Methods in Protein Design
.
p. 1-19
Computational design of novel protein binders has recently emerged as a useful technique to study biomolecular recognition and generate molecules for use in biotechnology, research, and biomedicine. Current limitations in computational design methodology have led to the adoption of high-throughput screening and affinity maturation techniques to diagnose modeling inaccuracies and generate high activity binders. Here, we scrutinize this combination of computational and experimental aspects and propose areas for future methodological improvements.
Whitehead T. A., Chevalier A., Song Y., Dreyfus C., Fleishman S. J., De Mattos C., Myers C. A., Kamisetty H., Blair P., Wilson I. A. & Baker D.
(2012)
Nature Biotechnology.
30,
6,
p. 543-548
We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.
Fleishman S. J. & Baker D.
(2012)
Cell.
149,
2,
p. 262-273
The folding of natural biopolymers into unique three-dimensional structures that determine their function is remarkable considering the vast number of alternative states and requires a large gap in the energy of the functional state compared to the many alternatives. This Perspective explores the implications of this energy gap for computing the structures of naturally occurring biopolymers, designing proteins with new structures and functions, and optimally integrating experiment and computation in these endeavors. Possible parallels between the generation of functional molecules in computational design and natural evolution are highlighted.
Wojdyla J. A., Fleishman S. J., Baker D. & Kleanthous C.
(2012)
Journal of Molecular Biology.
417,
1-2,
p. 79-94
How proteins achieve high-affinity binding to a specific protein partner while simultaneously excluding all others is a major biological problem that has important implications for protein design. We report the crystal structure of the ultra-high-affinity protein-protein complex between the endonuclease domain of colicin E2 and its cognate immunity (Im) protein, Im2 (Kd∼ 10-15 M), which, by comparison to previous structural and biophysical data, provides unprecedented insight into how high affinity and selectivity are achieved in this model family of protein complexes. Our study pinpoints the role of structured water molecules in conjoining hotspot residues that govern stability with residues that control selectivity. A key finding is that a single residue, which in a noncognate context massively destabilizes the complex through frustration, does not participate in specificity directly but rather acts as an organizing center for a multitude of specificity interactions across the interface, many of which are water mediated.
Protein-protein interactions play critical roles in biology, and computational design of interactions could be useful in a range of applications. We describe in detail a general approach to de novo design of protein interactions based on computed, energetically optimized interaction hotspots, which was recently used to produce high-affinity binders of influenza hemagglutinin. We present several alternative approaches to identify and build the key hotspot interactions within both core secondary structural elements and variable loop regions and evaluate the method's performance in natural-interface recapitulation. We show that the method generates binding surfaces that are more conformationally restricted than previous design methods, reducing opportunities for off-target interactions.
Fleishman S. J., Leaver-Fay A., Corn J. E., Strauch E. M., Khare S. D., Koga N., Ashworth J., Murphy P., Richter F., Lemmon G., Meiler J. & Baker D.
(2011)
PLoS ONE.
6,
6,
e20161.
Macromolecular modeling and design are increasingly useful in basic research, biotechnology, and teaching. However, the absence of a user-friendly modeling framework that provides access to a wide range of modeling capabilities is hampering the wider adoption of computational methods by non-experts. RosettaScripts is an XML-like language for specifying modeling tasks in the Rosetta framework. RosettaScripts provides access to protocol-level functionalities, such as rigid-body docking and sequence redesign, and allows fast testing and deployment of complex protocols without need for modifying or recompiling the underlying C++ code. We illustrate these capabilities with RosettaScripts protocols for the stabilization of proteins, the generation of computationally constrained libraries for experimental selection of higher-affinity binding proteins, loop remodeling, small-molecule ligand docking, design of ligand-binding proteins, and specificity redesign in DNA-binding proteins.
We describe a general computational method for designing proteins that bind a surface patch of interest on a target macromolecule. Favorable interactions between disembodied amino acid residues and the target surface are identified and used to anchor de novo designed interfaces. The method was used to design proteins that bind a conserved surface patch on the stem of the influenza hemagglutinin (HA) from the 1918 H1N1 pandemic virus. After affinity maturation, two of the designed proteins, HB36 and HB80, bind H1 and H5 HAs with low nanomolar affinity. Further, HB80 inhibits the HA fusogenic conformational changes induced at low pH. The crystal structure of HB36 in complex with 1918/H1 HA revealed that the actual binding interface is nearly identical to that in the computational design model. Such designed binding proteins may be useful for both diagnostics and therapeutics.
Leaver-Fay A., Tyka M., Lewis S. M., Lange O. F., Thompson J., Jacak R., Kaufman K., Renfrew P. D., Smith C. A., Sheffler W., Davis I. W., Cooper S., Treuille A., Mandell D. J., Richter F., Ban Y. E. A., Fleishman S. J., Corn J. E., Kim D. E., Lyskov S., Berrondo M., Mentzer S., Popović Z., Havranek J. J., Karanicolas J., Das R., Meiler J., Kortemme T., Gray J. J., Kuhlman B., Baker D. & Bradley P.
(2011)
Computer Methods
: Part C
.
C ed.
Vol. C.
p. 545-574
We have recently completed a full rearchitecturing of the Rosetta molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy-to-use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as Rosetta3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This chapter describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.
Meenan N. A., Sharma A., Fleishman S. J., MacDonald C. J., Morel B., Boetzel R., Moore G. R., Baker D. & Kleanthous C.
(2010)
Proceedings of the National Academy of Sciences of the United States of America.
107,
22,
p. 10080-10085
High-affinity, high-selectivity protein-protein interactions that are critical for cell survival present an evolutionary paradox: How does selectivity evolve when acquired mutations risk a lethal loss of high-affinity binding? A detailed understanding of selectivity in such complexes requires structural information on weak, non-cognate complexes which can be difficult to obtain due to their transient and dynamic nature. Using NMR-based docking as a guide, we deployed a disulfide-trapping strategy on a noncognate complex between the colicin E9 endonuclease (E9 DNase) and immunity protein 2 (Im2), which is seven orders of magnitude weaker binding than the cognate femtomolar E9 DNase-Im9 interaction. The 1.77 Å crystal structure of the E9 DNase-Im2 complex reveals an entirely noncovalent interface where the intersubunit disulfide merely supports the crystal lattice. In combination with computational alanine scanning of interfacial residues, the structure reveals that the driving force for binding is so strong that a severely unfavorable specificity contact is tolerated at the interface and as a result the complex becomes weakened through "frustration." As well as rationalizing past mutational and thermodynamic data, comparing our noncognate structure with previous cognate complexes highlights the importance of loop regions in developing selectivity and accentuates the multiple roles of buried water molecules that stabilize, ameliorate, or aggravate interfacial contacts. The study provides direct support for dual-recognition in colicin DNase-Im protein complexes and shows that weakened noncognate complexes are primed for high-affinity binding, which can be achieved by economical mutation of a limited number of residues at the interface.
McBeth C., Seamons A., Pizarro J. C., Fleishman S. J., Baker D., Kortemme T., Goverman J. M. & Strong R. K.
(2008)
Journal of Molecular Biology.
375,
5,
p. 1306-1319
We report crystal structures of a negatively selected T cell receptor (TCR) that recognizes two I-Au-restricted myelin basic protein peptides and one of its peptide/major histocompatibility complex (pMHC) ligands. Unusual complementarity-determining region (CDR) structural features revealed by our analyses identify a previously unrecognized mechanism by which the highly variable CDR3 regions define ligand specificity. In addition to the pMHC contact residues contributed by CDR3, the CDR3 residues buried deep within the Vα/Vβ interface exert indirect effects on recognition by influencing the Vα/Vβ interdomain angle. This phenomenon represents an additional mechanism for increasing the potential diversity of the TCR repertoire. Both the direct and indirect effects exerted by CDR residues can impact global TCR/MHC docking. Analysis of the available TCR structures in light of these results highlights the significance of the Vα/Vβ interdomain angle in determining specificity and indicates that TCR/pMHC interface features do not distinguish autoimmune from non-autoimmune class II-restricted TCRs.
Fuchs A., Martin-Galiano A. J., Kalman M., Fleishman S., Ben-Tal N. & Frishman D.
(2007)
Bioinformatics.
23,
24,
p. 3312-3319
Motivation: The analysis of co-evolving residues has been exhaustively evaluated for the prediction of intramolecular amino acid contacts in soluble proteins. Although a variety of different methods for the detection of these co-evolving residues have been developed, the fraction of correctly predicted contacts remained insufficient for their reliable application in the construction of structural models. Membrane proteins, which constitute between one-fourth and one-third of all proteins in an organism, were only considered in few individual case studies. Results: We present the first general study of correlated mutations in α-helical membrane proteins. Using seven different prediction algorithms, we extracted co-evolving residues for 14 membrane proteins having a solved 3D structure. On average, distances between correlated pairs of residues lying on different transmembrane segments were found to be significantly smaller compared to a random prediction. Covariation of residues was frequently found in direct sequence neighborhood to helix-helix contacts. Based on the results obtained from individual prediction methods, we constructed a consensus prediction for every protein in the dataset that combines obtained correlations from different prediction algorithms and simultaneously removes likely false positives. Using this consensus prediction, 53% of all predicted residue pairs were found within one helix turn of an observed helix-helix contact. Based on the combination of co-evolving residues detected with the four best prediction algorithms, interacting helices could be predicted with a specificity of 83% and sensitivity of 42%.
Wang C., Schueler-Furman O., Andre I., London N., Fleishman S. J., Bradley P., Qian B. & Baker D.
(2007)
Proteins: Structure, Function and Genetics.
69,
4,
p. 758-763
A challenge in protein-protein docking is to account for the conformational changes in the monomers that occur upon binding. The RosettaDock method, which incorporates sidechain flexibility but keeps the backbone fixed, was found in previous CAPRI rounds (4 and 5) to generate docking models with atomic accuracy, provided that conformational changes were mainly restricted to protein sidechains. In the recent rounds of CAPRI (6-12), large backbone conformational changes occur upon binding for several target complexes. To address these challenges, we explicitly introduced backbone flexibility in our modeling procedures by combining rigid-body docking with protein structure prediction techniques such as modeling variable hops and building homology models. Encouragingly, using this approach we were able to correctly predict a significant backbone conformational change of an interface loop for Target 20 (12 Å rmsd between those in the unbound monomer and complex structures), but accounting for backbone flexibility in protein-protein docking is still very challenging because of the significantly larger conformational space, which must be surveyed. Motivated by these CAPRI challenges, we have made progress in reformulating RosettaDock using a "fold-tree" representation, which provides a general framework for treating a wide variety of flexible-backbone docking problems.
Fleishman S. J., Sabag A. D., Ophir E., Avraham K. B. & Ben-Tal N.
(2006)
Journal of Biological Chemistry.
281,
39,
p. 28958-28963
Gap junctions form intercellular channels that mediate metabolic and electrical signaling between neighboring cells in a tissue. Lack of an atomic resolution structure of the gap junction has made it difficult to identify interactions that stabilize its transmembrane domain. Using a recently computed model of this domain, which specifies the locations of each amino acid, we postulated the existence of several interactions and tested them experimentally. We introduced mutations within the transmembrane domain of the gap junction-forming protein connexin that were previously implicated in genetic diseases and that apparently destabilized the gap junction, as evidenced here by the absence of the protein from the sites of cell-cell apposition. The model structure helped identify positions on adjacent helices where second-site mutations restored membrane localization, revealing possible interactions between residue pairs. We thus identified two putative salt bridges and one pair involved in packing interactions in which one disease-causing mutation suppressed the effects of another. These results seem to reveal some of the physical forces that underlie the structural stability of the gap junction transmembrane domain and suggest that abrogation of such interactions bring about some of the effects of disease-causing mutations.
Magidovich E., Fleishman S. J. & Yifrach O.
(2006)
Bioinformatics.
22,
13,
p. 1546-1550
Membrane-embedded voltage-activated potassium channels (Kv) bind intracellular scaffold proteins, such as the Post Synaptic Density 95 (PSD-95) protein, using a conserved PDZ-binding motif located at the channels' C-terminal tip. This interaction underlies Kv-channel clustering, and is important for the proper assembly and functioning of the synapse. Here we demonstrate that the C-terminal segments of Kv channels adjacent to the PDZ-binding motif are intrinsically disordered. Phylogenetic analysis of the Kv channel family reveals a cluster of channel sequences belonging to three out of the four main channel families, for which an association is demonstrated between the presence of the consensus terminal PDZ-binding motif and the intrinsically disordered nature of the immediately adjacent C-terminal segment. Our observations, combined with a structural analogy to the N-terminal intra-molecular ball-and-chain mechanism for Kv channel inactivation, suggest that the C-terminal disordered segments of these channel families encode an inter-molecular fishing rod-like mechanism for K+ channel binding to scaffold proteins.
Shental-Bechor D., Fleishman S. J. & Ben-Tal N.
(2006)
Trends in Biochemical Sciences.
31,
4,
p. 192-196
Polypeptides chains are segregated by the translocon channel into secreted or membrane-inserted proteins. Recent reports claim that an in vivo system has been used to break the 'amino acid code' used by translocons to make the determination of protein type (i.e. secreted or membrane-inserted). However, the experimental setup used in these studies could have confused the derivation of this code, in particular for polar amino acids. These residues are likely to undergo stabilizing interactions with other protein components in the experiment, shielding them from direct contact with the inhospitable membrane. Hence, it is our view that the 'code' for protein translocation has not yet been deciphered and that further experiments are required for teasing apart the various energetic factors contributing to protein translocation.
Fleishman S. J., Unger V. M. & Ben-Tal N.
(2006)
Trends in Biochemical Sciences.
31,
2,
p. 106-113
Transmembrane (TM) proteins constitute 15-30% of the genome, but 50% of the membrane protein families in eukaryotes lack bacterial homologs. Therefore, it is conceivable that many more years will elapse before high-resolution structures of eukaryotic TM proteins emerge. Until then, integrated approaches that combine biochemical and computational analyses with low-resolution structures are likely to have increasingly important roles in providing frameworks for the mechanistic understanding of membrane-protein structure and function.
Landau M., Fleishman S. J. & Ben-Tal N.
(2004)
Structure.
12,
12,
p. 2265-2275
Tyrosine kinase receptors of the EGFR family play a significant role in vital cellular processes and in various cancers. EGFR members are unique among kinases, as the regulatory elements of their kinase domains are constitutively ready for catalysis. Nevertheless, the receptors are not constantly active. This apparent paradox has prompted us to seek mechanisms of regulation in EGFR's cytoplasmic domain that do not involve conformational changes of the kinase domain. Our computational analyses, based on the three-dimensional structure of EGFR's kinase domain suggest that direct contact between the kinase and a segment from the C-terminal regulatory domains inhibits enzymatic activity. EGFR activation would then involve temporal dissociation of this stable complex, for example, via ligand-induced contact formation between the extracellular domains, leading to the reorientation of the transmembrane and intracellular domains. The model provides an explanation at the molecular level for the effects of several cancer-causing EGFR mutations.
Fleishman S. J., Harrington S., Friesner R. A., Honig B. & Ben-Tal N.
(2004)
Biophysical Journal.
87,
5,
p. 3448-3459
The transmembrane (TM) domains of many integral membrane proteins are composed of α-helix bundles. Structure determination at high resolution (10 Å) resolutions using, for example, cryo-electron microscopy (cryo-EM). These structures reveal the packing arrangement of the TM domain, but cannot be used to determine the positions of individual amino acids. The observation that typically, the lipid-exposed faces of TM proteins are evolutionarily more variable and less charged than their core provides a simple rule for orienting their constituent helices. Based on this rule, we developed score functions and automated methods for orienting TM helices, for which locations and tilt angles have been determined using, e.g., cryo-EM data. The method was parameterized with the aim of retrieving the native structure of bacteriorhodopsin among near- and far-from-native templates. It was then tested on proteins that differ from bacteriorhodopsin in their sequences, architectures, and functions, such as the acetylcholine receptor and rhodopsin. The predicted structures were within 1.5-3.5 Å from the native state in all cases. We conclude that the computational method can be used in conjunction with cryo-EM data to obtain approximate model structures of TM domains of proteins for which a sufficiently heterogeneous set of homologs is available. We also show that in those proteins in which relatively short loops connect neighboring helices, the scoring functions can discriminate between near- and far-from-native conformations even without the constraints imposed on helix locations and tilt angles that are derived from cryo-EM.
Fleishman S. J., Unger V. M., Yeager M. & Ben-Tal N.
(2004)
Molecular Cell.
15,
6,
p. 879-888
Gap junction channels connect the cytoplasms of apposed cells via an intercellular conduit formed by the end-to-end docking of two hexameric hemichannels called connexons. We used electron cryomicroscopy to derive a three-dimensional density map at 5.7 Å in-plane and 19.8 Å vertical resolution, allowing us to identify the positions and tilt angles for the 24 α helices within each hemichannel. The four hydrophobic segments in connexin sequences were assigned to the α helices in the map based on biochemical and phylogenetic data. Analyses of evolutionary conservation and compensatory mutations in connexin evolution identified the packing interfaces between the helices. The final model, which specifies the coordinates of C α atoms in the transmembrane domain, provides a structural basis for understanding the different physiological effects of almost 30 mutations and polymorphisms in terms of structural deformations at the interfaces between helices, revealing an intimate connection between molecular structure and disease.
Enosh A., Fleishman S. J., Ben-Tal N. & Halperin D.
(2004)
Bioinformatics.
20,
SUPPL. 1,
p. i122-i129
Motivation: Transmembrane (TM) proteins that form α-helix bundles constitute approximately 50% of contemporary drug targets. Yet, it is difficult to determine their high-resolution (< 4 Å) structures. Some TM proteins yield more easily to structure determination using cryo electron microscopy (cryo-EM), though this technique most often results in lower resolution structures, precluding an unambiguous assignment of TM amino acid sequences to the helices seen in the structure. We present computational tools for assigning the TM segments in the protein's sequence to the helices seen in cryo-EM structures. Results: The method examines all feasible TM helix assignments and ranks each one based on a score function that was derived from loops in the structures of soluble α-helix bundles. A set of the most likely assignments is then suggested. We tested the method on eight TM chains of known structures, such as bacteriorhodopsin and the lactose permease. Our results indicate that many assignments can be rejected at the outset, since they involve the connection of pairs of remotely placed TM helices. The correct assignment received a high score, and was ranked highly among the remaining assignments. For example, in the lactose permease, which contains 12 TM helices, most of which are connected by short loops, only 12 out of 479 million assignments were found to be feasible, and the native one was ranked first. Availability: The program and the non-redundant set of protein structures used here are available at http://www.cs.tau.ac.il/~angela.
Artzy-Randrup Y., Fleishman S. J., Ben-Tal N. & Stone L.
(2004)
Science.
305,
5687,
p. 1107; author reply 1107
1107.
Recently, excitement has surrounded the application of null-hypothesis approaches for identifying evolutionary design principles in biological, technological, and social networks (113) and for classifying diverse networks into distinctive superfamilies (2). Here, we argue that the basic method suggested by Milo et al. (1, 2) often has limitations in identifying evolutionary design principles.
Oren I., Fleishman S. J., Kessel A. & Ben-Tal N.
(2004)
Biophysical Journal.
87,
2,
p. 768-779
Steroid hormones such as progesterone, testosterone, and estradiol are derived from cholesterol, a major constituent of biomembranes. Although the hormones might be expected to associate with the bilayer in a fashion similar to that of cholesterol, their biological action in regulating transcription of target genes involves transbilayer transfer by free diffusion, which is not observed for cholesterol. We used a novel combination of a continuum-solvent model and the downhill simplex search method for the calculation of the free energy of interaction of these hormones with lipid membranes, and compared these values to that of cholesterol-membrane interaction. The hormones were represented in atomic detail and the membrane as a structureless hydrophobic slab embedded in implicit water. A deep free-energy minimum of ∼-15 kcal/mol was obtained for cholesterol at its most favorable location in the membrane, whereas the most favorable locations for the hormones were associated with shallower minima of -5.0 kcal/mol or higher. The free-energy difference, which is predominantly due to the substitution of cholesterol's hydrophobic tail with polar groups, explains the different manner in which cholesterol and the hormones interact with the membrane. Further calculations were conducted to estimate the rate of transfer of the hormones from the aqueous phase into hexane, and from hexane back into the aqueous phase. The calculated rates agreed reasonably well with measurements in closely related systems. Based on these calculations, we suggest putative pathways for the free diffusion of the hormones across biomembranes. Overall, the calculations imply that the hormones may rapidly cross biomembrane barriers. Implications for gastrointestinal absorption and transfer across the blood-brain barrier and for therapeutic uses are discussed.
Fleishman S. J., Yifrach O. & Ben-Tal N.
(2004)
Journal of Molecular Biology.
340,
2,
p. 307-318
A novel sequence-analysis technique for detecting correlated amino acid positions in intermediate-size protein families (50-100 sequences) was developed, and applied to study voltage-dependent gating of potassium channels. Most contemporary methods for detecting amino acid correlations within proteins use very large sets of data, typically comprising hundreds or thousands of evolutionarily related sequences, to overcome the relatively low signal-to-noise ratio in the analysis of co-variations between pairs of amino acid positions. Such methods are impractical for voltage-gated potassium (Kv) channels and for many other protein families that have not yet been sequenced to that extent. Here, we used a phylogenetic reconstruction of paralogous Kv channels to follow the evolutionary history of every pair of amino acid positions within this family, thus increasing detection accuracy of correlated amino acids relative to contemporary methods. In addition, we used a bootstrapping procedure to eliminate correlations that were statistically insignificant. These and other measures allowed us to increase the method's sensitivity, and opened the way to reliable identification of correlated positions even in intermediate-size protein families. Principal-component analysis applied to the set of correlated amino acid positions in Kv channels detected a network of inter-correlated residues, a large fraction of which were identified as gating-sensitive upon mutation. Mapping the network of correlated residues onto the 3D structure of the Kv channel from Aeropyrum pernix disclosed correlations between residues in the voltage-sensor paddle and the pore region, including regions that are involved in the gating transition. We discuss these findings with respect to the evolutionary constraints acting on the channel's various domains. The software is available on our website http://ashtoret.tau.ac.il/~sarel/CorrMut.html
Fleishman S. J., Dagan T. & Graur D.
(2003)
Molecular Biology and Evolution.
20,
11,
p. 1876-1880
We present a method for pairwise Assessment of Nonfunctionalization Times (pANT) in processed pseudogenes. Contrary to existing methods for estimating nonfunctionalization times, pANT utilizes previously calculated probabilities of nucleotide substitution as explicit rate measurements, rather than assume that the substitution rates are the same for all nucleotides. Thus, the method allows a more accurate computation of the time that has elapsed since the nonfunctionalization of a pseudogene. Whereas existing methods require the sequence of an orthologous functional gene, which is not always at hand, pANT only uses the pairwise alignment of the gene/pseudogene pair, thus expanding the range of problems that can be tackled. To estimate evolutionary times in nonfunctional sequences, pANT measures the differences in the pairwise alignment of a gene and its paralogous processed pseudogene, using only the first and second codon positions. It assumes that, because of functional constraints, these positions in the sequence of the functional homolog have not changed since the time of nonfunctionalization of the pseudogene. Hence, the sequence of the gene may be used as the ancestor of the pseudogene. We show that the method's reliance on a detailed substitution matrix, which is derived separately for each species, makes it more accurate than existing methods. We applied pANT to the case of the unitary α-1,3-galactosyltransferase human pseudogene and found that our estimate of the non-functionalization time was in agreement with that obtained by taxonomic and paleontological considerations pertaining to the divergence between platyrrhines (New World monkeys) and cattarhines (Old World monkeys).
Fleishman S. J., Schlessinger J. & Ben-Tal N.
(2002)
Proceedings of the National Academy of Sciences of the United States of America.
99,
25,
p. 15937-15940
Overexpression of the receptor tyrosine kinase (RTK) erB2 also designated neu or HER2) was implicated in causing a variety of human cancers, including mammary and ovarian carcinomas. Ligand-induced receptor dimerization is critical for stimulation of the intrinsic protein tyrosine kinase (PTK) of RTKs. It was therefore proposed that PTK activity is stimulated as a result of the reorientation of the cytoplasmic domains within receptor dimers, leading to transautophosphorylation and stimulation of enzymatic activity. Here, we propose a molecular mechanism for rotation-coupled activation of the erbB2 receptor. Using a computational exploration of conformation space of the transmembrane (TM) segments of an erbB2 homodimer, we found two stable conformations of the TM domain. We suggest that these conformations correspond to the active and inactive states of erbB2, and that the receptor molecules may switch from one conformation to the other without crossing exceedingly unfavorable states. This model provides an explanation for the biochemical and oncogenic properties of erbB2, such as the effects of erbB2 overexpression on kinase activity and cell transformation. Furthermore, the opposing effects of the neu* activating oncogenic point mutation and the Val-655→-lle single-nucleoticle polymorphism shown to be linked to reduced risk of breast cancer are explained in terms of shifts in the equilibrium between the active and inactive states of erbB2 in vivo.
Fleishman S. J. & Ben-Tal N.
(2002)
Journal of Molecular Biology.
321,
2,
p. 363-378
Pairs of helices in transmembrane (TM) proteins are often tightly packed. We present a scoring function and a computational methodology for predicting the tertiary fold of a pair of α-helices such that its chances of being tightly packed are maximized. Since the number of TM protein structures solved to date is small, it seems unlikely that a reliable scoring function derived statistically from the known set of TM protein structures will be available in the near future. We therefore constructed a scoring function based on the qualitative insights gained in the past two decades from the solved structures of TM and soluble proteins. In brief, we reward the formation of contacts between small amino acid residues such as Gly, Cys, and Ser, that are known to promote dimerization of helices, and penalize the burial of large amino acid residues such as Arg and Trp. As a case study, we show that our method predicts the native structure of the TM homodimer glycophorin A (GpA) to be, in essence, at the global score optimum. In addition, by correlating our results with empirical point mutations on this homodimer, we demonstrate that our method can be a helpful adjunct to mutation analysis. We present a data set of canonical α-helices from the solved structures of TM proteins and provide a set of programs for analyzing it (http://ashtoret.tau.ac.il/∼sarel). From this data set we derived 11 helix pairs, and conducted searches around their native states as a further test of our method. Approximately 73% of our predictions showed a reasonable fit (RMS deviation