‘Stunning advance’ on ‘protein folding’: A 50-year-old science problem solved and that could mean big things – USA TODAY

A breakthrough on protein folding could unlock new possibilities into disease understanding and drug discovery, among other fields.(Photo: DeepMind)

Anew discovery about "protein folding" could unlock a world of possibilities into the understanding ofeverything from diseases to drugs, researchers say.

The breakthrough that is sending ripples of excitement throughthe science and medical communities this week deals with theshapestiny proteins in our bodies essential to all life fold into.

The so-called "protein-folding problem" has puzzled scientists for five decades, and the discovery this week from the London-based artificial intelligence lab DeepMind has been heralded as a major milestone.

"This computational work represents a stunning advance on the protein-folding problem, a 50-year old grand challenge in biology," said Venki Ramakrishnan, president of the U.K.'s Royal Society. "It has occurred decades before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research.

'The Ultimate': Astronomers want to put a huge telescope on the moon to study the Big Bang

Proteins are essential to life, supporting practically all of its functions, according to DeepMind, which is owned by Google. They are large, complex molecules, made up of chains of amino acids, and what a protein does largely depends on its unique 3D structure.

The ability to predict protein structures accurately enables a better understanding of what they do and how they work.

This isn't your typical space rock: There's a metal asteroid out there worth $10,000 quadrillion

When proteins are translated from their DNA codes, they quickly transform from a non-functional, unfolded state into their folded, functional state. Problems in folding can lead to diseases such asAlzheimer's and Parkinson's.

The companys breakthrough essentially means that it figured out how to use artificial intelligence to deliver relatively quick answers to questions about protein structure and function that would take many months or years to solve using currently available methods, according to STAT News.

Lunar discovery: Water discovered on sunlit part of the moon for the first time, NASA says

DeepMinds program, called AlphaFold, outperformed about 100 other teams in a biennial protein-structure prediction challenge called CASP, short for Critical Assessment of Structure Prediction, according to the journal Nature.

We have been stuck on this one problem how do proteins fold up for nearly 50 years," said University of Maryland professor John Moult, co-founder and chair of CASP. "To see DeepMind produce a solution for this, having worked personally on this problem for so long and after so many stops and starts wondering if wed ever get there, is a very special moment.

Researchers from DeepMind plan to publish their results in a peer-reviewed journal in the near future.

Read or Share this story: https://www.usatoday.com/story/news/nation/2020/12/03/protein-folding-discovery-major-breakthrough-deepmind/3809693001/

Read more here:
'Stunning advance' on 'protein folding': A 50-year-old science problem solved and that could mean big things - USA TODAY

Image of the Month: The right place of human Man1b1 – Baylor College of Medicine News

Location, location, location! It is especially important in the world of cells. The Man1b1 protein, known to be involved in regulating a balanced, functional network of cellular proteins, was assumed to localize in the endoplasmic reticulum.

Dr. Richard Siferss group challenged this widespread view by showing that Man1b1 is actually located in the Golgi, a cellular structure functionally associated with but physically separate from the endoplasmic reticulum. The findings sharpened the appreciation of the dynamic process that regulates protein folding and the handling of misfolded, defective proteins, known to be involved in a number of conditions such as conformational diseases.

Conformational diseases include common conditions associated with accumulation of defective proteins, including neurological disorders, such as Alzheimers disease. Human Man1b1 has been linked to the causes of multiple congenital disorders of intellectual disability and HIV infection, and to poor prognosis in patients with bladder cancer. A better understanding of how Man1b1 works can potentially open new doors into developing improved treatments.

Learn more about the research conducted at the Sifers lab here, including the recent discovery of an unexpected new function of Man1b1.

Dr. Richard Sifers is professor of pathology & immunologyand member of theDan L Duncan Comprehensive Cancer CenteratBaylor College of Medicine.

By Ana Mara Rodrguez, Ph.D.

More:
Image of the Month: The right place of human Man1b1 - Baylor College of Medicine News

Real Progress In Crowdsourcing Scientific Tasks To Gamers – Bio-IT World

By Deborah Borfitz

November 4, 2020 | Gaming and sciencetwo seemingly incompatible areas of activityhave come together nicely in the case of citizen science games such as Foldit, Phylo, and Borderlands Science, as reported by academics close to the action who presented at the recent Bio-IT World Conference & Expo Virtual. The games are all played online, involve analyzing large sets of data, and endeavoring to solve real scientific problems. And players get credit individually (when willing) or as a crowd when findings appear in scholarly, peer-reviewed publications.

Whats not to love about the concept? Its certainly a great way to redirect the attention of people already spending untold hours on video games, says Seth Cooper, assistant professor in the Khoury College of Computer Sciences at Northeastern University. A pioneer of the field of scientific discovery games, he has demonstrated that video game enthusiasts are able to outperform purely computational methods for certain types of structural biochemistry problems, effectively codify their strategies, and integrate with the lab to help design real synthetic proteins.

Cooper is co-creator of Foldit, where the competition is about protein folding and design. Its hard for a computer to search all the possibilities without the aid of human creativity and reason, he says. The game is built on chemistry software called Rosetta and has been out for over a decade with more than half a million players, Cooper continues. It has evolved into a multi-institutional collaboration.

The goal, as with most games, is to get a high score, Cooper says. Players compete, and often collaborate, to build the best protein structures.

The process begins with a biochemist identifying a problem that gets turned into a game or puzzle that gets posted online, he explains. Each puzzle is only available for about a week, and generally a couple are up for play at any one time. Data generated by the Foldit players continually improve the game for better scientific results, Cooper notes. The levels of play get progressively harder.

Anyone can participate and most have no formal background in biochemistry, yet theyre contributing to science, he says. Back in 2011, players famously came up with an elegant, low-energy model for a monkey-virus enzyme, solving a longstanding scientific problem potentially useful for the design of retroviral drugs for AIDSand accomplished the feat inside of three weeks.

Players have also successfully redesigned existing enzymes, Cooper adds, as well as designed several protein structures from scratch that have been confirmed by X-ray crystallography. Theyre now working on designing an enzyme that will bind to the spike protein of SARS-CoV-2.

Vanderbilt University is also using Foldit to design small molecules and the University of California, Davis is studying the impact of adding a narrative to the competition. In the future, Cooper says, Foldit users might start working in a virtual reality environment. An educational version of Foldit with more contextual science information is available for classroom use, says Cooper, as is a standalone version that is completely separate from the game.

Burning Task Use

At McGill University, associate professor and computational scientist Jerome Waldispuehl is championing the gamification of genomics research with citizen science video game Phylo and its newest iteration called Borderlands Science. His focus is on multiple sequence alignment, one of the most challenging problems in bioinformatics that involves discovering similarities between a set of protein or DNA sequences.

Phylo presents players with DNA puzzles where they manipulate patterns consisting of colored tiles so that they almost forget the scientific context, Waldispuehl says. The abstraction task is to minimize the mismatch of colors to avoid a penalty.

Every alignment submitted by players is eventually reinserted into an existing algorithm as an optimization, says Waldispuehl. Alignments up for play contain sections of human DNA thought to be linked to various genetic disorders. Since 2010, Phylo has had 350,000 participants and generated one million solutions by improving alignments by 40%-95% over a computer algorithm, he reports.

Borderlands Science, launched in April for purposes of education and science outreach, quickly hit the one million mark with players and has come up with 50 million solutions, he adds. Collaborators include video game science company Massively Multiplayer Online Science, Gearbox Software and The Microsetta Initiative of the University of California, San Diego.

The Borderlands version of the game is played vertically rather than horizontally and rewards success with in-game currency that is important to some players, Waldispuehl says. It is currently aimed at improving 16S ribosomal RNA gene sequences from human microbiome alignments.

Go here to read the rest:
Real Progress In Crowdsourcing Scientific Tasks To Gamers - Bio-IT World

Angelika Amon, cell biologist who pioneered research on chromosome imbalance, dies at 53 – MIT News

Angelika Amon, professor of biology and a member of the Koch Institute for Integrative Cancer Research, died on Oct. 29 at age 53, following a two-and-a-half-year battle with ovarian cancer.

"Known for her piercing scientific insight and infectious enthusiasm for the deepest questions of science, Professor Amon built an extraordinary career and in the process, a devoted community of colleagues, students and friends," MIT President L. Rafael Reif wrote in a letter to the MIT community.

Angelika was a force of nature and a highly valued member of our community, reflects Tyler Jacks, the David H. Koch Professor of Biology at MIT and director of the Koch Institute. Her intellect and wit were equally sharp, and she brought unmatched passion to everything she did. Through her groundbreaking research, her mentorship of so many, her teaching, and a host of other contributions, Angelika has made an incredible impact on the world one that will last long into the future.

A pioneer in cell biology

From the earliest stages of her career, Amon made profound contributions to our understanding of the fundamental biology of the cell, deciphering the regulatory networks that govern cell division and proliferation in yeast, mice, and mammalian organoids, and shedding light on the causes of chromosome mis-segregation and its consequences for human diseases.

Human cells have 23 pairs of chromosomes, but as they divide they can make errors that lead to too many or too few chromosomes, resulting in aneuploidy. Amons meticulous and rigorous experiments, first in yeast and then in mammalian cells, helped to uncover the biological consequences of having too many chromosomes. Her studies determined that extra chromosomes significantly impact the composition of the cell, causing stress in important processes such as protein folding and metabolism, and leading to additional mistakes that could drive cancer. Although stress resulting from aneuploidy affects cells ability to survive and proliferate, cancer cells which are nearly universally aneuploid can grow uncontrollably. Amon showed that aneuploidy disrupts cells usual error-repair systems, allowing genetic mutations to quickly accumulate.

Aneuploidy is usually fatal, but in some instances extra copies of specific chromosomes can lead to conditions such as Down syndrome and developmental disorders including those known as Patau and Edwards syndromes. This led Amon to work to understand how these negative effects result in some of the health problems associated specifically with Down syndrome, such as acute lymphoblastic leukemia. Her expertise in this area led her to be named co-director of the recently established Alana Down Syndrome Center at MIT.

Angelikas intellect and research were as astonishing as her bravery and her spirit. Her labs fundamental work on aneuploidy was integral to our establishment of the center, say Li-Huei Tsai, the Picower Professor of Neuroscience and co-director of the Alana Down Syndrome Center. Her exploration of the myriad consequences of aneuploidy for human health was vitally important and will continue to guide scientific and medical research.

Another major focus of research in the Amon lab has been on the relationship between how cells grow, divide, and age. Among other insights, this work has revealed that once cells reach a certain large size, they lose the ability to proliferate and are unable to reenter the cell cycle. Further, this growth contributes to senescence, an irreversible cell cycle arrest, and tissue aging. In related work, Amon has investigated the relationships between stem cell size, stem cell function, and tissue age. Her labs studies have found that in hematopoetic stem cells, small size is important to cells ability to function and proliferate in fact, she posted recent findings on bioRxiv earlier this week and have been examining the same questions in epithelial cells as well.

Amon lab experiments delved deep into the mechanics of the biology, trying to understand the mechanisms behind their observations. To support this work, she established research collaborations to leverage approaches and technologies developed by her colleagues at the Koch Institute, including sophisticated intestinal organoid and mouse models developed by the Yilmaz Laboratory, and a microfluidic device developed by the Manalis Laboratory for measuring physical characteristics of single cells.

The thrill of discovery

Born in 1967, Amon grew up in Vienna, Austria, in a family of six. Playing outside all day with her three younger siblings, she developed an early love of biology and animals. She could not remember a time when she was not interested in biology, initially wanting to become a zoologist. But in high school, she saw an old black-and-white film from the 1950s about chromosome segregation, and found the moment that the sister chromatids split apart breathtaking. She knew then that she wanted to study the inner workings of the cell and decided to focus on genetics at the University of Vienna in Austria.

After receiving her BS, Amon continued her doctoral work there under Professor Kim Nasmyth at the Research Institute of Molecular Pathology, earning her PhD in 1993. From the outset, she made important contributions to the field of cell cycle dynamics. Her work on yeast genetics in the Nasmyth laboratory led to major discoveries about how one stage of the cell cycle sets up for the next, revealing that cyclins, proteins that accumulate within cells as they enter mitosis, must be broken down before cells pass from mitosis to G1, a period of cell growth.

Towards the end of her doctorate, Amon became interested in fruitfly genetics and read the work of Ruth Lehmann, then a faculty member at MIT and a member of the Whitehead Institute. Impressed by the elegance of Lehmanns genetic approach, she applied and was accepted to her lab. In 1994, Amon arrived in the United States, not knowing that it would become her permanent home or that she would eventually become a professor.

While Amons love affair with fruitfly genetics would prove short, her promise was immediately apparent to Lehmann, now director of the Whitehead Institute. I will never forget picking Angelika up from the airport when she was flying in from Vienna to join my lab. Despite the long trip, she was just so full of energy, ready to talk science, says Lehmann. She had read all the papers in the new field and cut through the results to hit equally on the main points.

But as Amon frequently was fond of saying, yeast will spoil you. Lehmann explains that because they grow so fast and there are so many tools, your brain is the only limitation. I tried to convince her of the beauty and advantages of my slower-growing favorite organism. But in the end, yeast won and Angelika went on to establish a remarkable body of work, starting with her many contributions to how cells divide and more recently to discover a cellular aneuploidy program.

In 1996, after Lehmann had left for New York Universitys Skirball Institute, Amon was invited to become a Whitehead Fellow, a prestigious program that offers recent PhDs resources and mentorship to undertake their own investigations. Her work on the question of how yeast cells progress through the cell cycle and partition their chromosomes would be instrumental in establishing her as one of the worlds leading geneticists. While at Whitehead, her lab made key findings centered around the role of an enzyme called Cdc14 in prompting cells to exit mitosis, including that the enzyme is sequestered in a cellular compartment called the nucleolus and must be released before the cell can exit.

I was one of those blessed to share with her a eureka moment, as she would call it, says Rosella Visintin, a postdoc in Amons lab at the time of the discovery and now an assistant professor at the European School of Molecular Medicine in Milan. She had so many. Most of us are lucky to get just one, and I was one of the lucky ones. Ill never forget her smile and scream neither will the entire Whitehead Institute when she saw for the first time Cdc14 localization: You did it, you did it, you figured it out! Passion, excitement, joy everything was in that scream.

In 1999, Amons work as a Whitehead Fellow earned her a faculty position in the MIT Department of Biology and the MIT Center for Cancer Research, the predecessor to the Koch Institute. A full professor since 2007, she also became the Kathleen and Curtis Marble Professor in Cancer Research, associate director of the Paul F. Glenn Center for Biology of Aging Research at MIT, a member of the Ludwig Center for Molecular Oncology at MIT, and an investigator of the Howard Hughes Medical Institute.

Her pathbreaking research was recognized by several awards and honors, including the 2003 National Science Foundation Alan T. Waterman Award, the 2007 Paul Marks Prize for Cancer Research, the 2008 National Academy of Sciences (NAS) Award in Molecular Biology, and the 2013 Ernst Jung Prize for Medicine. In 2019, she won the Breakthrough Prize in Life Sciences and the Vilcek Prize in Biomedical Science, and was named to the Carnegie Corporation of New Yorks annual list of Great Immigrants, Great Americans. This year, she was given the Human Frontier Science Program Nakasone Award. She was also a member of the NAS and the American Academy of Arts and Sciences.

Lighting the way forward

Amons perseverance, deep curiosity, and enthusiasm for discovery served her well in her roles as teacher, mentor, and colleague. She has worked with many labs across the world and developed a deep network of scientific collaboration and friendships. She was a sought-after speaker for seminars and the many conferences she attended. In over 20 years as a professor at MIT, she has mentored more than 80 postdocs, graduate students, and undergraduates, and received the School of Sciences undergraduate teaching prize.

Angelika was an amazing, energetic, passionate, and creative scientist, an outstanding mentor to many, and an excellent teacher, says Alan Grossman, the Praecis Professor of Biology and head of MITs Department of Biology. Her impact and legacy will live on and be perpetuated by all those she touched.

Angelika existed in a league of her own, explains Kristin Knouse, one of Amons former graduate students and a current Whitehead Fellow. She had the energy and excitement of someone who picked up a pipette for the first time, but the brilliance and wisdom of someone who had been doing it for decades. Her infectious energy and brilliant mind were matched by a boundless heart and tenacious grit. She could glance at any data and immediately deliver a sharp insight that would never have crossed any other mind. Her positive attributes were infectious, and any interaction with her, no matter how transient, assuredly left you feeling better about yourself and your science.

Taking great delight in helping young scientists find their own eureka moments, Amon was a fearless advocate for science and the rights of women and minorities and inspired others to fight as well. She was not afraid to speak out in support of the research and causes she believed strongly in. She was a role model for young female scientists and spent countless hours mentoring and guiding them in a male-dominated field. While she graciously accepted awards for women in science, including the Vanderbilt Prize and the Women in Cell Biology Senior Award, she questioned the value of prizes focused on women as women, rather than on their scientific contributions.

Angelika Amon was an inspiring leader, notes Lehmann, not only by her trailblazing science but also by her fearlessness to call out sexism and other -isms in our community. Her captivating laugh and unwavering mentorship and guidance will be missed by students and faculty alike. MIT and the science community have lost an exemplary leader, mentor, friend, and mensch.

Amons wide-ranging curiosity led her to consider new ideas beyond her own field. In recent years, she has developed a love for dinosaurs and fossils, and often mentioned that she would like to study terraforming, which she considered essential for a human success to life on other planets.

It was always amazing to talk with Angelika about science, because her interests were so deep and so broad, her intellect so sharp, and her enthusiasm so infectious, remembers Vivian Siegel, a lecturer in the Department of Biology and friend since Amons postdoctoral days. Beyond her own work in the lab, she was fascinated by so many things, including dinosaurs dreaming of taking her daughters on a dig lichen, and even life on Mars.

Angelika was brilliant; she illuminated science and scientists, says Frank Solomon, professor of biology and member of the Koch Institute. And she was intense; she warmed the people around her, and expanded what it means to be a friend.

Amon is survived by her husband Johannes Weis, and her daughters Theresa and Clara Weis, and her three siblings and their families.

Read this article:
Angelika Amon, cell biologist who pioneered research on chromosome imbalance, dies at 53 - MIT News

If AlphaFold Is a Product of Design, Maybe Our Bodies Are Too – Walter Bradley Center for Natural and Artificial Intelligence

Recently, weve been looking at tech philosopher George Gilders new Gaming AI about what AI canand cantdo for us. It cant do our thinking for us but it can do many jobs we dont even try because no human being has enough time or patience to motor through all the calculations.

Which brings us to the massive complexity of the proteins that carry out our genetic instructionsbetter knowledge of which would help us battle many diseases.

Gilder notes that when DeepMinds AlphaGo beat humans at the board game Go in 2016, it wasnt just for the fun of winning a game. DeepMind cofounder Demis Hassabis (pictured in 2018) is more interested in real-life uses such as medical research (p. 11). The human body is very complex and a researcher can be confronted with thousands of possibilities. Which ones matter?

The area the DeepMind team decided to focus on is protein folding: Human DNA has 64 codons that program little machines in our cells (ribosomes) to create specific proteins out of the standard twenty amino acids. But, to do their jobs, the proteins fold themselves into many, many different shapes. Figuring it all out is a real problem for researchers and the DeepMind crew hope that AI will help:

Over the past five decades, researchers have been able to determine shapes of proteins in labs using experimental techniques like cryo-electron microscopy, nuclear magnetic resonance and X-ray crystallography, but each method depends on a lot of trial and error, which can take years of work, and cost tens or hundreds of thousands of dollars per protein structure. This is why biologists are turning to AI methods as an alternative to this long and laborious process for difficult proteins. The ability to predict a proteins shape computationally from its genetic code alonerather than determining it through costly experimentationcould help accelerate research.

As Gilder recounts, the biotech industry conducts annual global protein-folding competitions among molecular biologists and in 2019 DeepMind defeated all teams of relatively unaided human rivals:

Advancing from the unaided human level of two or three correct protein configurations out of forty, DeepMind calculated some thirty-three correct solutions out of forty. This spectacular advance opens the way to major biotech gains in custom-built protein molecules adapted to particular people with particular needs or diseases. It is the most significant biotech invention since the complementary CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) method for using enzymes directly to edit strands of DNA.

But now that we have found a way to tackle one aspect of the immense complexity of human bodily existence, heres an interesting problem to think about: We are told by many philosophers that life came to exist on Earth purely by chance. How likely is that, given the intricacy of the machinery that governs our bodies?

Kirk Durston, a biophysicist who studies protein folds, comments:

As we all know from probabilities, you can get lucky once, but not thousands of times

As real data shows, the probability of finding a functional sequence for one average protein family is so low, there is virtually zero chance of obtaining it anywhere in this universe over its entire history never mind finding thousands of protein families.

Yet thats what we have. All those protein families. As we learn more about the world we live in, we may find ourselves confronting more challenges like this: We had to invent a really complex machine to even begin to figure out protein folding in our bodies and we know that the machine did not happen by chance. So why should we believe that our bodies happened that way? Probably not.

Note: While medicine may be the most important way AI can help us, it also helps us in other areas where huge numbers of calculations are essential for success. For example, it can help recover lost languages and interpret charred scrolls. It can continuously scan the skies, sparing astronomers for more human-friendly work like interpreting the results. It can restore blurred images and help with cold case files. As with anything, the trick is to take advantage of what it can really do. We dont need the courtroom sentencing robot or the AI Jesusbut then we never did. As our information resources become larger and more complex, we do need some help with the sheer volume and thats where AI is bound to succeed.

You may also enjoy:

Why AI geniuses havent created true thinking machines. The problems have been hinting at themselves all along. Quantum computers play by the same rules as digital ones: Meaningful information still requires an interpreter (observer) to relate the map to the territory.

Original post:
If AlphaFold Is a Product of Design, Maybe Our Bodies Are Too - Walter Bradley Center for Natural and Artificial Intelligence

Regulation of chaperone function by coupled folding and oligomerization – Science Advances

Abstract

The homotrimeric molecular chaperone Skp of Gram-negative bacteria facilitates the transport of outer membrane proteins across the periplasm. It has been unclear how its activity is modulated during its functional cycle. Here, we report an atomic-resolution characterization of the Escherichia coli Skp monomer-trimer transition. We find that the monomeric state of Skp is intrinsically disordered and that formation of the oligomerization interface initiates folding of the -helical coiled-coil arms via a unique stapling mechanism, resulting in the formation of active trimeric Skp. Native client proteins contact all three Skp subunits simultaneously, and accordingly, their binding shifts the Skp population toward the active trimer. This activation mechanism is shown to be essential for Salmonella fitness in a mouse infection model. The coupled mechanism is a unique example of how an ATP-independent chaperone can modulate its activity as a function of the presence of client proteins.

Molecular chaperones are central for the survival of the cell in all kingdoms of life (12). They are involved in many cellular processes such as helping proteins to fold, preventing protein aggregation, and reducing cellular stress (3). Some chaperones can use adenosine triphosphate (ATP) binding and hydrolysis to trigger conformational changes that, in turn, regulate their functional cycle, including their interaction with client proteins (4). ATP-independent chaperones, in turn, lack this possibility. Nonetheless, some ATP-independent chaperones were found to be regulated by major conformational changes, and the transition mechanisms for the activation of ATP-independent chaperones have been classified into three categories (5): oligomer disassembly [small heat shock protein (sHSP) (6) and trigger factor (TF) (79)], order-to-disorder transition {Hsp33 (10), HdeA [HNS (histone-like nucleoid structuring)dependent expression A] (11), and HdeB (12)}, and lack of major conformational change [spheroplast protein Y (Spy) (13, 14), seventeen kilodalton protein (Skp) (15), HSP40 (16), SecB (17), and survival factor A (SurA) (18)]. These mechanisms of activation are of major biological importance, because constitutively active chaperones can interfere with protein folding processes and proteostasis due to their high affinity and low specificity for client proteins, thus representing a potential hazard to cells (1922). An example of these detrimental effects has been reported for a constitutively active variant of the chaperone Hsp33, which lead to accumulation of large amounts of insoluble aggregates and severe growth disadvantages (20).

Representative of the first category, binding of chaperone sHSP to its client proteins is regulated via a shift from an inactive oligomeric ensemble toward an ensemble of smaller multimers, representing the active species (6). The monomeric species exposes a binding motif that is shielded within the oligomeric structure, making the large oligomeric state an inactive storage form that can be activated upon dissociation (23, 24). Similarly, it has been shown that binding of TF to client proteins is accompanied by a shift from the inactive dimeric state toward the active monomeric state (79). By contrast, the order-to-disorder activation is found for chaperones where the active form is intrinsically disordered. Thereby, to shift from the folded inactive chaperone to the unfolded active chaperone, not only the oligomeric state but also the secondary structure of the chaperone is undergoing change, triggered either by a pH drop to acidic conditions (HdeA and HdeB) or by a redox transition (Hsp33). Once stress factors are reduced, these chaperones can return to their folded/oligomeric inactive state with a release of the client (25). The third category contains chaperones for which only one conformational state is known, and therefore, these are assumed to require no major conformational changes for their activation, as well as chaperones for which activation requires only minor conformational changes. One such example is provided by the chaperone Hsp40, which has minimal structural differences between its client-bound and apo state (16). Another example is given by the chaperone SecB, for which high-resolution structures of client-bound states showed only a minor conformational change to the inactive client-free state (17). In the client-free form, helix 2 acts as a lid of the client protein binding site. Upon client binding, this helix swings outward, thereby allowing access to the client binding groove. Similarly, the chaperone SurA has been shown to have a dynamic mechanism of activation where a domain connected to the chaperone core by linkers assists client protein recognition, binding, and release (18).

The periplasmic chaperone Skp is an integral part of outer membrane protein (Omp) biogenesis, on a parallel pathway with the chaperone SurA. Skp transports Omps in their unfolded state across the periplasm toward their insertion point into the outer membrane (2628). Yersinia skp and Salmonella skp mutants show compromised virulence in rodent infection models, indicating a crucial role of Skp in vivo (29, 30). Skp is structurally characterized by a trimeric oligomeric state with a jellyfish-like architecture (31, 32). Each protomer contributes three -strands toward a nine-stranded -barrel in the trimerization interface and a long, -helical arm, made of two -helices in coiled-coil arrangement (31, 32). The combination of three arms from the individual subunits leads to formation of a cavity that can accommodate and bind unfolded Omps (15, 33).

The elongated arms of Skp are highly flexible in the apo state, and a recent molecular dynamics study has identified a pivot element to act as a hinge, allowing Skp to adapt to clients of different sizes (15, 34). Upon binding, the Skp arms undergo a rigidification and keep the bound Omps inside the cavity in the fluid globule state (15, 35). While Skp can accommodate differently sized protein clients, all functional complexes observed so far feature an Omp:Skp stoichiometry of 1:3 or 1:6, depending on the size of the client, suggesting that Skp binds clients always as a trimer (36). A recent study has emphasized that at physiological concentrations, Skp exists as an equilibrium between a trimeric and a monomeric form (37). The equilibrium was quantified by analytical ultracentrifugation (AUC), showing that the monomeric form is strongly dominant at 2 M Skp, the concentration found in Escherichia coli stationary phase (38, 39). The monomeric form of Skp has been proposed to be well folded based on indirect evidence (37); however, it has so far not been possible to directly analyze its structure, because at the high concentration required for most biophysical methods, the protein is mostly trimeric. Consequently, the structural features of the Skp monomeric state and the Skp activation mechanism remain poorly understood.

Here, we bypass this analytical challenge by introducing several weakly and non-oligomerizing mutants of Skp. We characterize their monomeric states by solution nuclear magnetic resonance (NMR) spectroscopy at the atomic level. The emerging reference data can then be used to fruitfully understand monomeric Skp(WT). The data show that monomeric Skp is intrinsically disordered and inactive and that binding of a client protein triggers Skp trimerization and activation. Last, we demonstrate that this mechanism is essential for bacterial virulence under in vivo conditions in a mouse infection model. The data thus reveal an essential mechanism regulating Skp chaperone activity by a combined disorder-to-order and oligomerization transition.

To prepare samples of monomeric Skp at concentrations sufficient for structural characterization, we set out to design mutants that would destabilize the oligomerization interface to shift the oligomerization equilibrium toward the monomeric form. The structure of trimeric Skp is stabilized by a network of three -sheets per subunit that together form the trimerization interface in the head of the molecule (Fig. 1A). We identified the conserved alanine-103 and alanine-108 as promising candidates, because they are located at the oligomerization interface with limited space for their side chains. Their replacement by a bulkier side chain such as leucine or arginine should introduce steric clashes, leading to destabilization of the trimer (Fig. 1, A and B). In addition, we designed the mutant V117P to insert a proline residue, which is a known secondary structure breaker, into the trimerization -sheet (2). The oligomerization state of each of the Skp mutants Skp(A103L), Skp(A103R), Skp(A108L), Skp(A108R), and Skp(V117P) was determined by SEC-MALS (size-exclusion chromatography coupled to multi-angle light scattering) experiments at an elution concentration of 80 M. At this concentration, the wild-type (WT) protein is mostly trimeric with a monomeric fraction lower than 4%. The mutant A103L was hardly distinguishable from WT, but the other Skp variants featured a gradually increased monomeric fraction, as evidenced by a smaller apparent mass, in the order A103R, A108L, V117P, and A108R (Fig. 1C). Thereby, mutants A108R and V117P were fully monomeric, and the others had effective molecular weights in between monomer and trimer, suggesting the presence of dynamic equilibria. We quantified the concentration dependence of these equilibria for Skp(A103L), Skp(A103R), and Skp(A108L) by solution NMR spectroscopy and SEC-MALS experiments (Fig. 1, C to E; Table 1; and fig. S1). Skp(WT) followed an equilibrium with C0.5 = 1.5 M, the protein concentration at which half of the molecules are in the monomeric form, in agreement with published data (37). Skp(A103L) showed a trimer-monomer equilibrium that was essentially identical to WT (Fig. 1E and fig. S1), whereas Skp(A103R) had C0.5 = 7 2 M and Skp(A108L) had C0.5 = 80 20 M, indicating that these mutations shifted the equilibrium by about one to two orders of magnitude toward the monomer (Fig. 1, D and E, and fig. S1). The two mutants Skp(A108R) and Skp(V117P) were found to be monomeric at concentrations of even up to 1 mM (Fig. 1E and fig. S1).

(A) Location of the mutation sites [red boxes (I), (II), and (III)], displayed on the Skp crystal structure (Protein Data Bank: 1SG2). Secondary structure elements and termini are indicated. (B) Close-up of the interface between Skp subunits, highlighting the position of the five mutations. See text for details. (C) SEC elution profiles (solid lines, left axis) and MALS apparent molecular mass (MM) (dotted lines, right axis) at elution concentrations of 80 M and a temperature of 25C. Dark gray, Skp(WT); brown, Skp(A103L); green, Skp(A103R); blue, Skp(A108L); magenta, Skp(A108R); purple, Skp(V117P). Gray horizontal lines indicate the molecular masses of monomers, dimers, and trimers of Skp. (D) Experiment as in (C) for Skp(A108L) as a function of the elution concentrations: 5, 20, and 79 M. (E) Fractional populations f of monomers in the monomer-trimer equilibrium as a function of total Skp concentration. Experimental data points from NMR and SEC-MALS are indicated by filled and open circles, respectively. These have been fitted by Eq. 4 for mutants A103R and A108L (solid lines). The corresponding fractional populations of the trimeric state, 1 f, are shown by dashed lines. Note that the concentration of Skp trimers equals one-third of the concentration of Skp molecules in the trimeric state, i.e., [Skp]trimer = 1/3 (1 f) [Skp]tot. For Skp(WT) and Skp(A103L), the data follow the WT association constant published by Sandlin et al. (37).

Error estimates have been omitted for clarity. n.d., not determined.

We then characterized the structural integrity of Skp(A103L), Skp(A103R), and Skp(A108L) in their trimeric forms by NMR spectroscopy. For each of these proteins, two-dimensional (2D) [15N,1H]-TROSY (transverse relaxation-optimized spectroscopy) fingerprint spectra show the presence of two species in slow exchange on the NMR time scale, i.e., with kinetic exchange rate constants kex 10 s1 (fig. S1). For each of the mutants, the overlay of the NMR spectra at 25C (fig. S2) shows a high degree of similarity with the WT protein for most resonances, with considerable chemical shift perturbations only for some residues. Those residues are all located in spatial vicinity of the mutation site, in full agreement with the expected local distortion effects of single point mutations (fig. S2). The signals of residues located in the arms are not affected by the mutations, suggesting that symmetry and structural integrity of the trimeric form of the protein are maintained in the mutant. Oligomeric states other than the monomer and the trimer were not detected. The mutations thus shift the oligomerization equilibrium while leaving the trimeric form largely intact.

The mutant Skp(A108L) with a C0.5 of 80 20 M at 25C allowed us to prepare the monomeric state at concentrations of 100 M and above, which is required for solution NMR spectroscopy. The NMR spectra of monomeric Skp(A108L) are completely overlapping with the monomeric, but low-abundant conformation of Skp(WT) (Fig. 2A), indicating that the conformations are essentially identical and thus validating the further analysis. Increasing the temperature from 25 to 37C shifted the equilibrium of Skp(A108L) further toward the monomer, resulting in around 95% monomer at a concentration of 1 mM and thus further increasing the NMR signal intensity (fig. S2). A primary classification of the type of conformational state of monomeric Skp was obtained from the observation of a narrow chemical shift dispersion of backbone amide NMR signals, which is characteristic for proteins with low structural propensity (fig. S3). To quantify the secondary structure elements, we established complete sequence-specific resonance assignments of the monomeric state (fig. S3) and determined backbone 13C and 13C secondary chemical shifts (Fig. 2B). These show that the three -sheets that constitute the oligomerization interface in the trimeric form are in random-coil conformation in the monomeric state. Furthermore, the four -helices forming the arms of Skp are in a fast conformational exchange between folded and unfolded conformations, as evidenced by the observation of a single set of resonances in fast exchange. Taking the fully denatured form of Skp in 8 M urea solution and the folded trimer as reference points, the residual helicity can be quantified for each residue (Fig. 2, B and C). The analysis shows that the helices 1, 3.B, and 4, which are closest to the trimerization interface, feature a residual helicity of <20%, while the helices 2.A, 2.B, and 3.A located at the tip of the arms display a helical population of 20 to 30% (Fig. 2, C and D). Overall, these data show that a small amount of residual -helical structure is present in the disordered Skp monomers, but that the complete formation of the helices requires the trimerization interface. Overall, these data demonstrate that the monomeric state of Skp is intrinsically disordered with some residual helical propensity located at the tip of the arms. In the trimeric structure, the circular-barrel interface, connecting the N- and C-terminal part of the protein, brings helices 2 and 3 close together in space and thus stabilizes their secondary structure (Fig. 2E). This unique mechanism resembles a stapling of the coiled-coil helices to the barrel in the head domain.

(A) Sections of 2D [15N,1H]-TROSY spectra of [U-2H,15N] Skp(WT) (dark gray) and Skp(A108L) (blue) at a concentration of 1 mM and 37C in NMR buffer (20 mM MES, pH 6.5, and 150 mM NaCl). NMR signals of the monomeric state of Skp(WT) are overlaying with the one from Skp(A108L). The assignments of the overlapping NMR signals of the monomeric state are indicated in the panel. (B) Residue-specific secondary backbone chemical shifts of Skp(WT) in 8 M urea solution, Skp(A108L) in its monomeric form, and Skp(WT) in its trimeric form. Positive and negative values indicate -helical and -sheet secondary structure elements, respectively. The gray-shaded area indicates the positions of helices in the Skp trimer. (C) Percentage of helical population in the conformational ensemble of the Skp monomer. Helical regions with 10 to 20% helicity or 20 to 30% helicity are highlighted with light or dark yellow, respectively. (D) Structural model of the Skp monomer. On a configuration of Skp with -helices formed, the degree of residual helical population present in the conformational ensemble is indicated. The large majority of monomeric Skp is disordered. (E) Schematic model of coupled oligomerization and folding mechanism of Skp. Monomeric Skp explores an ensemble of conformations with a low propensity for the formation of the arm -helices. The formation of the oligomerization interface brings the N and C termini together (red arrows), thus stabilizing the coiled-coil structure of the -helical arms.

It has been previously proposed that the monomeric state of Skp would be well folded rather than disordered (37). That conclusion was obtained from indirect measurements of the molar heat capacity change Cp of trimer formation by a vant Hoff analysis of temperature-dependent AUC data, which indicated a value of Cp = 0.62 0.11 kcal mol1 K1 for the Skp monomer-trimer transition. Because the authors expected a value for a coupled folding and oligomerization of Cp = 8.01 3.3 kcal mol1 K1, they concluded that only trimerization, but not folding, would take place during oligomerization. To resolve these different views, we determined Cp of Skp(WT) directly by differential scanning calorimetry (DSC) to Cp = 2.9 0.4 kcal mol1 K1 at 37C (fig. S3). Considering the average residual helicity of 21% in the monomer, this corresponds to a value of 1.1 kcal mol1 K1 for folding of one monomer subunit, which is a similar value to proteins of the same size (40, 41). We note that Cp is strongly temperature dependent (fig. S3), which may have perturbed the precision of the vant Hoff analysis by Sandlin et al. (37).

Having established that Skp activation comprises an equilibrium between a folded trimer and a disordered monomer, it appears relevant to understand how this equilibrium contributes to Skp chaperone activity. As a model client, we use the native client protein tOmpA, an eight-stranded transmembrane domain of OmpA. tOmpA, when bound to Skp, adopts a conformational ensemble of rapidly reorienting conformers (15). To investigate whether tOmpA binds to the trimeric or the monomeric state of Skp, or to both, we used an activity assay with all mutants. In a first step, we measured the chaperone activity by quantifying the amount of Skp-bound tOmpA. Intriguingly, the activity correlated with the concentration of the trimer for all Skp variants, such that, e.g., Skp(A108L) has around 50% of the Skp(WT) activity and that no chaperone activity could be detected for Skp(V117P) and Skp(A108R) (Fig. 3, A and B, and fig. S4).

(A) Holdase activity of Skp variants as determined by the amount of aggregation-prone tOmpA solubilized in equilibrium. Values are normalized to the activity of Skp(WT). Error bars represent the SD of 15 individual signals of tOmpA. (B) 2D [15N,1H]-TROSY fingerprint spectra of [U-2H,15N]-tOmpA bound to unlabeled Skp(WT) or Skp(A108L). Spectra were recorded at a temperature of 37C in NMR buffer (20 mM MES, pH 6.5, and 150 mM NaCl). A 1D 1H cross section shows the intensity of alanine-176. (C) Combined amide chemical shift differences between [U-2H,15N]-Skp(WT) and [U-2H,15N]-Skp(A108L) with bound unlabeled tOmpA. The magnitude of 2 SDs [0.053 parts per million (ppm)] is indicated by a dashed line. (D) Structural model of Skp(108L) with bound tOmpA. Amide groups with chemical shift changes larger than 2 SDs upon binding of tOmpA to Skp(A108L) are marked in light blue. The position of A108 is indicated by a blue circle. (E and F) 2D [15N,1H]-TROSY fingerprint spectra of [U-2H,15N] Skp(A108L) in the absence (E) and presence (F) of unlabeled tOmpA. Spectra were recorded at 37C in NMR buffer. The spectral area 7.5 to 8.5 ppm in 1H, corresponding to disordered protein states, is indicated by gray lines. 1D 1H cross sections of lysine-141 in the monomeric (M) and trimeric (T) state of Skp are shown, and the relative fractions are indicated.

We then selected Skp(A108L) to characterize structure and arrangement of the tOmpA-Skp(A108L) complex. First, addition of tOmpA to Skp(A108L) increases the apparent molecular mass in SEC-MALS experiments (fig. S4). Second, the 2D [15N,1H]-TROSY NMR spectra of isotope-labeled tOmpA bound to unlabeled Skp(A103L), Skp(A103R), Skp(A108L), or unlabeled Skp(WT) are highly similar (Fig. 3B and fig. S4). Because the chemical shift is a population-weighted average over the individual conformers in the tOmpA ensemble, this observation indicates that the client conformational ensemble inside the chaperone is essentially unperturbed by the local structural adaptations, resulting from the mutation A103L, A103R, or A108L. Third, a direct spectral comparison showed that the chemical shift perturbations that occur on the Skp trimeric state upon tOmpA binding are highly similar for Skp(WT), Skp(A103L), Skp(A103R), and Skp(A108L) (Fig. 3, C and D, and fig. S4). Identically to the apo state, only one set of NMR signals is present for the trimeric state, showing that the complex with tOmpA does not involve other stable oligomeric states (Fig. 4, D to G). Furthermore, for all mutants with a considerable population of the trimeric state, binding of tOmpA induces similar chemical shift perturbations, confirming a similar mode of binding (Fig. 3, C and D, and fig. S3). As a consequence, the structural description that was previously established for the Skp-tOmpA complex (15) can be assumed in good first-order approximation also for Skp(A108L), although the thermodynamics and kinetic of the ensemble are somewhat different (Fig. 3, B to D).

(A) Fitness of Salmonella strains with various chromosomal skp mutations in rich lysogeny broth. Data for individual cultures and means are shown. (B) Fitness of Salmonella strains in a mouse infection model. Each circle represents data for one mouse from a total of two independent infection experiments (****P < 0.0001 and ***P < 0.001; statistical significance of difference to values for WT based on t test with Holm-dk correction for multiple comparisons). Corresponding competitive index data are shown in fig. S4. (C) Functional cycle of Skp. In the absence of client proteins, Skp populates the periplasm in monomeric form up to low micromolar concentrations. These partially disordered monomers are functionally inactive. An emerging Omp client at the inner membrane recruits an active trimeric chaperone from the ensemble equilibrium. Upon release of the client, trimeric Skp dissociates and the monomers enter the pool of inactive disordered conformations. See text for details.

Then, we investigated the effect of tOmpA binding on the Skp monomer-trimer equilibrium at a temperature of 37C, where Skp(A108L) is more than 80% in its monomeric state and Skp(A108R) and Skp(V117P) are completely monomeric (Fig. 3, E and F, and fig. S4). For Skp(A108L), binding of tOmpA resulted in a strong shift of the population levels from the monomeric toward the trimeric state, while no change was observed for Skp(A108R) and Skp(V117P) (Fig. 3, E and F, and fig. S4). Furthermore, for all Skp variants with considerable population of the monomeric state, the NMR signal positions of the monomeric state were not perturbed by the addition of tOmpA, confirming that there is no detectable interaction between monomeric Skp and the Omp client (fig. S3). This is an additional proof that only the structured trimer, but not the disordered monomer, has chaperone activity.

Because a bound tOmpA client is in direct contact with all three arms of Skp simultaneously (15), client binding contributes by avidity to the thermodynamic stability of the trimeric state of the chaperone. We quantified the difference in free energy of apo-Skp(WT) in comparison to tOmpA-Skp(WT) by a denaturation titration (fig. S4). Binding of tOmpA to Skp(WT) increased its stability by 1.7 kJ mol1, corroborating the stabilization effect of the trimeric state by the binding of its client protein. Overall, the data show that monomeric, disordered Skp does not interact with the Omp client and that client binding increases the stability of the Skp trimer by avidity, thus shifting the conformational equilibrium toward the trimeric state.

Skp is dispensable for growth of various bacterial species under rich laboratory conditions. However, bacterial pathogens such as Yersinia and Salmonella require Skp for growth in hostile host tissue. To determine whether the Skp activation mechanism that we identified is important under these physiologically relevant conditions, we engineered analogous point mutants in Salmonella enterica serovar Typhimurium. Salmonella Skp is highly homologous to E. coli Skp, with 91% identity (fig. S4). We selected three of the mutations for these experiments, the two mildest ones A103R and A103L, as well as V117P, and also engineered a strain with complete genetic deletion of the skp gene (skp). As expected, neither the point mutants nor a full skp deletion affects Salmonella fitness in rich lysogeny broth (Fig. 4A and Table 1). We then tested the same mutants in competitive infections in a mouse typhoid fever model. In competitive infections, mice are infected with a mixture of WT and mutant strains. Plating of bacteria retrieved from spleen of these mice yields the fitness of mutants relative to the WT bacteria in each mouse. This approach reduces interindividual variance and offers higher statistical power with limited numbers of experimental animals compared to single-strain infections. The data reveal a slight but significant fitness defect of Salmonella skp(A103L) compared to WT and strong fitness defects for mutants skp(A103R) and skp(V117P), which are comparable to the full skp deletion (Fig. 4B and Table 1; competitive index data in fig. S4). These results show that already subtle perturbations of the Skp monomer-trimer equilibrium diminish Skp function in vivo and that perturbation of this equilibrium by less than an order of magnitude in C0.5 completely abolishes Skp function, rendering bacteria nonvirulent.

In this work, we have elucidated the activation mechanism of the molecular chaperone Skp at atomic resolution. The monomer state of Skp is intrinsically disordered, with a limited residual propensity of -helicity in the coiled-coil tentacle arms. This low inherent stability of helices 2 and 3 is particularly interesting, because they are not involved in inter-subunit contacts in the trimer structure. The formation of the head domain trimer merely fixes the positions of the end points of the -helices in space, thus stabilizing them by reducing the conformational entropy of the unfolded state. This unique mechanism resembles a stapling of the coiled-coil helices to the barrel in the head domain. A directly related effect is being exploited in peptide chemistry to stabilize helical conformation of small peptides by a suitably chosen covalent circularization, the so-called stapled peptides (42). Furthermore, because the tOmpA client is in simultaneous direct contact with all three Skp subunits, its binding stabilizes and shifts the oligomer equilibrium of Skp toward the trimeric state. Last, the disordered Skp monomer does not exhibit chaperone activity.

These mechanistic insights integrate into an improved picture of the functional cycle of Skp in the bacterial periplasm (Fig. 4C). Monomeric, disordered Skp molecules populate the periplasmic space. As soon as a client protein emerges from the Sec translocase, the inactive monomers fold and assemble into a trimeric state around the unfolded client protein. Skp directly or indirectly transports the chain to the Bam complex for folding and insertion in the membrane and possibly also to DegP for degradation (27, 43). The exact mechanism of client release is not understood, but besides direct migration to a higher-affinity target, one exciting possibility to enhance the release kinetics could be a destabilization of the oligomeric state of Skp or a stabilization of the monomeric state of Skp in the vicinity of the downstream receptor of the substrate. This may include negative charges on membranes or BamA (36, 44, 45). After client release, the disordered Skp monomers enter the periplasmic reservoir of individually inactive chaperones. The absence of a chaperoning activity of the monomer ensures that only Skp molecules with complete cavity bind clients, providing maximal chaperoning effect in an all-or-none fashion. At the same time, it introduces a directionality of the chaperoning effect toward the center of the cavity, avoiding spurious binding effects that would not be directed into the Skp cavity. These could potentially destabilize periplasmic proteins that are not intended client proteins. Last, the disordered nature of monomeric Skp might facilitate its import into the periplasm through the Sec complex upon its own biogenesis. Additional impact for this type of activation mechanism comes from a direct comparison to the activation mechanism of the chaperone SurA (18). SurA is constitutively active with just a dynamic modulation of its activity upon rotation of a domain connected by linkers to its chaperone core, i.e., its activity is only weakly regulated (18). Skp activity, in turn, is strongly regulated, with a switch between a completely inactive and an active state, as shown in this work. This stark difference matches a fundamental difference in function of these two periplasmic chaperones. Skp has high affinity for its client proteins and a strong tendency to prevent their folding and therefore presumably requires to be tightly regulated to avoid unspecific chaperone activity under no-stress conditions, whereas SurA binds unfolded OMPs with lower affinity while promoting their folding and therefore presumably does not require a strong regulation of its chaperone activity (15, 4649).

The Skp activation mechanism provides an elegant example how a chaperone can regulate its functional cycle in an environment depleted of any source of energy. For ATP-independent chaperones, only three types of activation mechanisms have so far been described: an order-to-disorder transition [Hsp33 (10), HdeA (11), and HdeB (12)], oligomer disassembly [sHSP (6) and TF (79)], and no or minor conformational change [Spy (13, 14), HSP40 (16), SecB (17), and SurA (18)]. Skp is the first chaperone found to feature these activation mechanisms in the opposite direction and even combine them, i.e., by a disorder-to-order transition that is coupled to oligomerization. The high (nM)affinity Skp has for its client proteins and the strong tendency to prevent their folding could represent a potential hazard to the cell (15, 49). The coupled folding and oligomerization mechanism ensures that holdase function is only present in the trimer where it is geometrically oriented only toward the chaperone cavity. Under nonstressed conditions, Skp exists as an inactive disordered monomer with a minor population of active folded trimer to avoid detrimental effect for the cells. At the opposite, under stress conditions, up-regulation of the Skp concentration and binding to client proteins shift the equilibrium toward the trimeric folded active state, protecting the cells by preventing aggregation of unfolded protein. While most chaperones use strategies to cover a preexisting client binding site in their inactive state, Skp has thus evolved a more extreme mechanism where the client binding area exists only in the active state. This strong regulation allows the tight control of Skp activity while providing at the same time a fast mechanism for client release upon dissociation into the disordered monomeric state. The chaperone activity of Skp is thus regulated in dynamic response to chaperone concentration and client availability.

Skp, lacking its signal sequence, was cloned from genomic DNA through Nde I and Xho I into the pET28b expression vector (Novagen) containing a thrombin-cleavable N-terminal His6-tag (15). Skp was expressed in BL21-( DE3)-Lemo cells [New England Biolabs (NEB)] transformed with the Skp plasmid and grown at 37C in M9 minimal medium containing kanamycin (30 mg/ml) to OD600 (optical density at 600 nm) = 0.6, and then the expression was induced by adding 0.4 mM isopropyl--d-thiogalactopyranoside (IPTG) at 25 for 12 hours. Uniformly [2H, 13C, 15N]-labeled protein was prepared by growing cells in D2O-based M9 minimal medium, with 1 g of 15NH4Cl and 2 g of [U-13C,2H] glucose per liter of medium. Cells were harvested by centrifugation at 5000g for 20 min. The pellet was resuspended in 20 ml of lysis buffer A per liter of culture [20 mM tris (pH 7.5), 500 mM NaCl, deoxyribonuclease (DNase) (0.01 mg/ml), ribonuclease (RNase) (0.02 mg/ml), and inhibitor cocktail (cOmplete EDTA-free protease inhibitor; Roche)]. Cell lysis was performed using a microfluidizer (Microfluidics) for three cycles at 4C. The soluble bacterial lysate was separated from cell debris and other components by centrifugation at 14,000g for 60 min and loaded onto a Ni-NTA (nitrilotriacetic acid) column (Qiagen). Skp eluted at 250 mM imidazole concentration and was dialyzed against buffer [20 mM tris (pH 7.5) and 500 mM NaCl] overnight to remove the imidazole. In a final step, a size exclusion chromatography (Superdex-200 16/600 PG) step was applied to further purify the proteins and adjust the protein to its final buffer [20 mM MES (pH 6.5) and 150 mM NaCl]. Note that the His6-tag was consistently not cleaved from all Skp constructs, because in both our hands and published work by others (37), the presence of the His6-tag was found to not change the monomer-trimer equilibrium constant and because monomeric, disordered Skp was found to be sensitive to proteolytic degradation. Afterward, Skp was concentrated by ultrafiltration and stored at 20C until use. Final yield of purified protein was 25 mg for Skp(WT) and mutants per liter of deuterated M9 minimal medium.

The transmembrane domain of OmpA (residues 1 to 177) was cloned through Nco I and Xho I into the pET28b expression vector without any affinity tag and lacking its signal sequence (15). BL21-( DE3)-Lemo cells (NEB) were transformed with the tOmpA expression plasmid and grown at 37C in medium containing kanamycin (30 g/ml) to OD600 = 0.8. Expression was induced by 1 mM IPTG. Cells were harvested 4 hours after induction and resuspended in 20 ml of buffer B per liter of culture (20 mM tris-HCl and 5 mM EDTA, pH 8.5). Cell lysis was performed using a microfluidizer (Microfluidics) for three cycles at 4C. Purification from inclusion bodies was done as described (50). The ion-exchange elution fractions containing tOmpA were pooled and dialyzed against buffer B. The precipitate was resuspended in 6 M Gdm/HCl and stored at 20C until usage. Final yield of purified protein was 50 mg of tOmpA per liter of deuterated M9 minimal medium.

The QuikChange II mutagenesis protocol (Stratagene) was used to introduce the mutations A108L, A108R, A103L, A103R, or V117P into Skp. Polymerase chain reaction (PCR) primers (Table 2) were obtained from Microsynth. The expression and purification of the mutant proteins was performed as described for the WT proteins. The final yield of purified mutants was similar to WT.

Salmonella strains used in this study were based on S. enterica serovar Typhimurium SL1344 hisG xyl (51, 52). Salmonella mutants with gene deletions were obtained by two consecutive single crossovers with positive selection for resistance to kanamycin and negative selection for levansucrase-mediated sensitivity to sucrose. Salmonella was grown in lysogeny broth containing NaCl (5 g/liter; Lennox LB). Each strain was transformed with a low-copy plasmid expressing a distinct fluorescent protein (mtagBFP2, mNeonGreen, YPet, or mCherry). These plasmids have no impact on in vivo fitness (53, 54). All animal experiments were approved (license 2239, Kantonales Veterinramt Basel) and performed according to local guidelines (Tierschutz-Verordnung, Basel) and the Swiss animal protection law (Tierschutz-Gesetz). Eight 10- to 16-week-old female BALB/c mice (Charles River Laboratories) were infected by tail vein injection of mixtures containing WT Salmonella and different combinations of three mutants with about 1000 colony-forming units (CFU) each per strain. The exact inoculum size for each strain was determined by plating. After 4 days, mice were euthanized with carbon dioxide and Triton X-100 detergenttreated spleen homogenates were prepared as described previously (55). Total Salmonella loads were determined by plating dilution series on agar plates. Mutant-to-WT ratios were determined by flow cytometry counting of bacterial cells falling into gates indicative for the various fluorescent proteins using optical filters (55). Fitness was calculated as log2(FI), with FI corresponding to the fold increase starting from the initial spleen colonization [around 20% of the inoculum (56)] to the final spleen load for each strain. The relative fitness value of co-administered WT Salmonella was set to 100%. We also determined the more commonly used readout competitive index by dividing the output ratio (mutant/WT) by the inoculum ratio (mutant/WT).

Complex assembly was carried out following a modified version of the protocol published by Burmann et al. (15). A 1.5 M excess of denatured tOmpA was added to Skp(WT) or mutants in 20 ml of assembly buffer [20 mM MES (pH 6.5) and 150 mM NaCl] in a dropwise fashion under continuous stirring. The solution was then stirred for another 1 hour to ensure saturation of the chaperones. After centrifugation at 10,000g for 30 min, the supernatant fraction, containing the Skp-tOmpA complexes, was separated from the pellet, containing the precipitated tOmpA. The supernatant was exchanged by ultrafiltration to NMR buffer [20 mM MES (pH 6.5) and 150 mM NaCl], and after concentration, the volume was adjusted to 250 l. The chaperone activity of Skp(WT) and mutant was determined by quantifying the NMR signals in 2D [15N,1H]-TROSY spectra of [U-2H,15N]-tOmpA bound to unlabeled Skp. Control sample of [U-2H,15N]-tOmpA in NMR buffer was prepared following the reference protocol, showing that, in the absence of the functional Skp(WT), less than 2% of [U-2H,15N]-tOmpA signals were observed in comparison to [U-2H,15N]-tOmpA bound to Skp(WT).

All NMR experiments for Skp-Omp complexes were performed in NMR buffer [20 mM MES (pH 6.5) and 150 mM NaCl]. The experiments were recorded at the specified temperature on a Bruker AscendII 700 MHz or Avance 800 MHz spectrometer running Topspin 3.0 and equipped with a cryogenically cooled triple-resonance probe. For the sequence-specific backbone resonance assignment of [U-99% 2H, 13C, 15N]-Skp(A108L), the following NMR experiments were recorded at 37C: 2D [15N,1H]-TROSY, 3D TROSY-HNCA, 3D TROSY-HNCACB, 3D TROSY-HNCO, and 3D TROSY-HN(CA)CO. NMR data were processed with nmrPipe (57) and analyzed with CARA and ccpnmr (58). Secondary chemical shifts were calculated relative to the random-coil values of Kjaergaard and Poulsen (59). For the backbone assignment of the unfolded [U-2H,15N,13C]-Skp(WT), automated projection spectroscopy (APSY) experiments were recorded in NMR buffer [20 mM MES (pH 6.5) and 150 mM NaCl] containing 8 M urea at 15C. The 5D APSY-HNCOCACB (60) was recorded with 54 transients for Skp, two scans per transient, 0.7-s recycle delay, and 1024 150 complex points in the direct and indirect dimensions. The 4D APSY-HNCACB (60) was recorded with 46 transients, two scans per transient, 0.7-s recycle delay, and 1024 180 complex points in the direct and indirect dimensions, respectively. The GAPRO (geometric analysis of projections) (60) analysis of the projection spectra was carried out with = 5.0 Hz, Rmin = 15.0 Hz, S/N = 7.0, and Smin,1 = Smin,2 = 8 for the 5D APSY-HNCOCACB and with = 5.0 Hz, Rmin = 15.0 Hz, S/N = 10.0, and Smin,1 = Smin,2 = 15 for the 4D APSY-HNCACB. As the signals for glycine residues within the 4D APSY-HNCACB and the signals of residues succeeding glycines within the 5D APSY-HNCOCAB have a different sign than the other resonances, the GAPRO algorithm was run twice for positive and negative peaks, respectively, and the two resulting peak lists were combined. The combined peak lists were assigned by using the newest version of the MATCH algorithm within the UNIO10 software package, yielding a 65% complete assignment for Skp. By using a conventional 3D TROSY-HNCACB experiment, complete backbone assignment for Skp could be obtained. NMR data were processed using PROSA (61) and analyzed with CARA and XEASY. Combined chemical shift differences of the amide resonances in 2D [15N,1H]-TROSY spectra were calculated asHN=((H1))2+(0.2(N15))2(1)

SEC-MALS measurements of Skp were performed at 25C in NMR buffer [20 mM MES (pH 6.5) and 150 mM NaCl] using a GE Healthcare Superdex-200 Increase 10/300 GL column on an Agilent 1260 high-performance liquid chromatography. Elution was monitored using an Agilent multi-wavelength absorbance detector (data collected at 280 and 254 nm), a Wyatt Heleos II 8+ multiangle light-scattering detector, and a Wyatt Optilab rEX differential refractive index detector. The column was equilibrated overnight in the running buffer to obtain stable baseline signals from the detectors before data collection. Inter-detector delay volumes, band broadening corrections, and light-scattering detector normalization were calibrated using an injection of bovine serum albumin solution (2 mg/ml; ThermoPierce) and standard protocols in ASTRA 6. Weight-averaged molar mass, elution concentration, and mass distributions of the samples were calculated using the ASTRA 6 software (Wyatt Technology).

DSC data were acquired using a Microcal VP-Capillary DSC instrument (Malvern Panalytical, Malvern UK) at a Skp trimer concentration of 24.4 M (i.e., 73 M concentration in terms of monomer). After centrifugation, protein concentration was determined by ultraviolet spectrophotometry using a molar extinction coefficient of 4470 M1 cm1 at 280 nm for the trimer and correcting for minor scattering contributions to apparent absorbance. The Skp sample was scanned from 15 to 105C at a scan rate of 1C/min, and data points were acquired at 0.1C increments. Multiple buffer versus buffer scans, performed before the sample scan to establish the instrumental heat capacity baseline, were averaged and subtracted from the sample scan data, which were then normalized to excess molar heat capacity using the trimer concentration. Attempts to fit the complex thermogram with standard models of oligomer dissociation and denaturation proved unsuccessful, so Cp for folding was estimated from the difference between the slopes of the excess molar heat capacity in low- and high-temperature regions (the apparent pre- and post-transition baselines), fitted by linear regression, and extrapolated to the temperature of interest.

The chemical equilibrium between trimeric and monomeric Skp can be described by the reaction3SmSt(2)where the equilibrium constant L13 is given byL13=[St][Sm]3(3)where [St] and [Sm] are the molar concentrations of Skp trimers and free Skp monomers, respectively, and L13 has units of M2. For this equilibrium, the concentration of trimer [St] as a function of total Skp [S0] is given by Sandlin et al. (37)[St]([S0])=[S0]3+(23+)13+(23+)13(4)where , , and are given by=9L13[S0]2+1L13(5)=[S0]2981(6)=[S0]318[S0]162(7)

The fraction of total Skp protein that is trimeric at any total Skp concentration equalsfSt=3[St][S0](8)and the fraction of total Skp protein that is monomeric equalsfSm=1fSt(9)

In SEC-MALS experiments in equilibrium situations, the detected molar mass represents the concentration-weighted average mass of the species involvedMw=(ciMi)(ci)(10)where ci is the mass concentration and Mi is the molar mass of the ith species. Therefore, for the monomer-trimer equilibrium, by comparison with the limits for the completely monomeric or trimeric species, the weight-averaged mass reports directly on the fractional populations asfSm=MobsMStMSmMSt(11)where Mobs is the detected weight-averaged mass, and MSm and MSt are the detected masses of the completely monomeric and trimeric state, respectively.

For the estimation of the population of monomeric and trimeric states for the WT and mutants, the residue lysine-141 was chosen, because its signals are well resolved in each state and it is located in a nonstructured, locally flexible region in the trimer. The fractions were estimated by calculating the ratio of the intensity of the signals in the monomeric and trimeric state according to the equationfSm=ISmISm+ISt(12)where ISm and ISt are the intensity of the residue lysine-141 in the monomeric and trimeric state, respectively. Similarly, for the denaturation titration, the fractions of folded and unfolded Skp were determined from the signals of residue lysine-141, and for each titration point, G was calculated assuming a two-state model according to the equationG=RTlnfStfSm(13)

The data were fitted by linear regression, and G was extrapolated to a concentration of 0 M urea.

Acknowledgments: We thank C. Johnson for help in setting up the DSC experiments and the Biophysics Facility of the MRC Laboratory of Molecular Biology, Cambridge, for access to the DSC instrument. Funding: This work was supported by the Swiss National Science Foundation (grants 310030B_185388 and 407240_167125 to S.H. and 310030_182315 to D.B.). Author contributions: G.M., S.H., and D.B. designed the study, analyzed the data, discussed the results, and wrote the paper. G.M. and T.S. conducted the SEC-MALS experiments. T.S. conducted the DSC experiment. B.M.B. conducted the assignment of urea-unfolded Skp(WT). B.C. engineered the Salmonella mutants and conducted the mouse experiments. G.M. conducted all other experimental work. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors. Sequence-specific resonance assignments have been submitted to the Biological Magnetic Resonance Data Bank under the following accession codes: Skp(WT) in 8 M urea, 26613; monomeric Skp(A108L), 50195.

See the article here:
Regulation of chaperone function by coupled folding and oligomerization - Science Advances

Discovery of a previously unknown biosynthetic capacity of naringenin chalcone synthase by heterologous expression of a tomato gene cluster in yeast -…

INTRODUCTION

Plant specialized metabolism is a rich source of structurally and functionally diverse small molecules, also known as plant natural products. These specialized metabolites play important roles in plant communication and defense and have been widely applied as phytomedicines, antibiotics, antivirals, nutraceuticals, and cosmetics (1, 2). Recent developments in synthetic biology and metabolic engineering have enabled the assembly and expression of plant genes in heterologous hosts as a sustainable and efficient alternative for production of complex chemicals, including plant natural products and their synthetic derivatives (3, 4). However, the broader potential of these engineering efforts is challenged partially due to our limited knowledge of plant biosynthetic pathways and associated enzyme activities.

The elucidation of plant specialized metabolic pathways has been challenging, particularly in comparison to the elucidation of natural product pathways in microbes. In part, this has been due to the differences in the genomic organization of these pathways, where the genes encoding the biosynthetic pathway in plants are generally dispersed across the plant genome, whereas, in contrast, those in microbes tend to be tightly clustered in operons. However, recent work has revealed that certain genes constituting a number of plant natural product pathways are colocalized in the genome in operon-like structures. These plant biosynthetic gene clusters range from ~35 to several hundred kilobases (i.e., 3 to more than 10 genes) in size (5) and comprise genes that are physically colocalized and potentially coregulated. These gene clusters encode species-specific and/or specialized biochemical pathways modifying metabolites from primary metabolism, contributing to the vast chemical space present in the plant kingdom (6). Characterization of putative gene cluster activities and their resulting products assisted by genome mining and analytical chemistry may thus provide an abundant source for the discovery of enzyme activities and compound structures (7, 8).

Gene cluster prediction in plants has been challenging because plant genomes are larger than those of bacteria and fungi, and plant genes are sparsely distributed along the genome, separated by a substantial amount of intergenic, noncoding sequences (7). A general approach for identifying plant gene clusters involves defining a cluster core by searching for backbone-generating enzymese.g., nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), hybrid NRPS-PKS, and terpene synthasefrom genome sequences and then expanding the cluster components based on catalytic domain analysis, physical colocalization, gene coexpression, and/or shared regulatory patterns (7, 8). Recently developed cluster-mining algorithms such as PhytoClust (9), PlantiSMASH (10), and PlantClusterFinder (11) have demonstrated automated detection of hundreds to thousands of putative gene clusters from various plant genomes.

Despite the increasing number of putative plant biosynthetic gene clusters arising from computational prediction tools, characterizing the potential functionality of these clusters and associated enzymes in their host organisms has been limiting. In particular, in planta pathway characterization can be hindered by cryptic pathway gene expression, low concentrations of targeted compounds embedded in complex mixtures, and difficulties in genetically manipulating the native host for cluster activation (7). Facilitated by well-developed tools for genetic manipulation and pathway expression, bakers yeast (Saccharomyces cerevisiae) has proven to be a powerful platform for expression of heterologous gene clusters. Previous research has used yeast to characterize the biosynthetic activities of several gene clusters from various plant species, including triterpene biosynthetic clusters from Arabidopsis thaliana (12), a 10-gene noscapine-producing cluster from poppy (Papaver somniferum) (13), partial pathway genes for vinblastine and vincristine biosynthesis from Madagascan periwinkle (Catharanthus roseus) (14), cucurbitacin from cucurbit (Cucurbitaceae) (15), and a cyanogenic glycoside biosynthetic cluster from sorghum (Sorghum bicolor) (16). In these earlier studies, the previously identified plant gene clusters were heterologously expressed in yeast to validate the production of the compounds as expected from their plant hosts.

In this work, we use yeast as a plant natural product discovery platform to characterize the biosynthetic potential of a putative tomato gene cluster predicted from PlantClusterFinder (11), the activity of which has not been reported previously. By coexpressing the cluster genes with an early-step flavonoid pathway gene in yeast, we identified two previously unknown compounds in the yeast culture when fed p-coumaric acid, specifically 3-hydroxyanthranilic acid (3-HAA) methyl ester (1) and a hydroxycinnamic acid amide (HCAA) compound, dihydro-coumaroyl anthranilate amide (2) (Fig. 1A). Further analysis confirmed that a methyltransferase (SlMT2) catalyzes the conversion of 3-HAAa native yeast metabolite involved in tryptophan metabolismto (1), and a naringenin chalcone synthase (SlCHS) catalyzes the condensation of (1) and p-dihydro-coumaroylcoenzyme A (CoA), reduced from p-coumaroyl-CoA by a yeast endogenous enoyl-CoA reductase (ECR), leading to production of (2). Knocking out the native ECR in yeast restored the production of an oxidized form of (2), coumaroyl anthranilate amide (3). Our characterization results reveal a previously uncharacterized amide synthesis activity for SlCHS. In vivo site-directed mutagenesis results suggest that SlCHS uses the same active site for synthesis of (3) and for canonical synthesis of naringenin chalcone. Our work demonstrates the potential of yeast as a characterization tool for computationally aided discovery of compound structures and enzymatic activities from plant genomes.

(A) Discovery of two previously unidentified compound structures by heterologous expression of genes from tomato cluster in yeast. Gene color: red, putative gene cluster; white, plant flavonoid pathway. (B) Validation of (1) and (2) production in yeast. CEN.PK2, wild-type yeast strain; CSY1210, strain expressing SlCHS, SlCYP, and SlMT1/2/3. (C) Characterization of (1) and (2) production with individual tomato methyltransferases in yeast. SlCHS and SlCYP are coexpressed with SlMT1 (CSY1301), SlMT2 (CSY1302), or SlMT3 (CSY1303). (D) Summary of compound production with SlMT1/2/3. (E) Proposed pathway for biosynthesis of (1) and (2) in yeast. Enzyme color: red, tomato; yellow, yeast. (F) Proposed activity of SlCHS in TSC13 knockout strains. (G) Summary of compound production by TSC13 knockout strains. TIC, total ion chromatogram; EIC, extracted ion chromatograms; ** indicates a thorough MS scan from m/z 10 to 168.0 or 316.1. +/ indicates the presence/absence of a gene or a gene fragment. Data show the mean of two biologically independent replicates, with error bar the indicating SD. Compound color: purple, (1) methyl 3-hydroxyanthranilic acid; blue, (2) dihydro-coumaroyl anthranilate amide; green, (3) coumaroyl anthranilate amide. Enzyme abbreviations: SlMT2, methyltransferase 2; Sl4CL, 4-coumarate-CoA ligase; SlCHS, naringenin chalcone synthase; ATR1, NADPH-cytochrome P450 reductase 1; ECR, enoyl-CoA reductase.

Our study investigated the biosynthetic potential of a tomato-derived putative gene cluster that was predicted to produce hydroxylated naringenin chalcone and/or methyl esters of hydroxylated naringenin chalcone, natural compounds that are found in tomato but without an elucidated pathway for biosynthesis (11). The putative tomato gene cluster predicted from PlantClusterFinder [referred to as C584_4 (11)] consists of a CHS (SlCHS, SOLYC09G091510), a putative cytochrome P450 (SlCYP, SOLYC09G091570), and three methyltransferases (SlMT1/2/3; SOLYC09G091530, SOLYC09G091540, and SOLYC09G091550). SlCHS is a well-studied type III PKS, which is known to sequentially condense one p-coumaroyl-CoA and three malonyl-CoA molecules to make naringenin chalcone, the first committed intermediate in the biosynthesis of flavonoids and anthocyanins (17). Among the three methyltransferases, SlMT3 was previously characterized as a putative salicylic acid methyltransferase potentially regulating tomato hormone emission (18). To our knowledge, no studies have been reported characterizing SlMT1, SlMT2, and SlCYP from the cluster.

We examined the biosynthetic capacity of the predicted tomato gene cluster in yeast. Yeast expression cassettes for complementary DNAs encoding the five genes identified in the cluster (SlCHS, SlCYP, and SlMT1/2/3) were designed and assembled into a yeast artificial chromosome and transformed into a wild-type yeast strain (CEN.PK2), resulting in yeast strain CSY1210. Two additional enzymes supporting the putative pathway enzymes were expressed in CSY1210 from low-copy plasmids: (i) a yeast codon-optimized 4-coumarateCoA ligase from tomato (Sl4CL), a precursor-producing gene from the flavonoid pathway, and (ii) an Arabidopsis NADPH-cytochrome P450 reductase (AtATR1), a reductase partner to support the activity of the putative cytochrome P450 (SlCYP). We cultured CSY1210 transformed with the additional plasmids and a control strain (transformed with the plasmids but not harboring the reconstructed tomato cluster) in synthetic dropout media supplemented with 100 M p-coumaric acid (the substrate for Sl4CL) for 72 hours at 25C and analyzed the yeast media. The metabolites produced by the strain harboring the reconstructed tomato cluster were identified using an untargeted metabolomics analysis by qToF-MS (quadrupole time-of-flight hybrid mass spectrometry) (with a mass accuracy at 50 parts per million).

We observed two differential peaks representing compounds only produced in the strain harboring the reconstructed tomato cluster, one at mass/charge ratio (m/z) 168.0655 ([M + H]+) (1) and the other at 316.1179 ([M + H]+) (2) (fig. S1, A and B). To validate production of the two compounds in yeast, we analyzed the yeast culture media for production of (1) and (2) on liquid chromatographytandem MS (LC-MS/MS). A product ion scan with a precursor ion set at 168.0 m/z showed two peaks at retention times of 4.291 and 5.872 min, respectively, and a product ion scan with a precursor ion set at 316.1 m/z showed a single peak at 5.872 min (Fig. 1B). On the basis of retention times and fragmentation patterns of (1) and (2) from qToF-MS analysis (fig. S1, A and B), we hypothesized that the peak at 4.291 min corresponds to (1) and that the peak at 5.872 min (for both precursor ion settings) corresponds to (2).

We next identified the genes from the predicted tomato cluster and supporting flavonoid pathway (i.e., Sl4CL and AtATR1) that participated in the production of (1) and (2) in yeast. We first examined whether the methyltransferases individually participated in the biosynthesis of (1) and (2). To enable stable expression of the gene cassettes, Sl4CL, SlCHS, and SlMT1/2/3 were chromosomally integrated into the wild-type yeast strain (CEN.PK2) such that each engineered strain harbors Sl4CL, SlCHS, and one of the methyltransferasesleading to construction of CSY1301 (SlMT1), CSY1302 (SlMT2), and CSY1303 (SlMT3). As a control, we eliminated SlCYP (and AtATR1) from the integration to isolate their functions in compound synthesis. We cultured the strains in synthetic complete media supplemented with 100 M p-coumaric acid for 72 hours at 30C and analyzed the yeast culture media for production of (1) and (2). A product ion scan on LC-MS/MS with precursor ion set at 168.0 showed two peaks for SlMT1 and SlMT2 transformants at 4.324 and 5.864 min, respectively (Fig. 1C). A product ion scan with a precursor ion set at 316.1 showed a single peak at 5.864 min for SlMT1 and SlMT2 transformants (Fig. 1C). As previously hypothesized, the peak at 5.864 min detected at 168 m/z may be a molecular fragment of (2). Production of (1) and (2) in the absence of SlCYP (and AtATR1) indicates that SlCYP and AtATR1 are not involved in the production of the compounds. From the data, we observed production of (1) and (2) in both CSY1301 and CSY1302, and the product ion detected in CSY1302 was 14-fold greater than that in CSY1301 (Fig. 1D). The results indicate that SlMT1 and SlMT2 participate individually in the production of (1) and (2) and that SlMT2 leads to ~21-fold higher level of (1) and ~14-fold higher level of (2) than SlMT1. Since the activities of SlMT1 and SlMT2 appear to be redundant in the context of characterizing the production of (1) and (2), we focused on the activity of SlMT2 for subsequent characterizations. Together, the results of methyltransferase characterizations revealed that (1) and (2) can be produced from a minimal set of genes consisting of Sl4CL, SlCHS, and SlMT2.

We next elucidated a biosynthetic scheme for the synthesis of (1) and (2) in yeast. Low-copy plasmids encoding the expression of Sl4CL, SlCHS, and SlMT2 were cotransformed in different combinations into yeast, and the production of (1) and (2) were monitored in the presence and absence of fed p-coumaric acid after 72 hours of growth at 30C (table S1). We first coexpressed the three genes with or without fed p-coumaric acid (groups 1 and 2). We then coexpressed all pairs of genes, e.g., SlCHS and SlMT2, SlMT2 and Sl4CL, and SlCHS and Sl4CL with fed p-coumaric acid (groups 3 to 5). Last, we expressed each single gene in the absence of fed p-coumaric acid (groups 6 to 8). We observed that (i) the removal of fed p-coumaric acid eliminates the production of (2) (groups 1 and 2), (ii) the removal of the expression of Sl4CL or SlCHS eliminates the production of (2) (groups 3 and 4), (iii) the removal of the expression of SlMT2 eliminates the production of both (1) and (2) (group 5), and (iv) the single expression of SlMT2 without fed p-coumaric acid leads to production of (1) (groups 6 to 8). The observations (i) and (ii) indicate that p-coumaric acid is a precursor for the production of (2), and both Sl4CL and SlCHS are required for the production of (2). The observations (iii) and (iv) indicate that SlMT2 is responsible for the production of (1), which is independent of fed p-coumaric acid, and that (1) is likely a substrate for the production of (2).

On the basis of the production patterns of (1) and (2) under different enzyme combinations, we proposed the sequencing of intermediates along the reconstructed pathway in yeast. Sl4CL is known to catalyze the conversion of p-coumaric acid to p-coumaroyl-CoA (19), and we observed that p-coumaric acid is an essential precursor for the production of (2) through the reconstructed pathway; thus, we hypothesized that p-coumaroyl-CoA is likely an intermediate of the pathway. A previous study reported that a group of methyltransferases from the salicylic acid benzoic acid theobromine (SABATH) enzyme family in maize is able to catalyze conversion of anthranilic acid to methyl anthranilate, a volatile methyl ester with potential function in plant defense (20). We hypothesized that SlMT2 may use an anthranilate analog from yeast native metabolism (as the pathway precursor) and catalyze its conversion to a methyl ester (as a pathway intermediate). By searching anthranilate-related yeast native metabolites in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, we identified 3-HAA, a primary metabolite involved in tryptophan metabolism, as a putative substrate for the SlMT2 methyltransferase and proposed the compound structure for the methyl ester (1) (Fig. 1A). We confirmed the compound structure of (1) with its chemical standard by retention time and tandem mass (MS/MS) spectrum (fig. S1C).

The data further support that (2) is the final product of the reconstructed pathway in yeast. Specifically, (2) may result from the condensation of the two identified intermediates, 3-HAA methyl ester (1) and p-coumaroyl-CoA, through the formation of an amide bond potentially catalyzed by SlCHS. However, direct condensation of the two intermediates would lead to a final m/z of 314.1023 ([M + H]+), whereas the final m/z we observed for (2) from yeast culture was m/z 316.1179 ([M + H]+). A native yeast ECR, encoded by TSC13, has been reported to reduce p-coumaroyl-CoA to p-dihydro-coumaroyl-CoA (21). We hypothesized that native Tsc13p activity in yeast may reduce p-coumaroyl-CoA to p-dihydro-coumaroyl-CoA and that SlCHS catalyzes the condensation of p-dihydro-coumaroyl-CoA with (1), leading to production of (2) (Fig. 1E).

To validate our hypothesis, we used CSY1302 (which harbors chromosomally integrated SlCHS, SlMT2, and ySl4CL) to engineer TSC13 knockout strains. As deletion of TSC13 inhibited cellular growth due to its essential role in fatty acid synthesis (22), we partially disrupted Tsc13p activity by inserting three consecutive stop codons at two-thirds the length of TSC13 coding sequence, resulting in strain CSY1304. The insertion of stop codons in TSC13 may lead to low activity through a low frequency of stop-codon readthrough, enabling very low expression of Tsc13p. Stop-codon readthrough has been reported in yeast, where readthrough efficiencies can be as high as 8% (23) and be induced by stress conditions (24). We also replaced TSC13 with heterologous ECR variants from Gossypium hirsutum (GhECR2) and Malus domestica (MdECR) that were reported to have low activity on p-coumaroyl-CoA (21), resulting in CSY1305 (TSC13::GhECR2) and CSY1306 (TSC13::MdECR), respectively. We cultured CSY1304 to CSY1306 in synthetic complete media supplemented with 100 M p-coumaric acid and 100 M 3-HAA methyl ester for 72 hours at 30C and analyzed the yeast culture media for production of (1) and (2) on LC-MS/MS by multiple reaction monitoring (MRM) detection. Partial disruption of the native yeast ECR Tsc13p (CSY1304) resulted in a 40% reduction in production of (2), while replacement of Tsc13p with heterologous ECR variants (CSY1305 and CSY1306) resulted in the absence of production of (2) and the presence of a previously unknown compound (3), with an expected m/z of 314.1 ([M + H]+) corresponding to the oxidized form of (2) (Fig. 1, F and G). The compound identities of (2) and (3) were validated by comparing the retention times and MS/MS spectrums to those of the chemical standards (fig. S1, D and E). The results suggest that the yeast native enzyme participated in the tomato cluster activity and produced a derivative product (2); we eliminated this interference by knocking out the yeast native gene TSC13, thereby restoring the true product (3) resulting from the minimal gene cluster (Sl4CL, SlMT2, and SlCHS).

On the basis of our in vivo functional characterization of SlMT2, the methyltransferase recognizes yeast native 3-HAA as a substrate. According to the KEGG pathway database, 3-HAA is involved in central metabolism, i.e., tryptophan metabolism, and the metabolite is also present in tomato. Since no previous studies have been reported on the functional roles of SlMT1 and SlMT2, we investigated the activities of the methyltransferases on hydroxycinnamic acids, amines, and anthranilic acids by feeding these substrates to yeast engineered to express these methyltransferases. Among the three methyltransferases predicted in the tomato cluster, SlMT3 has been reported to catalyze the methylation of salicylic acid (19). SlMT1 and SlMT2 showed high protein sequence similarity to SlMT3 (78.12 and 81.42%, respectively), indicating that they may similarly exhibit activity on salicylic acid. In addition, the three methyltransferases were initially predicted as tailoring enzymes to modify p-coumaric acid and other moieties of hydroxycinnamic acids, contributing to the production of hydroxylated naringenin chalcone and/or methyl esters of hydroxylated naringenin chalcone in tomato flavonoid metabolism (11).

We tested the activity of SlMT1/2/3 toward a variety of candidate substrates in yeast, including hydroxycinnamic acids (cinnamic, p-coumaric, caffeic, and salicylic acids), trace amines (tyramine, tryptamine, octopamine, dopamine, and serotonin), and anthranilic acid analogs (3-HAA and p-aminobenzoic acid). Low-copy plasmids encoding the expression of SlMT1/2/3 or inactive ccdB (negative control) were transformed into the wild-type yeast strain (CEN.PK2). The transformed yeast strains expressing one of the methyltransferases (or negative control protein) were cultured in synthetic dropout media fed with 100 M of each substrate candidate for 72 hours at 30C. The resulting yeast media was analyzed on qToF-MS for total ion scan, and the methylation products were evaluated by analyzing differential peaks detected from the transformants compared to the negative control. A methylation product is counted if the m/z ([M + H]+) of a differential peak (between the sample and the negative control) qualifies a putative methylated product catalyzed from the substrate. Among all the potential substrates tested, SlMT1 and SlMT2 exhibited detectable activities toward 3-HAA, p-coumaric acid, and p-aminobenzoic acid (a primary metabolite that shares similar functional groups with 3-HAA), and SlMT3 exhibited detectable activity only toward 3-HAA. The highest level of the methylation product was observed when supplying 3-HAA to SlMT2 (Fig. 2). Among the three methyltransferases, SlMT3 showed the lowest production of the methylation product from 3-HAA, and the methylation products catalyzed from p-coumaric acid and p-aminobenzoic acid were not detected in our assay. None of SlMT1/2/3 showed detectable activity toward salicylic acid in the context of the yeast-based feeding assay. We hypothesized that either salicylic acid was not efficiently transported into yeast cells due to previously reported antagonism between salicylic acid and d-glucose (25) or the volatile salicylate methyl ester product may have evaporated. Our results indicate that all three methyltransferases (SlMT1/2/3) showed the highest activity toward 3-HAA (among the fed substrates tested) and that SlMT2 led to the highest production of 3-HAA methyl ester in the yeast-based feeding assay.

Relative production of methylation products was calculated as a percentage of the highest production by SlMT2 from substrate 3-HAA: 100% corresponds to the concentration of 3-HAA methyl ester (146 M) catalyzed from yeast endogenous 3-HAA and 100 M 3-HAA fed to yeast culture medium. Compounds not detected were crossed out. Data show the mean and SD of three biologically independent replicates.

Our in vivo characterization results of the minimal gene cluster (Sl4CL, SlMT2, and SlCHS) indicate that SlCHS can potentially catalyze the condensation of p-coumaroyl-CoA and 3-HAA methyl ester, leading to the formation of a nitrogen-carbon (amide) bond. To our knowledge, this study is the first report of amide formation by CHS, which canonically catalyzes Claisen condensation (carbon-carbon bond formation) (26).

We further examined the amide bond catalytic activity of SlCHS by expressing SlCHS recombinantly in Escherichia coli, purifying the enzyme, and characterizing its activities via in vitro enzymatic assays. SlCHS activity was examined with both its canonical substrates (malonyl-CoA and p-coumaroyl-CoA) and the substrates identified in the context of the minimal tomato gene cluster (3-HAA methyl ester and p-coumaroyl-CoA). The reactions were performed by incubating 4 g of purified enzyme with 200 M malonyl-CoA or 3-HAA methyl ester and 200 M p-coumaroyl-CoA for 4 hours and analyzed on LC-MS/MS by MRM detection. For SlCHS canonical activity characterization, we observed spontaneous conversion of naringenin chalcone to naringenin under the in vitro reaction conditions, and we confirmed the production of naringenin by comparing the resultant peak with an authentic standard of naringenin (fig. S2A). We observed the production of (3) when 3-HAA methyl ester was added to the reaction mixture by comparing the peaks with a chemically synthesized standard of (3). The chemical standard of (3) yielded a single peak when dissolved in water (retention time, 6.872 min) but resulted in a secondary peak (retention time of 7.484 min) when dissolved in acidic methanol (fig. S2B). The secondary peak was also detected in acidic methanol-quenched in vitro reaction mixtures, from which the detection of (3) is expected. A previous study compared nonenzymatic and chalcone isomerasecatalyzed conversion of chalcone to flavanone and the pH dependence of this reaction (27). We hypothesized that the secondary peak could result from an isomerized form of (3), similar to the isomerization process of converting naringenin chalcone to naringenin, possibly formed during the in vitro reaction. Together, these results validate that SlCHS is capable of amide formation.

We next examined whether the amide synthesis interferes with the canonical activity. We performed an in vitro reaction with SlCHS under similar conditions but incubated equimolar amounts (200 M) of 3-HAA methyl ester and malonyl-CoA with 200 M p-coumaroyl-CoA. Analysis of the reaction products showed an 85% decrease in production of (3) (Fig. 3, reactions 2 and 3) and 6% decrease in production of naringenin (Fig. 3, reactions 1 and 3). The results suggest that 3-HAA methyl ester is likely competing with malonyl-CoA for a p-coumaroyl starter molecule at the SlCHS active site, indicating that SlCHS could use the same active site for amide formation as for Claisen condensation.

+/ indicates the presence/absence of 200 M p-coumaroyl-CoA, 200 M 3-HAA methyl ester, 200 M malonyl-CoA, or 4 g of purified SlCHS protein. MRM (314.1 147.0) and MRM (273.0 152.8) detect the production of coumaroyl anthranilate amide (3) and naringenin, respectively. The ion counts are normalized by the highest ion count across reaction (rxn) 1 to 5 by each column; SD shows the percentage error among two independent replicates. Enzyme abbreviation: SlCHS, naringenin chalcone synthase.

We next investigated whether SlCHS exhibited a substrate specificity toward 3-HAA methyl ester for amide synthesis. We incubated SlCHS with 200 M anthranilic acid analog and 200 M p-coumaroyl-CoA with similar in vitro reaction conditions, and the reaction mixture was analyzed on LC-MS/MS by product ion scan with a precursor ions set to match the m/z of expected condensation products. We tested numerous anthranilic acid analogs in this assay, including 3-HAA methyl ester, 2-amino-3/4/5-methoxybenzoic acid, 3-HAA, 2-amino-5-hydroxybenxoic acid, 3-hydroxybenzoic methyl ester, and anthranilic acid. Analysis of the m/z ([M + H]+) of the expected product for each substrate indicated product peaks with 3-HAA methyl ester, 2-amino-5-methoxybenzoic acid, and 3-hydroxybenzoic methyl ester, among which 3-HAA methyl ester yielded more than 15-fold and 49-fold higher product ion detected than those of 2-amino-5-methoxybenzoic acid and 3-hydroxybenzoic methyl ester, respectively (fig. S2C). In contrast, no amide product was observed when 3-HAA and anthranilic acid, which share a very similar molecular structure with 3-HAA methyl ester, were included in the reaction mixture. A trace amount of a possible ester product was observed when 3-hydroxybenzoic methyl ester was included as a substrate. The observed substrate preferences of SlCHS on the panel of anthranilic acid analogs tested indicate that methylation on the carboxyl group of the anthranilate may facilitate substrate access to the SlCHS active site and that SlCHS exhibits a high substrate preference toward 3-HAA methyl ester.

Last, we examined whether the observed amide synthesis activity was specific to the CHS variant from tomato (SlCHS). Specifically, we performed in vitro reaction assays with the CHS variant from Arabidopsis (AtCHS). AtCHS was recombinantly expressed in E. coli and purified, and its activities on malonyl-CoA and 3-HAA methyl ester were analyzed under the same assay conditions as were used for SlCHS. AtCHS exhibits identical patterns of catalytic activity and substrate preferences as SlCHS in vitro, i.e., highest production of amide with 3-HAA methyl ester, trace amounts of amide production with 2-amino-5-methoxybenzoic acid, and ester production with 3-hydroxybenzoic methyl ester (fig. S2, D and E). Together, the results indicate that the amide synthesis activity observed in SlCHS is not unique to this variant and could be a common secondary function in plant CHS enzymes.

Type III PKSs are characterized by a conserved cysteine-histidine-asparagine catalytic triad, which corresponds to C164-H303-N336 in SlCHS. For canonical synthesis of naringenin chalcone, C164 and H303 form an imidazolium ion pair, which initiates a nucleophilic attack on the thioester carbonyl of p-coumaroyl-CoA that completes acyl transfer onto C164 (28). H303 and N336 coordinate the orientation of the incoming malonyl-CoA moieties during the process of iterative decarboxylation and condensation of the extender malonyl-CoA molecules in formation of the polyketide intermediate. In addition, F215 is an important gatekeeper residue that is reported to separate the CoA-binding tunnel from the active site cavity and help with folding and internal orientation of the tetraketide intermediate (2830). On the basis of our in vitro assay results, we hypothesized that SlCHS is likely to use the same active site for amide synthesis as for naringenin chalcone synthesis. We therefore investigated the catalytic mechanism of amide bond formation by examining the roles of these active site residues that are important for SlCHS canonical activity.

We first evaluated which residues could potentially interact with 3-HAA methyl ester and use the substrate for amide formation. We built a homology model for SlCHS using Phyre2 (31) and simulated the docking of 3-HAA methyl ester to the homology model structure using AutoDock Vina (32). The simulation shows that 3-HAA methyl ester favorably docks at the SlCHS active site, potentially interacting with H303, N336, and G305 by hydrogen bonding (Fig. 4A, fig. S3A). As a comparison, we simulated the docking of the canonical substrate malonyl-CoA to the SlCHS active site (fig. S3B), which shows that the substrate 3-HAA methyl ester is much smaller in size (molecular weight, 153 versus 854) than the canonical substrate and therefore can readily dock at the active site cavity.

(A) Docking of (1) to SlCHS active site. Dotted line, hydrogen bond interaction. (B to D) Production of (3) and naringenin chalcone in yeast by SlCHS for C164, H303, N336, and G305 mutants (B); F215 mutants (C); and distal [~10 within docking site of (1)] residue mutants (D). Data show the mean of two biologically independent replicates with error bar indicating the SD. Unpaired two-tailed t test was performed between each variant and the parent for production of (3): **P < 0.01 and ***P < 0.001 (D). Compound name: (1), methyl 3-hydroxyanthranilic acid; (3), coumaroyl anthranilate amide. Enzyme abbreviation: SlCHS, naringenin chalcone synthase.

On the basis of the results of the docking simulation, we first investigated the roles of C164, H303, N336 (canonical catalytic triad residues), and G305 on amide synthesis. We created a SlCHS knockout strain (CSY1307) by replacing the full sequence of SlCHS with three consecutive stop codons in CSY1305 (which harbors chromosomally integrated Sl4CL, SlMT2, SlCHS, and TSC13::GhECR2). Low-copy plasmids encoding SlCHS point mutants (C164A, C164S, H303A, N336A, and G305A) were constructed and transformed into CSY1307. Transformed CSY1307 strains harboring individual SlCHS mutants were cultured in synthetic dropout media supplemented with 100 M p-coumaric acid and 100 M 3-HAA methyl ester for 72 hours at 30C. Yeast culture media was analyzed for production of naringenin chalcone and (3) on LC-MS/MS by MRM detection. C164A, C164S, and H303A mutants completely eliminated both the canonical activity and the amide synthesis activity (Fig. 4B). The N336A mutant completely abolished naringenin chalcone production but resulted in an increase in the production of (3) compared to the wild-type variant, whereas the G305A mutant abolished canonical activity but exhibited only trace amounts of amide formation. The results indicate that C164 and H303 are essential for both canonical and amide synthesis, which is expected as these two residues are responsible for the loading of p-coumaroyl-CoA. The C164S mutant confirms the importance of the thiol group of cysteine for forming the imidazolium ion pair with H303 to activate acyl transfer through nucleophilic attack during loading of p-coumaroyl-CoA onto C164. Although N336 is essential for canonical activity for binding of extender malonyl-CoA, it does not contribute to binding of 3-HAA methyl ester to the active site. This result is further supported by an uninterrupted docking of 3-HAA methyl ester to the active site of a N336A mutant homology model using AutoDock Vina (fig. S3C). The increase in production of (3) observed from the N336A mutant relative to the parent enzyme is likely due to a lack of competition between 3-HAA methyl ester and malonyl-CoA for the p-coumaroyl starter moiety at the active site of the N336A mutant. Last, the removal of amide and canonical activities observed in the G305A mutant suggests that G305 potentially performs a stabilizing role in anchoring 3-HAA methyl ester (as predicted by the docking simulation) and malonyl-CoA during their respective condensation reactions.

We next examined potential effects of F215 on amide formation (Fig. 4C). We tested different mutants of the residue to conserve either the ring structure (F215W, F215Y, and F215H) or spatial occupancy (F215I) of the residue side chain. Low-copy plasmids encoding SlCHS mutants (F215A, F215W, F215Y, F215H, F215C, and F215I) were each transformed into CSY1307. The transformed CSY1307 strains were cultured under identical conditions, and production of naringenin chalcone and (3) was analyzed on LC-MS/MS by MRM detection. All F215 mutants except F215W completely abolished the canonical activity, where F215W maintained only 5% naringenin chalcone production as compared to the wild-type variant (Fig. 4C). The results support the previously proposed role of F215 in orienting malonyl-CoA and polyketide intermediates at the active site (29, 30). We also observed that all mutants except F215W led to 70% reduction in production of (3), while F215W maintained 90% production of (3) compared to the wild-type variant (Fig. 4C). The results suggest that the ring structure of residue 215 in wild-type and the F215W mutant may assist in orienting 3-HAA methyl ester at the active site to facilitate amide formation. However, the ring structure itself in the residue is not sufficient for 3-HAA methyl ester binding since decreased production of (3) was observed in F215Y and F215H (which conserved the ring structure); instead, spatial occupancy (F215I) by the residue may also contribute to substrate selection. Furthermore, reduced production of (3) observed in the F215Y and F215H mutants could result from a poorly oriented residue side chain shielding the active site, thus preventing the access of 3-HAA methyl ester to C164-bound p-coumaroyl moiety. We also scanned for the production of pyrone derivatives bis-noryangonin (BNY) and 4-coumaroyltriacetic acid lactone (CTAL), the former a triketide and the latter a tetraketide early-released derailment by-product (29, 33), by F215 mutants in yeast culture media. We observed proportional levels of CTAL production compared to that of naringenin chalcone and no detectable levels of BNY production (fig. S4A). The results suggest that inhibited production of (3) by F215 mutants is unlikely due to pyrone by-product accumulation at the SlCHS active site. In summary, the results indicate that although F215 likely performs a specific structural role in orienting malonyl-CoA during extension of the polyketide intermediate in canonical activity, its function is less specific for selecting 3-HAA methyl ester as a substrate.

Last, we investigated the potential effects of nonspecific binding by 3-HAA methyl ester to SlCHS protein. We mutated nine residues (T132A, S133A, S339A, S339T, I193A, T194A, L267A, V271A, and P272A) within ~10 of the 3-HAA methyl ester docking site and analyzed the effects of these mutations on production of (3) in yeast (Fig. 4D and fig. S3D). CSY1307 strains transformed with the mutants encoded on low-copy plasmids were cultured under identical conditions, and production of naringenin chalcone and (3) was analyzed on LC-MS/MS by MRM detection. The results showed that most of the nine tested residues did not show statistically significant effects on production of (3), except for S339A, T194A, and P272A (Fig. 4D). S339A completely abolished SlCHS activity, and the two distal residue mutants (T194A and P272A) significantly improved SlCHS activity for production of (3). Since S339 is located at a loop structure near the SlCHS active site, the mutation may have interrupted the correct folding of the active site cavity and therefore disrupted both naringenin chalcone and amide synthesis. Removal of the two distal residues (T194A and P272A) may have altered the entrance geometry of the active site cavity, which facilitated the access of 3-HAA methyl ester to the active site and therefore increased production of (3). Similarly, fluctuations in the production of naringenin chalcone observed among the mutants could be caused by an altered geometry around the active site, which affected the access of p-coumaroyl-CoA or malonyl-CoA to the active site.

The results of the site-directed mutagenesis studies suggest that SlCHS uses the same active site for canonical and amide synthesis. We performed in vitro enzymatic assays to further investigate the kinetic properties of SlCHS on 3-HAA methyl ester. Kinetic assays were performed by incubating purified SlCHS with p-coumaroyl-CoA and varying concentrations of 3-HAA methyl ester (0, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.4, 0.8, 1.6, 3, 5, 10, and 15 mM). The reactions were stopped at different time points, and reaction products were analyzed on LC-MS/MS to derive the kinetic curve (Fig. 5A and fig. S5A). The kinetic data show that the amide synthesis has a Km (Michaelis-Menten constant) of 3.06 mM and a Vmax of 14.47 nM min1, resulting in a kcat of 0.362 min1 and kcat/Km of 1.18 104 M1 min1 under the in vitro reaction conditions (Fig. 5A). As a comparison, we performed in vitro enzymatic assays to characterize the kinetic properties of SlCHS canonical activity by incubating purified SlCHS with p-coumaroyl-CoA and varying concentrations of malonyl-CoA (0, 5, 50, 100, 200, 300, and 500 M). The canonical synthesis of naringenin chalcone has a Km of 21.34 M and Vmax of 11.32 nM min1, resulting in a kcat of 0.0943 min1 and kcat/Km of 4.42 103 M1 min1 (fig. S5B). The results show a 143-fold difference between SlCHSs Km for 3-HAA methyl ester and malonyl-CoA, indicating that the enzyme has a much higher affinity for malonyl-CoA than for 3-HAA methyl ester. The results also show a 37-fold higher catalytic efficiency (kcat/Km) of SlCHS for synthesis of naringenin chalcone than for that of amide. Together, the results indicate that amide synthesis is likely to be a less efficient secondary function of SlCHS.

(A) Kinetic characterization of SlCHS synthesis of coumaroyl anthranilate amide (3). (B) Kinetic characterization of SlCHS synthesis of naringenin chalcone, inhibited with 0, 3, or 5 mM 3-HAA methyl ester. (C) Proposed inhibition mechanisms of 3-HAA methyl ester to SlCHS canonical activity. E, enzyme (SlCHS); EC, enzyme-coumaroyl complex; I, inhibitor (3-HAA methyl ester); ECI, enzyme-coumaroyl-inhibitor complex; CAA, coumaroyl anthranilate amide; M, malonyl-CoA; ECM, enzyme-diketide complex; ECM2, enzyme-triketide complex; ECM3, enzyme-tetraketide complex; NC, naringenin chalcone; ECMI, enzyme-diketide-inhibitor complex; ECM2I, enzyme-triketide-inhibitor complex; ECM3I, enzyme-tetraketide-inhibitor complex. Equation notations: v0, initial velocity; Vmax, maximal velocity; Km, Michaelis-Menten constant; S, substrate (i.e., malonyl-CoA); Kc, competitive inhibition coefficient; Ku, uncompetitive inhibition coefficient; n, Hill coefficient that simulates cooperativity effect by sequential binding of malonyl-CoA to the coumaroyl-bound enzyme complex. (D and E) Analysis on mode of inhibition by 3 mM (D) and 5 mM (E) 3-HAA methyl ester. Eq. 1, no inhibition; Eq. 2, competitive inhibition; Eq. 3, uncompetitive inhibition; Eq. 4, mixed-type inhibition. Data show the mean of two independent replicates, with error bar indicating the SD.

We next examined the mechanism of 3-HAA methyl ester inhibition of SlCHS canonical activity. Kinetic assays were performed by incubating purified SlCHS with p-coumaroyl-CoA and varying concentrations of malonyl-CoA (5, 50, and 100 M) as the substrate and 3-HAA methyl ester (0, 3, and 5 mM) as the inhibitor. The reactions were stopped at different time points, and reaction products were analyzed on LC-MS/MS to derive the kinetic curve for each inhibitor concentration (Fig. 5B). For the purpose of curve-fitting, only malonyl-CoA was considered as the substrate, since the reactions were performed under saturated concentrations of p-coumaroyl-CoA (200 M). We first fit all data points (measured under 0, 3, and 5 mM inhibitor) to Eq. 1 (Fig. 5, B and C). By tuning the Hill coefficient, we observed that root mean square error (RMSE) is minimized for data points of 0 mM when n = 1, for data points of 3 mM when n = 1.7, and for data points of 5 mM when n = 1.5 (Fig. 5B and table S2A). The curve-fitting results suggest that the effects of cooperativity emerge only when inhibitors are present.

We then fit the data points taken under 3 and 5 mM inhibitors to competitive (Fig. 5, Eq. 2), uncompetitive (Fig. 5, Eq. 3), or mixed-type (Fig. 5, Eq. 4) inhibition modes to interpret inhibition coefficients (Kc for competitive inhibition and Ku for uncompetitive inhibition) by fixing the values for Km and kcat at those obtained at 0 mM inhibitor (Fig. 5, C to E). Here, we used the Hill coefficient n to represent the effect of cooperativity resulting from sequential binding of three molecules of malonyl-CoA to coumaroyl-bound enzyme complex. For the data points obtained under 3 and 5 mM inhibitors, we observed minimization of RMSE with the mixed-type inhibition model, and the best fits were obtained at n = 1.7 and 1.5 for 3 and 5 mM inhibitors, respectively (Fig. 5, D and E, and table S2D). For 3 mM inhibitor, Kc = 0.377 mM and Ku = 1.01 mM (Ku/Kc = 2.67). For 5 mM inhibitor, Kc = 0.341 mM and Ku = 0.897 mM (Ku/Kc = 2.63). Together, the results indicate that inhibition is dominated by competitive mode in both cases with a shift from competitive to uncompetitive mode as inhibitor concentration increases from 3 to 5 mM.

Last, we investigated the production of pyrone derivatives BNY and CTAL by SlCHS when inhibited by 3-HAA methyl ester. We scanned for BNY and CTAL production from reaction mixtures fed with 100 M malonyl-CoA; 100 M p-coumaroyl-CoA; and 0, 3, or 5 mM 3-HAA methyl ester inhibitor at the end of the kinetic assay time course. We detected proportional levels of CTAL production compared to that of naringenin and no detectable levels of BNY production (fig. S4, B and C). The results suggest that 3-HAA is unlikely to promote the release of derailment by-products due to early termination in extension and/or cyclization during polyketide synthesis.

We leveraged a yeast biosynthesis platform to characterize the activity of a computationally predicted biosynthetic gene cluster from tomato, which led to the discovery of a previously undocumented HCAA compound and the potential of CHS for nitrogen-carbon bond synthesis. The HCAA compound is generated by the condensation of a hydroxycinnamic acid moiety and anthranilic acid moiety through formation of an amide bond. We showed that one of the substrates for HCAA production in yeast was 3-HAA methyl ester, which was converted from the native metabolite, 3-HAA, by each of the three methyltransferases in the predicted tomato gene cluster. Among the methyltransferases, SlMT2 exhibited the highest activity toward 3-HAA in yeast. Through systematic mutagenesis, in vivo activity screens, and in vitro substrate competition assays, we showed that SlCHS uses the same active site for its canonical naringenin chalcone synthesis activity to catalyze the condensation of 3-HAA methyl ester and p-coumaroyl-CoA, leading to the production of coumaroyl anthranilate amide (3). To our knowledge, this is the first report of a type III PKS enzyme exhibiting amide bond formation activity. In vitro kinetic assays indicate that SlCHS catalyzes the formation of (3) with a Km of 3.06 mM for 3-HAA methyl ester.

To examine the catalytic mechanism of CHS for HCAA synthesis, we referred to mechanisms of other classes of enzymes that catalyze similar reactions. Specifically, the acyl-CoA N-acyltransferases are a category of benzylalcohol acetyl-, anthocyanin-O-hydroxy-cinnamoyl-, anthranilate-N-hydroxy-cinnamoyl/benzoyl-, deacetylvindoline (BAHD) acyltransferases that catalyze the formation of HCAA in plants (3441) and share a conserved HXXXDG domain, positioned near the center of the enzyme (38). A histidine residue in the HXXXDG motif deprotonates the oxygen or nitrogen atom on the corresponding acceptor substrate, thereby allowing a nucleophilic attack on the carbonyl carbon of the CoA thioester and leading to the formation of a tetrahedral intermediate between the CoA thioester and acceptor substrate (39). The intermediate is reprotonated to release the free CoA and the acylated ester or amide. The aspartic acid residue in the conserved motif plays a structural rather than catalytic role by forming a salt bridge with a conserved arginine residue (39). Another family of enzymes, arylamine N-acetyltransferases (NATs), catalyzes a similar reaction that transfers an acetyl group from acetyl-CoA to the terminal nitrogen group of an arylamine substrate (42). The reaction is catalyzed by a cysteine-histidine-aspartic acid catalytic triad and is initiated by nucleophilic attack of the carbonyl group on acetyl-CoA by cysteine, activated by the histidine residue likely through formation of a thiolate-imidazolium ion pair (43, 44). The incoming arylamine attacks the carbonyl group bound to cysteine in forming a tetrahedral intermediate, with a general base deprotonating the amine group. Similarly to BAHD acyltransferases, it has been suggested that the deprotonation in NATs is assisted by the histidine residue in the catalytic triad (43). The aspartic acid residue was proposed to form a low-barrier hydrogen bond with the histidine residue to increase the basicity of the histidine for cysteine activation (43).

The catalytic mechanisms for BAHD acyltransferases and NATs suggest the potential roles of histidine at the SlCHS catalytic triad (C164-H303-N336) in (i) cysteine activation before nucleophilic attack of the carbonyl group of p-coumaroyl-CoA and (ii) deprotonating the incoming amine nucleophile in formation of a tetrahedral intermediate bound to cysteine. Previous studies on CHS catalytic mechanisms support (i) that H303 and C164 form a thiolate-imidazolium ion pair, which facilitates the nucleophilic attack of the thiolate anion on the thioester carbonyl of p-coumaroyl-CoA, resulting in transfer of the acyl moiety to C164 (28). Our in vivo mutagenesis data indicate that C164 and H303 are critical for canonical and amide synthesis. Therefore, it is likely that the mechanism for cysteine activation and acyl transfer is conserved for amide formation (fig. S6, A and B). In the next step, incoming 3-HAA methyl ester forms a covalent bond with the coumaroyl moiety bound to C164 by nucleophilic attack of the amine group on the carbonyl group of the coumaroyl moiety, leading to formation of a tetrahedral intermediate (fig. S6, C and D). The positively charged amide is then deprotonated by an unidentified general base (fig. S6, D and E), followed by release of the amide product (fig. S6F). H303 may play the role of the unidentified general base in deprotonating the incoming amine nucleophile as suggested for NATs (43); however, this process requires H303 to be regenerated (deprotonation of the imidazolium) after accepting a proton from a thiol group upon acyl transfer from p-coumaroyl-CoA to cysteine, the exact mechanism for which was not determined in this study.

Prior studies on CHS activity reported that mutations in an active site residue (F215) and acidification of in vitro reaction mixtures before extraction can lead to an increase in production of BNY and CTAL (29). In this work, we observed proportional levels of CTAL production compared to that of naringenin chalcone and no detectable levels of BNY production from CHS in vitro reaction mixtures. We also did not observe increases in BNY and CTAL from the F215 mutants expressed in yeast, in contrast to previously reported in vitro characterization of F215 mutants (29). The study reported the production of BNY from F215A and F215H mutants and CTAL from F215Y mutant, where BNY production was maximized at pH 7.0, and CTAL production was prominent within a pH range of 6.0 to 6.5 (29). The absence of detectable BNY and CTAL production by F215 mutants in our work may be due to differences in characterization conditions, i.e., yeast versus in vitro, and specifically may be due to the acidic pH 5.8 of yeast synthetic complete media. The observation also indicates that inhibited production of (3) observed with F215 mutants is not likely due to pyrone by-product accumulation at the CHS active site.

We observed that CHS exhibits catalytic promiscuity by catalyzing the synthesis of two different families of compounds: polyketide through its canonical activity and HCAA through the secondary activity characterized here. The syntheses of other HCAA compoundse.g., p-coumaroyltyramine, p-coumaroyldopamine, and feruloyldopamineby hydroxycinnamoyl-CoA:tyramine N-hydroxycinnamoyl transferase (THT), have been reported in tomato for defense against bacterial and fungal pathogens (45, 46). There is currently limited evidence to support that this secondary activity of CHS may be adapted by the plant host for HCAA synthesis, considering that the secondary activity shows ~40-fold lower efficiency (kcat/Km) compared to the canonical activity. However, this catalytic promiscuity may indicate a starting point for evolution of the enzyme to become an alternative route for HCAA compound production (47). For example, future work can compare the amine substrate specificity of both THT and CHS for HCAA synthesis, which may indicate an evolutionary advantage of CHS to catalyze hydroxycinnamoyl anthranilate-type HCAA if CHS shows higher activity toward anthranilic acid analogs than THT. Additional future work may focus on validating a role of the gene cluster in the native host by knocking out individual genes in tomato and performing metabolomics to search for metabolites that may be associated with the gene cluster. However, if the genes in the cluster are associated with a cryptic pathway, identification of a proper elicitor treatment would be required to induce the silent gene cluster and production of the target compound(s) in the host.

As more than 1000 putative plant gene clusters have now been predicted via computational tools (7, 911), future advances that further streamline high-throughput characterization workflows will be critical to characterizing activities encoded within these clusters. For example, future efforts may develop systematic criteria to prioritize gene clusters for yeast-based characterization and reliable high-throughput metabolite screening methods to accelerate the exploration of previously unidentified chemical space. Parallel genomic integration of multiple gene clusters can be facilitated by multiplexed CRISPR technology (48). Yeast harboring multiple gene clusters can then be screened for compound production using high-precision metabolomics, where improved computational workflows for untargeted metabolomics analysis can enable more efficient identification of novel low-abundance metabolites to distinguish robustly from background metabolite profiles. Thus, the integration of computational plant genome analysis, yeast-based heterologous pathway expression, and advances in analytics will allow for the streamlined characterization and discovery of biosynthetic routes that may be difficult to uncover in planta.

DNA sequences for heterologous biosynthetic enzymes were codon-optimized to improve expression in S. cerevisiae using GeneArt GeneOptimizer software (Thermo Fisher Scientific, Waltham, MA) and were synthesized as gene fragments (Twist Bioscience, San Francisco, CA). For guide RNA (gRNA)/Cas9 plasmids, 20base pair (bp) gRNAs targeting the genomic site were synthesized as primers (TSC13 gRNA1: AACAGCTCAAATGTACGCAT; TSC13 gRNA2: ATAACTTAGCATTCCCAAAG; SlCHS gRNA: TGTTGGTACATCATCAATCT), overlap polymerase chain reaction (PCR)amplified with tRNA promoter/hepatitis delta virus (HDV) ribozyme PCR fragment (pCS3411), trans-activating CRISPR RNA (tracrRNA)/terminator PCR fragment (pCS3414), and cloned into a SpCas9 expression vector with G418 resistance (pCS3410) via Gibson assembly (49).

Plasmids for protein expression in E. coli were constructed by inserting DNA fragments encoding At4CL, SlCHS, and AtCHS into pET28 vector via Gibson assembly, for which the PCR-amplified pET28 vector backbone and the protein inserts share a 40base pair (bp) overhang at both ends of the linear DNA components. Plasmid encoding the parent SlCHS protein in the site-directed mutagenesis study was constructed using Gibson assembly. The plasmid vector (pCS3305) was digested by restriction enzymes Xba I and Xho I, and the SlCHS gene insert was amplified from a gene fragment.

Plasmids for single amino acid mutant variants were constructed either via Gibson assembly or blunt-end ligation. For the Gibson assembly method, primers encoding the single amino acid substitution were used to amplify the parent plasmid and the linear DNA product. The linear DNA product contained a 15-bp overlap between its 5 and 3 ends and was annealed by Gibson assembly. For blunt-end ligation method, a primer pair without overhang was used to amplify the parent plasmid, and the 5 primer encodes the single amino acid substitution. The linear DNA product is then incubated with T4 nucleotide kinase [New England Biolabs (NEB), Ipswich, MA] at 37C for 30 min and subsequently with T4 DNA ligase (NEB, Ipswich, MA) at room temperature for 2 hours.

All the primers in this work were synthesized by the Stanford Protein and Nucleic Acid Facility (Stanford, CA). PCR amplifications were performed with Q5 High-Fidelity DNA polymerase (NEB, Ipswich, MA), and PCR products were purified using the DNA Clean and Concentrator Kit (Zymo Research, Irvine, CA). Plasmids generated in this work are listed in table S3.

The chemical standard for methyl 3-hydroxy-2-(3-(4-hydroxyphenyl)propanamido)benzoate [dihydro-coumaroyl anthranilate amide (2)] and (E)-methyl 3-hydroxy-2-(3-(4-hydroxyphenyl)acrylamido)benzoate [coumaroyl anthranilate amide (3)] was purchased from Toronto Research Chemicals (Canada). Methyl 2-amino-3-hydroxybenzoate [3-HAA methyl ester (1)] was purchased from Apollo Scientific (UK). p-Coumaric acid, malonyl-CoA, 3-HAA, methyl 3-hydroxybenzoate, 2-amino-3-methoxybenzoic acid, 2-amino-4-methoxybenzoic acid, 2-amino-5-methoxybenzoic acid, 2-amino-5-hydroxybenzoic acid, and 2-anthranilic acid were purchased from Sigma-Aldrich (St. Louis, MO). Naringenin chalcone was purchased from Biosynth Carbosynth (USA). Naringenin was purchased from MedChemExpress (USA). p-coumaroyl-CoA standard was purchased from PlantMetaChem (Germany).

Yeast strains used in this study are listed in table S3. All yeast strains are haploid, derived from CEN.PK2-1D (50) (MAT URA3-52, TRP1-289, LEU2-3/112, HIS31, MAL2-8C, and SUC2), referred to as CEN.PK2. Genes in the predicted tomato cluster were codon-optimized and assembled with corresponding promoter/terminator fragments and integrated into pYES1L (Life Technologies, Carlsbad, CA). To create the minimal pathway strain, the pathway genes (SlCHS, Sl4CL, and SlMT1/2/3) were first cloned into pAG414-GDP1p/ADHt, pAG414-PGK1p/PHO5t, pAG414-PYK1p/MFA1t, or pAG414-TEF1p/CYC1t expression vector with Gibson assembly, and the linear DNA fragment for each pathway gene expression cassette with 30-bp overlap between each fragment was PCR-amplified from the pAG vectors, assembled, and integrated into YMR206W:: locus with SpHIS5 selection marker.

TSC13 and SlCHS knockout strains were created by CRISPR-Cas9 genome editing method as previously described (51). The linear DNA repair templates were PCR-amplified and harbor a 30- to 45-bp overlap with the target genomic site. Two hundred nanograms of gRNA/SpCas9 plasmid and 500 ng of linear DNA template were cotransformed into yeast competent cells prepared from the Frozen-EZ Yeast Transformation II Kit (Zymo Research, Irvine, CA), as described in the Yeast strain construction and transformation section. Colonies picked from G418 plate after 3 days were screened for metabolite production.

For yeast transformations, a single colony of the parent strain was inoculated in yeast peptone with 2% dextrose (YPD) media and incubated overnight at 30C and 220 rpm. The saturated overnight culture was then diluted 50-fold in fresh YPD media and incubated for 4 to 6 hours. Cells (2.5 ml) were used per transformation. The cells were then harvested by centrifugation at 3500g for 4 min and prepared for transformation using the Frozen-EZ Yeast Transformation II Kit (Zymo Research, Irvine, CA). For plasmid transformations, 50 ng of DNA was used per transformation. The transformed cells were plated directly onto synthetic dropout agar plates after 45-min incubation with EZ3 solution. For Cas9-based chromosomal integrations, 100 ng of the Cas9 plasmid (encodes G418 resistance) and 500 ng of the linear DNA fragments were used per transformation, and the transformed cells were subject to a 2-hour recovery at 30C in YPD media after 45-min incubation with EZ3 solution. The cells were plated onto synthetic dropout plates supplemented with G418 (400 mg/liter) to select for colonies with successfully integrated constructs. The plate cultures were incubated 2 to 3 days before colonies were picked for metabolite production assays.

To screen for metabolite production, two or three colonies were inoculated for each strain (or transformed strain) into 400 l of synthetic complete or dropout media with 2% dextrose in 2-ml 96-well plates, grown for 16 to 20 hours to saturation, diluted at a 1:8 ratio into fresh media with corresponding feeding conditions, and grown for 72 hours at 25 or 30C, as indicated, before metabolite analysis of culture supernatant on LC-MS/qToF-MS.

For targeted metabolite production assays, 100 l of supernatant of yeast culture from 96-well plates was obtained by centrifugation at 4000g for 5 min. The sample was analyzed by an Agilent 1260 Infinity Binary high-performance LC (HPLC) paired with an Agilent 6420 Triple Quadrapole LC-MS, with a reversed-phase column (Agilent EclipsePlus C18, 2.1 50 mm, 1.8 m), water with 0.1% formic acid as solvent A, and acetonitrile with 0.1% formic acid as solvent B, at a constant flow rate of 0.4 ml/min and an injection volume of 5 l. The following gradient was used for compound separation: 0 to 6 min, 3 to 50% B; 6 to 9 min, 50 to 97% B; 9 to 10 min, 97% B; 10 to 10.5 min, 97 to 3% B; 10.5 to 11 min, equilibration with 3% B. The liquid chromatogram eluent was directed to the MS for 1 to 10 min with electrospray ionization (ESI) source in positive mode, gas temperature at 350C, gas flow rate at 10 liters/min, and nebulizer pressure at 50 psi. LC-MS data files were analyzed in Agilent MassHunter Workstation software. The liquid chromatograms and product ion scans were extracted either by specified precursor ion from total ion current or by MRM with ion transitions and related parameters specified in table S4. All the MRM transitions in this work were derived from product ion scan with specified precursor ion, and the most abundant product ion was chosen for MRM transition quantification. For each compound, production was quantified by integrating the peak area under the ion count curve. The ion counts were calibrated to a chemical standard curve and converted to measurements of titer (ng/ml or g/ml) and molar concentration (nM) for in vivo and in vitro assays, respectively.

For untargeted metabolite production assays, 200 l of yeast culture from 96-well plates was flash-frozen, lyophilized overnight, and dissolved in 100 l of 75% methanol (with 25% water) with 0.1% formic acid. The sample was analyzed by the Agilent 1260 Infinity Binary HPLC paired with an Agilent 6545 Quadrupole Time-of-Flight LC-MS, with a reversed-phase column (Agilent EclipsePlus C18, 2.1 50 mm, 1.8 m), water with 0.1% formic acid as solvent A, and acetonitrile with 0.1% formic acid as solvent B, at a constant flow rate of 0.6 ml/min and an injection volume of 1 l. The following gradient was used for compound separation: 0 to 0.40 min, 5% B; 0.40 to 8.40 min, 5 to 95% B; 8.40 to 10.40 min, 95% B; 10.40 to 10.41 min, 95 to 5% B; 10.41 to 12.00 min, 5% B. The liquid chromatogram eluent was directed to the MS for 1 to 12 min with ESI source in positive mode, gas temperature at 250C, gas flow rate at 12 liters/min, nebulizer pressure at 10 psig, Vcap at 3500 V, fragmentor at 100 V, skimmer at 50 V, octupole 1 RF Vpp at 750 V, and acquisition scan rate at 2.50 spectra/s.

SlCHS homology model was built using Phyre2 (31) from amino acid sequence, with 85% identity with template c1cml chain A from Protein Data Bank. Docking simulation was performed by AutoDock Vina (32), and docking results were visualized using PyMOL. Geometry optimizations of substrate structures before docking simulations were conducted using Gaussian 16 (DFT, B3LYP, and LANL2DZ).

Protein expression plasmids were transformed into E. coli BL21(DE3) cells. For each protein construct, single colony was inoculated into 5 ml of LB media with kanamycin (50 mg/liter) and incubated at 37C and 220 rpm for 16 hours (overnight). Overnight culture (5 ml) was then inoculated into 1 liter of Luria-Bertani (LB) media with kanamycin (50 mg/liter) and incubated at 37C and 200 rpm for around 5 hours until an optical density at 600 nm (OD600) reached 0.6. The culture was then cooled to 18C, induced with 0.5 mM isopropyl--d-thiogalactopyranoside, and incubated for 16 hours at 200 rpm. The cells were harvested by centrifugation at 4000g for 15 min, and all the following steps were performed on ice with prechilled buffers and reagents. The cell pellet was first washed in 50 mM (pH 8.0) tris buffer, resuspended in lysis buffer [10 mM imidazole, 50 mM sodium phosphate, and 300 mM sodium chloride (pH 7.4)], and lysed by sonication. The cellular debris was removed from cell lysate by centrifugation at 16,000g and 4C for 1 hour. The enzyme proteins were purified from the supernatant using Ni-NTA agarose affinity chromatography and eluted using a range of imidazole concentrations (40, 100, 150, 200, 250, and 450 mM) with the target protein most efficiently eluted at 200 mM imidazole. The purified proteins were then buffer-exchanged and concentrated to storage buffer [50 mM potassium phosphate, 100 mM NaCl, and 10% (v/v) glycerol (pH 7.5)]. The protein concentration was determined by NanoDrop and corrected by extinction coefficient. The final yield for all three proteins is ~2.2 mg/ml. Aliquots of the purified proteins were flash-frozen and stored at 80C.

p-Coumaroyl-CoA was synthesized by a batch of in vitro reactions with purified protein (40 g/ml) of At4CL, 400 M p-coumaric acid, 400 M CoA, 4 mM adenosine 5-triphosphate, and 5 mM MgCl2, added to a buffer with 50 mM potassium phosphate and 100 mM NaCl at pH 7.5. The reaction mixture was incubated at 37C and 500 rpm for 4 hours. Aliquots of the reaction products were stored at 20C.

For SlCHS and AtCHS in vitro activity validation, 4 g of purified protein and 200 M p-coumaroyl-CoA were incubated with 200 M malonyl-CoA and/or 3-HAA methyl ester in a 50-l reaction volume at 30C and 450 rpm for 4 hours in the dark. The reaction volume was quenched in equal volume of acidic methanol (with 0.1% formic acid), the mixture was centrifuged at 32,000g for 10 min, and the supernatant was used for LC-MS analysis. For the specificity assay, 4 g of purified protein and 200 M p-coumaroyl-CoA were incubated with 200 M 3-HAA, methyl 3-hydroxybenzoate, 2-amino-3-methoxybenzoic acid, 2-amino-4-methoxybenzoic acid, 2-amino-5-methoxybenzoic acid, 2-amino-5-hydroxybenzoic acid, or 2-anthranilic acid, with the same incubation and extraction protocol described above.

For amide synthesis kinetic assays, 680 or 40 nM purified SlCHS protein and 200 or 500 M p-coumaroyl-CoA were incubated with 0, 1, 5, 10, 50, 100, and 200 M or 0, 0.4, 0.8, 1.6, 3, 5, 10, and 15 mM 3-HAA methyl ester. For canonical activity kinetic assay, 120 nM purified SlCHS protein and 200 M p-coumaroyl-CoA were incubated with 0, 5, 50, 100, 200, 300, and 500 M malonyl-CoA. For each assay, duplicates were performed in 50-l reaction volumes; incubated at 30C and 450 rpm under dark condition; and quenched by adding equal volume of acidic methanol (with 0.1% formic acid) at 5, 10, 15, 20, and 25 min (for amide synthesis with low concentration range of 3-HAA methyl ester); at 6, 24, 30, and 36 min (for amide synthesis with high concentration range of 3-HAA methyl ester); or at 5, 10, 17, 24, and 31 min (for canonical activity). The samples were further diluted by adding 30 l of water and filtered using 0.2 M filter plates before measurements on LC-MS/MS.

For enzymatic inhibition assays, 108 nM purified SlCHS protein was incubated with 200 M p-coumaroyl-CoA and 5, 50, or 100 M malonyl-CoA and 0, 3, or 5 mM 3-HAA methyl ester. For each assay, duplicates were performed in 40-l reaction volumes; incubated at 30C, 450 rpm under dark condition; and quenched by adding equal volume of acidic methanol (with 0.1% formic acid) at 5, 10, 17, 24, and 31 min. The samples were further diluted by adding 30 l of water and filtered using 0.2 M filter plates before measurements on LC-MS/MS.

For untargeted metabolomic analysis, data were obtained from n = 3 biologically independent replicates. Biological independence refers to individual colonies of a yeast strain inoculated into separate culture volumes under the same feeding and growth conditions. qToF-MS data files were converted to mzXML files using MSConvert, and untargeted metabolomics differential analysis was performed using the xcms package in R (52). The differential peaks were then identified by sorting the diffreport generated from xcms differential analysis by fold parameter, with a filter set for a P value smaller than 0.01.

For metabolite production, each liquid chromatogram trace is representative of two biologically independent replicates. Ion count data show the mean of n = 2 or 3 biologically independent replicates, with error bar indicating the SD. Biological independence refers to individual colonies of a yeast strain inoculated into separate culture volumes under the same feeding and growth conditions. Statistical significance analysis was performed (for selected data) by unpaired two-tailed t test.

For in vitro kinetic assay, progress curve data show the mean of compound produced from n = 2 independent replicates performed simultaneously in separated reaction volumes, with error bar indicating the SD. For amide synthesis kinetic assays, initial reaction rates and error bars were calculated by fitting progress curves with a built-in linear regression tool in GraphPad Prism 7 for amide formation reactions. For canonical activity inhibition assay by 3-HAA methyl ester, progress curves were fitted using DynaFit (53) through an ordinary differential equation (ODE)based system derived from the kinetic model specified in fig. S5E. Because of an initial lag phase in the progress curve, the reaction rates were obtained from the first derivative of the progress curve (calculated by DynaFit) and then fitted to the general equation M(1exp(ax)) in MATLAB 2017a, in which M, i.e., plateau of the rate function, represents the reaction rate at steady state, i.e., linear region of the progress curve. For kinetic curve, data show the slope or M obtained from progress curve data analysis, with error bar representing the relative error (%) of the slope (calculated by GraphPad Prism 7 linear regression tool) or relative RMS (%) for progress curve fitting (calculated by DynaFit). Km and Vmax for kinetic data were estimated using built-in Michaelis-Menten kinetic nonlinear regression tool in GraphPad Prism 7 (for amide synthesis) or MATLAB 2017b by fitting data with kinetic equations as specified in Fig. 5C (for canonical activity inhibition assay).

Acknowledgments: We thank A. Cravens for the providing the Cas9/single-guide RNA plasmids (pCS3410, 3411, and 3414) for yeast genomic editing, J. Payne for performing the geometry optimizations of substrate structures for docking simulations, T. Valentic and J. Payne for training in protein purification and valuable discussions on protocol design for in vitro experiments, J. E. Jeon and X. Guan for assistance with tomato metabolomics analyses, and the Stanford ChEM-H Metabolic Chemistry Analysis Center and C. Fischer for instrument (qToF-MS) access and training. We thank E. Sattely, S. Y. Rhee, and C. Khosla for discussions and advice on experimental design. We thank T. Valentic, P. Srinivasan, and B. Kotopka for feedback in the preparation of this manuscript. Funding: This work was supported by the NIH U01GM110699 Genome to Natural Products Initiative and Chan-Zuckerberg Biohub Foundation. Author contributions: All authors designed the research, analyzed the data, and wrote the paper. D.K. and S.L. performed the research. S.L. performed untargeted metabolomics analysis and found the new compounds. D.K. and S.L. proposed and characterized the tomato cluster activity in yeast. D.K. performed and analyzed CHS in vivo site-directed mutagenesis studies and in vitro enzyme assays. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Read this article:
Discovery of a previously unknown biosynthetic capacity of naringenin chalcone synthase by heterologous expression of a tomato gene cluster in yeast -...

A computer’s all you need: Folding@Home joins the race to find a COVID-19 cure – The Stanford Daily

Today, you just need a computer Thats all you need. You dont need to have a fancy computer [or a] super modern computer. Anything will do, said Anton Thynell, head of collaboration and communication at [emailprotected]

Founded by chemistry, structural biology and computer science professor Vijay Pande in 2000 at Stanford, the global computing research community [emailprotected] (FAH) is now joining the race to find a cure for COVID-19. Volunteers from across the globe are downloading the FAH software, which is accessible to everyone, to run simulations of protein-folding in the background of their computer. The simultaneous running of these simulations contributes to researchers efforts to find treatments to certain diseases, illnesses and COVID-19.

[emailprotected] was originally a computing project that studied and simulated biomolecular systems. In 2006, collaborators from Stanford University joined the project and later increased computing performance to a level that rivaled that of a supercomputer.

Upon downloading the FAH software, volunteers are given specific proteins to run simulations on. They then can start folding by running their extra CPU power a part of the computer that operates instructions and later, they upload the results. The word folding comes from the process that proteins undergo when they are created. During that process, protein molecules transform from a long chain of amino acids to a complex shape (it folds up). The resulting structure allows researchers to understand the proteins properties and functions.

The FAH community aims to apply their professional knowledge along with volunteers computing power to understand the role of proteins dynamics in their function and dysfunction, and to aid in the design of new proteins and therapeutics. It is established as the worlds fastest supercomputer according to Ethan Zuo, president of [emailprotected] a group of volunteers who contribute to the [emailprotected] research project.

Thynell, who joined the [emailprotected] community in 2013, said that since COVID-19 began, he and his team have created a separate project that relied on the [emailprotected] concept to understand SARS-CoV-2.

[COVID-19] was really an all-hands-on-deck situation, Thynell said. I stopped working at my regular job and started full-time at [emailprotected] We grew our community [to] about 150 times [our past size] in three months. Thats where we are today.

Thynell broke down the process and importance of understanding protein dynamics when trying to find treatments or solutions to diseases, pandemics and more.

Most of the time, when youre studying biology, you look at proteins as a fixed structure, but theyre actually moving around, Thynell said.And there are tons of reactions happening in our cell structure all the time. So these proteins are actually like small machines We wanted to understand more about the virus and hopefully find some hidden pockets. Its like a treasure map, and sometimes you find a treasure.

These hidden pockets can open up for a certain period of time and you can look at them at potential[ly] druggable sites, which is very interesting for developing therapeutics, he added.

Zuo added that [emailprotected] is helping researchers study spike proteins, a type of protein that is part of the SARS-CoV-2 and allows the coronavirus to enter host cells. Zuo states that using extra computing power to run simulations of the virus can speed up the process of studying how these proteins work, which can then help researchers find ways to manipulate them using medicine.

When you download our software from our website and you have Wi-Fi or internet, you connect with our servers and download the small work unit thats a small part of a large simulation and your computer starts crunching away at it, Thynell said. You can decide how much computing power you want to dedicate or when you want to start. Its all up to you.

Recently, Zuo has been very active in volunteering for the [emailprotected] COVID-19 project. He leaves his computer on 24 hours a day so that it can build computational models to help identify sites of the spike protein that researchers can target through a therapeutic antibody.

[When] school shut down, everyone was doing online learning, Zuo said. When doing online learning, I realized that everyone is using their computers for a large fraction of the day [but] not 100% of their computing potential is used. So I decided to [help] put that extra compute power to good use Even though [emailprotected] is the worlds largest supercomputer, a surprising number of people dont know about [it].

By reaching out to more people, youll make the supercomputer more powerful [in] finding a cure for COVID-19 more quickly and gain knowledge more effectively, he added.

Recently, [emailprotected] has been working with COVID Moonshotan organization aiming to develop inexpensive patent-free therapeutics for COVID-19 to identify key compounds that may stop the main viral protease (Mpro),an enzyme that breaks down proteins of COVID-19. As of now, over 800 compounds have been simulated and tested. Volunteers are actively participating in weekly sprints in which they donate their computing power to crunching work units to collect and generate new designs for proteins. Additionally, researchers are constantly discovering new things about the virus and are actively publishing them on their home website.

To see and measure progress within [emailprotected] teams, volunteers are able to collect individual points for their contributions, which are displayed on a universal leaderboard. Depending on the computation power and system, certain amounts of points may also be awarded to teams, which puts them higher on the leaderboard.

According to Thynell, the leaderboard also shows what communities are participating in folding; these include tech companies such as Google, Reddit, Linus, NVIDIA and Intel. Global teams include China [emailprotected] Power, Overclockers Australia and TSC Russia.

What was really interesting is that [emailprotected] is global, Thynell said. We have people contributing from every part of the world. And its really amazing to see a global community coming together and fighting the virus, with the spare computing power of your home computer. That has been really nice to see.

Contact Rachel Jiang at racheljiang310 at gmail.com

Continued here:
A computer's all you need: Folding@Home joins the race to find a COVID-19 cure - The Stanford Daily

Protein domain – Wikipedia

Conserved part of a protein

A protein domain is a conserved part of a given protein sequence and tertiary structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length.[1] The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

The concept of the domain was first proposed in 1973 by Wetlaufer after X-raycrystallographic studies of hen lysozyme[2] and papain[3]and by limited proteolysis studies of immunoglobulins.[4][5] Wetlaufer defined domains as stable units of protein structure that could fold autonomously. In the past domains have been described as units of:

Each definition is valid and will often overlap, i.e. a compact structural domain that is found amongst diverse proteins is likely to fold independently within its structural environment. Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities.[9] In a multidomain protein, each domain may fulfill its own function independently, or in a concerted manner with its neighbours. Domains can either serve as modules for building up large assemblies such as virus particles or muscle fibres, or can provide specific catalytic or binding sites as found in enzymes or regulatory proteins.

An appropriate example is pyruvate kinase (see first figure), a glycolytic enzyme that plays an important role in regulating the flux from fructose-1,6-biphosphate to pyruvate. It contains an all- nucleotide binding domain (in blue), an /-substrate binding domain (in grey) and an /-regulatory domain (in olive green),[10] connected by several polypeptide linkers.[11] Each domain in this protein occurs in diverse sets of protein families.[12]

The central /-barrel substrate binding domain is one of the most common enzyme folds. It is seen in many different enzyme families catalysing completely unrelated reactions.[13] The /-barrel is commonly called the TIM barrel named after triose phosphate isomerase, which was the first such structure to be solved.[14] It is currently classified into 26 homologous families in the CATH domain database.[15] The TIM barrel is formed from a sequence of -- motifs closed by the first and last strand hydrogen bonding together, forming an eight stranded barrel. There is debate about the evolutionary origin of this domain. One study has suggestedthat a single ancestral enzyme could have diverged into several families,[16] while another suggests that a stable TIM-barrel structure has evolvedthrough convergent evolution.[17]

The TIM-barrel in pyruvate kinase is 'discontinuous', meaning that more than one segment of the polypeptide is required to form the domain. This is likely to be the result of the insertion of one domain into another during the protein's evolution. It has been shown from known structures that about a quarter of structural domains are discontinuous.[18][19] The inserted -barrel regulatory domain is 'continuous', made up of a single stretch of polypeptide.

The primary structure (string of amino acids) of a protein ultimately encodes its uniquely folded three-dimensional (3D) conformation.[20] The most important factor governing the folding of a protein into 3D structure is the distribution of polar and non-polar side chains.[21] Folding is driven by the burial of hydrophobic side chains into the interior of the molecule so to avoid contact with the aqueous environment. Generally proteins have a core of hydrophobic residues surrounded by a shell of hydrophilic residues. Since the peptide bonds themselves are polar they are neutralised by hydrogen bonding with each other when in the hydrophobic environment. This gives rise to regions of the polypeptide that form regular 3D structural patterns called secondary structure. There are two main types of secondary structure: -helices and -sheets.

Some simple combinations of secondary structure elements have been found to frequently occur in protein structure and are referred to as supersecondary structure or motifs. For example, the -hairpin motif consists of two adjacent antiparallel -strands joined by a small loop. It is present in most antiparallel structures both as an isolated ribbon and as part of more complex -sheets. Another common super-secondary structure is the -- motif, which is frequently used to connect two parallel -strands. The central -helix connects the C-termini of the first strand to the N-termini of the second strand, packing its side chains against the -sheet and therefore shielding the hydrophobic residues of the -strands from the surface.

Covalent association of two domains represents a functional and structural advantage since there is an increase in stability when compared with the same structures non-covalently associated.[22] Other, advantages are the protection of intermediates within inter-domain enzymatic clefts that mayotherwise be unstable in aqueous environments, and a fixed stoichiometric ratio of the enzymatic activity necessary for a sequential set of reactions.[23]

Structural alignment is an important tool for determining domains.

Several motifs pack together to form compact, local, semi-independent units called domains.[6]The overall 3D structure of the polypeptide chain is referred to as the protein's tertiary structure. Domains are the fundamental units of tertiary structure, each domain containing an individual hydrophobic core built from secondary structural units connected by loop regions. The packing of the polypeptide is usually much tighter in the interior than the exterior of the domain producing a solid-like core and a fluid-like surface.[24] Core residues are often conserved in a protein family, whereas the residues in loops are less conserved, unless they are involved in the protein's function. Protein tertiary structure can be divided into four main classes based on the secondary structural content of the domain.[25]

Domains have limits on size.[27] The size of individual structural domains varies from 36 residues in E-selectin to 692 residues in lipoxygenase-1,[18] but the majority, 90%, have fewer than 200 residues[28] with an average of approximately 100 residues.[29] Very short domains, less than 40 residues, are often stabilised by metal ions or disulfide bonds. Larger domains, greater than 300 residues, are likely to consist of multiple hydrophobic cores.[30]

Many proteins have a quaternary structure, which consists of several polypeptide chains that associate into an oligomeric molecule. Each polypeptide chain in such a protein is called a subunit. Hemoglobin, for example, consists of two and two subunits. Each of the four chains has an all- globin fold with a heme pocket.

Domain swapping is a mechanism for forming oligomeric assemblies.[31] In domain swapping, a secondary or tertiary element of a monomeric protein is replaced by the same element of another protein. Domain swapping can range from secondary structure elements to whole structural domains. It also represents a model of evolution for functional adaptation by oligomerisation, e.g. oligomeric enzymes that have their active site at subunit interfaces.[32]

Nature is a tinkerer and not an inventor,[33] new sequences are adapted from pre-existing sequences rather than invented. Domains are the common material used by nature to generate new sequences; they can be thought of as genetically mobile units, referred to as 'modules'. Often, the C and N termini of domains are close together in space, allowing them to easily be "slotted into" parent structures during the process of evolution. Many domain families are found in all three forms of life, Archaea, Bacteria and Eukarya.[34] Protein modules are a subset of protein domains which are found across a range of different proteins with a particularly versatile structure. Examples can be found among extracellular proteins associated with clotting, fibrinolysis, complement, the extracellular matrix, cell surface adhesion molecules and cytokine receptors.[35] Four concrete examples of widespread protein modules are the following domains: SH2, immunoglobulin, fibronectin type 3 and the kringle.[36]

Molecular evolution gives rise to families of related proteins with similar sequence and structure. However, sequence similarities can be extremely low between proteins that share the same structure. Protein structures may be similar because proteins have diverged from a common ancestor. Alternatively, some folds may be more favored than others as they represent stable arrangements of secondary structures and some proteins may converge towards these folds over the course of evolution. There are currently about 110,000 experimentally determined protein 3D structures deposited within the Protein Data Bank (PDB).[37] However, this set contains many identical or very similar structures. All proteins should be classified to structural families to understand their evolutionary relationships. Structural comparisons are best achieved at the domain level. For this reason many algorithms have been developed to automatically assign domains in proteins with known 3D structure; see 'Domain definition from structural co-ordinates'.

The CATH domain database classifies domains into approximately 800 fold families; ten of these folds are highly populated and are referred to as 'super-folds'. Super-folds are defined as folds for which there are at least three structures without significant sequence similarity.[38] The most populated is the /-barrel super-fold, as described previously.

The majority of proteins, two-thirds in unicellular organisms and more than 80% in metazoa, are multidomain proteins.[39] However, other studies concluded that 40% of prokaryotic proteins consist of multiple domains while eukaryotes have approximately 65% multi-domain proteins.[40]

Many domains in eukaryotic multidomain proteins can be found as independent proteins in prokaryotes,[41] suggesting that domains in multidomain proteins have once existed as independent proteins. For example, vertebrates have a multi-enzyme polypeptide containing the GAR synthetase, AIR synthetase and GAR transformylase domains (GARs-AIRs-GARt; GAR: glycinamide ribonucleotide synthetase/transferase; AIR: aminoimidazole ribonucleotide synthetase). In insects, the polypeptide appears as GARs-(AIRs)2-GARt, in yeast GARs-AIRs is encoded separately from GARt, and in bacteria each domain is encoded separately.[42]

Multidomain proteins are likely to have emerged from selective pressure during evolution to create new functions. Various proteins have diverged from common ancestors by different combinations and associations of domains. Modular units frequently move about, within and between biological systems through mechanisms of genetic shuffling:

The simplest multidomain organization seen in proteins is that of a single domain repeated in tandem.[46] The domains may interact with each other (domain-domain interaction) or remain isolated, like beads on string. The giant 30,000 residue muscle protein titin comprises about 120 fibronectin-III-type and Ig-type domains.[47] In the serine proteases, a gene duplication event has led to the formation of a two -barrel domain enzyme.[48] The repeats have diverged so widely that there is no obvious sequence similarity between them. The active site is located at a cleft between the two -barrel domains, in which functionally important residues are contributed from each domain. Genetically engineered mutants of the chymotrypsin serine protease were shown to have some proteinase activity even though their active site residues were abolished and it has therefore been postulated that the duplication event enhanced the enzyme's activity.[48]

Modules frequently display different connectivity relationships, as illustrated by the kinesins and ABC transporters. The kinesin motor domain can be at either end of a polypeptide chain that includes a coiled-coil region and a cargo domain.[49] ABC transporters are built with up to four domains consisting of two unrelated modules, ATP-binding cassette and an integral membrane module, arranged in various combinations.

Not only do domains recombine, but there are many examples of a domain having been inserted into another. Sequence or structural similarities to otherdomains demonstrate that homologues of inserted and parent domains can exist independently. An example is that of the 'fingers' inserted into the 'palm' domain within the polymerases of the Pol I family.[50] Since a domain can be inserted into another, there should always be at least one continuous domain in a multidomain protein. This is the main difference between definitions of structural domains and evolutionary/functional domains. An evolutionary domain will be limited to one or two connections between domains, whereas structural domains can have unlimited connections, within a given criterion of the existence of a common core. Several structural domains could be assigned to an evolutionary domain.

A superdomain consists of two or more conserved domains of nominally independent origin, but subsequently inherited as a single structural/functional unit.[51] This combined superdomain can occur in diverse proteins that are not related by gene duplication alone. An example of a superdomain is the protein tyrosine phosphataseC2 domain pair in PTEN, tensin, auxilin and the membrane protein TPTE2. This superdomain is found in proteins in animals, plants and fungi. A key feature of the PTP-C2 superdomain is amino acid residue conservation in the domain interface.

Protein folding - the unsolved problem: Since the seminal work of Anfinsen in the early 1960s,[20] the goal to completely understand the mechanism by which a polypeptide rapidly folds into its stable native conformation remains elusive. Many experimental folding studies have contributed much to our understanding, but the principles that govern protein folding are still based on those discovered in the very first studies of folding. Anfinsen showed that the native state of a protein is thermodynamically stable, the conformation being at a global minimum of its free energy.

Folding is a directed search of conformational space allowing the protein to fold on a biologically feasible time scale. The Levinthal paradox states that if an averaged sized protein would sample all possible conformations before finding the one with the lowest energy, the whole process would take billions of years.[52] Proteins typically fold within 0.1 and 1000 seconds. Therefore, the protein folding process must be directed some way through a specific folding pathway. The forcesthat direct this search are likely to be a combination of local and global influences whose effects are felt at various stages of the reaction.[53]

Advances in experimental and theoretical studies have shown that folding can be viewed in terms of energy landscapes,[54][55] where folding kinetics is considered as a progressive organisation of an ensemble of partially folded structures through which a protein passes on its way to the folded structure. This has been described in terms of a folding funnel, in which an unfolded protein has a large number of conformational states available and there are fewer states available to the folded protein. A funnel implies that for protein folding there is a decrease in energy and loss of entropy with increasing tertiary structure formation. The local roughness of the funnel reflects kinetic traps, corresponding to the accumulation of misfolded intermediates. A folding chain progresses toward lower intra-chain free-energies by increasing its compactness. The chain's conformational options become increasingly narrowed ultimately toward one native structure.

The organisation of large proteins by structural domains represents an advantage for protein folding, with each domain being able to individually fold, accelerating the folding process and reducing a potentially large combination of residue interactions. Furthermore, given the observed random distribution of hydrophobic residues in proteins,[56] domain formation appears to be the optimal solution for a large protein to bury its hydrophobic residues while keeping the hydrophilic residues at the surface.[57][58]

However, the role of inter-domain interactions in protein folding and in energetics of stabilisation of the native structure, probably differs for each protein. In T4 lysozyme, the influence of one domain on the other is so strong that the entire molecule is resistant to proteolytic cleavage. In this case, folding is a sequential process where the C-terminal domain is required to fold independently in an early step, and the other domain requires the presence of the folded C-terminal domain for folding and stabilisation.[59]

It has been found that the folding of an isolated domain can take place at the same rate or sometimes faster than that of the integrated domain,[60] suggesting that unfavourable interactions with the rest of the protein can occur during folding. Several arguments suggest that the slowest step in the folding of large proteins is the pairing of the folded domains.[30] This is either because the domains are not folded entirely correctly or because the small adjustments required for their interaction are energetically unfavourable,[61] such as the removal of water from the domain interface.

Protein domain dynamics play a key role in a multitude of molecular recognition and signaling processes.Protein domains, connected by intrinsically disordered flexible linker domains, induce long-range allostery via protein domain dynamics.The resultant dynamic modes cannot be generally predicted from static structures of either the entire protein or individual domains. They can however be inferred by comparing different structures of a protein (as in Database of Molecular Motions). They can also be suggested by sampling in extensive molecular dynamics trajectories[62] and principal component analysis,[63] or they can be directly observed using spectra[64][65]measured by neutron spin echo spectroscopy.

The importance of domains as structural building blocks and elements of evolution has brought about many automated methods for their identification and classification in proteins of known structure. Automatic procedures for reliable domain assignment is essential for the generation of the domain databases, especially as the number of known protein structures is increasing. Although the boundaries of a domain can be determined by visual inspection, construction of an automated method is not straightforward. Problems occur when faced with domains that are discontinuous or highly associated.[66] The fact that there is no standard definition of what a domain really is has meant that domain assignments have varied enormously, with each researcher using a unique set of criteria.[67]

A structural domain is a compact, globular sub-structure with more interactions within it than with the rest of the protein.[68]Therefore, a structural domain can be determined by two visual characteristics: its compactness and its extent of isolation.[69] Measures of local compactness in proteins have been used in many of the early methods of domain assignment[70][71][72][73] and in several of the more recent methods.[28][74][75][76][77]

One of the first algorithms[70] used a C-C distance map together with a hierarchical clustering routine that considered proteins as several small segments, 10 residues in length. The initial segments were clustered one after another based on inter-segment distances; segments with the shortest distances were clustered and considered as single segments thereafter. The stepwise clustering finally included the full protein. Go[73] also exploited the fact that inter-domain distances are normally larger than intra-domain distances; all possible C-C distances were represented as diagonal plots in which there were distinct patterns for helices, extended strands and combinations of secondary structures.

The method by Sowdhamini and Blundell clusters secondary structures in a protein based on their C-C distances and identifies domains from the pattern intheir dendrograms.[66] As the procedure does not consider the protein as a continuous chain of amino acids there are no problems in treating discontinuous domains. Specific nodes in these dendrograms are identified as tertiary structural clusters of the protein, these include both super-secondary structures and domains. The DOMAK algorithm is used to create the 3Dee domain database.[75] It calculates a 'split value' from the number of each type of contact when the protein is divided arbitrarily into two parts. This split value islarge when the two parts of the structure are distinct.

The method of Wodak and Janin[78] was based on the calculated interface areas between two chain segments repeatedly cleaved at various residue positions. Interface areas were calculated by comparing surface areas of the cleaved segments with that of the native structure. Potential domain boundaries can be identified at a site where the interface area was at a minimum. Other methods have used measures of solvent accessibility to calculate compactness.[28][79][80]

The PUU algorithm[19] incorporates a harmonic model used to approximate inter-domain dynamics. The underlying physical concept is that many rigid interactions will occur within each domain and loose interactions will occur between domains. This algorithm is used to define domains in the FSSP domain database.[74]

Swindells (1995) developed a method, DETECTIVE, for identification of domains in protein structures based on the idea that domains have a hydrophobicinterior. Deficiencies were found to occur when hydrophobic cores from different domains continue through the interface region.

RigidFinder is a novel method for identification of protein rigid blocks (domains and loops) from two different conformations. Rigid blocks are defined as blocks where all inter residue distances are conserved across conformations.

The method RIBFIND developed by Pandurangan and Topf identifies rigid bodies in protein structures by performing spacial clustering of secondary structural elements in proteins.[81] The RIBFIND rigid bodies have been used to flexibly fit protein structures into cryo electron microscopy density maps.[82]

A general method to identify dynamical domains, that is proteinregions that behave approximately as rigid units in the course ofstructural fluctuations, has been introduced by Potestio et al.[62] and, among other applications was also usedto compare the consistency of the dynamics-based domainsubdivisions with standard structure-based ones. The method,termed PiSQRD, is publicly available in the form of a webserver.[83] The latter allows users to optimally subdivide single-chainor multimeric proteins into quasi-rigid domains[62][83] based on the collective modes of fluctuation of the system. By default thelatter are calculated through an elastic network model;[84]alternatively pre-calculated essential dynamical spaces can beuploaded by the user.

A large fraction of domains are of unknown function. Adomain of unknown function(DUF) is aprotein domainthat has no characterized function. These families have been collected together in thePfamdatabase using the prefix DUF followed by a number, with examples beingDUF2992andDUF1220. There are now over 3,000 DUF families within the Pfam database representing over 20% of known families.[86]

This article incorporates text and figures from George, R. A. (2002) "Predicting Structural Domains in Proteins" Thesis, University College London, which were contributed by its author.

See original here:
Protein domain - Wikipedia

The Cyberlaw Podcast: It’s Time to Pay Attention When Attention Stops Paying – Lawfare

Did you ever wonder where all that tech money came from all of a sudden? Turns out, a lot of it comes from online programmatic ads, an industry that gets little attention even from the companies, such as Google, that it made wealthy. That lack of attention is pretty ironic, because lack of attention is whats going to kill the industry, according to Tim Hwang, former Google policy maven and current research fellow at the Center for Security and Emerging Technology (CSET).

In our interview, Tim Hwang explains the remarkably complex industry and the dynamics that are gradually leaching the value out of its value proposition. Tim thinks were in an attention bubble, and the popping will be messy. Im persuaded the bubble is here but not that its end will be disastrous outside of Silicon Valley.

Sultan Meghji and I celebrate what seems like excellent news about a practical artificial intelligence (AI) achievement in predicting protein folding. Its a big deal, and an ideal problem for AI, with one exception. The parts of the problem that AI hasnt solved would be a lot easier for humans to work on if AI could tell us how it solved the parts it did figure out. Explainability, it turns out, is the key to collaborative AI-human work.

We welcome first time participant and long-time listener Jordan Schneider to the panel. Jordan is the host of the unmissable ChinaTalk podcast. Given his expertise, we naturally ask him about Australia. Actually, its natural, because Australia is now the testing ground for many of Chinas efforts to exercise power over independent countries using cyber power along with trade. Among the highlights: Chinese tweets highlighting a report about Australian war crimes followed by ham-handed tweet-boosting bot campaigns. And in a move that ought to be featured in future justifications of the Trump administrations ban on WeChat, the platform refused to carry the Australian prime ministers criticism of the war-crimes tweet.

Sen. Ted Cruz, call your office! And this will have to be Sen. Cruzs fight, because it looks more and more as though the Trump administration has thrown in the towel. Its claim that it is negotiating a TikTok sale after ordering divestment is getting thinner; now the divestment deadline has completely disappeared, as the government simply says that negotiations continue. Nick Weaver is on track to win his bet with me that CFIUS wont make good on its order before the mess is shoveled onto President-elect Joe Bidens plate.

Whoever was in charge of beating up WeChat and TikTok may have left the government early, but the team thats sticking pins in other Chinese companies is still hard at work. Jordan and Brian Egan talk about the addition of SMIC to the amorphous defense blacklist. And Congress has passed a law (awaiting the presidents signature) that will make life hard for Chinese firms listed on U.S. exchanges.

China, meanwhile, isnt taking this lying down, Jordan reports. It is mirror-imaging all the Western laws that it sees as targeting China, including bans on exports of Chinese products and technology. It is racing (on what Jordan thinks is a twenty-year pace) to create its own chip design capabilities. And with some success. Sultan takes some of the hype out of Chinas claims to quantum supremacy. Though even dehyped, Chinas achievement should be making those who rely on RSA-style crypto just a bit nervous (thats all of us, by the way).

Michael Weiner previews the still veiled state antitrust lawsuit against Facebook and promises to come back with details as soon as its filed.

In quick hits, I explain why we havent covered the Iranian claim that their scientist was rubbed out by an Israeli killer robot machine gun: I dont actually believe them. Brian explains that another law aimed at China and its use of Xinjian forced labor is attracting lobbyists but likely to pass. Apple, Nike, and Coca-Cola have all taken hits for lobbying on the bill; none of them say they oppose the bill, but it turns out theres a reason for that. Lobbyists have largely picked the bones clean.

President Trump is leaving office in typical fashiongesturing in the right direction but uninteresting in actually getting there. In a Too Much Too Late negotiating move, the President has threatened to veto the defense authorization act if it doesnt include a repeal of Section 230 of the Communications Decency Act. If hes yearning to wield the veto, the Democrats and GOP alike seem willing to give him the chance. They may even override, or wait until Jan. 20 to pass it again.

Finally, I commend to interested listeners the oral argument in the Supreme Courts Van Buren case, about the Computer Fraud and Abuse Act. The solicitor generals footwork in making up quasi textual limitations on the more sweeping readings of the act is admirable, and it may well be enough to keep van Buren in jail, where he probably belongs for some crime, if not this one.

And more.

Download the 341st Episode (mp3)

You can subscribe to The Cyberlaw Podcast using iTunes, Google Play, Spotify, Pocket Casts, or our RSS feed. As always, The Cyberlaw Podcast is open to feedback. Be sure to engage with @stewartbaker on Twitter. Send your questions, comments, and suggestions for topics or interviewees to CyberlawPodcast@steptoe.com. Remember: If your suggested guest appears on the show, we will send you a highly coveted Cyberlaw Podcast mug!

The views expressed in this podcast are those of the speakers and do not reflect the opinions of their institutions, clients, friends, families, or pets.

Originally posted here:
The Cyberlaw Podcast: It's Time to Pay Attention When Attention Stops Paying - Lawfare

A math problem stumped experts for 50 years. This grad student from Maine solved it in days – The Boston Globe

The problem had to do with proving whether the Conway knot was something called slice, an important concept in knot theory that well get to a little later. Of all the many thousands of knots with 12 or fewer crossings, mathematicians had been able to determine the sliceness of all but one: the Conway knot. For more than 50 years, the knot stubbornly resisted every attempt to untangle its secret, along the way achieving a kind of mythical status. A sculpture of it even adorns a gate at the University of Cambridges Isaac Newton Institute for Mathematical Sciences.

Then, two years ago, a little-known graduate student named Lisa Piccirillo, who grew up in Maine, learned about the knot problem while attending a math conference. A speaker mentioned the Conway knot during a discussion about the challenges of studying knot theory. For example, the speaker said, we still dont know whether this 11-crossing knot is slice.

Thats ridiculous, Piccirillo thought while she listened. This is 2018. We should be able to do that. A week later, she produced a proof that stunned the math world.

__________

Knot theory is a sub-specialty of a field of mathematics known as topology, which is concerned with the study of spaces. Whats it used for? The answer one memorizes is that topology is useful for understanding DNA and protein folding, Piccirillo tells me in May as we sit wearing masks and maintaining a good 10 feet of distance in an outdoor courtyard not far from where she lives in Harvard Square. Apparently these things are very long and they like to stick to themselves, so they get all knotted up.

When topologists think of knots, however, they dont imagine a length of rope with a gnarled twist in the middle. To them, a knot is more like an extension cord in which the two ends have been plugged together and the whole thing has been tossed onto the floor in a mess of crisscrosses. Its essentially a closed loop with various places where the loop crosses over itself.

Now lets take one of these knots and think for a moment about the space in which it exists. That space has a fourth dimension, such as time, and to a topologist, our knot is a kind of sphere that sits within it. Topologists see spheres everywhere, but in a specialized way: A circle is a one-dimensional sphere, while the skin surrounding an orange is a two-dimensional sphere. And here is where minds tend to get blown: If we were to take that whole orange and glue it to another one, topologists would see the resulting object as a three-dimensional sphere, one that could be viewed as the skin of a four-dimensional orange. Dont worry if you are unable to conjure such a higher-dimension image for yourself. There are only a couple hundred specialists doing this work in the world, and not even all of them can.

Piccirillo, who graduated from Boston College in 2013, was already well on her way to joining the ranks of those specialists when, in the summer of 2018, the speaker at the math conference said something that would change the trajectory of her career.

The speaker showed a slide depicting the Conway knot and explained that mathematicians had long suspected that the knot was not, in fact, slice, but no one had been able to prove it. So what does it mean for a knot to be slice? Lets return for a moment to that four-dimensional orange. Inside of it there are disks think of them as the surface of a plate. If a three-dimensional knot, like Conways, can bound such a disk, then the knot is slice. If it cannot, then it is not slice.

Topologists use mathematical tools called invariants to try to determine sliceness, but for half a century, those tools had been unable to help them prove the prevailing belief that the Conway knot wasnt slice. Sitting in that lecture hall two years ago, however, Piccirillo sensed right away that the techniques she was using in a different area of topology might help these invariants better apply to the Conway knot problem. I immediately knew that some work that I was doing for totally other reasons could at least try to answer this question, she says. She started on the problem the very next day.

__________

Piccirillo, who is 29, grew up in Greenwood, Maine, a town with a population of less than 900. She was an excellent student and her mom taught middle school math, but there was little in her interests to suggest that she would become a world-class mathematician.

I was an overachiever, she says. I rode dressage. I was very active in the youth group at my church. I did drama. I was in band. I did everything. Which is another way of saying that she wasnt one of those math prodigies whos programming computers and building algorithms at age 4.

When Piccirillo arrived on campus for her first year at Boston College in 2009, she was as interested in theater and other subjects as she was math. During a calculus class that year, though, she made a connection with professor J. Elisenda Grigsby. (Disclosure: I am the editor of Boston Colleges alumni magazine.)

Piccirillo stood out, even if she lacked a certain polish, Grigsby recalls. Golden-child mathematicians usually went to math camp when they were in high school and had been groomed from a young age, she says. That wasnt Piccirillos background, but I felt a kinship to her.

She really encouraged me, Piccirillo says of Grigsby. Eli really pushed me into trying another math class, and then liking the next class. I had already started on a progression. By her senior year, she was taking graduate-level topology courses. After graduating in 2013, she chose to pursue her doctorate at the University of Texas because of the universitys excellent topology program and its reputation as a great place for female math students. In 2014, just 28.9 percent of math and science doctorates were awarded to women, according to the National Science Foundation, but at Texas, something like 40 percent of graduate math students were women.

By and large, Piccirillo has felt welcomed and encouraged as a female mathematician. But now and again, things happen, she tells me. For example, in grad school, I would receive notes in my department mailbox commenting on my appearance.

Overall, Piccirillo excelled during her six years at the University of Texas, finding both strong mentorship and a supportive research community. The time coincided with her deepening connection to the math itself. She loved to turn problems over in her mind, thinking about how one higher-dimension shape might be manipulated to resemble an entirely different one. It was thrilling, creative work, as much about aesthetic as arriving at a particular result. When you perform a calculation, sometimes theres really clever tricks you can use or some ways that you can be an actual human and not a computer in the performing of the calculation, Piccirillo says. But when you make a logical argument thats entirely yours.

Outside of her studies, Piccirillo liked to make beautiful things. She carved wooden spoons for a while, as well as large-scale woodcut prints of fish and vegetables. She and her roommate, Wiley Jennings, built a dining room table together. For a while, she was obsessed with buying and repairing 70s Japanese motorcycles.

She has a very, very strong sense of aesthetic, says James Farre, a friend of Piccirillos from the University of Texas who specializes in geometry and is a postdoc at Yale. At Piccirillos level, math that people like is often thought of and talked about as beautiful or deep.

The day after hearing about the Conway knot problem, Piccirillo, then 27, sat down at her desk and began looking for a solution. Because much of her graduate work involved building pairs of knots that were different but shared some 4-D properties, she already knew that any two knots that share the same 4-D space also share sliceness theyre either both slice or both not slice. Since her goal was to prove that the Conway knot wasnt slice, her first step was come up with an entirely different knot with the same four-dimensional space, she explains. Then Ill try to show that the other knot isnt slice.

She spent spare time over the next several days hand-sketching and manipulating configurations of the 4-D space occupied by the Conway knot. I didnt allow myself to work on it during the day, she told Quanta Magazine earlier this year, because I didnt consider it to be real math. I thought it was, like, my homework.

The next step was to try to prove that the knot she drew was not slice. There are lots of tools already in the literature for doing that, she says. She would feed the knot iterations into a computer, and based on the data of the knot, maybe based on how its crossings look or other data that you can pull from the knot, the algorithm spits out an integer. In less than a week, Piccirillo had created a knot that hit the sweet spot: It had the same 4-D properties as the Conway knot, and it was found by the algorithm to be not slice.

She had suddenly succeeded where countless mathematicians had failed for five decades. She had solved the Conway knot problem.

__________

Not long after the breakthrough, Piccirillo attended a meeting with the Cameron Gordon, a University of Texas math professor. When she mentioned her solution, Gordon was skeptical. He asked Piccirillo to walk him through the steps. Then he made me write it down, like all up on the board, she recalls, and then he got very excited and started yelling.

Piccirillo submitted her solution to the Annals of Mathematics, and the prestigious math journal agreed to publish her paper. When I asked James Farre, the Yale postdoc, to explain the significance of having a paper published in the Annals he laughed for several seconds. Its head and shoulders the most important and influential journal in mathematics, he says. Thats why Im laughing. Its amazing and its so cool!

By the time Piccirillos paper appeared in the journal about a year later, word of her solution had already spread throughout the math world. After graduating from UT in 2019, Piccirillo started her postdoctoral work at Brandeis. The last time I saw her was in January, says Wiley Jennings, her roommate in Austin, who recently completed a doctorate at Stanford. She was out at a faculty visit here at Stanford. To be invited, as someone who has done one year or less [of postdoc study] just finished their PhD essentially I mean, thats insane. Its unheard of . . . I think thats when I first got a hint that like, Oh my gosh, shes really a hotshot.

Postdoc positions typically run for three or four years, but Piccirillo found herself in high demand. In July, she started a new tenure-track position as an assistant professor at MIT. Its been a whirlwind, and I wondered how her life has changed. The practical answer is not too much, she says. She still teaches undergrads and conducts her research. She acknowledges, though, that there sometimes is a feeling of pressure, based on what shes already accomplished. In practice, math for everyone is about trying to prove simple statements and failing, basically all of the time. So, she says, Im having to relearn how to be OK with the fact that most of the time Im failing to prove really simple stuff when Im feeling the weight of these expectations.

When I ask her about her goals, Piccirillo says one of her priorities is to help grow and broaden the mathematics community. There certainly are many young women, people of color, non-heterosexual, or non-gender binary people who feel put at an arms length by the institution of mathematics, she says. Its really important to me to help mitigate that in any small ways I can. One important way to do that, she continues, is to help shatter the myth of the math prodigy.

When universities organize math conferences, she says, they should avoid inviting speakers who give talks where they go really fast and they try to show you how smart they are and how hard their research is. Thats not good for anyone, but its especially not good for young people or people who are feeling maybe like they dont belong here. What those people in the audience dont know, she says, is that nobody else really understands it either.

You dont have to be really smart whatever that means to be a successful mathematician, Piccirillo says. Theres this idea that mathematicians are geniuses. A lot of them seem to be child prodigies that do these Olympiads. In fact, you dont have to come from that background at all to be very good at math and most mathematicians, including many of the really great ones, dont come from that sort of background.

And as Piccirillo herself proves, some of them even go on to produce work that alters the course of mathematics.

__________

John Wolfson is the editor of Boston College Magazine. Follow him on Twitter @johnwolfson and send comments to magazine@globe.com.

Link:
A math problem stumped experts for 50 years. This grad student from Maine solved it in days - The Boston Globe

Scientists discover protein linked to depression and brain disorders – The Irish Times

Earlier diagnosis and better treatments for people with depression and certain brain disorders may be possible following a research breakthrough involving Belfast-based scientists.

They have found how a specific protein plays a crucial role in the generation of neurons the nerve cells that relay electrical signals it the brain. This was made possible by focusing on a specific time and location during brain development, and how its disruption can lead to intellectual disability and depression in adults.

A research team led by Queens University Belfast (QUB) in collaboration with the Centre for Regenerative Therapies at Dresden University in Germany have published their findings in the journal Genes & Development.

It is expected this breakthrough will have a major impact on our fundamental understanding of brain development and lead to earlier diagnosis and better treatments for people with certain brain disorders, said Dr Vijay Tiwari, who is based at the Wellcome-Wolfson Institute for Experimental Medicine at QUB.

Our study reveals the key role this protein plays during the birth of probably one of the most important cells in our body the neuron.

Brain development is a highly complex process that involves generating various types of cells at defined time points and locations during embryonic development, he explained. Any kind of interference during these processes is known to cause diseases including a range of intellectual disabilities.

Among these brain cell types, neurons are the working unit of the brain, designed to transmit information to other nerve cells and various tissues in the body, such as the muscles as well as storage of memory in our brain, he added.

While the field has rapidly advanced, the mechanisms creating the birth of neurons from their mother cells, called neural stem cells, in time and space during development has not been well understood until now.

To conduct their study, the researchers looked at brain samples to closely determine the development of various cell types within the brain.

The study showed how the presence of a specific protein (called Phf21b), within a defined time window of brain development and in a specific location in the brain, signals the birth of neurons from neural stem cells in the right place and at the right time, said Dr Tiwari, who is a molecular biologist working in neuroscience.

The researchers found that removal of Phf21b stopped production of neurons from neural stem cells and led to severe defects in brain development. They also found the importance of this protein, in particular in the folding of DNA in cells going on to form neurons.

Understanding how a cell type in the brain is born at a specific point and in a specific place during development is crucial in our understanding how neurological issues arise later in life. We hope this discovery will pave the way for earlier diagnosis, earlier interventions and better treatment for people with a brain disorder, such as depression, he said.

Their research suggested screening for certain genetic variants would enable earlier diagnosis, in contrast to a scenario where depression in adults is not usually detected until a person is seriously depressed.

Here is the original post:
Scientists discover protein linked to depression and brain disorders - The Irish Times

ProMIS Neurosciences adds Dr. David Wishart to its Scientific Advisory Board – GlobeNewswire

TORONTO and CAMBRIDGE, Mass., Oct. 29, 2020 (GLOBE NEWSWIRE) -- ProMIS Neurosciences, Inc. (TSX: PMN) (OTCQB: ARFXF), a biotechnology company focused on the discovery and development of antibody therapeutics targeting toxic oligomers implicated in the development of neurodegenerative diseases, welcomes Dr. David Wishart, Distinguished University Professor in the Departments of Biological Sciences and Computing Science at the University of Alberta, to its Scientific Advisory Board (SAB). Identified as one of the worlds most highly cited scientists for each of the past 7 years, Dr. Wishart brings more than three decades in protein folding and misfolding research to ProMIS, creating industry-leading depth in this area of therapeutic development for neurodegenerative and other diseases.

The commitment and talent of our advisory board has been instrumental to the ongoing development of our broad portfolio of highly specific therapeutic, vaccine and diagnostic candidates, said Eugene Williams, Executive Chairman of ProMIS Neurosciences. Dr. Wisharts world-recognized expertise in protein folding and misfolding combined with Dr. Neil Cashmans complementary leadership will place ProMIS among the most accomplished within this arena. Their combined expertise will advance our platforms application to an even broader scope of diseases caused by protein misfolding.

Dr. Wishart will play a pivotal role in advising ProMIS on the application and further development of its drug discovery and development platform, which is uniquely capable of identifying the sequence and shape (conformation) of novel binding targetscalled peptide antigenson misfolded proteins implicated in the development of neurodegenerative diseases such as Alzheimers, Parkinsons and ALS. ProMIS has leveraged its novel platform to create a portfolio of antibody, intrabody and vaccine candidates that are highly selective for the misfolded protein aggregates driving pathogenesis. With Dr. Wisharts support, ProMIS will continue to expand the application of its platform to the biology of additional misfolded protein diseases.

Never before has there been a more urgent need for therapy, diagnostic and vaccine candidates that are highly specific for their intended target, said Dr. Wishart. I look forward to working with Dr. Neil Cashman and his team and such an accomplished SAB as we continue to seek new opportunities to apply ProMIS unique platform technology to misfolded protein diseases with high unmet need.

ProMIS SAB includes distinguished, highly published and cited contributors to the current scientific understanding of Alzheimers, Parkinsons, ALS, protein misfolding diseases in general, vaccines and diagnostics. Dr. Wishart joins the following current members:

About Dr. David WishartDr. Wishart has been studying protein folding and misfolding for more than 30 years using a combination of computational and experimental approaches. These experimental approaches include NMR spectroscopy, circular dichroism, fluorescence spectroscopy, electron microscopy, protein engineering and molecular biology. The computational methods include molecular dynamics, agent-based modeling, bioinformatics and machine learning. Over the course of his career, Dr. Wishart has published more than 430 scientific papers, cited more than 78,000 times, covering many areas of protein science including structural biology, protein metabolism and computational biochemistry. He has been with the University of Alberta since 1995 and is currently a Distinguished University Professor in the Departments of Biological Sciences and Computing Science. He also holds adjunct appointments with the Faculty of Pharmaceutical Sciences and the Department of Pathology and Laboratory Medicine.

Dr. Wishart has been awarded research grants totaling more than $130 million from a number of funding agencies. He has also led or directed a number of core facilities and centers and currently co-directs The Metabolomics Innovation Centre (TMIC), Canadas national metabolomics laboratory. Dr. Wishart held the Bristol-Myers Squibb Research Chair in Pharmaceutical Sciences from 1995-2005, received the Astra-Zeneca-CFPS Young Investigator Prize in 2001, was awarded a Lifetime Honorary Fellowship by the Metabolomics Society in 2014 and elected as a Fellow of the Royal Society of Canada in 2017.

About ProMIS NeurosciencesProMIS Neurosciences, Inc. is a development stage biotechnology company whose unique core technology is the ability to rationally predict the site and shape (conformation) of novel targets known as Disease Specific Epitopes (DSEs) on the molecular surface of proteins. In neurodegenerative diseases, such as Alzheimers, ALS and Parkinsons disease, the DSEs are misfolded regions on toxic forms of otherwise normal proteins. In the infectious disease setting, these DSEs represent peptide antigens that can be used as an essential component to create accurate and sensitive serological assays to detect the presence of antibodies that arise in response to a specific infection, such as COVID-19. ProMIS proprietary peptide antigens can also be used to create potential therapeutic antibodies, as well as serve as the basis for development of vaccines. ProMIS is headquartered in Toronto, Ontario, with offices in Cambridge, Massachusetts. ProMIS is listed on the Toronto Stock Exchange under the symbol PMN, and on the OTCQB Venture Market under the symbol ARFXF.Visit us atwww.promisneurosciences.com, follow us onTwitterandLinkedIn. To learn more about protein misfolding diseases, listen to Episodes 11, 24, of Saving Minds, a podcast available atiTunesorSpotify.

For media inquiries, please contact:Shanti Skiffingtonshanti.skiffington@gmail.comTel. 617 921-0808

The TSX has not reviewed and does not accept responsibility for the adequacy or accuracy of this release. This information release contains certain forward-looking information. Such information involves known and unknown risks, uncertainties and other factors that may cause actual results, performance or achievements to be materially different from those implied by statements herein, and therefore these statements should not be read as guarantees of future performance or results. All forward-looking statements are based on the Companys current beliefs as well as assumptions made by and information currently available to it as well as other factors. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of the date of this press release. Due to risks and uncertainties, including the risks and uncertainties identified by the Company in its public securities filings, actual events may differ materially from current expectations. The Company disclaims any intention or obligation to update or revise any forward-looking statements, whether as a result of new information, future events or otherwise.

More:
ProMIS Neurosciences adds Dr. David Wishart to its Scientific Advisory Board - GlobeNewswire

Silent Mutations Identified That Give the COVID-19 Coronavirus an Evolutionary Edge – SciTechDaily

RNA folding may help explain how the coronavirus became so hard to stop after it spilled over from wildlife to humans.

We know that the coronavirus behind the COVID-19 crisis lived harmlessly in bats and other wildlife before it jumped the species barrier and spilled over to humans.

Now, researchers at Duke University have identified a number of silent mutations in the roughly 30,000 letters of the viruss genetic code that helped it thrive once it made the leap and possibly helped set the stage for the global pandemic. The subtle changes involved how the virus folded its RNA molecules within human cells.

For the study, published October 16, 2020, in the journal PeerJ, the researchers used statistical methods they developed to identify adaptive changes that arose in the SARS-CoV-2 genome in humans, but not in closely related coronaviruses found in bats and pangolins.

Were trying to figure out what made this virus so unique, said lead author Alejandro Berrio, a postdoctoral associate in biologist Greg Wrays lab at Duke.

Previous research detected fingerprints of positive selection within a gene that encodes the spike proteins studding the coronaviruss surface, which play a key role in its ability to infect new cells.

The new study likewise flagged mutations that altered the spike proteins, suggesting that viral strains carrying these mutations were more likely to thrive. But with their approach, study authors Berrio, Wray and Duke Ph.D. student Valerie Gartner also identified additional culprits that previous studies failed to detect.

The researchers report that so-called silent mutations in two other regions of the SARS-CoV-2 genome, dubbed Nsp4 and Nsp16, appear to have given the virus a biological edge over previous strains without altering the proteins they encode.

Instead of affecting proteins, Berrio said, the changes likely affected how the viruss genetic material which is made of RNA folds up into 3-D shapes and functions inside human cells.

What these changes in RNA structure might have done to set the SARS-CoV-2 virus in humans apart from other coronaviruses is still unknown, Berrio said. But they may have contributed to the viruss ability to spread before people even know they have it a crucial difference that made the current situation so much more difficult to control than the SARS coronavirus outbreak of 2003.

The research could lead to new molecular targets for treating or preventing COVID-19, Berrio said.

Nsp4 and Nsp16 are among the first RNA molecules that are produced when the virus infects a new person, Berrio said. The spike protein doesnt get expressed until later. So they could make a better therapeutic target because they appear earlier in the viral life cycle.

More generally, by pinpointing the genetic changes that enabled the new coronavirus to thrive in human hosts, scientists hope to better predict future zoonotic disease outbreaks before they happen.

Viruses are constantly mutating and evolving, Berrio said. So its possible that a new strain of coronavirus capable of infecting other animals may come along that also has the potential to spread to people, like SARS-CoV-2 did. Well need to be able to recognize it and make efforts to contain it early.

Reference: Positive selection within the genomes of SARS-CoV-2 and other Coronaviruses independent of impact on protein function by Alejandro Berrio1, Valerie Gartner and Gregory A. Wray, 16 October 2020, PeerJ.DOI: 10.7717/peerj.10234

Read the original post:
Silent Mutations Identified That Give the COVID-19 Coronavirus an Evolutionary Edge - SciTechDaily

Scientists discover new organic compounds that could have helped form the first cells – Science Codex

Chemists studying how life started often focus on how modern biopolymers like peptides and nucleic acids contributed, but modern biopolymers don't form easily without help from living organisms. A possible solution to this paradox is that life started using different components, and many non-biological chemicals were likely abundant in the environment. A new survey conducted by an international team of chemists from the Earth-Life Science Institute (ELSI) at Tokyo Institute of Technology and other institutes from Malaysia, the Czech Republic, the US and India, has found that a diverse set of such compounds easily form polymers under primitive environmental conditions, and some even spontaneously form cell-like structures.

Understanding how life started on Earth is one of the most challenging questions modern science attempts to explain. Scientists presently study modern organisms and try to see what aspects of their biochemistry are universal, and thus were probably present in the organisms from which they descended. The best guess is that life has thrived on Earth for at least 3.5 billion of Earth's 4.5 billion year history since the planet formed, and most scientists would say life likely began before there is good evidence for its existence. Problematically, since Earth's surface is dynamic, the earliest traces of life on Earth have not been preserved in the geological record. However, the earliest evidence for life on Earth tells us little about what the earliest organisms were made of, or what was going on inside their cells. "There is clearly a lot left to learn from prebiotic chemistry about how life may have arisen," says the study's co-author Jim Cleaves.

A hallmark of life is evolution, and the mechanisms of evolution suggest that common traits can suddenly be displaced by rare and novel mutations which allow mutant organisms to survive better and proliferate, often replacing previously common organisms very rapidly. Paleontological, ecological and laboratory evidence suggests this occurs commonly and quickly. One example is an invasive organism like the dandelion, which was introduced to the Americas from Europe and is now a commo weed causing lawn-concerned homeowners to spend countless hours of effort and dollars to eradicate. Another less whimsical example is COVID-19, a virus (technically not living, but technically an organism) which was probably confined to a small population of bats for years, but suddenly spread among humans around the world. Organisms which reproduce faster than their competitors, even only slightly faster, quickly send their competitors to what Leon Trotsky termed the "ash heap of history." As most organisms which have ever existed are extinct, co-author Tony Z. Jia suggests that "to understand how modern biology emerged, it is important to study plausible non-biological chemistries or structures not currently present in modern biology which potentially went extinct as life complexified."

This idea of evolutionary replacement is pushed to an extreme when scientists try to understand the origins of life. All modern organisms have a few core commonalities: all life is cellular, life uses DNA as an information storage molecule, and uses DNA to make ribonucleic RNA as an intermediary way to make proteins. Proteins perform most of the catalysis in modern biochemistry, and they are created using a very nearly universal "code" to make them from RNA. How this code came to be is in itself enigmatic, but these deep questions point to their possibly having been a very murky period in early biological evolution ~ 4 billion years ago during which almost none of the molecular features observed in modern biochemistry were present, and few if any of the ones that were present have been carried forward.

Proteins are linear polymers of amino acids. These floppy strings of polymerised amino acids fold into unique three-dimensional shapes, forming extremely efficient catalysts which foster precise chemical reactions. In principle, many types of polymerised molecules could form similar strings and fold to form similar catalytic shapes, and synthetic chemists have already discovered many examples. "The point of this kind of study is finding functional polymers in plausibly prebiotic systems without the assistance of biology, including grad students," says co-author Irena Mamajanov.

Scientists have found many ways to make biological organic compounds without the intervention of biology, and these mechanisms help explain these compounds' presence in samples like carbonaceous meteorites, which are relics of the early solar system, and which scientists don't think ever hosted life. These primordial meteorite samples also contain many other types of molecules which could have formed complex folded polymers like proteins, which could have helped steer primitive chemistry. Proteins, by virtue of their folding and catalysis mediate much of the complex biochemical evolution observed in living systems. The ELSI team reasoned that alternative polymers could have helped this occur before the coding between DNA and protein evolved. "Perhaps we cannot reverse-engineer the origin of life; it may be more productive to try and build it from scratch, and not necessarily using modern biomolecules. There were large reservoirs of non-biological chemicals that existed on the primeval Earth. How they helped in the formation of life-as-we-know-it is what we are interested in," says co-author Kuhan Chandru.

The ELSI team did something simple yet profound: they took a large set of structurally diverse small organic molecules which could plausibly be made by prebiotic processes and tried to see if they could form polymers when evaporated from dilute solution. To their surprise, they found many of the primitive compounds could, though they also found some of them decomposed rapidly. This simple criterion, whether a compound is able to be dried without decomposing, may have been one of the earliest evolutionary selection pressures for primordial molecules.

The team conducted one further simple test. They took these dried reactions, added water and looked at them under a microscope. To their surprise, some of the products of these reaction formed cell-sized compartments. That simple starting materials containing 10 to 20 atoms can be converted to self-organised cell-like aggregates containing millions of atoms provides startling insight into how simple chemistry may have led to complex chemistry bordering on the kind of complexity associated with living systems, while not using modern biochemicals.

"We didn't test every possible compound, but we tested a lot of possible compounds. The diversity of chemical behaviors we found was surprising, and suggests this kind of small-molecule to functional-aggregate behavior is a common feature of organic chemistry, which may make the origin of life a more common phenomenon than previously thought," concludes co-author Niraja Bapat.

See the article here:
Scientists discover new organic compounds that could have helped form the first cells - Science Codex

Breakout Paper in Journal of Theoretical Biology Explicitly Supports Intelligent Design – Discovery Institute

Photo: Red poppy, Auckland Botanic Gardens, Auckland, New Zealand, by Sandy Millar via Unsplash.

As John West noted here last week, the Journal of Theoretical Biology has published an explicitly pro-intelligent design article, Using statistical methods to model the fine-tuning of molecular machines and systems. Lets take a closer look at the contents. The paper is math-heavy, discussing statistical models of making inferences, but it is also groundbreaking for this crucial reason: it considers and proposes intelligent design, by name, as a viable explanation for the origin of fine-tuning in biology. This is a major breakthrough for science, but also for freedom of speech. If the paper is any indication, appearing as it does in a prominent peer-reviewed journal, some of the suffocating constraints on ID advocacy may be coming off.

The authors are Steinar Thorvaldsen, a professor of information science at the University of Troms in Norway, and Ola Hssjer, a professor of mathematical statistics at Stockholm University. The paper, which is open access, begins by noting that while fine-tuning is widely discussed in physics, it needs to be considered more in the context of biology:

Fine-tuning has received much attention in physics, and it states that the fundamental constants of physics are finely tuned to precise values for a rich chemistry and life permittance. It has not yet been applied in a broad manner to molecular biology.

The authors explain the papers main thrust:

However, in this paper we argue that biological systems present fine-tuning at different levels, e.g. functional proteins, complex biochemical machines in living cells, and cellular networks. This paper describes molecular fine-tuning, how it can be used in biology, and how it challenges conventional Darwinian thinking. We also discuss the statistical methods underpinning finetuning and present a framework for such analysis.

They explain how fine-tuning is defined. The definition is essentially equivalent to specified complexity:

We define fine-tuning as an object with two properties: it must a) be unlikely to have occurred by chance, under the relevant probability distribution (i.e. complex), and b) conform to an independent or detached specification (i.e. specific).

They then introduce the concept of design, and explain how humans are innately able to recognize it:

A design is a specification or plan for the construction of an object or system, or the result of that specification or plan in the form of a product. The very term design is from the Medieval Latin word designare (denoting mark out, point out, choose); from de (out) and signum (identifying mark, sign). Hence, a public notice that advertises something or gives information. The design usually has to satisfy certain goals and constraints. It is also expected to interact with a certain environment, and thus be realized in the physical world. Humans have a powerful intuitive understanding of design that precedes modern science. Our common intuitions invariably begin with recognizing a pattern as a mark of design. The problem has been that our intuitions about design have been unrefined and pre-theoretical. For this reason, it is relevant to ask ourselves whether it is possible to turn the tables on this disparity and place those rough and pre-theoretical intuitions on a firm scientific foundation.

That last sentence is key: the purpose is to understand if there is a scientific method by which design can be inferred. They propose that design can be identified by uncovering fine-tuning. The paper explicates statistical methods for understanding fine-tuning, which they argue reflects design:

Fine-tuning and design are related entities. Fine-tuning is a bottom-up method, while design is more like a top-down approach. Hence, we focus on the topic of fine-tuning in the present paper and address the following questions: Is it possible to recognize fine-tuning in biological systems at the levels of functional proteins, protein groups and cellular networks? Can fine-tuning in molecular biology be formulated using state of the art statistical methods, or are the arguments just in the eyes of the beholder?

They cite the work of multiple leading theorists in the ID research community.

They return to physics and the anthropic principle, the idea that the laws of nature are precisely suited for life:

Suppose the laws of physics had been a bit different from what they actually are, what would the consequences be? (Davies, 2006). The chances that the universe should be life permitting are so infinitesimal as to be incomprehensible and incalculable. The finely tuned universe is like a panel that controls the parameters of the universe with about 100 knobs that can be set to certain values. If you turn any knob just a little to the right or to the left, the result is either a universe that is inhospitable to life or no universe at all. If the Big Bang had been just slightly stronger or weaker, matter would not have condensed, and life never would have existed. The odds against our universe developing were enormous and yet here we are, a point that equates with religious implications

However, rather than getting into religion, they apply statistics to consider the possibility of design as an explanation for the fine-tuning of the universe. They cite ID theorist William Dembski:

William Dembski regards the fine-tuning argument as suggestive, as pointers to underlying design. We may describe this inference as abductive reasoning or inference to the best explanation. This reasoning yields a plausible conclusion that is relatively likely to be true, compared to competing hypotheses, given our background knowledge. In the case of fine-tuning of our cosmos, design is considered to be a better explanation than a set of multi-universes that lacks any empirical or historical evidence.

The article offers additional reasons why the multiverse is an unsatisfying explanation for fine-tuning namely that multiverse hypotheses do not predict fine-tuning for this particular universe any better than a single universe hypothesis and we should prefer those theories which best predict (for this or any universe) the phenomena we observe in our universe.

The paper reviews the lines of evidence for fine-tuning in biology, including information, irreducible complexity, protein evolution, and the waiting-timeproblem. Along the way it considers the arguments of many ID theorists, starting with a short review showing how the literature uses words such as sequence code, information, and machine to describe lifes complexity:

One of the surprising discoveries of modern biology has been that the cell operates in a manner similar to modern technology, while biological information is organized in a manner similar to plain text. Words and terms like sequence code, and information, and machine have proven very useful in describing and understanding molecular biology (Wills, 2016). The basic building blocks of life are proteins, long chain-like molecules consisting of varied combinations of 20 different amino acids. Complex biochemical machines are usually composed of many proteins, each folded together and configured in a unique 3D structure dependent upon the exact sequence of the amino acids within the chain. Proteins employ a wide variety of folds to perform their biological function, and each protein has a highly specified shape with some minor variations.

The paper cites and reviews the work of Michael Behe, Douglas Axe, Stephen Meyer, and Gnter Bechly. Some of these discussions are quite long and extensive. First, the article contains a lucid explanation of irreducible complexity and the work of Michael Behe:

Michael Behe and others presented ideas of design in molecular biology, and published evidence of irreducibly complex biochemical machines in living cells. In his argument, some parts of the complex systems found in biology are exceedingly important and do affect the overall function of their mechanism. The fine-tuning can be outlined through the vital and interacting parts of living organisms. In Darwins Black Box (Behe, 1996), Behe exemplified systems, like the flagellum bacteria use to swim and the blood-clotting cascade, that he called irreducibly complex, configured as a remarkable teamwork of several (often dozen or more) interacting proteins. Is it possible on an incremental model that such a system could evolve for something that does not yet exist? Many biological systems do not appear to have a functional viable predecessor from which they could have evolved stepwise, and the occurrence in one leap by chance is extremely small. To rephrase the first man on the moon: Thats no small steps of proteins, no giant leap for biology.

[]

A Behe-system of irreducible complexity was mentioned in Section 3. It is composed of several well-matched, interacting modules that contribute to the basic function, wherein the removal of any one of the modules causes the system to effectively cease functioning. Behe does not ignore the role of the laws of nature. Biology allows for changes and evolutionary modifications. Evolution is there, irreducible design is there, and they are both observed. The laws of nature can organize matter and force it to change. Behes point is that there are some irreducibly complex systems that cannot be produced by the laws of nature:

If a biological structure can be explained in terms of those natural laws [reproduction, mutation and natural selection] then we cannot conclude that it was designed. . . however, I have shown why many biochemical systems cannot be built up by natural selection working on mutations: no direct, gradual route exist to these irreducible complex systems, and the laws of chemistry work strongly against the undirected development of the biochemical systems that make molecules such as AMP1 (Behe, 1996, p. 203).

Then, even if the natural laws work against the development of these irreducible complexities, they still exist. The strong synergy within the protein complex makes it irreducible to an incremental process. They are rather to be acknowledged as finetuned initial conditions of the constituting protein sequences. These structures are biological examples of nano-engineering that surpass anything human engineers have created. Such systems pose a serious challenge to a Darwinian account of evolution, since irreducibly complex systems have no direct series of selectable intermediates, and in addition, as we saw in Section 4.1, each module (protein) is of low probability by itself.

The article also reviews the peer-reviewed research of protein scientist Douglas Axe, as well as his 2016 book Undeniable, on the evolvability of protein folds:

An important goal is to obtain an estimate of the overall prevalence of sequences adopting functional protein folds, i.e. the right folded structure, with the correct dynamics and a precise active site for its specific function. Douglas Axe worked on this question at the Medical Research Council Centre in Cambridge. The experiments he performed showed a prevalence between 1 in 1050 to 1 in 1074 of protein sequences forming a working domain-sized fold of 150 amino acids (Axe, 2004). Hence, functional proteins require highly organised sequences, as illustrated in Fig. 2. Though proteins tolerate a range of possible amino acids at some positions in the sequence, a random process producing amino-acid chains of this length would stumble onto a functional protein only about one in every 1050 to 1074 attempts due to genetic variation. This empirical result is quite analog to the inference from fine-tuned physics.

[]

The search space turns out to be too impossibly vast for blind selection to have even a slight chance of success. The contrasting view is innovations based on ingenuity, cleverness and intelligence. An element of this is what Axe calls functional coherence, which always involves hierarchical planning, hence is a product of finetuning. He concludes: Functional coherence makes accidental invention fantastically improbable and therefore physically impossible (Axe, 2016, p. 160).

They conclude that the literature shows the probability of finding a functional protein in sequence space can vary broadly, but commonly remains far beyond the reach of Darwinian processes (Axe, 2010a).

Citing the work of Gnter Bechly and Stephen Meyer, the paper also reviews the question of whether sufficient time is allowed by the fossil record for complex systems to arise via Darwinian mechanisms. This is known as the waiting-time problem:

Achieving fine-tuning in a conventional Darwinian model: The waiting time problem

In this section we will elaborate further on the connection between the probability of an event and the time available for that event to happen. In the context of living systems, we need to ask the question whether conventional Darwinian mechanisms have the ability to achieve fine-tuning during a prescribed period of time. This is of interest in order to correctly interpret the fossil record, which is often interpreted as having long periods of stasis interrupted by very sudden abrupt changes (Bechly and Meyer, 2017). Examples of such sudden changes include the origin of photosynthesis, the Cambrian explosions, the evolution of complex eyes and the evolution of animal flight. The accompanying genetic changes are believed to have happen very rapidly, at least on a macroevolutionary timescale, during a time period of length t. In order to test whether this is possible, a mathematical model is needed in order to estimate the prevalence P(A) of the event A that the required genetic changes in a species take place within a time window of length t.

Throughout the discussions are multiple citations of BIO-Complexity, a journal dedicated to investigating the scientific evidence for intelligent design.

Lastly, the authors consider intelligent design as a possible explanation of biological fine-tuning, citing heavily the work of William Dembski, Winston Ewert, Robert J. Marks, and other ID theorists:

Intelligent Design (ID) has gained a lot of interest and attention in recent years, mainly in USA, by creating public attention as well as triggering vivid discussions in the scientific and public world. ID aims to adhere to the same standards of rational investigation as other scientific and philosophical enterprises, and it is subject to the same methods of evaluation and critique. ID has been criticized, both for its underlying logic and for its various formulations (Olofsson, 2008; Sarkar, 2011).

William Dembski originally proposed what he called an explanatory filter for distinguishing between events due to chance, lawful regularity or design (Dembski, 1998). Viewed on a sufficiently abstract level, its logics is based on well-established principles and techniques from the theory of statistical hypothesis testing. However, it is hard to apply to many interesting biological applications or contexts, because a huge number of potential but unknown scenarios may exist, which makes it difficult to phrase a null hypothesis for a statistical test (Wilkins and Elsberry, 2001; Olofsson, 2008).

The re-formulated version of a complexity measure published by Dembski and his coworkers is named Algorithmic Specified Complexity (ASC) (Ewert et al., 2013; 2014). ACS incorporates both Shannon and Kolmogorov complexity measures, and it quantifies the degree to which an event is improbable and follows a pattern. Kolmogorov complexity is related to compression of data (and hence patterns), but suffers from the property of being unknowable as there is no general method to compute it. However, it is possible to give upper bounds for the Kolmogorov complexity, and consequently ASC can be bounded without being computed exactly. ASC is based on context and is measured in bits. The same authors have applied this method to natural language, random noise, folding of proteins, images etc (Marks et al., 2017).

[]

The laws, constants, and primordial initial conditions of nature present the flow of nature. These purely natural objects discovered in recent years show the appearance of being deliberately fine-tuned. Functional proteins, molecular machines and cellular networks are both unlikely when viewed as outcomes of a stochastic model, with a relevant probability distribution (having a small P(A)), and at the same time they conform to an independent or detached specification (the set A being defined in terms of specificity). These results are important and deduced from central phenomena of basic science. In both physics and molecular biology, fine-tuning emerges as a uniting principle and synthesis an interesting observation by itself.

In this paper we have argued that a statistical analysis of fine-tuning is a useful and consistent approach to model some of the categories of design: irreducible complexity (Michael Behe), and specified complexity (William Dembski). As mentioned in Section 1, this approach requires a) that a probability distribution for the set of possible outcomes is introduced, and b) that a set A of fine-tuned events or more generally a specificity function f is defined. Here b) requires some apriori understanding of what fine-tuning means, for each type of application, whereas a) requires a naturalistic model for how the observed structures would have been produced by chance. The mathematical properties of such a model depend on the type of data that is analyzed. Typically a stochastic process should be used that models a dynamic feature such as stellar, chemical or biological (Darwinian) evolution. In the simplest case the state space of such a stochastic process is a scalar (one nucleotide or amino acid), a vector (a DNA or amino acid string) or a graph (protein complexes or cellular networks).

A major conclusion of our work is that fine-tuning is a clear feature of biological systems. Indeed, fine-tuning is even more extreme in biological systems than in inorganic systems. It is detectable within the realm of scientific methodology. Biology is inherently more complicated than the large-scale universe and so fine-tuning is even more a feature. Still more work remains in order to analyze more complicated data structures, using more sophisticated empirical criteria. Typically, such criteria correspond to a specificity function f that not only is a helpful abstraction of an underlying pattern, such as biological fitness. One rather needs a specificity function that, although of non-physical origin, can be quantified and measured empirically in terms of physical properties such as functionality. In the long term, these criteria are necessary to make the explanations both scientifically and philosophically legitimate. However, we have enough evidence to demonstrate that fine-tuning and design deserve attention in the scientific community as a conceptual tool for investigating and understanding the natural world. The main agenda is to explore some fascinating possibilities for science and create room for new ideas and explorations. Biologists need richer conceptual resources than the physical sciences until now have been able to initiate, in terms of complex structures having non-physical information as input (Ratzsch, 2010). Yet researchers have more work to do in order to establish fine-tuning as a sustainable and fully testable scientific hypothesis, and ultimately a Design Science.

This is a significant development. The article gives the arguments of intelligent design theorists a major hearing in a mainstream scientific journal. And dont miss the purpose of the article, which is stated in its final sentence to work towards establish[ing] fine-tuning as a sustainable and fully testable scientific hypothesis, and ultimately a Design Science. The authors present compelling arguments that biological fine-tuning cannot arise via unguided Darwinian mechanisms. Some explanation is needed to account for why biological systems show the appearance of being deliberately fine-tuned. Despite the noise that often surrounds this debate, for ID arguments to receive such a thoughtful and positive treatment in a prominent journal is itself convincing evidence that ID has intellectual merit. Claims of IDs critics notwithstanding, design science is being taken seriously by scientists.

Read the original here:
Breakout Paper in Journal of Theoretical Biology Explicitly Supports Intelligent Design - Discovery Institute

The structural basis for Z 1-antitrypsin polymerization in the liver – Science Advances

Abstract

The serpinopathies are among a diverse set of conformational diseases that involve the aberrant self-association of proteins into ordered aggregates. 1-Antitrypsin deficiency is the archetypal serpinopathy and results from the formation and deposition of mutant forms of 1-antitrypsin as polymer chains in liver tissue. No detailed structural analysis has been performed of this material. Moreover, there is little information on the relevance of well-studied artificially induced polymers to these disease-associated molecules. We have isolated polymers from the liver tissue of Z 1-antitrypsin homozygotes (E342K) who have undergone transplantation, labeled them using a Fab fragment, and performed single-particle analysis of negative-stain electron micrographs. The data show structural equivalence between heat-induced and ex vivo polymers and that the intersubunit linkage is best explained by a carboxyl-terminal domain swap between molecules of 1-antitrypsin.

The misfolding of proteins and their spontaneous ordered aggregation underlie the pathology of Alzheimers, Huntingtons, and Parkinsons diseases; amyloidoses; and serpinopathiesthe latter involving self-association of mutant members of the serine protease inhibitor (serpin) superfamily. 1-Antitrypsin is a 52-kDa serpin expressed and secreted predominantly by hepatocytes and is the most abundant circulating protease inhibitor. The primary physiological role of 1-antitrypsin is the inhibition of neutrophil elastase, a protease whose production is increased during the acute phase inflammatory response (fig. S1, A and B). However, genetic variants such as the severe Z (E342K) allele of 1-antitrypsin promote proteasomal degradation and the formation of ordered linear polymers (1, 2). Despite the pronounced retention in the endoplasmic reticulum (ER), 1-antitrypsin polymers do not typically initiate the unfolded protein response. Instead, these ordered aggregates can be sequestered into ER-derived inclusion bodies that are associated with liver disease. The lack of circulating 1-antitrypsin results in dysregulation of neutrophil elastase and hence tissue destruction and emphysema (2).

The structure of the pathological polymers that accumulate in patients has not been demonstrated. The observation that 1-antitrypsin polymers show a similar degree of stabilization to the cleaved form (3) (fig. S1B, EI) and that peptide analogs of the inserted portion of the reactive center loop (RCL) could similarly stabilize the protein (4) and prevent polymerization (1, 3) suggested that polymers were the product of an interaction between the RCL of one molecule and sheet A of the next (1). This loop-sheet model (Fig. 1A, hypotheses H1 and H2) is consistent with nuclear magnetic resonance and H/D (hydrogen-deuterium) exchange data showing that polymerization proceeds via a compact, rather than an expanded, intermediate (5, 6). The subsequently proposed -hairpin hypothesis (Fig. 1A, H3) was based on the crystal structure of a self-terminating dimer of a homologous protein, generated artificially at low pH, and extrapolated to 1-antitrypsin using limited proteolysis and recombinant mutants with stabilizing disulfide bonds (7). The C-terminal model (Fig. 1A, hypothesis H4) posits that the C terminus fails to form properly in the donor molecule and is instead incorporated into an acceptor molecule, with latent-like self-insertion of the RCL providing the extreme stability found in polymers (8). This model is based on a crystal structure of a denaturant-induced circular trimer of recombinant disulfide-bonded 1-antitrypsin. The circular arrangement of subunits provides a rigid structure that is tractable for crystallography but reflects a minor component of the source sample that is not generally enriched in polymer preparations (1), although there is an absence of the latent conformation in humans that would be predicted to be a by-product of this mechanism (9).

(A) Different linkages hypothesized for the pathological polymer, H1 to H4, with the intermolecular interface proposed between one monomeric subunit and the next shown in black. (B) (i) Analysis of polymers isolated from intrahepatic inclusion bodies (denoted as ZZ) by 4-12 (w/v) acrylamide SDS-PAGE in comparison with the monomeric wild-type (M) variant purified from human plasma and visualized by Coomassie blue R stain. (ii, iv, and v) Western blots of ex vivo polymers (ZZ), polymers of the M variant induced by heating (H), and monomeric M variant (M) separated by denaturing SDS-PAGE (top) and nondenaturing native PAGE (bottom) and probed with a conformation-insensitive rabbit polyclonal antibody (pAb AAT, left) or a mouse monoclonal selective for polymeric 1-antitrypsin (mAb 2C1, right). No monomer is visible by native PAGE in the heat or ZZ preparation. (iii) Sensitivity of ex vivo Z 1-antitrypsin to PNGase F (+P) or EndoH (+E), the latter preferentially cleaving high-mannose glycans. (C) Representative micrograph of polymers isolated from ex vivo liver tissue, visualized by 2% (w/v) uranyl acetate negative stain using a Tecnai 120-keV transmission electron microscope at a magnification of 92,000. The image has been low-passfiltered to 30 . Black scale bar, 50 nm. Details of some polymers are shown at the right. (D) Same material, labeled with the Fab fragment of the 4B12 monoclonal antibody (Fab4B12), and visualized under the same conditions. Scale bar, 50 nm. Details from micrographs are shown at the right; readily discernible Fab protrusions are highlighted by arrows.

The question remains unresolved as to which polymerization model, if any, describes a realistic organization of the pathological polymer. To address this issue, we have performed a structural characterization of polymers from explant liver tissue of individuals homozygous for the Z allele who had undergone orthotopic transplantation. This has allowed us to define structural limits on the pathological polymer and to critically evaluate the proposed models in this pathological context.

Tissue samples were obtained from the explanted livers of individuals homozygous for the Z allele of 1-antitrypsin. After isolation of inclusion bodies, polymers released by sonication were found to contain a major component that resolved at ~50 kDa when dissociated and visualized by denaturing SDSpolyacrylamide gel electrophoresis (SDS-PAGE) (Fig. 1B, i). It was confirmed to be 1-antitrypsin by Western blot analysis (Fig. 1B, ii). The difference in migration with respect to monomeric material purified from human plasma (Fig. 1B, i and ii) was no longer observed following treatment with PNGase F or EndoH (Fig. 1B, iii). This is diagnostic for glycosylated material that has not undergone maturation in the trans-Golgi network and therefore has been retained by the cell. When visualized by nondenaturing PAGE, the protein migrated with a broad size profile with some discrete bands visible, it was reactive with the polymer-specific (10) monoclonal antibody mAb2C1, and it was free of detectable monomer (Fig. 1B, iv and v).

The liver-derived polymers were applied to carbon-coated copper grids and negatively stained with 2% (w/v) uranyl acetate; polymers could easily be distinguished in the resultant electron microscopy (EM) images by a beads-on-a-string appearance (1), with a curvature of the chain and an absence of branching (Fig. 1C). While some circular forms were present, in contrast to a small-angle x-ray scattering (SAXS) analysis of polymeric material produced in the cytoplasm of Pichia pastoris (11), most (~80%) were nonself-terminating with clearly separated termini.

Polymer subunits are ~50 kDa in size, their ellipsoidal shape has few distinct features that would aid orientation, and they are connected by linkages that appear flexible. These properties provide confounding factors to processing by single-particle analysis. To facilitate subsequent image processing, we doubled the effective size of the polymer subunits and introduced an orienting feature by labeling polymers with the antigen-binding fragment of the 4B12 monoclonal antibody (Fab4B12) (12). This antibody was selected as it recognizes all folded forms of 1-antitrypsin including the polymer, and the location of its epitope is well established (1214).

Following the addition of Fab4B12 at a stoichiometric excess to the 1-antitrypsin subunits and removal of unbound material, the polymer sample was visualized using negative-stain EM (NS-EM) (Fig. 1D). Fab4B12-labeled polymer subunits demonstrated additional density visible as tooth-like protrusions (Fig. 1D, insets). On consecutive subunits, Fabs were, in general, present on the same side of the polymer chain, potentially indicating a preference of the angular relationship around the polymer axis. Conversely, opposing 1-antitrypsinFab4B12 orientations, which would report substantial orientational freedom around the intersubunit linkage, were observed only infrequently.

The heterogeneity and flexibility of ex vivo polymers make them unsuitable for crystallography. Modern protocols for single-particle reconstruction of three-dimensional (3D) objects using EM images enable us to explicitly address heterogeneity in samples, and we therefore sought to structurally characterize the pathological polymers using this technique. A NS-EM image dataset of Fab4B12-labeled polymers was compiled from 100 30-frame movies that had been collected using a DE-20 direct detector and a Tecnai 200-keV transmission electron microscope. Preliminary experiments indicated that polymer flexibility would represent a challenge for a single-particle reconstruction approach. Thus, a minimal segment required to investigate the linkage between monomersa dimer of adjacent subunitswas chosen for the subsequent structural analysis.

The processing pathway for single-particle reconstruction is described in more detail in the Supplementary Materials and in fig. S2 and is summarized here. Initially, images of dimer particles were manually selected from regions of polymers that appeared by eye to be side views with relatively little curvature (fig. S2b) and divided into classes using the Class2D function of RELION (15). The class sums included dimers in which the subunits appeared as adjacent ellipses, and many subunits exhibited a protuberance with the characteristic narrow midriff present in Fab structures (fig. S2d). In some classes, these Fab4B12 subunits were poorly resolved, suggesting variability in rotation between adjacent subunits. Seven classes with well-defined Fab4B12 components were used as references for autopicking from the same set of micrographs; after removal of poorly defined components, this yielded ~100,000 230 230 particle images. This dataset, DA,100K, was found by 2D classification to be more diverse and less dominated by long-axis dimer views (fig. S2f). Later in the course of processing, a subset of 69,000 dimer images (DB,69K) was extracted from a 2D reclassification of the same dataset (fig. S2k).

One class in particular showed two well-resolved Fab subunits (fig. S2h). To generate an initial model-agnostic reference map for 3D classification, we converted this 2D image to a 3D surface representation (fig. S2h, right) with the height (along z in both directions) at each x,y coordinate proportional to the grayscale value of the corresponding pixel in the image (fig. S2h, right). This map was used as a reference for 3D classification of the DA,100K dataset (fig. S2i). In two of eight resulting maps, both 1-antitrypsin molecules exhibited Fab4B12 protrusions. The best-defined map was divided in half, and one subunit was used as a monomer input reference in a reclassification of DA,100K (fig. S2j). Following several iterations of 3D classification, five of eight classes exhibited either one or two well-defined 1-antitrypsinFab4B12 subunits (fig. S2n). These maps were divided in half, and the monomer subunits were individually superimposed and averaged together, providing a consensus density for the 1-antitrypsinFab4B12 monomer subunit Monav (fig. S2o, left). Monav was used as the reference map in successive rounds of 3D classification. Eventually, two classes were identified that showed connected 1-antitrypsin molecules with clear Fab4B12 subunits, comprising 9200 and 6200 particle images, respectively (fig. S2, p and q).

These 3D classes differed in the angles between the two 1-antitrypsinFab4B12 subunitsapproximately 60 and 90and were accordingly termed Dim60 and Dim90 (Fig. 2A). Both showed clear Fab4B12 protuberances and connectivity between the volumes representing the 1-antitrypsin molecules. 3D refinement using gold-standard FSC (Fourier shell correlation) analysis provided estimated resolutions of 19.1 and 24.8 , respectively (at a FSC threshold of 0.33) (fig. S3). Other attempts to obtain dimer reconstructions using variations of the processing pathway described above also converged on these two forms and no others.

(A) Orthogonal views of the reconstruction of Dim60 (left) and Dim90 (right) contoured at 3.9 105 3. In this orientation, the connected 1-antitrypsin density is situated at the bottom, and the Fab domains are at the top. Calculated resolutions (using FSC = 0.33) are 19.1 and 24.8 , respectively (fig. S3). (B) Particle images, clustered by view and averaged, that are the basis for the reconstructions. The relative support for each cluster, calculated from the sum of the weights of the constituent images, is shown as circles colored according to a heatmap, highlighting the enrichment of views orthogonal to the dimer axis.

A summary of the constituent particle images, clustered by orientation relative to the 3D reconstructions, can be seen in Fig. 2B. In both cases, the assigned views show that the datasets contain a larger number of side-on views of the dimers, consistent with the observed alignment of most polymers in the micrographs.

Polymers artificially induced at an elevated temperature have often been used to study the process of polymerization (3, 6, 12, 1618). It has been shown that this form shares a common epitope with ex vivo polymers in the vicinity of helices E and F (fig. S1A) (14); the epitope is not recognized when polymerization is artificially induced using a denaturant (10, 16). The lack of discrimination between heat and liver polymers does not, however, demonstrate structural equivalence, and a means of direct comparison between the two has been lacking.

Heat-induced polymers of the plasma-purified M variant were induced and purified, labeled with Fab4B12, and visualized by NS-EM using 2% (w/v) uranyl acetate stain. The resulting images showed the same flexible beads-on-a-string appearance (Fig. 3A), with a greater proportion exhibiting a circularized morphology. The Fab domains once again appeared as teeth-like protuberances with a general preferred orientation on the same side of the polymer axis in adjacent subunits with an occasional apparent ~90 to 180 inversion (Fig. 3B). A new dataset comprising 169 micrograph images was obtained, compiled from 30-frame movies collected using the DE-20 direct detector and the Tecnai 200-keV transmission electron microscope.

(A) Representative micrograph of polymers of M 1-antitrypsin induced at 55C for 48 hours, visualized by 2% (w/v) uranyl acetate negative stain using the Tecnai 120-keV transmission electron microscope at a magnification of 92,000. The image has been low-passfiltered to 30 . Black scale bar, 50 nm. Details of selected polymers are shown at the right. (B) Heat-induced polymers labeled with Fab4B12 and visualized in the same manner. Details from micrographs are shown at the right; discernible Fab protrusions are highlighted by arrows. (C) Orthogonal views of the reconstruction of a Dim60-like structure, with a calculated resolution of 26.4 (FSC = 0.33) (fig. S3). (D) Particles upon which the reconstruction is based, clustered by imputed orientation and with the relative sum of their weights shown as a spectrum. (E) Orthogonal projections of the aligned and contoured Dim60 (blue) and Dim60H (red) structures, with axes shown; overlapping regions appear as magenta. (F) 2D class sums from the liver and heat-induced polymer particle datasets arranged in pairs with columns denoted by L and H, respectively. For each liver polymer class, the most similar heat-induced polymer class by cross-correlation coefficient is shown; gray vertical lines through the images denote identified intensity peaks. (G) Distribution of the interpeak distances for the liver (blue) and heat (red) polymer distances. Dashed lines indicate the means of both sets of data.

We performed autopicking in RELION from the new micrographs using the same 2D references as with the ex vivo dataset (fig. S2d, right) because the heat-induced polymer subunits were of a similar size. Following rounds of 2D classification and cleaning of the image dataset, 25,000 dimer particles were extracted for further image analysis. In 3D classification, the monomeric subunit Monav (fig. S2o, left), obtained from the ex vivo dataset, was used as the reference; monomer rather than dimer was chosen to avoid introducing bias in the relative rotation and translation between subunits. At the final step of classification, a Dim60-type class was identified (Dim60H; Fig. 3C), comprising 6750 particles and with a nominal resolution of 26.4 (at FSC = 0.33; fig. S3). Clustering of particles by their orientation relative to the 3D volume again showed a preference for side views (Fig. 3D). Attempts at reclassification of the residual 18,000 particles failed to reveal further well-defined 3D classes.

In a preliminary model-free analysis, the 1-antitrypsinFab4B12 dimer structure identified from the heat polymer data exhibited a somewhat different intersubunit distance and Fab4B12 orientation to that seen with the liver-derived dataset (Fig. 3, C and E): Translations and rotations of 64 /57 and 69 /65, respectively, were required to superimpose a subunit volume onto the adjacent one. The correspondence more generally between the two datasets was therefore investigated. A comparison was made between all 2D classes obtained from the liver-derived polymer dataset against those calculated from the heat-induced polymer dataset by optimally aligning every possible pair and recording those with the highest correlation coefficient. Most pairs showed good visual correspondence (representative comparisons of class averages are shown in Fig. 3F). Positions of subunits were identified from peaks in the intensity profile of each image. The distribution of distances between these peaks in the aligned classes was almost identical, with a mean of 65 12 and 64 11 (SD) for liver-derived and heat-induced polymer 2D classes, respectively (Fig. 3G). The putative distinction between the dimer volumes is therefore likely accommodated within the observed geometric relationships between subunits in both samples rather than supporting separate linkage mechanisms.

The 3D reconstructions of adjacent subunits reflect the asymmetric character of the Fab4B12-bound subunits and polarity of 1-antitrypsin within the polymer and embody shape, intersubunit distance, and rotational information. Accordingly, they could be used to challenge the different hypotheses regarding the structure of the pathological 1-antitrypsin polymer (Fig. 1A). As the foundation of this analysis, an atomic model of the Fab-antigen complex was required. Protein crystallization trials of Fab4B12 were successful and yielded a 1.9 structure, with the crystallographic parameters summarized in table S1. The asymmetric unit contained two molecules, one of which exhibited fully defined variable loop regions. Despite extensive efforts, it was not possible to obtain a crystal structure of the 1-antitrypsinFab4B12 complex; SAXS data were collected instead. The atomic model of the 1-antitrypsinFab4B12 subunit was then constructed using five sets of experimental data:

1) a consensus density map of the monomer generated by aligning and averaging the individual subunits of the Dim60 and Dim90 reconstructions from the liver polymer dataset (Mon60,90; shown in Fig. 4A, left);

(A) Left: Density for an 1-antitrypsinFab4B12 subunit calculated as the average of the Dim60 and Dim90 subunits, contoured at 1.9 105 3 with a nominal resolution (at FSC = 0.33) of 15.2 (fig. S3). Middle: Result of modeling trials in which complexes between 1-antitrypsin and Fab4B12 molecules with random starting orientations were optimized with respect to the antibody epitope and the subunit density. The resulting structures were evaluated according to their correspondence with the experimental SAXS profile recorded for the complex. A cluster of structures maximizing both parameters are highlighted in red and circled. Right: Superposition on the 1-antitrypsin chain of these five structures showing a consistent relationship between the two components, with the heavy chain in blue and light chain in red. (B) Left: Final model of the subunit shown in the context of the experimental density, with the heavy chain in blue, the light chain in dark green, and 1-antitrypsin sheets A, B, and C in red, pink, and yellow, respectively. The orientations are according to the axes shown in Fig. 3E. Right: Correspondence between the observed SAXS data (black) and the profile calculated from the coordinates of the final subunit model (red). (C) Top: Various polymer images extracted from NS-EM micrographs are shown in red, and 2D projections of polymer models that have been refined against these images are shown in black. Bottom: Mean relative correlations (SD) between each model and the experimental density are shown. Values were calculated for each oligomer relative to the best score observed for that oligomer. Significance was determined by one-way analysis of variance (ANOVA) and Tukeys multiple comparisons test (n = 18); ***P < 0.001 and ****P < 0.0001.

2) the experimentally determined epitope of Fab4B12 (13, 14) at 1-antitrypsin residues 32, 36, 43, 266, and 306 incorporated as a collection of distance constraints on the crystal structures of the individual components;

3) the Fab4B12 crystal structure;

4) the SAXS profile of the complex (Fig. 4B, right); and

5) the structure of cleaved 1-antitrypsin [Protein Data Bank (PDB): 1EZX (19)], as all extant polymer models propose a six-stranded sheet A configuration (Fig. 1A).

Integration of these data during modeling was performed using PyRosetta (20). One thousand randomized starting orientations for 1-antitrypsin and Fab4B12 were subjected to rigid-body energy optimization with reference to these constraints and the Mon60,90 subunit map and scored according to both the cross-correlation coefficient (CCC) with the density and their correspondence with the SAXS profile (Fig. 4A, middle). Backbone and side-chain flexibility was conferred on regions of the Fab likely to contribute to the interface (heavy chain: 27 to 33, 51 to 57, 71 to 76, and 94 to 102; light chain: 27 to 32, 49 to 54, 66 to 70, and 91 to 94) and 1-antitrypsin side chains within the boundaries of the epitope.

The five models that maximized these metrics showed an unambiguous polarity (Fig. 4A, right). One model was selected that best represented this cluster by root mean square distance comparison with the others. This showed the heavy-light chain partition to be oriented off-center along helix A, with the variable-constant domain axis perpendicular to the long axis of the serpin [Fig. 4, A (right) and B (left)]. The cleft between the variable and constant domains of Fab4B12 aligned closely with a central dimple exhibited by the monomer density (denoted by an asterisk in the figure), and the complex corresponded well with the experimentally determined SAXS profile (Fig. 4B, right).

Initial models of the C-terminal (8), loop-sheet (1), and -hairpin (7) polymer configurations (Fig. 1A) were built using the 1-antitrypsinFab4B12 subunit structure (representations of these can be seen in the left column of Fig. 5), differing most substantially in the linker regions connecting adjacent subunits in the polymer chain (detailed in Materials and Methods).

(Top) Different polymer configurations were randomly perturbed by rotation of the subunits with respect to one another and their conformations optimized against Dim60, Dim90, and Dim60H reconstructions. The correlation coefficient after perturbation and before optimization is shown on the x axis, while that after optimization is shown on the y axis. Values are expressed relative to subunits optimized into the density without restriction by a connecting linker. Flexible regions encompassed residues 357 to 368 in all models as well as 340 to 349 (H1), 340 to 352 (H2), and 309 to 328 (H3). (Bottom) The best-fitting model for each polymer configuration and for each of the three dimer EM structures is shown (1-antitrypsin in blue and Fab4B12 in dark green) with respect to the fit of unconstrained subunits (shown in pink). Regions treated as flexible linkers during the optimization are highlighted in light green. For all three reconstructions, the C-terminal model corresponds with the optimum arrangement of subunits.

From an examination of the representative micrographs shown in Fig. 1 (C and D), the intersubunit angular relationships along the polymer chains are not solely accounted for by the Dim60 and Dim90 configurations. Instead, these structures likely correspond to more highly populated species along a continuum of intermediate states. To investigate the compatibility of the loop-sheet, C-terminal, and -hairpin linkages with the arrangement of polymers seen in the micrographs, we used a method that optimized the 3D models to maximize their correspondence with the 2D polymer images. Stretches of residues connecting the dimer subunits were treated as flexible (as specified in Materials and Methods), while the 1-antitrypsinFab4B12 cores behaved as rigid bodies. A selection of 20 oligomers was chosen with different degrees of curvature and subunit orientation (Fig. 4C). Despite a lack of information along the z axis, this approach was able to discriminate between the models on the basis of their ability to adopt the shapes seen in the 2D polymer images: The highly constrained loop-sheet eight-residue insertion model (H1) performed significantly worse than the others (P < 0.0001). The flexibility of the C-terminal domain swap (H4) provided a better fit than the loop-sheet four-residue insertion model (H2) (P < 0.001), and the -hairpin (H3) and C-terminal models (H4) were not distinguishable by this analysis (Fig. 4D).

Next, the compatibility of loop-sheet, C-terminal, and -hairpin configurations with the 3D Dim60, Dim90, and Dim60H reconstructions was evaluated. Each model was repeatedly randomly perturbed by rotation around the dimer long axis (through the 1-antitrypsin subunits) and energy minimized with respect to the EM structures and default stereochemical restraints using PyRosetta (20). This process was undertaken 1000 times for each combination of model and map. As before, the 1-antitrypsinFab4B12 subunits were treated as rigid bodies connected by a flexible linker region. The correspondence between each model and the target map was assessed by the cross-correlation function. These CCC values were denoted as ccperturbed and ccrefined for each perturbed model before and after energy minimization, respectively. Benchmark maximum CCC values were obtained by performing model-free alignments of 1-antitrypsinFab4B12 subunits into each map in the absence of a linker region and reported as ccoptimal, denoted by red shaded models in the bottom panels of Fig. 5.

The result of this analysis is shown in Fig. 5 (top, color-coded by hypothesis). The random rotational perturbations applied to each model resulted in a spread of preminimization CCC values along the horizontal axis, and minimization of these models generally showed a convergence over a narrow range of CCC values on the vertical axis. The minimized structure giving the highest ccrefined/ccoptimal score for each polymer configuration (in rows) with respect to each map (in columns) is shown in Fig. 5 (bottom). By this analysis, the best-scoring C-terminal polymers (H4) exhibited a value close to one, indicating that the linkage-restrained models were essentially indistinguishable from the unrestrained ones, and this was reflected by an almost direct superimposition of the model over the aligned linker-free subunits (top row). In contrast, the translational and rotational restrictions imposed by the linkers of the other models (H13) prevented them, to varying degrees, from adopting the preferred orientation inherent with respect to the data (bottom three rows).

All models entail a connection between strand 4A of one 1-antitrypsin subunit and strand 1C of the next. A distinguishing characteristic of hypotheses H13, with respect to the C-terminal model (H4), is that they involve a second unique intermolecular linkage. Having dual intermolecular constraints might be expected to reduce conformational flexibility, and this may contribute to their lesser compatibility with the density. To explore this, we performed a variation on the experiment in Fig. 5 in which the dual-linkage models were converted to single linkage by breaking the peptide bond between residues 358 and 359 of the strand 4A-1C connection, leaving the unique second linker that each model embodies intact. During iterative rounds of optimization, displacement between residues adjacent to the site of cleavage confirmed that this modification allowed additional freedom of movement of the subunits. At the conclusion of the experiment, the scores obtained were very similar to those obtained with the intact models (fig. S4, top). We also performed the converse experiment, in which the strand 4A-1C connection was kept intact, and the second unique linker of each model was broken (between residues 344 and 345 for H12 and 324 and 325 for H3). This provided comparable results to the single-linkage C-terminal model (H4) (fig. S4, middle).

These results demonstrate that the head-to-tail orientation of 1-antitrypsin subunits, with the base of sheet A and the top of sheet C in proximity to one another, is an intrinsic feature of the dimer density. Therefore, for the dual-linker models, it is not the reduced flexibility that distinguishes them but the inconsistency of their second linkage with this subunit orientation.

Thus, the orientation provided by the C-terminal model is most compatible with the Dim60 and Dim90 structures present in liver-derived polymers. In the final structure, there are translations of 71 and 73 between the centers of mass of the 1-antitrypsin molecules and a final calculated rotation around the dimer axis of 65 and 81, respectively (Fig. 6A, top and middle). The same analysis, performed using the Dim60H model derived from the heat-induced dataset, gave the same conclusion: The C-terminal model (H4) provided a fit consistent with the model-free aligned subunits (Fig. 6A, bottom). While there was a relative improvement in the fit of the loop-sheet 4 dimer, this model remained unable to adopt an optimal alignment to the experimental data (Fig. 5, right, and fig. S4, top right).

(A) Best-fitting C-terminal model (H4) displayed against the Dim60 (top), Dim90 (middle), and Dim60H (bottom) density, annotated with intersubunit translations and rotations. Dashed lines represent vectors passing through the centers of mass of the 1-antitrypsin and Fab molecules. (B) Electrophoretic mobility shift assay comparing the affinity of the polymer-specific mAb2C1 for polymers of different origin. Binding of the antibody results in a cathodal shift of 1-antitrypsin polymers. Arrows highlight that cleavage-induced polymers, which are structurally analogous to C-terminal polymers, are readily recognized by the antibody with respect to denaturant-induced polymers. A schematic representation of P9-cleavageinduced polymers is shown at the left, with the domain-swapped peptide in black, based on PDB 1D5S (21). (C) Results of sandwich ELISA experiments showing the relative affinity of mAb2C1 for liver-derived, cleavage-induced, and denaturant-induced polymers, normalized to the half-maximal effective concentration (EC50) of the interaction with heat-induced polymers. The affinity of monomeric M and Z, denoted by open circles, was outside the maximum antigen concentration used in the experiment and, correspondingly, not less than two orders of magnitude worse than that of heat-induced polymers. Independent experiments are denoted by the markers, and the means SD are indicated by the bars (liver-derived and denaturant-induced, n = 3; cleaved, n = 6); heat-induced by definition is 1, represented by the dotted line; w.r.t, with respect to.

A neoepitope is recognized by the mAb2C1 antibody that is present in liver-derived and heat-induced polymers but not in those induced in the presence of a denaturant. Thus, the latter conditions produce a polymer structure not representative of pathological material (14, 16). Cleavage of the RCL of 1-antitrypsin in a noncognate position can also induce polymerization (3), and the atomic details of the resulting polymer linkage, defined by crystallography (21, 22), show that it produces a molecule that mimics a noncircular form of the C-terminal trimer (8). To determine whether mAb2C1 recognizes the open C-terminal configuration identified from the EM analyses, polymers mimicking this structure were produced by limited proteolysis of a recombinant Ala350Arg 1-antitrypsin mutant by thrombin. This material was readily recognized by mAb2C1 as demonstrated in a mobility shift experiment (Fig. 6B). The relative affinity of mAb2C1 for the different forms was then determined by enzyme-linked immunosorbent assay (ELISA). These experiments exhibited comparable recognition of liver, heat-induced, and C-terminalmimicking cleaved polymers by the antibody, with a markedly lower affinity for denaturant-induced polymers and monomer (Fig. 6C).

1-Antitrypsin deficiency is characterized by the accumulation of mutant protein as inclusions within hepatocytes. Extraction and disruption of these inclusions release chains of unbranched polymers, which, when isolated, exhibit pronounced flexibility and apparently lack higher-order interactions. Several models have been proposed for the molecular basis of the formation and properties of these polymers from in vitro experiments. On the basis of the observation that polymers are extremely stable and that artificially induced polymerization can be prevented by peptide mimics of the RCL, the first proposed loop-sheet molecular mechanism posited that the RCL of one molecule would incorporate into a sheet of the adjacent molecule (H1 and H2 in Figs. 1A and 5) (1). Since that time, while biophysical studies have attempted to address the question of mechanism, the only crystal structures that have been obtained of 1-antitrypsin oligomers are of forms produced artificially from recombinant nonglycosylated material: a chain of molecules spontaneously assembled following fortuitous cleavage by a contaminating protease (21, 22) and a circular trimer of a disulfide mutant produced by heating (H4) (8). Hence, there has been no direct evidence of the structure of the pathological polymers that deposit in the livers of patients with 1-antitrypsin deficiency.

The in vivo mechanism of 1-antitrypsin polymerization and accumulation in the liver has important consequences for the development of therapeutics that interfere with this process. The loop-sheet hypothesis (H1 and H2) involves relatively minor and reversible perturbations with respect to the native conformation to adopt a polymerization-prone state (1), the C-terminal model (H4) predicates a preceding substantial and irreversible conformational change (8), and the -hairpin model (H3) lies somewhere between the two (7). This has implications for the nature of the site and mode of ligand binding capable of blocking polymerization and, indeed, for the question as to whether the process can be reversed at all.

Polymer material obtained from liver tissue is heterogeneous in size, glycosylated, and difficult to obtain in substantial quantity, making it unsuitable for crystallography. Without the requirement to form a crystal lattice, single-particle reconstruction using EM images represents an excellent option to obtain structural information. The negative-stain approach used here for the analysis of small protein complexes provided a strong contrast between protein and background and, in conjunction with decoration by Fab moieties, made angular information easier to retrieve, revealing the interactions between the components of the flexible polymer chains present in explant liver tissue.

Interrogation of the extant models of polymerization revealed that the loop-sheet dimer model (H1), despite its general compatibility with many biophysical observations, was unable to adopt the intersubunit translation or rotation observed in the 2D and 3D data (Figs. 4 and 5). A less stringent test of this model, a four-residue insertion loop-sheet configuration (H2) with an interchain interface analogous to one binding site of a tetrameric peptide blocker of polymerization (23), still provided an incomplete fit to the data. The -hairpin domain swap model (H3), based on the structure of a self-terminating dimer of antithrombin, has been proposed to extend to 1-antitrypsin polymerization by limited proteolysis and the stability of a disulfide mutant against polymerization (7), a conclusion that has been questioned (16, 24) and not supported by peptide fragment folding data (25). Owing to its longer predicted linking regions, the fit to the Dim60 and Dim90 data was better than that seen with the loop-sheet models (Fig. 5), but it required 20 residues to lose their native structure with respect to the antithrombin crystal structure from which this model is derived. While the crystal structure unequivocally demonstrates the ability of this form to adopt a 180 inversion orthogonal to the dimer axis, there was no evidence in the micrographseither Fab-bound or unboundof a chain inversion of this magnitude.

In contrast, the NS-EM data were best explained by the location, length, and flexibility of the C-terminal linkage (H4). The C-terminal mechanism involves displacement (or delayed formation) of the C-terminal 4-kDa fragment of 1-antitrypsin comprising strands 1C, 4B, and 5B (fig. S1) and self-insertion of the RCL, which results in a monomeric latent-like intermediate conformation (8). The open, nonself-terminating arrangement of the subunits (Fig. 6A) contrasts with the observation that oligomeric components of recombinant material purified from P. pastoris were circular (11).

The data obtained, including the intersubunit orientation and distance (Figs. 3, F and G, 5, and 6A) and the presence of the mAb2C1 epitope (Fig. 6B), support a structural equivalence of heat-induced and liver-derived polymers. Hence, it follows that there will be components shared between their respective polymerization pathways; it should accordingly be possible to extend mechanistic observations made in vitro to the mechanisms that produce polymers in vivo, and here, we draw on observations made in the literature regarding the role of strands 5A, 1C, 4B, and 5B and the breach region (Fig. 7). The ability to induce polymers from folded native 1-antitrypsin by displacement of the C-terminal region at modestly elevated temperatures in the Z variant implies that core packing interactions are readily destabilized when the molecule is in a five-stranded sheet A configuration. In the native conformation (Fig. 7, i), the Z variant has been noted to increase the mobility of strand 5A (26) and the solvation and rotational freedom (27) of the solvent-accessible (28) Trp194 residue that is situated in the breach region (Fig. 7, ii, bottom). The breach is bounded by a hydrophobic cluster of residues including some contributed by strands 5A as well as C-terminal 4B and 5B, on which solvation (as reported by Trp194) would be expected to exert destabilizing effects. This is supported by sequential polypeptide folding experiments, suggesting that engagement of ~36 residues at the C terminus is predicated on a properly formed strand 5A (25). A related process likely occurs on the opposing side of the molecule: Helices A, G, and H form a trihelix clamp over this region, and disruption of stabilizing interactions by the S (Glu264Val) and I (Arg39Cys) mutations (Fig. 7, ii, top) also leads to an increased tendency to polymerize upon the application of heat. Moreover, the fact that S, I, and Z are able to copolymerize (29, 30) indicates that this occurs by a common mechanism and supports the mutual destabilization of the C-terminal region that is situated between them (Fig. 7, iii). This process is consistent with the site of polymerization-prone latch mutations clustered near the end of the polypeptide chain (31).

From the native state (i), the evidence suggests that during heating, decreased affinity for the C terminus can be induced by destabilization of the adjacent breach region with increased solvation of the hydrophobic core (26, 27), destabilization in the adjacent trihelix region (as in the S and I variants), and associated loss of strand 1C native interactions (ii and iii) (6, 24, 32). Upon dissociation of the C terminus, the molecule is equivalent to a final stage of folding of the nascent polypeptide chain (iv) (25). This (reversible) displacement is unable to immediately lead to self-insertion and generate the hyper-stable six-stranded sheet A (25) despite delayed folding (34) (v), but such a change is able to proceed rapidly and irreversibly upon incorporation of the C terminus of another molecule (vi) (25, 33). Under appropriate conditions, the latent conformation is generated as an off-pathway species (vii) that is expected to be inaccessible once full RCL insertion has taken place (v) (17, 36). Asterisks denote Trp194 (blue) and Glu264/Arg39 (red), regions colored as black and yellow arrows highlight structural changes, and symbols indicate the application of heat (triangle) or a hypothesized point of convergence with the nascent chain folding pathway (R).

The early (6) and necessary (24, 32) loss of native strand 1C contacts is consistent with the displacement of the C-terminal region (Fig. 7, iv). In this state, current evidence indicates that the molecule is equivalent to a final stage of the folding pathway (25). While the displaced C terminus (Fig. 7, iv) is relatively hydrophobic, in isolation, the equivalent C36 peptide has been found to be soluble, albeit fibrillogenic over a period of hours, and readily incorporated into native 1-antitrypsin at room temperature, inducing an increase in thermal stability consistent with transition to a self-inserted form (33). This suggests that displacement of this region even at ambient temperature is possible. While, by analogy with release of the RCL by proteolytic cleavage (fig. S1), it might be expected that the release of the C terminus would immediately give rise to self-insertion of the untethered RCL as strand 4A, there is evidence that the absence of an engaged C terminus will prevent this from occurring (25). This is congruent with the preferential folding of the protein to the kinetically stabilized five-stranded sheet A conformation rather than the loop-inserted six-stranded thermodynamically favored state (25) despite the adoption of the hyperstable form upon administration of exogenous C-terminal peptide (33) and the fact that some material does fold correctly to the active form even with the delayed folding of the Z variant (34).

Upon incorporation of the C terminus of another molecule (Fig. 7, v), self-insertion of strand 4A would be expected to follow (Fig. 7, vi) (33). The RCL of 1-antitrypsin is shorter than those of serpins known to undergo latency as a competing process to polymerization (35); once insertion has proceeded beyond a molecular decision point near the center of sheet A (17, 36), the molecule would no longer be able to (re-) incorporate its own C-terminal fragment (Fig. 7, vii), and it would effectively become irreversibly activated for oligomerization (Fig. 7, v). This mechanism is consistent with the suppression of polymerization in cells by a single-chain antibody fragment that alters the behavior of sheet A in the vicinity of the helix F (12, 13) and mutations that inhibit loop self-insertion (17).

Thus, of the proposed polymerization linkage models, our data most strongly support the C-terminal domain swap as the structural basis for pathological polymers of Z 1-antitrypsin. It remains to be determined how common or rare the exceptions are to this mechanism among other members of the serpin family. Serpins share a highly conserved core structure and exhibit common folding behaviors, and mutations that are associated with instability and deficiency tend to cluster within defined structural regions (37, 38). These factors likely place constraints on the mechanism by which mutations can induce polymerization. It is difficult to overlook the central role of the C terminus in both latency and the C-terminal domain swap, with the former essentially a monomeric self-terminating form of the latter (Fig. 7, v to vii). While a shorter RCL likely renders these two states mutually exclusive in 1-antitrypsin, it has been suggested that the greater tendency of plasminogen activator inhibitor-1 (PAI-1) to adopt the latent conformation is due to a common origin in the polymerogenic intermediate (35). In support of this, PAI-1 and the neuroserpin L49P variant can form polymers from the latent state (35, 39), a notable observation given the high stability of this conformation and inconsistent with the loop-sheet polymerization mechanism (which is predicated on a five-stranded native-like molecule) and the intermolecular strand 5A/4A linkage of the -hairpin model.

On the other hand, it has been shown that distinct alternative polymerization pathways are accessible in vitro depending on the nature of the destabilizing conditions used. The crystal structure of a -hairpinswapped self-terminating dimer of antithrombin (7) produced by incubation of this protein in vitro at low pH provides evidence of this. Similarly, induction of polymerization at acidic pH or with denaturants causes 1-antitrypsin to adopt a polymer form inconsistent with that seen upon heating or with pathological specimens from ZZ homozygotes (16). Biochemical evidence indicates that this may reflect the conformation of the rare 1-antitrypsin Trento variant (14).

From the data presented, here we expect the C-terminal domain swap to reflect the basis of pathological polymers in carriers of the Z 1-antitrypsin alleleand by extension, the S and I variantsand therefore account for more than 95% of cases of severe 1-antitrypsin deficiency. Because of its intimate association with the folding pathway and relationship with the latent structure more readily adopted by other serpins, it is probable that this form will be relevant to other serpin pathologies. Whether the same linkage underlies the shutter region mutants of 1-antitrypsin [such as Siiyama, Mmalton, and Kings (2, 10)] that also cause polymer formation and severe plasma deficiency remains to be determined.

Human M and Z 1-antitrypsin were purified from donor plasma, and recombinant 1-antitrypsin was purified from Escherichia coli as previously described (24, 40). Monoclonal antibodies were purified from hybridomas according to published methods (12) and stored in phosphate-buffered saline (PBS) with 0.02% (w/v) sodium azide. Fab fragments were generated by limited proteolysis using ficin or papain as appropriate with commercial kits according to the manufacturers instructions (Thermo Fisher Scientific) with the subsequent addition of 1 mM E-64 inhibitor.

Explanted liver tissue (5 to 10 g) from individuals homozygous for the Z allele was homogenized and incubated at 37C for 1 hour in 10 ml of Hanks modified balanced salt solution with 5 mg of Clostridium histolyticum collagenase, and fibrous tissue was removed from the resultant suspension by filtration through BioPrepNylon synthetic cheesecloth with a 50-m pore size (Biodesign). The filtrate was centrifuged at 3000g at 4C for 15 min, the pellet was resuspended in 3 ml of 0.25 M sucrose in buffer E [5 mM EDTA, 50 mM NaCl, and 50 mM tris (pH 7.4)], and the sample was layered onto the top of two 14-ml centrifuge tubes (Beckman Coulter) containing a preformed 0.3 to 1.3 M sucrose gradient in buffer E and centrifuged at 25,000g for 2 hours at 4C. The supernatant was discarded, and the pelleted inclusion bodies were washed with buffer E. Previous approaches to polymer extraction (41) have made use of detergents and denaturants, compounds that have been shown, under certain conditions, to induce conformational change in 1-antitrypsin (3), and therefore, we omitted their use. Soluble polymers were extracted by sonication on ice using a SoniPrep 150 with a nominal amplitude of 2.5 m (giving a probe displacement of 17.5 m) in bursts of 15 s and 15-s rest for a total of 6 min. The solution was repeatedly centrifuged for 5 min at 13,000g in a benchtop centrifuge to remove insoluble material. Purity of the soluble component was assessed by SDS- and nondenaturing PAGE.

For heat-induced polymers, purified plasma M 1-antitrypsin was buffer-exchanged into PBS to 0.2 mg/ml and polymerization induced by heating at 55C for 48 hours. Denaturant-induced polymers were formed by incubation at 0.4 mg/ml and 25C for 48 hours in 3 M guanidine hydrochloride and 40 mM tris-HCl (pH 8) buffer. Following dialysis, anion exchange chromatography using a HiTrap Q Sepharose column with a 0 to 0.5 M NaCl gradient in 20 mM tris (pH 8.0) was used to remove residual monomer, as confirmed by native PAGE.

An arginine residue was introduced at the P9 position (residue 350) of 1-antitrypsin in a pQE-30based (Qiagen) expression system (17) using the QuikChange mutagenesis kit according to the manufacturers instructions (Agilent). Following purification from E. coli, the protein was subjected to limited proteolysis by a 50-fold substoichiometric concentration of bovine thrombin (Merck) at 37C overnight and polymer isolated by anion exchange chromatography using a HiTrap Q Sepharose column with a 0 to 0.5 M NaCl gradient in 20 mM tris (pH 8.0).

Polymers were incubated with a threefold molar excess (with respect to subunit concentration) of Fab4B12 (12) for 2.5 hours at room temperature and repurified by anion exchange chromatography as described above or dialyzed overnight at 4C into buffer E using a 300-kDa molecular weight cutoff membrane (Spectrum). Copper grids (300 mesh, Electron Microscopy Services) were covered with a continuous carbon film of thickness ~50 m and glow discharged for 30 s. Three microliters of the prepared sample at ~0.05 to 0.1 mg/ml concentration was applied to the prepared grids for 1 min before blotting. Samples were negatively stained for 1 min using 5 l of 2% (w/v) uranyl acetate and blotted, and the staining step was repeated. For single-frame high-contrast micrographs, grids were visualized using an FEI Tecnai T12 BioTWIN LaB6 microscope operating at 120 keV, and images were recorded on an FEI Eagle 4K 4K charge-coupled device camera under low-dose conditions (25 electrons 2) at an effective magnification of 91,463 (1.64 per pixel) and a defocus range of 0.8 to 3.5 m. Micrographs for single-particle reconstruction were recorded as averages of 30-frame, 30-frames/s movies using a Tecnai F20 field emission gun transmission electron microscope at 200 keV with a Direct Electron DE-20 direct detector at a calibrated 41,470 magnification (1.54 per pixel) under low-dose conditions (~1 electron 2 per frame). Frames were motion-corrected using MotionCorr (42). Resulting images were corrected for the effects of the contrast transfer function of the microscope using CTFFIND3 (43). Micrographs with greater than 5% astigmatism were discarded. Manual particle picking was undertaken using EMAN (44). General processing scripts in Python made use of the EMAN2 (44), NumPy, SciPy, OpenCV, and Matplotlib libraries.

RELION v2.1 and v3.0.6 (15) were used for single-particle reconstruction including automated particle picking, 2D and 3D classification, and 3D refinement, with the final processing path described in detail in Results and fig. S2. In general, classification in RELION used a regularization parameter T = 2 and 25 iterations or 50 iterations where convergence of statistics was not observed to have occurred. Image boxes were 230 230 in size; for 2D processing, a mask diameter of 180 was used, and alignment was performed using an initial 7.5 interval with an offset search range of five pixels; for 3D processing, the mask diameter was 195 with a sampling of 15 and eight pixels; and 3D refinement used 195 , 7.5, and five pixels, respectively. Masks were generated for 3D dimer references by contouring at ~3.8 105 3 (or at noise), for monomer references at ~1.9 105 3, and both with the addition of a 7-voxel/7-voxel hard and soft edge. A 30- low-pass filter was applied to the resulting masked volumes before classification or refinement. After obtaining the Dim60 and Dim90 structures, the subsets of particle images on which they were based were subjected to a reference-free stochastic gradient-driven de novo reconstruction in RELION (sampling 15 and two-pixel increments; 50 initial, 200 in-between, and 50 final iterations from 40 down to 20 ). An equivalent model was returned in each case. Similarly, combining the two particle sets together and performing a 3D reclassification using the monomeric Monav reference (fig. S2o, left) effectively returned the same two models.

Proteins were resolved under denaturing conditions by NuPAGE 4 to 12% (w/v) acrylamide bis-tris SDS-PAGE gels and under nondenaturing conditions using NativePAGE 3 to 12% (w/v) acrylamide bis-tris gels (Thermo Fisher Scientific). For visualization by Coomassie dye, typical loading was 1 to 4 and 0.1 to 0.4 g for Western blot. Western blot transfer to a polyvinylidene difluoride membrane was undertaken using the iBlot system (Thermo Fisher Scientific) or by wet transfer (Bio-Rad), followed by these steps: soaking in PBS for 10 min; blocking for 1 hour at room temperature with 5% (w/v) nonfat milk powder in PBS; incubation with primary antibody (rabbit polyclonal at 0.8 g/ml or mouse monoclonal at 0.2 g/ml) overnight at 4C in PBS with 0.1% Tween (PBST), 5% (w/v) bovine serum albumin, and 0.1% sodium azide; washing with PBST; incubation with secondary antibodies at 1:5000 to 1:10,000 in PBST with 5% (w/v) bovine serum albumin and 0.1% sodium azide; and development by Pierce enhanced chemiluminesence (Thermo Fisher Scientific) or fluorescence (LiCor).

High-binding enzyme immunoassay microplates (Sigma-Aldrich) were coated with 50 l per well of anti-polymer mAb2C1 (2 g/ml) in PBS with incubation overnight at room temperature, washed once with distilled water and twice with wash buffer [0.9% (w/v) sodium chloride and 0.025% (v/v) Tween 20], and blocked for 1 hour with 300 l per well of PBST buffer [PBS, 0.025% (v/v) Tween 20, and 0.1% (w/v) sodium azide] supplemented with 0.25% (w/v) bovine serum albumin at room temperature (PBSTB). After washing the plates, antigens in PBSTB were applied by 1:1 serial dilution at a final volume of 50 l across the plate, incubated for 2 hours at room temperature, and washed. Fifty microliters of rabbit anti-human 1-antitrypsin polyclonal antibody (1 g/ml) (DAKO) in PBSTB was added to each well, the plates were incubated for 2 hours at room temperature and washed, 50 l of a 1:2000 dilution of goat anti-rabbit horseradish peroxidase antibody in PBSTB (without sodium azide) was added to each well, and the plates were incubated in the dark for 75 min at room temperature and then washed again. For detection, 3,3,5,5-tetramethylbenzidine substrate solution (Sigma-Aldrich) was added at 50 l per well, the plates were incubated for ~7 min in the dark, the reaction stopped by adding 50 l per well of 1 M H2SO4, and the absorbance was promptly measured at 450 nm in a SpectraMax M5 plate reader (Molecular Devices).

For crystallization trials, protein was buffer-exchanged into buffer C [10 mM tris (pH 7.4), 50 mM NaCl, and 0.02% (w/v) sodium azide] and concentrated to 10 mg/ml. Broad-screen sitting drop approaches against commercially available buffer formulations (Molecular Dimensions and Hampton Research) were performed with 100-nl protein:100-nl buffer drops dispensed using a Mosquito robot (TTP LabTech) and equilibrated against 75 l of buffer at 16C with automatic image acquisition by a CrystalMation system (Rigaku). Hanging-drop screens were performed at 20C with 1 l of protein:1 l of buffer equilibrated against 250 l of buffer. Crystals mounted on nylon loops were briefly soaked in the respective crystallization buffer supplemented by 10% (v/v) glycerol ethoxylate or 10% (v/v) ethylene glycol before plunge-freezing into liquid nitrogen. Data collection was undertaken at the European Synchrotron Radiation Facility (ESRF) ID30B beamline (with enabling work at the Diamond I03 beamline). Data reduction, integration, scaling, and merging were performed using autoPROC (45); the structures were solved by molecular replacement using Phaser (46); model refinement was undertaken with PHENIX (47); and model visualization and building were performed with Coot (48).

Recombinant 1-antitrypsin was incubated at a substoichiometric ratio to Fab4B12 for an hour at room temperature, and excess Fab was removed by anion exchange as described above. After concentration of the complex to 10 mg/ml, 50 l was applied to a Superdex 200 Increase 5/150 column (GE Life Sciences) at a rate of 0.3 ml/min in 30 mM NaCl and 50 mM tris (pH 7.4) buffer at the P12 BioSAXS beamline, European Molecular Biology Laboratory (EMBL) Hamburg (49). The x-ray scatter ( = 1.24 ) was recorded on a Pilatus 6M detector at 1 frame/s. The buffer baseline-corrected scatter profile was produced by integration over time points corresponding with elution of the complex from the size exclusion column using the ATSAS software package (50).

For initial working subunit and dimer models, Coot (48) and PyMOL (Schrdinger Software) were used to position crystal structures of 1-antitrypsin [PDB: cleaved, 1EZX (19); cleaved polymer, 1D5S (21)] or mAb4B12 (PDB: 6QU9) and modify chain boundaries, repair gaps, and improve stereochemistry of intermolecular segments. The initial -hairpin and loop-sheet models (Fig. 1A, H13) were further optimized in PyRosetta (Fig. 1A) (20). Superposition of the model of the 1-antitrypsinFab4B12 complex onto the dimer was undertaken using PyMOL. Modifications had to be made to each model to reconcile observations made here and in recent studies:

H1 and H2. Loop-sheet models have been represented with various degrees of insertion of the donor RCL into the site of strand 4A in the acceptor molecule. To explore the compatibility of this parameter with the flexibility and periodicity of the polymers visualized here, two forms were generated, one with a substantial eight-residue insertion (loop-sheet 8, H1) and one with a marginal interaction at the base of sheet A based on the observation that tetrameric peptides are able to block polymerization and induce stabilization of 1-antitrypsin (loop-sheet 4, H2) (18, 23). The loop insertion site is permissive of noncognate peptide residues; however, such out-of-register insertion has not been observed crystallographically for intra- or interprotein loop insertion. For the arrangements used here, inserted residues were maintained in register at their cognate positions as observed for the structures of the cleaved protein, cleavage-induced polymer (21), and the self-terminating dimer (7) and trimer (8).

H3. The hypothesized unwinding of helix I in the -hairpin polymer has been challenged (16) and is inconsistent with the role of this element in the 4B12 epitope (13). The ability of Fab4B12 to bind to the ex vivo polymers is unequivocal from the images recorded here; thus, if the pathological polymer is reflected by the -hairpin model, then helix I must remain intact.

H4. Contrary to a proposal that circular polymers are the predominant species (8, 11), most of those extracted from liver tissue were observed to be linear. Accordingly, the C-terminal dimer was arranged in an open configuration through redefinition of the chain boundaries in the crystal structure of a cleavage-generated polymer (21).

During optimization of Fab-bound 1-antitrypsin dimer models, the constituent subunits were treated as rigid bodies connected by flexible linker regions. As much intersubunit linker flexibility was allowed as possible while maintaining the integrity of the core 1-antitrypsin fold, consistent with serpin monomer and oligomer crystal structures and with the high stability of the polymer. Divergence from the canonical structure was permitted where this accorded with the characteristics of the model being tested and other experimental data. Specifically:

1) Although crystal structures of cleaved antitrypsin polymers (21, 22), an antithrombin dimer (7), and antitrypsin trimer (8) all have an intact strand 1C, it has been shown that during the process of (heat-induced) polymerization, this element is labile (24, 32). Accordingly, we allowed the residues of this element (362 to 368) to move in all models.

2) All models of polymerization, either structurally defined or modeled, propose a connection between the C terminus of strand 4A and the N terminus of strand 1C (residues 357 to 362). The evidence is that this is a region that lacks secondary structure: In the cleaved form, it is not part of strand 4A or strand 1C; in the native structure, it does not form polar contacts with the body of the molecule; and it forms an extended chain in the latent conformation (36). Thus, this was treated as a flexible region.

3) The -hairpin model (H3) involves a connection between helix I of the donor subunit and strand 5A of the acceptor. Limited proteolysis data were interpreted to support the unraveling of helix I in this polymer linkage, yet this is not a feature observed in the crystal structure of the antithrombin dimer on which the model is based (7), and this conclusion has been disputed (16). If the -hairpin model is indeed representative of the polymers considered in this study, then helix I should be intact as it is integral to the epitope of the nonconformation-selective Fab4B12 that decorates them (13). Hence, the region 309 to 328 between helix I and strand 5A was provided with full flexibility, which maintains the integrity of elements seen in the original crystal structure but allows all other linker residues to move.

4) All crystal structures exhibit an intact strand 5A, and while there is evidence of some lability of this structural element in the native conformation of a Z-like Glu342Ala mutant, this is not shared by the wild-type protein (26). For the loop-sheet models (H12) that propose connections between strand 5A of the donor subunit and strand 4A embedded in the acceptor, all connecting residues between residues 340 to 348 (H1) and 340 to 352 (H2) were provided full torsional freedom during refinement.

The selection of polymers was performed manually by visual inspection of micrographs, followed by automatic thresholding and excision of regions of interest from the individual polymer images. Where a region of interest contained more than one chain, the image was postprocessed to remove density not related to the polymer of interest. Starting models of each polymer configuration at an appropriate length were generated by permutation of a seed dimer structure according to the number of subunits in an oligomer. The PyRosetta application programming interface (20) was then used, in which the 1-antitrypsinFab4B12 subunits were treated as rigid bodies connected by flexible linker regions; a full-backbone centroid model was used in which each side chain was represented by a single pseudoatom. Following an initial rigid-body step to approximately align the model with the image, loose positional constraints were applied to subunits according to the polymer path determined during the manual selections from the micrographs. Angular relationships with respect to the underlying substrate plane were inferred according to the extent of the orthogonal Fab protrusion observable, from 90 (evidence of increased density along the z axis only) to 0 (full-length protrusion in the XY plane). A necessary simplification, resulting in an implicit minimization of the magnitude of the angular displacement between subunits, was that these would tend to orient away from the underlying carbon substrate. Refinement of these models used an energy term that sought to increase the correlation between the experimental reference image and a 2D projection of the target 3D molecule. Standard stereochemical, repulsive, and attractive terms, and loose positional restraints, were maintained throughout. Iterative refinement proceeded for a minimum of 10 steps of 25 iterations, following which convergence was deemed to have occurred when the root mean square deviation between prerefined and postrefined model was less than 0.05 . The score for a given model-oligomer pair was calculated as the ratio of the best correlation coefficient observed during the optimization of the model against the oligomer relative to the best score observed for any model against that oligomer image.

For each dimer configurationloop-sheet 8 (H1) or 4 (H2), -hairpin (H3), and C-terminal (H4)repeated (1000) rounds of optimization were undertaken from a starting model randomly perturbed by rotation around the dimer axis. Full-atom models were represented as rigid subunits connected by flexible linkers. Optimization (using PyRosetta) involved an alternating sequence of whole-dimer rigid body shift and torsional optimization into the experimental density. The scoring scheme used to steer the process involved default internal stereochemical, attractive, and repulsive terms as well as the correlation of the atomic configuration with the EM density, with relative weighting of these terms progressively adjusted during the iterative procedure. To avoid any contribution of the linker regions to the scores obtained, only the rigid core subunits were used in the calculation of the correlation coefficient with respect to the electron density. The van der Waals scoring term was monitored to exclude models where unresolvable clashes occurred. Structures were visualized using Chimera (51) and PyMOL (Schrdinger Software).

Statistical analyses were performed using Prism 6 software (GraphPad, La Jolla, CA, USA). The significance of the difference in correlation between the 2D projections of the different polymer models and the polymer images in Fig. 4 was determined by a one-way analysis of variance (ANOVA) and Tukeys multiple comparisons test; ***P < 0.001 and ****P < 0.0001. Mean values are reported throughout the text with SD or SEM, as indicated.

Tissue was used with the informed consent of donors and in accordance with local Institutional Review Boards.

Acknowledgments: We are indebted to M. Carroni (now at SciLifeLab) for collection of EM micrographs and training, and we would like to thank N. Lukoyanova and S. Chen at the ISMB Birkbeck EM Laboratory for support, training, and facility access (as well as D. Clare and Y. Chaban, now at eBIC, for antecedent enabling work) and N. Pinotis at the ISMB X-Ray Crystallography Laboratory for logistical support and facility access. We acknowledge the ESRF (Grenoble) for provision of synchrotron radiation facilities, and we would like to thank G. Leonard for assistance in using beamline ID30B; enabling work was performed on beamline I03 at the Diamond Light Source (proposal mx17201), and we would like to thank the staff for facility provision and technical support. The synchrotron SAXS data were collected at beamline P12 operated by EMBL Hamburg at the PETRA III storage ring (DESY, Hamburg, Germany), and we would like to thank M. Graewert and D. Franke for assistance. We acknowledge the contribution to this publication made by the University of Birminghams Human Biomaterials Resource Centre, which has been supported through the Birmingham Science CityExperimental Medicine Network of Excellence project. We acknowledge the use of the UCL Grace High Performance Computing Facility (Grace@UCL) and the UCL Legion High Performance Computing Facility (Legion@UCL), and associated support services, in the completion of this work. Funding: This work was supported by a grant from the Medical Research Council (UK) to D.A.L. (MR/N024842/1, also supporting J.A.I. as RCo-I and B.G. as Co-I) and the NIHR UCLH Biomedical Research Centre. D.A.L. is an NIHR Senior Investigator. S.V.F. was the recipient of an EPSRC/GSK CASE studentship. E.L.K.E. was the recipient of a Wellcome Trust Biomedical Research Studentship to the ISMB. B.G. was supported for this work by a Wellcome Trust Intermediate Clinical Fellowship and is currently supported by the NIHR Leicester Biomedical Research Centre. This work was funded, in part, by an Alpha-1 Foundation grant to J.A.I. The equipment used at the ISMB/Birkbeck EM Laboratory was funded by the Wellcome Trust (grants 101488 and 058736). Author contributions: S.V.F., E.L.K.E., A.R., and B.G. collected EM data. J.A.I., E.L.K.E., S.V.F., B.G., E.V.O., and M.B. analyzed EM data. A.M.J. and J.A.I. collected and analyzed crystallography data. A.M.J. and J.A.I. collected and analyzed SAXS data. E.L.K.E., S.V.F., I.A., and J.A.I. collected and analyzed biochemical data. J.A.I. performed modeling and wrote the computer code. S.V.F., E.L.K.E., N.H.-C., A.M.J., I.A., A.R., E.M., and J.A.I. prepared reagents. S.T.R., G.M.R., and D.H.A. provided reagents. E.M. provided advice and training. J.A.I., E.V.O., B.G., and A.R. supervised data collection and analysis. J.A.I., D.A.L., and E.V.O. supervised the project. J.A.I., S.V.F., E.L.K.E., and D.A.L. drafted the manuscript. All authors contributed to and approved the final manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. The Dim60, Dim90, and Dim60H maps have been deposited in the EMDB with accessions EMD-4632, EMD-4631, and EMD-4620. The crystal structure of Fab4B12 has been deposited as PDB accession 6QU9. Additional data related to this paper may be requested from the authors.

View original post here:
The structural basis for Z 1-antitrypsin polymerization in the liver - Science Advances

These Enzyme-Mimicking Polymers May Have Helped Start Life on Earth – SciTechDaily

The micrograph shows uniform nanoparticles under 10nm in diameter. Credit: Tony Z. Jia, ELSI

Earth-Life Science Institute scientists find that small highly branched polymers that may have formed spontaneously on early Earth can mimic modern biological protein enzyme function. These simple catalytic structures may have helped jump start the origins of life.

Most effort in origins of life research is focused on understanding the prebiotic formation of biological building blocks. However, it is possible early biological evolution relied on different chemical structures and processes, and these were replaced gradually over time by eons of evolution. Recently, chemists Irena Mamajanov, Melina Caudan and Tony Jia at the Earth-Life Science Institute (ELSI) in Japan borrowed ideas from polymer science, drug delivery, and biomimicry to explore this possibility. Surprisingly, they found that even small highly branched polymers could serve as effective catalysts, and these may have helped life get started.

In modern biology, coded protein enzymes do most of the catalytic work in cells. These enzymes are made up of linear polymers of amino acids, which fold up and double-back on themselves to form fixed three-dimensional shapes. These preformed shapes allow them to interact very specifically with the chemicals whose reactions they catalyze. Catalysts help reactions occur much more quickly than they would otherwise, but dont get consumed in the reaction themselves, so a single catalyst molecule can help the same reaction happen many times. In these three-dimensional folded states, most of the structure of the catalyst doesnt directly interact with the chemicals it acts on, and just helps the enzyme structure keep its shape.

Metal sulfide enzymes could have originated from globular metal-sulfide/hyperbranched polymer particles. Credit: Irena Mamajanov, ELSI

In the present work, ELSI researchers studied hyperbranched polymers tree-like structures with a high degree and density of branching which are intrinsically globular without the need for informed folding which is required for modern enzymes. Hyperbranched polymers, like enzymes, are capable of positioning catalysts and reagents, and modulating local chemistry in precise ways.

Most effort in origins of life research is focused on understanding the prebiotic formation of modern biological structures and building blocks. The logic is that these compounds exist now, and thus understanding how they could be made in the environment might help explain how they came to be. However, we only know of one example of life, and we know that life is constantly evolving, meaning that only the most successful variants of organisms survive. Thus it may be reasonable to assume modern organisms may not be very similar to the first organisms, and it is possible prebiotic chemistry and early biological evolution relied on different chemical structures and processes than modern biology to reproduce itself. As an analogy with technological evolution, early cathode-ray TV sets performed more or less the same function as modern high definition displays, but they are fundamentally different technologies. One technology led to the creation of the other in some ways, but it was not necessarily the logical and direct precursor of the other.

If this kind of scaffolding model for biochemical evolution is true, the question becomes what sort of simpler structures, besides those used in contemporary biological systems, might have helped carry out the same sorts of catalytic functions modern life requires? Mamajanov and her team reasoned that hyperbranched polymers might be good candidates.

The team synthesized some of the hyperbranched polymers they studied from chemicals that could reasonably be expected to have been present on early Earth before life began. The team then showed that these polymers could bind small naturally occurring inorganic clusters of atoms known as zinc sulfide nanoparticles. Such nanoparticles are known to be unusually catalytic on their own.

As lead scientist Mamajanov comments, We tried two different types of hyperbranched polymer scaffolds in this study. To make them work, all we needed to do was to mix a zinc chloride solution and a solution of polymer, then add sodium sulfide, and voila, we obtained a stable and effective nanoparticle-based catalyst.

The teams next challenge was to demonstrate that these hyperbranched polymer-nanoparticle hybrids could actually do something interesting and catalytic. They found that these metal sulfide doped polymers that degrade small molecules were especially active in the presence of light, in some cases they catalyzed the reaction by as much as a factor of 20. As Mamajanov says, So far we have only explored two possible scaffolds and only one dopant. Undoubtedly there are many, many more examples of this remaining to be discovered.

The researchers further noted this chemistry may be relevant to an origins of life model known as the Zinc World. According to this model, the first metabolism was driven by photochemical reactions catalyzed by zinc sulfide minerals. They think that with some modifications, such hyperbranched scaffolds could be adjusted to study analogs of iron or molybdenum-containing protein enzymes, including important ones involved in modern biological nitrogen fixation. Mamajanov says, The other question this raises is, assuming life or pre-life used this kind of scaffolding process, why did life ultimately settle upon enzymes? Is there an advantage to using linear polymers over branched ones? How, when and why did this transition occur?

Reference: Protoenzymes: The Case of Hyperbranched Polymer-Scaffolded ZnS Nanocrystals by Irena Mamajanov, Melina Caudan and Tony Z. Jia, 13 August 2020, Life.DOI: 10.3390/life10080150

Excerpt from:
These Enzyme-Mimicking Polymers May Have Helped Start Life on Earth - SciTechDaily

Solution NMR readily reveals distinct structural folds and interactions in doubly 13C- and 19F-labeled RNAs – Science Advances

Abstract

RNAs form critical components of biological processes implicated in human diseases, making them attractive for small-molecule therapeutics. Expanding the sites accessible to nuclear magnetic resonance (NMR) spectroscopy will provide atomic-level insights into RNA interactions. Here, we present an efficient strategy to introduce 19F-13C spin pairs into RNA by using a 5-fluorouridine-5-triphosphate and T7 RNA polymerasebased in vitro transcription. Incorporating the 19F-13C label in two model RNAs produces linewidths that are twice as sharp as the commonly used 1H-13C spin pair. Furthermore, the high sensitivity of the 19F nucleus allows for clear delineation of helical and nonhelical regions as well as GU wobble and Watson-Crick base pairs. Last, the 19F-13C label enables rapid identification of a small-molecule binding pocket within human hepatitis B virus encapsidation signal epsilon (hHBV ) RNA. We anticipate that the methods described herein will expand the size limitations of RNA NMR and aid with RNA-drug discovery efforts.

RNAs form essential regulators of biological processes and are implicated in human diseases, making them attractive therapeutic targets (1, 2). This extensive functional diversity of RNA derives from its ability to fold into complex three-dimensional (3D) structures. Yet, the number of noncoding RNA sequences far outstrips the number of solved RNA structures deposited in the Protein Data Bank (PDB) necessary for understanding RNA function (3, 4). In comparison to x-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy provides high-resolution structural and dynamic information in solution, making it an ideal biophysical technique to characterize the interactions between target RNAs and small drug-like molecules. Nonetheless, NMR studies of RNA suffer from poor spectral resolution and sensitivity, both of which worsen with increasing molecular weight. In contrast with proteins, which are made up of 20 unique amino acid building blocks, RNAs are composed of only four aromatic residues. These four resonate over a very narrow chemical shift region. At high magnetic field strengths, sizable transverse relaxation rates (R2) cause line broadening and thereby decrease both sensitivity and resolution. These problems are further exacerbated with increasing molecular weight. To overcome these limitations of RNA, novel labeling strategies that expand the number of NMR probes beyond the traditional nonradioactive and stable isotope labels such as hydrogen-1 (1H), phosphorus-31 (31P), carbon-13 (13C), hydrogen-2 (2H), and nitrogen-15 (15N) are needed.

Solution NMR of the magnetically active fluorine-19 (19F) isotope offers clear advantages in the study of RNA structure and conformational changes, which occur upon ligand binding. 19F has high NMR sensitivity (0.83 of 1H) due to a large gyromagnetic ratio that is comparable to 1H (0.94 of 1H), a 100% natural abundance, and ~6 wider chemical shift dispersion than 1H (5, 6). In addition, 19F is also sensitive to changes in its local chemical environment (5, 6). In contrast with other commonly used NMR nuclei (1H/31P/13C/15N), 19F is virtually absent in biological systems, thereby rendering 19F NMR background free. Together, 19F is an attractive probe for incorporation into nucleic acids to study their structure, interactions, and dynamics in solution.

Given its attractive spectroscopic properties, 19F was incorporated into RNA for NMR studies in the 1970s (79). Since then, 19F has been successfully incorporated into DNA and RNA oligonucleotides for NMR analysis and used to probe RNA and DNA structure, conformational exchange, and macromolecular interactions (10, 11). Most of these studies were conducted on short oligonucleotides [~30 nucleotides (nt)] prepared by solid-phase synthesis with only a few residues 19F labeled. Even when 2-fluoroadenine (2FA) was incorporated into a 73-nt (~22 kDa) guanine-sensing riboswitch, only 4 of the 16 signals could be assigned. This 2FA study hinted at the limitations of 19F NMR for large RNAs (12). Despite its attractiveness, the application of 19F NMR to study RNA has remained limited because the large 19F chemical shift anisotropy (CSA) contributes substantially to line broadening as a function of increasing molecular weight and polarizing magnetic fields.

To circumvent this limitation, Boeszoermenyi et al. (13) recently showed that direct coupling of 19F to 13C allowed for cancelation of CSA and dipole-dipole (DD) interactions. By incorporating this 19F-13C spin pair into aromatic moieties of proteins and a 16-nt DNA, they showed that a transverse relaxation optimized spectroscopy (TROSY) version of a 19F-13C heteronuclear single-quantum coherence (HSQC) (13) provided improved spectroscopic properties. These exciting results hinted that installing 13C-19F pairs in RNA nucleobases should also lead to improved spectroscopic features.

However, there were no facile methods to readily incorporate 19F-13C spin pairs into RNA. To overcome this technical obstacle of incorporating fluorinated aromatic moieties into RNA, we provide here a straightforward chemoenzymatic synthesis of [5-19F, 5-13C]-uridine 5-triphosphate (5FUTP) for incorporation into RNA (Fig. 1) using phage T7 RNA polymerasebased in vitro transcription. To showcase its versatility, we transcribed two model RNAs using these labels: the 30-nt (~10-kDa) human immunodeficiency type 2 transactivation response (HIV-2 TAR) element (6, 14) and the 61-nt (~20-kDa) human hepatitis B virus encapsidation signal epsilon (hHBV ) element (Fig. 1) (15, 16).

(A) Model RNA systems: HIV-2 TAR (30 nt, 10 kDA) and hHBV (61 nt, 20 kDa). Residues highlighted in green are labeled with 19F-13C5-fluorouridine (5FU) shown in the box. Green circle, 19F; brown circle, 13C; blue circle, 2H. (B) Theoretical 19F,13C spectrum showing the four observable magnetization components of the 19F-13C spin pair as well as the decoupled resonance that has the average chemical shift and linewidths of all four components.

With our new labels, we demonstrate several advantages for RNA NMR studies, including improved resolution and increased sensitivity to ligand binding. We show that a 19F substitution is structurally nonperturbing and has an optimal TROSY effect at readily available magnetic field strength of 600 MHz (1H frequency), in agreement with previous studies (13). Unlike C-H spectra, the resolving power of 19F allows for easy identification of RNA structural elements in helical and nonhelical regions, as well as in wobble GU base-paired regions. With protons substituted with deuterium and depending on the molecular weight of the RNA, the TROSY effect in the 19F-13C pair can reduce the 13C linewidth by a factor >2, compared to a 13C-1H pair, and the 19F-13C label enables detection of a small-molecule binding to a 20-kDa RNA. Thus, our 19F-13C label overcomes several of the limitations in sensitivity and resolution facing RNA NMR studies with the potential to extend the application of solution NMR measurements to largermolecular weight systems in vivo.

Given the potential utility but unavailability of 19F-13C spin pairs in aromatic moieties of RNA, we first sought to develop a reliable and scalable method that combined chemical synthesis with enzymatic coupling in almost quantitative yields. This chemoenzymatic approach is a versatile method that combines chemical synthesis of atom-specific labeled nucleobases with commercially available selectively labeled ribose using enzymes from the pentose phosphate pathways (PPPs) (3, 4). To this end, we adapted the method of Santalucia et al. (17) and Kreutz and co-workers (18) and first synthesized the uracil base (U) specifically labeled with 13C at the aromatic C5 and 15N at the N1 and N3 positions (Fig. 1). This synthesis is readily accomplished using unlabeled potassium cyanide, 13C-labeled bromoacetic acid, and 15N-labeled urea. The resulting U was converted to 5-fluorouracil (5FU) by direct fluorination with Selectfluor (19, 20). This strategy allows for efficient and cost-effective synthesis of the 5FU base with high yield of ~63%. In addition, to remove unwanted scalar coupling interactions (14), we selectively deuterated H6 (~95%) using well-established methods (21). Next, using enzymes from the PPP, we coupled 5FU to D-ribose labeled at the C1 position to give 5FUTP (Fig. 1) (3, 22) with an overall yield of ~50%. This site-specifically labeled 5FUTP was then used for DNA templatedirected T7 RNA polymerasebased in vitro transcription with overall yields comparable to those obtained with unmodified nucleotides.

Fluorine substitution at uridine C5 is thought to reduce the imino N3 pKa values by about 1.7 to 1.8 units with respect to their protonated analogs (23), leading to extensive line broadening of imino protons in 5FU RNAs (24). To determine if incorporation of 5-fluorouridine alters the folding thermodynamics of our RNAs (Fig. 1), we recorded ultraviolet (UV) thermal melting profiles for both wild-type (WT) and 5FU HIV-2 TAR and hHBV (table S1). Both WT and 5FU RNAs showed a single transition in their melting profiles, consistent with unimolecular folding (25). WT and 5FU HIV-2 TAR had melting temperatures within ~1 K of each other (WT: Tm = 355.6 0.5 K; 5FU: Tm = 357.4 0.4 K). Similarly, 5FU hHBV had a melting temperature of 327.1 0.1 K, which is within the error of the melting temperature of WT. Together, these results suggest that 5FU does not markedly alter the thermodynamic stability of HIV-2 TAR and hHBV , in accordance with previous studies of 5FU RNAs (6, 7, 24).

The linewidth for aromatic 19F-13C spin pair (Fig. 1B) is expected to become dominated by the CSA mechanism with increasing polarizing magnetic fields (13). To estimate this effect for 5FU, we calculated the chemical shielding tensor (CST) for 19F-13C spin pairs using density functional theory (DFT) methods (tables S2 and S3) (2629). Using these CST parameters and relaxation theory implemented in the Spinach library (30), we computed the TROSY R2 relaxation rates for the 19F-13C pair of 5FU (13CF and 19FC) and the 13C-1H pair of U (13CH and 1HC) (Fig. 2) assuming isotropic tumbling. The R2 of fluorinated carbon (13CF) TROSY resonance is ~2 times smaller than that of the protonated carbon (13CH) at their respective minima of ~600 and ~950 MHz, respectively, for all molecular weights greater than 5 ns (Fig. 2A). Compared with the decoupled resonance, the R2 of the 13CF TROSY resonance is ~3 times smaller than that of protonated carbon for all molecular weights greater than 5 ns (fig. S1). Although the TROSY effect is quite small for 19F nuclei bonded to 13C (19FC) and for 1H nuclei bonded to 13C (1HC), the R2 of 19FC is three times bigger than that of 1HC (fig. S2). Thus, sensitive, high-resolution NMR spectra for the 19F-13C pair of 5FU in RNAs can be obtained by selective detection of the 13CF TROSY resonance as demonstrated for the 19F-13C pair in aromatic amino acids (13).

(A) Theoretical curves showing the expected R2 values for the TROSY component of 13CF (cyan) and 13CH (magenta) as a function of magnetic field strength (relative to 1H Larmor frequency) for c = 6 ns (dashed line), 25 ns (solid line), and 100 ns (dotted line) at 25C. (B) Theoretical R2 values taken at the commercially available magnetic field strength closest to the maximum TROSY effect (13CH = 950 MHz; 13CF = 600 MHz) for c = 6, 25, and 100 ns at 25C.

To validate these theoretical TROSY predictions experimentally, we adapted the 1H-15N TROSY experiment (3, 31, 32) to perform a 19F-13C TROSY experiment on a ~10-kDa 5FU HIV-2 TAR and on a ~20-kDa 5FU hHBV RNAs (Fig. 3). Because of hardware limitations, we could only run experiments that start with and end on the magnetization of 19F, with the 13C frequency encoded in the indirect dimension. That is, we used the so-called 19F-detected out-and-back method, rather than the more sensitive 19F-excited out-and-stay 13C-detected experiment (13). We collected spectra for each of the four components (Fig. 1B) of the 19F-13C (1H-13C) correlations for both 5FU (WT) HIV-2 TAR and hHBV (figs. S3 to S6).

(A) 19F-13C TROSY of 5FU HIV-2 TAR. (B) 1H-13C TROSY of WT HIV-2 TAR. (C) 19F-13C TROSY of 5FU hHBV . (D) 1H-13C TROSY of WT hHBV . The assignments of 5FU and WT TAR-2 are indicated, as well as the arbitrary peak numbers for 5FU and WT hHBV . The same window size was used in all four spectra to aid in comparison. Gray dashed boxes indicate signals from helical, GU, and nonhelical regions. For (D), the black box indicates a zoom-in view of poorly resolved signals.

Both HIV-2 TAR and hHBV show ~6-fold improvement in chemical shift dispersion of 19F compared with 1H and similar dispersion in 13C (Fig. 3). All six correlations of HIV-2 TAR are well resolved for both 1H-13C and 19F-13C correlations and are in agreement with previously published 1H-19F and 1H-13C RNA spectra (6, 24, 33). Nonetheless, even for this small RNA, the 19F-13C spin pair markedly improves the spectral resolution. 5FU HIV-2 TAR shows a chemical shift dispersion of 2.6 parts per million (ppm) in the 19F dimension and only 0.5 ppm in the 1H dimension for WT (Fig. 3, A and B). Replacing 1H with 19F at C5 results in a slight reduction in chemical shift dispersion along the 13C dimension from 2.1 to 1.5 ppm, although this effect is much smaller than the gain in resolution for 19F over 1H (Fig. 3, A and B). Similarly, the 19F resonances of 5FU hHBV are spread over 4.5 ppm, whereas the WT 1H signals resonate over a narrow 0.8-ppm window. This represents 5.7 times better dispersion (Fig. 3, C and D). Again, substitution of 1H with 19F at C5 results in a reduction in chemical shift dispersion of 2.3 to 1.7 ppm along the 13C dimension for hHBV (Fig. 3, C and D). Of the anticipated 18 signals for hHBV , 16 are resolved for WT and 17 for 5FU. Together, these results demonstrate the marked gain in resolution afforded by the 19F-13C spin pair in 5FU RNAs compared with the 1H-13C spin pair in WT.

In addition to this considerable gain in resolution, 19F-13C labeling confers favorable 13CF TROSY linewidths. We compared the relative linewidths for both RNAs, which we assume to be Lorentzian (Figs. 4 and 5). For 5FU HIV-2 TAR, the 13CF TROSY linewidths were 1.5 times sharper on average than the anti-TROSY components, with a range of 1.3 to 1.7 (Fig. 4A). For WT HIV-2 TAR, the 13CH TROSY component was 3.7-fold narrower than the anti-TROSY component (range, 1.6 to 8.7) (Fig. 4B). Similarly, for 5FU HBV , the 13CF TROSY linewidths were 2.2-fold narrower than the anti-TROSY ones over a range of 1.5 to 3.3 (Fig. 4C). For WT HBV , only 5 of the 16 13CH anti-TROSY signals were observed and were 2.6 times broader than the TROSY resonances (range, 2.0 to 3.3) (Fig. 4D). As predicted from our simulations (Fig. 2), the 13CF TROSY component relaxes ~2 times slower than the 13CH TROSY component in both HIV-2 TAR and hHBV . The 19FC TROSY linewidths for 5FU HIV-2 TAR and 5FU HBV were 1.4 (range, 1.3 to 1.6) and 1.6 (range, 1.1 to 2.5) times narrower than the anti-TROSY components, respectively (Fig. 5, A and C). For both WT HIV-2 and WT HBV , the 1HC TROSY and anti-TROSY linewidths were comparable (Fig. 5, B and D). Consistent with our simulations, the 19FC TROSY linewidth is ~2-fold larger than that of the 1HC component for both RNAs (fig. S3). Again, this is in line with the poor performance of 19F NMR experiments due to the large CSA-induced relaxation. Thus, the incorporation of the 13C label mitigates the deleterious relaxation of the 19F nuclei within a 19F-13C spin pair. However, even for medium-sized RNAs ~20 kDa, 19F TROSY detection of the 19F-13C spin pair still outperforms that for a 1H-13C spin pair. Therefore, to reap the maximum benefits of this label, it is advantageous to monitor the 13C nuclei rather than the 19F nuclei. We anticipate that the 19F-13C TROSY effect will continue to scale with molecular weight for RNAs as was seen recently with proteins (13) and our simulations.

Quantification of TROSY (black) and anti-TROSY (gray) (A) 13CF and (B) 13CH linewidths for HIV-2 TAR. Note that U40 was not observed in the anti-TROSY spectrum of WT HIV-2 TAR (B). In addition, the anti-TROSY component of U38 in (B) was 97 Hz and truncated to fit in the plot. Quantification of TROSY (black) and anti-TROSY (gray) (C) 13CF and (D) 13CH linewidths for hHBV . Note that peaks 1 through 11 in WT hHBV were not observed in the anti-TROSY spectrum (D). The average SD in Hz is shown for the TROSY and anti-TROSY components in each plot. Peak numbers and assignments are given in Fig. 3.

Quantification of TROSY (black) and anti-TROSY (gray) (A) 19FC and (B) 1HC linewidths for HIV-2 TAR. Quantification of TROSY (black) and anti-TROSY (gray) (C) 19FC and (D) 1HC linewidths for hHBV . The average SD in Hz is shown for the TROSY and anti-TROSY components in each plot. Peak numbers and assignments are given in Fig. 3.

In addition to these gains in resolution and favorable linewidths, previous work suggested the 19F chemical shifts serve as sensitive markers of RNA secondary structure (10, 11). For example, GU wobble base pairs are deshielded and shifted by ~4.5 ppm to lower fields compared with AUs within Watson-Crick geometries (34). On the basis of these earlier observations, we hypothesized that 19F-13C correlations of HIV-2 TAR and hHBV can be grouped on the basis of whether or not they are in helical, nonhelical, or GU base-paired regions of the RNA. As a positive control, we note that nonhelical U23, U25, and U31 in 5FU HIV-2 TAR resonate around ~165.5 ppm in 19F and ~142.5 ppm in 13C (Fig. 3A). On the other hand, the helical residues U38, U40, and U42 of 5FU HIV-2 TAR are centered around ~167.5 ppm in 19F and ~141.5 ppm in 13C in line with previous observations for 19F-1H samples of HIV-2 TAR (6) and tRNA (34). Comparison of the equivalent 1H-13C spectra shown in Fig. 3B indicates that even though helical residues cannot be distinguished from nonhelical residues in the 1H dimension, nonhelical residues can be differentiated from helical base pairs in the 13C dimension for a 1H-13C spin pair.

The 17 19F-13C resolved correlations of 5FU hHBV show similar clustering as 5FU HIV-2 TAR (Fig. 3C). For instance, the six most intense signals are centered around ~165.5 ppm in 19F and ~142.5 ppm in 13C where the nonhelical signals of HIV-2 TAR are located. On the basis of the secondary structure of hHBV (Fig. 1A), these six intense peaks belong to the six nonhelical uridines (U15, U17, U18, U32, U34, and U43) (Fig. 3C). A seventh peak is also seen in this region, most likely due to U48 or U49, both of which flank the bulge region. The weaker peaks are from the helical portions of hHBV because these signals located at ~167.5 ppm in 19F and ~141.5 ppm in 13C resonate in the same region as the helical signals from 5FU HIV-2 TAR (6). HIV-2 TAR contains only Watson-Crick base pairs, and so, signals in this region of the hHBV spectrum correspond to AUs (U3, U7, U38, U39, U47, U48, U49, and U56). Of the eight anticipated peaks belonging to helical residues, only seven are observed, further suggesting that U48 or U49 may fray and resonate within the nonhelical region. Unlike HIV-2 TAR, hHBV has four noncanonical GU wobble base pairs embedded within helical regions. The three signals resonating in a distinct region centered at ~163.5 ppm in 19F and ~142.0 ppm in 13C are from the four GUs (U4, U9, U12, and U25). This is in line with previous observations of GU base pairs in tRNA (34). Peak 5 (Fig. 3C) is most likely two GUs that are overlapped. Again, comparison of the equivalent 1H-13C spectra shown in Fig. 3D indicates that even though helical residues can be distinguished from nonhelical residues, nonhelical residues cannot be differentiated from GU base pairs for a 1H-13C spin pair. Thus, the spectroscopic discrimination of helical and nonhelical regions as well as GU wobble and Watson-Crick base pairs in RNA structures becomes possible with the high sensitivity of 19F to the local chemical environment of a 19F-13C spin pair. This distinguishing feature is not readily available for a 1H-13C spin pair.

Ligand-based (35) and protein-observed (36) 19F NMR screening methods are important for identifying small drug-like molecules that act as protein inhibitors. Although most work to date has focused on proteins, recent work suggests that RNAs also contain specific binding pockets that could be easily distinguished and targeted with small molecules (1, 2). hHBV is at the center of the viral replication cycle since the first two residues in its internal bulge are used by the virus to initiate synthesis of the minus-strand DNA. Thus, targeting this RNA structure will notably expand the repertoire of HBV drug targets beyond the current focus on viral proteins (37). Given 19F chemical shifts serve as sensitive markers of RNA secondary structure, we reasoned that 19F-13C spectroscopy will likely pinpoint loop over helical region binders. Rather satisfyingly, we found a small molecule that specifically binds a subset of nonhelical residues in 5FU hHBV (Fig. 6). Overlay of the full spectra of 5FU hHBV with and without the small-molecule shows chemical shift perturbations (CSPs) (38) predominantly confined to nonhelical regions (Fig. 6). Within the nonhelical residues, only four of the seven signals shift with the addition of the small molecule, which suggests selectivity for certain nonhelical residues over others (Fig. 6). We propose a model whereby our small molecule binds hHBV in the 6-nt bulge formed between C14 and C19, but not anywhere else in the RNA. The minor CSPs seen in the helical portion of the 5FU hHBV spectra are from U residues flanking the 6-nt bulge, specifically U47, U48, and U49. Last, the CSP seen in the GU portion is from U12, which also flanks our proposed binding pocket.

(A) Overlay of 19F-13C-TROSY spectra for hHBV without (black) and with small molecule (SM, magenta). (B) Zoom-in of nonhelical residues showing chemical shift perturbations (CSPs) upon addition of SM. (C) Quantification of the CSPs upon addition of SM. The average (Ave) CSP is shown as a dashed line.

19F is an attractive spectroscopic probe to study biomolecular structure, interactions, and dynamics in solution. Nonetheless, a number of obstacles must be overcome for it to become widely useful. First, we must be able to easily install the label into any biopolymer. While incorporation of fluorinated aromatic amino acids and nucleobases into proteins and nucleic acids is usually not a technical challenge, until now, synthesis of carbon-labeled and fluorinated nucleobase to create a 19F-13C spin pair has been problematic for RNA. Here, we present a facile strategy to incorporate 19F-13C 5-fluorouridine into RNA using in vitro transcription for characterization of small-molecule binding interactions by NMR. Our protocol to prepare 19F-13C 5-fluorouridine-5-triphosphate (5FUTP) involves chemically synthesizing 5FU and then enzymatically coupling it to 13C-labeled D-ribose. Our synthetic strategy can be generalized to selectively place labels in the pyrimidine nucleobase at either 15N1, 15N3, 13C2, 13C4, 13C5, or 13C6 or any combinations thereof, and then enzymatically couple ribose labeled at either 13C1, 13C2, 13C3, 13C4, or 13C5 or any of the preceding ribose combinations to the base. The resulting isotopically enriched 5FUTP is then readily incorporated into any desired RNA using DNA templatedirected T7 RNA polymerasebased in vitro transcription. This enzymatic approach, unlike solid-phase RNA synthesis, is not limited to RNAs less than 70 nt or to nucleotides made of labeled nucleobase coupled to unlabeled ribose. Although fluorine substitution at C5 in pyrimidines strongly affects the shielding of the nearby H6, it has little effect on the anomeric H1 chemical shifts (24). We therefore anticipate that our unique strategy that combines ribose 13C1 label with 19F-13C uracil should allow the transfer of assignments from unmodified RNAs to 5-fluoropyrimidinesubstituted RNAs made with our labels.

Second, because of van der Waals radii comparable to that of 1H, 19F is considered minimally perturbing when incorporated into biopolymers (24). Although fluorine substitution in 5FU RNAs leads to sizeable line broadening of the imino protons, thermal melting analysis indicates that the 5FU RNAs are thermodynamically equivalent to the nonfluorinated RNAs (6, 7, 24). In future work, it will be important to systematically investigate the effect of fluorine substitution not only on thermodynamic stability but also on folding kinetics of RNAs. Insights derived from solving, at high-resolution, the 3D structures of fluorinated and nonfluorinated RNA could potentially guide the use of these spin pairs to spy on the biological processes within the cell.

Third, despite its huge potential, nucleic acid observed 19F (NOF) NMR has remained underused because the large 19F CSA induces severe line broadening at high molecular weights and magnetic fields. Using DFT calculations of CST parameters, we show that an optimal 19F-13C TROSY enhancement occurs at 600-MHz 1H frequency to enable slow relaxation of 13C bonded to 19F. Our RNAs show an enhanced 19F-13C TROSY effect with increasing molecular weight and 13C linewidths that are twice as sharp as seen with traditional 1H-13C spin pairs. Thus, nucleobase 19F -13C TROSY will expand the applicability of RNA NMR beyond the ~30-nt (~10-kDa) average.

Fourth, the RNA secondary structure is made up of segments of nucleotides that are either base paired or not. The arrangements of base-paired with unpaired regions can leave distinct NMR chemical shift signatures that can provide low-resolution structural information with minimum expenditure of time and cost. For example, the H5 of a pyrimidine is sensitive to the nature of the residue that comes before it within a triplet of canonical Watson-Crick AU and GC base pairs. When the A in a central UA base pair is substituted by a G, the H5 resonance shifts downfield because of the formation of the GU base pair. Yet, an analysis of the commonly used 1H-13C probes fails to unambiguously separate nonhelical residues from helical ones (39). In contrast, the 19F-13C labels resonate in distinct chemical shift regions based on their secondary structure. For instance, nonhelical residues resonate in spectral regions distinct from helical ones, which are further separated into GU wobble and AU Watson-Crick base-paired regions. The ability to differentiate between different structural features in an RNA simply based on chemical shifts removes the need for the time-consuming and laborious process of resonance assignment.

Given the ubiquity and functional importance of GU wobble base pairs (40) in all kingdoms of life (41), the ability to easily distinguish GU from canonical GC and AU base pairs has several important implications. For instance, in the minor groove, a GU base pair presents a distinctive exocyclic amino group that is unpaired and the Us C1 atom rotates counterclockwise compared with the Cs C1 atom in a canonical GC base pair. This region serves as an important site for protein-RNA interactions. Similarly, in the major groove, G N7 and O6 together with U O4 create an area of intense negative electrostatic potential conducive for binding divalent metal ions. Furthermore, all canonical Watson-Crick base pairs are circumscribed by ~10.6- diameters formed by a line connecting their C1-C1 centers. These ribose-connected centers are superimposable with almost perfect alignment. In contrast, a GU base pair is misaligned counterclockwise by a residual twist of +14, and an UG base pair is misaligned clockwise by a residual twist of 11 (42). That is, the GU base pair is not isosteric with canonical Watson-Crick pairs. Rather, these wobble base pairs either overtwist or undertwist the RNA double helix. 19F-13C labels might aid in elucidating the structural and dynamic basis of these twists depending on the identity of the base pairs neighboring the wobble pair. We, therefore, anticipate that our new label could potentially open up avenues for probing GU wobble pairs in various structural contexts outlined above, such as 19F-13Clabeled RNA-protein interactions and metalloribozyme-ion interactions.

In summary, the labeling technologies presented here open the door for characterizing the structure, dynamics, and interactions of RNA, RNA-RNA, RNA-DNA, RNA-protein, and RNA-drug complexes in vitro and in vivo for complexes as large as 100 kDa or higher with the appropriate fluorine NMR hardware. This 19F -13C labeling approach will also enable correlating chemical shiftstructure relationships to aid chemical shiftcentered probing of RNA structure, dynamics, and interactions. We envision that the 19F-13C spin pair, by providing a clear demarcation of RNA structural elements, may facilitate the discovery and identification of small drug-like molecules that target RNA binding pockets in vitro and in vivo.

The full description of Materials and Methods can be found in the Supplementary Materials. A brief summary is provided here.

[5-19F, 5-13C, 6-2H] and [5-19F, 5-13C, 6-2H, 1,3-15 N2]5FU were synthesized from unlabeled potassium cyanide, 13C-labeled bromoacetic acid, and 15N-labeled urea as described elsewhere (3, 17, 18, 24). The resulting uracil was converted to 5FU by direct fluorination with Selectfluor and deuteration (1921). [1,5-13C2, 5-19F, 6-2H]5FUTP and [1,5-13C2, 5-19F, 6-2H, 1,3-15 N2]5FUTP were synthesized using PPP enzymes (3, 4, 6, 22, 24, 43).

All RNAs were prepared by in vitro transcription and purified as previously described (3, 4). RNA concentrations were approximated by UV absorbance using extinction coefficients of 387.5 mM1 cm1 for HIV-2 TAR and 768.3 mM1 cm1 for hHBV . All RNA concentrations were >0.5 mM (~0.3 ml) in Shigemi NMR tubes.

We collected thermal melting profiles for both WT and 5FU-substituted HIV-2 TAR and hHBV as previously described (24, 25).

Calculations were carried out on 1-methyl-uracil and 5-fluoro-1-methyl uracil using optimized geometries (2628). All calculations used the Gaussian-16 program (29). Details are provided in the Supplementary Materials.

All 19F-13C TROSY spectra were collected at 298 K using a Bruker 600 MHz Avance III spectrometer equipped with TXI (triple resonance inverse) and BBI (broad band inverse) probes. All data were processed with Brukers Topspin 4.0.7 software. 1H chemical shifts were internally referenced to DSS (0.00 ppm), with the 13C chemical shifts referenced indirectly using the gyromagnetic ratios of 13C/1H (44). The 19F chemical shifts were internally referenced to trifluoroacetic acid (75.51 ppm) (45). Experiments showing each component of the 1H/19F-13C correlations were adapted from a sensitivity- and gradient-enhanced 1H-15N TROSY used for proteins (31).

M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Mennucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, D. Williams-Young, F. Ding, F. Lipparini, F. Egidi, J. Goings, B. Peng, A. Petrone, T. Henderson, D. Ranasinghe, V. G. Zakrzewski, J. Gao, N. Rega, G. Zheng, W. Liang, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, K. Throssell, J. Montgomery, J. A., J. E. Peralta, F. Ogliaro, M. J. Bearpark, J. J. Heyd, E. N. Brothers, K. N. Kudin, V. N. Staroverov, T. A. Keith, R. Kobayashi, J. Normand, K. Raghavachari, A. P. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, J. M. Millam, M. Klene, C. Adamo, R. Cammi, J. W. Ochterski, R. L. Martin, K. Morokuma, O. Farkas, J. B. Foresman, D. J. Fox, Gaussian 16, Revision A.03 (2016).

Acknowledgments: We thank P. Deshong, J. Kahn, L.-X. Wang, and P. Y. Zavalij (University of Maryland) and H. Arthanari (Harvard University) for the helpful comments. We thank S. Bentz and D. Oh for help in preparing samples for thermal melt analysis, and M. Svirydava for help in analyzing samples by mass spectrometry. Funding: We thank the National Science Foundation (DBI1040158 to T.K.D. for NMR instrumentation) and the NIH (U54AI50470 to T.K.D. and D.A.C.) for support. Author contributions: T.K.D.: conceptualization. T.K.D. and O.B.B.: implementation of the project and manuscript preparation. G.Z., B.C., K.M.T., and T.K.D.: synthesis of 5FU. O.B.: synthesis of 5FUTP, RNA synthesis, and thermal melt analysis. T.K.D., K.M.T., B.C., and O.B.B.: TROSY measurements. O.B.B.: small-molecule titration. D.A.C.: DFT calculations. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors

Originally posted here:
Solution NMR readily reveals distinct structural folds and interactions in doubly 13C- and 19F-labeled RNAs - Science Advances

After COVID-19, capital will be different, stronger and more conscious – SmartCompany.com.au

VentureCrowd executive director and Maarbani Consulting managing director. Source: supplied.

Lets talk about sushi. I love the stuff. High protein, low fat, minimalist Japanese perfection. Its great with soy, pickled ginger and a sprinkle of microplastic.

Oh, you didnt know? Let me explain.

The modern lightweight shopping bag was invented by Swedish engineer Sten Gustaf Thulin in the early-1960s (stay with me, the connection is coming). Thulin developed and patented a method of forming a one-piece bag by folding, welding and die-cutting a flat tube of plastic for the packaging company Celloplast.

Nowadays, nearly 1 trillion plastic bags are consumed worldwide every year thats over 1 million per minute. Needless to say, the lifetime value of a plastic bag customer is a rock-solid metric, and a bunch of investors are making a ship-load of cash from this little beauty.

Convenient, cheap, disposable and, as it turns out, delicious.

Get COVID-19 news you can use delivered to your inbox.

Youll also receive special offers from our partners. You can opt-out at any time.

You see, most plastic trash in the oceans flows from land. Once at sea, sunlight, wind and wave action break down plastic waste into small particles called microplastics. They are less than 5 millimetres in length the size of a sesame seed and have been found in municipal drinking water systems, drifting through the air and in the seafood we eat.

Ah, the circle of life.

In just one generation, we went from being plastic-free pause for effect to a level of reliance on plastic that results in 12 million tonnes of plastic entering our oceans every year. Thats a full rubbish truck every minute.

But, whatever. Were all making money, right?

Oh, Thulin. Insert facepalm. As my mother would say, Im not angry, Im just disappointed.

The new reality is that the global investment landscape is changing.

A new generation of investors is awakening and they dont want plastic in their sushi. Backed by the largest intergenerational transfer of wealth in modern history, this group is demanding the opportunity to support companies that fund more sustainable futures and solve real-world problems.

In the post-COVID-19 world, capital will be different, stronger, and more conscious.

Even before the pandemic changed our everyday lives, companies contributing to climate change were being called to account as Australia experienced its worst bushfire season on record.

Investors alarmed at the impact of companies damaging the environment have begun to look at the impact of their own investments, and whether those investments are aligned with their values. When people began to dig a little deeper and uncovered where their money was going, the floodgates opened.

In January, Ethical Super saw its net inflow increase by five timescompared to January of 2019, with the fund citing increased awareness of climate change as the reason behind the rise in growth.

The changes are not just being seen in retail investment.

Recently, over half of Woodside Petroleums investors backed motionsfor the company to commit to hard targets for the reduction of its greenhouse gas emissions. As more people realised that the power to choose is in their hands, the shift towards more ethical investments began.

At the same time, the impact of the pandemic has caused many aspects of globalisation to come to a screeching halt accelerating the pace of transformation for industries across the world. In times like these, innovation flourishes.

Uber, Airbnb and WhatsApp were all founded during the 2009 global financial crisis, underling that some of the biggest disruptive opportunities arise during major economic downturns.

Square Peg co-founder Paul Bassat concurs: Every time theres been a major crisis, weve seen this burst of innovation occur where theres a combination of problems needing to be solved as a result, as well as people having a chance to think differently about their career and their lives.

In the midst of the global pandemic, the Australian venture capital sector actually grew. The KPMG Venture Pulse Q1 2020 report found that investment in Australian startups reached a record high of $US944.7 million ($1,314 billion) in H1 2020.

Clearly VC firms are grasping at the opportunities. But they are not the only ones able to reap the potential benefits.

Changes to Australian legislation in 2017 has seen the creation of investment opportunities for retail investors that were previously only available to high-net-worth individuals or sophisticated investors.

If they meet the criteria, these investors are able to invest up to $10,000 in private companies launching fundraises of up to $5 million; cementing the fact that startup investment is no longer just for VCs and angel investors.

Investors have generally been motivated by two things: the opportunity to back the companies changing the world, and the outsized returns of startup investment.

As we move towards a post-COVID world, investments also need to be good for the planet.

As a new generation of investors increasingly begin to focus on the positive impacts their funding decisions can make on the world, startups will need to prove their social and environmental credentials as well as their ability to disrupt and grow.

When they do that, investors will follow and we can all enjoy sushi again, without the microplastic.

NOW READ: Eco-deodorant, accessible rock climbing and interior design: Meet the entrepreneurs taking part in The Good Incubator

NOW READ: After being made redundant on maternity leave, this founder launched her own watch brand and raised $15,000 in six minutes

Small and medium businesses and startups have never needed credible, independent journalism and information more than now.

Thats our job at SmartCompany: to keep you informed with the news, interviews and analysis you need to manage your way through this unprecedented crisis.

Now, theres a way you can help us keep doing this: by becoming a SmartCompany supporter.

Even a small contribution will help us to keep doing the journalism that keeps Australias entrepreneurs informed.

Visit link:
After COVID-19, capital will be different, stronger and more conscious - SmartCompany.com.au