{"id":1028467,"date":"2024-05-13T02:36:29","date_gmt":"2024-05-13T06:36:29","guid":{"rendered":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/uncategorized\/predicting-equilibrium-distributions-for-molecular-systems-with-deep-learning-nature-com.php"},"modified":"2024-05-13T02:36:29","modified_gmt":"2024-05-13T06:36:29","slug":"predicting-equilibrium-distributions-for-molecular-systems-with-deep-learning-nature-com","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/deep-learning\/predicting-equilibrium-distributions-for-molecular-systems-with-deep-learning-nature-com.php","title":{"rendered":"Predicting equilibrium distributions for molecular systems with deep learning &#8211; Nature.com"},"content":{"rendered":"<p><p>    Deep neural networks have been demonstrated to predict accurate    molecular structures from descriptors ({{{mathcal{D}}}}) for many molecular    systems1,5,6,9,10,11,12. Here, DiG aims    to take one step further to predict not only the most probable    structure but also diverse structures with probabilities under    the equilibrium distribution. To tackle this challenge,    inspired by the heatingannealing paradigm, we break down the    difficulty of this problem into a series of simpler problems.    The heatingannealing paradigm can be viewed as a pair of    reciprocal stochastic processes on the structure space that    simulate the transformation between the system-specific    equilibrium distribution and a system-independent simple    distribution psimple. Following this idea, we    use an explicit diffusion process (forward process; Fig.    1b, orange arrows) that    gradually transforms the target distribution of the molecule    ({q}_{{{{mathcal{D}}}},0}),    as the initial distribution, towards psimple    through a time period . The corresponding reverse    diffusion process then transforms psimple    back to the target distribution ({q}_{{{{mathcal{D}}}},0}). This is the    generation process of DiG (Fig. 1b, blue arrows). The    reverse process is performed by incorporating updates predicted    by deep neural networks from the given ({{{mathcal{D}}}}), which are trained    to match the forward process. The descriptor ({{{mathcal{D}}}}) is processed into    node representations ({{{mathcal{V}}}}) describing the    feature of each system-specific individual element and a pair    representation ({{{mathcal{P}}}}) describing    inter-node features. The ({{{{mathcal{V}}}},{{{mathcal{P}}}}})    representation is the direct input from the descriptor part to    the Graphormer model10, together with    the geometric structure input R to produce a physically    finer structure (Supplementary Information sections    B.1 and    B.3). Specifically,    we choose ({p}_{{{mbox{simple}}}}:=    {{{mathcal{N}}}}({{{bf{0}}}},{{{bf{I}}}})) as the    standard Gaussian distribution in the state space, and the    forward diffusion process as the Langevin diffusion process    targeting this psimple (OrnsteinUhlenbeck    process)40,41,42. A time dilation    scheme t (ref. 43) is introduced    for approximate convergence to psimple after    a finite time . The result is written as the following    stochastic differential equation (SDE):  <\/p>\n<p>      $${{{rm{d}}}}{{{{bf{R}}}}}_{t}=-frac{{beta      }_{t}}{2}{{{{bf{R}}}}}_{t},{{{rm{d}}}}t+sqrt{{beta      }_{t}},{{{rm{d}}}}{{{{bf{B}}}}}_{t}$$    <\/p>\n<p>      (1)    <\/p>\n<p>    where Bt is the standard Brownian    motion (a.k.a. Wiener process). Choosing this forward process    leads to a psimple that is more concentrated    than a heated distribution, hence it is easier to draw    high-density samples, and the form of the process enables    efficient training and sampling.  <\/p>\n<p>    Following stochastic process theory (see, for example, ref.    44), the reverse    process is also a stochastic process, written as the following    SDE:  <\/p>\n<p>      $${{{rm{d}}}}{{{{bf{R}}}}}_{bar{t}}=frac{{beta      }_{bar{t}}}{2}{{{{bf{R}}}}}_{bar{t}},{{{rm{d}}}}bar{t}+{beta      }_{bar{t}}nabla log      {q}_{{{{mathcal{D}}}},bar{t}}({{{{bf{R}}}}}_{bar{t}}),{{{rm{d}}}}bar{t}+sqrt{{beta      }_{bar{t}}},{{{rm{d}}}}{{{{bf{B}}}}}_{bar{t}}$$    <\/p>\n<p>      (2)    <\/p>\n<p>    where (bar{t}:= tau -t) is    the reversed time, ({q}_{{{{mathcal{D}}}},bar{t}}:=    {q}_{{{{mathcal{D}}}},t = tau -bar{t}}) is the    forward process distribution at the corresponding time and    ({{{{bf{B}}}}}_{bar{t}})    is the Brownian motion in reversed time. Note that the forward    and corresponding reverse processes, equations (1) and (2), are inspired from    but not exactly the heating and annealing processes. In    particular, there is no concept of temperature in the two    processes. The temperature T mentioned in the PIDP loss    below is the temperature of the real target system but is not    related to the diffusion processes.  <\/p>\n<p>    From equation (2), the only obstacle    that impedes the simulation of the reverse process for    recovering ({q}_{{{{mathcal{D}}}},0}) from    psimple is the unknown (nabla log    {q}_{{{{mathcal{D}}}},bar{t}}({{{{bf{R}}}}}_{bar{t}})).    Deep neural networks are then used to construct a score model    ({{{{bf{s}}}}}_{{{{mathcal{D}}}},t}^{theta    }({{{bf{R}}}})), which is trained to predict the true    score function (nabla log    {q}_{{{{mathcal{D}}}},t}({{{bf{R}}}})) of each    instantaneous distribution ({q}_{{{{mathcal{D}}}},t}) from the    forward process. This formulation is called a diffusion-based    generative model and has been demonstrated to be able to    generate high-quality samples of images and other    content27,28,45,46,47. As our score    model is defined in molecular conformational space, we use our    previously developed Graphormer model10 as the neural    network architecture backbone of DiG, to leverage its    capabilities in modelling molecular structures and to    generalize to a range of molecular systems. Note that the score    model aims to approximate a gradient, which is a set of    vectors. As these are equivariant with respect to the input    coordinates, we designed an equivariant vector output head for    the Graphormer model (Supplementary Information section    B.4).  <\/p>\n<p>    With the ({{{{bf{s}}}}}_{{{{mathcal{D}}}},t}^{theta    }({{{bf{R}}}})) model, drawing a sample    R0 from the equilibrium distribution of a    system ({{{mathcal{D}}}})    can be done by simulating the reverse process in equation    (2) on N+1    steps that uniformly discretize [0,] with step size    h=\/N (Fig. 1b, blue arrows), thus  <\/p>\n<p>      $$begin{array}{ll}&{{{{bf{R}}}}}_{N} sim      {p}_{{{mbox{simple}}}},\\      &{{{{bf{R}}}}}_{i-1}=frac{1}{sqrt{1-{beta      }_{i}}}left({{{{bf{R}}}}}_{i}+{beta      }_{i}{{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta      }({{{{bf{R}}}}}_{i})right)+{{{mathcal{N}}}}({{{bf{0}}}},{beta      }_{i}{{{bf{I}}}}),,i=N,cdots ,,1,end{array}$$    <\/p>\n<p>    where the discrete step index i corresponds to time    t=ih, and    i:=ht=ih.    Supplementary Information section A.1 provides the    derivation. Note that the reverse process does not need to be    ergodic. The way that DiG models the equilibrium distribution    is to use the instantaneous distribution at the instant    t=0 (or (bar{t}=tau)) on the reverse process,    but not using a time average. As RN    samples can be drawn independently, DiG can generate    statistically independent R0 samples for the    equilibrium distribution. In contrast to MD or MCMC    simulations, the generation of DiG samples does not suffer from    rare events that link different states and can thus be far more    computationally efficient.  <\/p>\n<p>    DiG can be trained by using conformation data sampled over a    range of molecular systems. However, collecting sufficient    experimental or simulation data to characterize the equilibrium    distribution for various systems is extremely costly. To    address this data scarcity issue, we propose a pre-training    algorithm, called PIDP, which effectively optimizes DiG on an    initial set of candidate structures that need not be sampled    from the equilibrium distribution. The supervision comes from    the energy function ({E}_{{{{mathcal{D}}}}}) of each system    ({{{mathcal{D}}}}), which    defines the equilibrium distribution ({q}_{{{{mathcal{D}}}},0}({{{bf{R}}}})propto    exp    (-frac{{E}_{{{{mathcal{D}}}}}({{{bf{R}}}})}{{k}_{{{{rm{B}}}}}T}))    at the target temperature T.  <\/p>\n<p>    The key idea is that the true score function (nabla log {q}_{{{{mathcal{D}}}},t})    from the forward process in equation (1) obeys a partial    differential equation, known as the FokkerPlanck equation    (see, for example, ref. 48). We then    pre-train the score model ({{{{bf{s}}}}}_{{{{mathcal{D}}}},t}^{theta    }) by minimizing the following loss function that    enforces the equation to hold:  <\/p>\n<p>      $$begin{array}{rc}&mathop{sum      }limits_{i=1}^{N}frac{1}{M}mathop{sum      }limits_{m=1}^{M}leftVert frac{{beta      }_{i}}{2}left(nabla      left({{{{bf{R}}}}}_{{{{mathcal{D}}}},i}^{(m)}cdot      {{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta      }({{{{bf{R}}}}}_{{{{mathcal{D}}}},i}^{(m)})right)right.+nabla      leftVert {{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta      }({{{{bf{R}}}}}_{{{{mathcal{D}}}},i}^{(m)})rightVert      ^{2}+nabla left(nabla cdot      {{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta      }({{{{bf{R}}}}}_{{{{mathcal{D}}}},i}^{(m)})right)right)\\      &left.-frac{partial }{partial      t}{{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta      }left({{{{bf{R}}}}}_{{{{mathcal{D}}}},i}^{(m)}right)rightVert^{2}+frac{{lambda      }_{1}}{M}mathop{sum      }limits_{m=1}^{M}leftVertfrac{1}{{k}_{{{{rm{B}}}}}T}nabla      {E}_{{{{mathcal{D}}}}}left({{{{bf{R}}}}}_{{{{mathcal{D}}}},1}^{(m)}right)+{{{{bf{s}}}}}_{{{{mathcal{D}}}},1}^{theta      }left({{{{bf{R}}}}}_{{{{mathcal{D}}}},1}^{(m)}right)rightVert^{2}end{array}$$    <\/p>\n<p>    Here, the second term, weighted by 1,    matches the score model at the final generation step to the    score from the energy function, and the first term implicitly    propagates the energy function supervision to intermediate time    steps (Fig. 1b, upper row). The    structures ({{{{{{bf{R}}}}}_{{{{mathcal{D}}}},i}^{(m)}}}_{m    = 1}^{M}) are points on a grid spanning the structure    space. Since these structures are only used to evaluate the    loss function on discretized points, they do not have to obey    the equilibrium distribution (as is required by structures in    the training dataset), therefore the cost of preparing these    structures can be much lower. As structure spaces of molecular    systems are often very high dimensional (for example, thousands    for proteins), a regular grid would have intractably many    points. Fortunately, the space of actual interest is only a    low-dimensional manifold of physically reasonable structures    (structures with low energy) relevant to the problem. This    allows us to effectively train the model only on these relevant    structures as R0 samples.    Ri samples are produced by passing    R0 samples through the forward process. See    Supplementary Information section C.1 for an example on    acquiring relevant structures for protein systems.  <\/p>\n<p>    We also leverage stochastic estimators, including Hutchinsons    estimator49,50, to reduce the    complexity in calculating derivatives of high order and for    high-dimensional vector-valued functions. Note that, for each    step i, the corresponding model ({{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta    }) receives a training loss independent of other steps    and can be directly back-propagated. In this way, the    supervision on each step can improve the optimizing efficiency.  <\/p>\n<p>    In addition to using the energy function for information on the    probability distribution of the molecular system, DiG can also    be trained with molecular structure samples that can be    obtained from experiments, MD or other simulation methods. See    Supplementary Information section C for data collection    details. Even when the simulation data are limited, they still    provide information about the regions of interest and about the    local shape of the distribution in these regions; hence, they    are helpful to improve a pre-trained DiG. To train DiG on data,    the score model ({{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta    }({{{{bf{R}}}}}_{i})) is matched to the corresponding    score function (nabla log    {q}_{{{{mathcal{D}}}},i}) demonstrated by data    samples. This can be done by minimizing ({{mathbb{E}}}_{{q}_{{{{mathcal{D}}}},i}({{{{bf{R}}}}}_{i})}{parallel    {{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta    }({{{{bf{R}}}}}_{i})-nabla log    {q}_{{{{mathcal{D}}}},i}({{{{bf{R}}}}}_{i})parallel    }^{2}) for each diffusion time step i. Although    a precise calculation of (nabla    log {q}_{{{{mathcal{D}}}},i}) is impractical, the    loss function can be equivalently reformulated into a denoising    score-matching form51,52  <\/p>\n<p>      $$frac{1}{N}mathop{sum      }limits_{i=1}^{N}{{mathbb{E}}}_{{q}_{{{{mathcal{D}}}},0}({{{{bf{R}}}}}_{0})}{{mathbb{E}}}_{p({{{{mathbf{epsilon      }}}}}_{i})}{parallel {sigma      }_{i}{{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta }({alpha      }_{i}{{{{bf{R}}}}}_{0}+{sigma }_{i}{{{{mathbf{epsilon      }}}}}_{i})+{{{{mathbf{epsilon }}}}}_{i}parallel      }^{2}$$    <\/p>\n<p>    where ({alpha }_{i}:= mathop{prod    }nolimits_{j = 1}^{i}sqrt{1-{beta }_{j}}),    ({sigma }_{i}:= sqrt{1-{alpha    }_{i}^{2}}) and    p(i) is the standard    Gaussian distribution. The expectation under ({q}_{{{{mathcal{D}}}},0}) can be    estimated using the simulation dataset.  <\/p>\n<p>    We remark that this score-predicting formulation is equivalent    (Supplementary Information section A.1.2) to the    noise-predicting formulation28 in the diffusion    model literature. Note that this function allows direct loss    estimation and back-propagation for each i in constant    (with respect to i) cost, recovering the efficient    step-specific supervision again (Fig. 1b, bottom).  <\/p>\n<p>    The computation of many thermodynamic properties of a molecular    system (for example, free energy or entropy) also requires the    density function of the equilibrium distribution, which is    another aspect of the distribution besides a sampling method.    DiG allows for this by tracking the distribution change along    the diffusion process45:  <\/p>\n<p>      $$begin{array}{l}log      {p}_{{{{mathcal{D}}}},0}^{theta }({{{{bf{R}}}}}_{0})=log      {p}_{{{mbox{simple}}}}left({{{{bf{R}}}}}_{{{{mathcal{D}}}},tau      }^{theta      }({{{{bf{R}}}}}_{0})right)\\qquadqquadqquad;;-displaystyleintnolimits_{0}^{tau      }frac{{beta }_{t}}{2}nabla cdot      {{{{bf{s}}}}}_{{{{mathcal{D}}}},t}^{theta      }left({{{{bf{R}}}}}_{{{{mathcal{D}}}},t}^{theta      }({{{{bf{R}}}}}_{0})right),{{{rm{d}}}}t-frac{D}{2}intnolimits_{0}^{tau      }{beta }_{t},{{{rm{d}}}}tend{array}$$    <\/p>\n<p>    where D is the dimension of the state space and    ({{{{bf{R}}}}}_{{{{mathcal{D}}}},t}^{theta    }({{{{bf{R}}}}}_{0})) is the solution to the ordinary    differential equation (ODE)  <\/p>\n<p>      $${{{rm{d}}}}{{{{bf{R}}}}}_{t}=-frac{{beta      }_{t}}{2}left({{{{bf{R}}}}}_{t}+{{{{bf{s}}}}}_{{{{mathcal{D}}}},t}^{theta      }({{{{bf{R}}}}}_{t})right),{{{rm{d}}}}t$$    <\/p>\n<p>      (3)    <\/p>\n<p>    with initial condition R0, which can be    solved using standard ODE solvers or more efficient specific    solvers (Supplementary Information section A.6).  <\/p>\n<p>    There is a growing demand for the design of materials and    molecules that possess desired properties, such as intrinsic    electronic band gaps, elastic modulus and ionic conductivity,    without going through a forward searching process. DiG provides    a feature to enable such property-guided structure generation,    by directly predicting the conditional structural distribution    given a value c of a microscopic property.  <\/p>\n<p>    To achieve this goal, regarding the data-generating process in    equation (2), we only need to    adapt the score function from (nabla log    {q}_{{{{mathcal{D}}}},t}({{{bf{R}}}})) to    ({nabla }_{{{{bf{R}}}}}log    {q}_{{{{mathcal{D}}}},t}({{{bf{R}}}}| c)). Using    Bayes rule, the latter can be reformulated as ({nabla }_{{{{bf{R}}}}}log    {q}_{{{{mathcal{D}}}},t}({{{bf{R}}}}| c)=nabla log    {q}_{{{{mathcal{D}}}},t}({{{bf{R}}}})+{nabla    }_{{{{bf{R}}}}}log {q}_{{{{mathcal{D}}}}}(c|    {{{bf{R}}}})), where the first term can be    approximated by the learned (unconditioned) score model; that    is, the new score model is  <\/p>\n<p>      $${{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta      }({{{{bf{R}}}}}_{i}|      c)={{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta      }({{{{bf{R}}}}}_{i})+{nabla }_{{{{{bf{R}}}}}_{i}}log      {q}_{{{{mathcal{D}}}}}(c| {{{{bf{R}}}}}_{i})$$    <\/p>\n<p>    Hence, only a ({q}_{{{{mathcal{D}}}}}(c|    {{{bf{R}}}})) model is additionally    needed45,46, which is a    property predictor or classifier that is much easier to train    than a generative model.  <\/p>\n<p>    In a normal workflow for ML inverse design, a dataset must be    generated to meet the conditional distribution, then an ML    model will be trained on this dataset for structure    distribution predictions. The ability to generate structures    for conditional distribution without requiring a conditional    dataset places DiG in an advantageous position when compared    with normal workflows in terms of both efficiency and    computational cost.  <\/p>\n<p>    Given two states, DiG can approximate a reaction path that    corresponds to reaction coordinates or collective variables,    and find intermediate states along the path. This is achieved    through the fact that the distribution transformation process    described in equation (1) is equivalent to the    process in equation (3) if ({{{{bf{s}}}}}_{{{{mathcal{D}}}},i}^{theta    }) is well learned, which is deterministic and    invertible, hence establishing a correspondence between the    structure and latent space. We can then uniquely map the two    given states in the structure space to the latent space,    approximate the path in the latent space by linear    interpolation and then map the path back to the structure    space. Since the distribution in the latent space is Gaussian,    which has a convex contour, the linearly interpolated path goes    through high-probability or low-energy regions, so it gives an    intuitive guess of the real reaction path.  <\/p>\n<p>    Further information on research design is available in the    Nature Portfolio    Reporting Summary linked to this article.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Read this article:<\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/www.nature.com\/articles\/s42256-024-00837-3\" title=\"Predicting equilibrium distributions for molecular systems with deep learning - Nature.com\">Predicting equilibrium distributions for molecular systems with deep learning - Nature.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Deep neural networks have been demonstrated to predict accurate molecular structures from descriptors ({{{mathcal{D}}}}) for many molecular systems1,5,6,9,10,11,12. Here, DiG aims to take one step further to predict not only the most probable structure but also diverse structures with probabilities under the equilibrium distribution. To tackle this challenge, inspired by the heatingannealing paradigm, we break down the difficulty of this problem into a series of simpler problems.  <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/deep-learning\/predicting-equilibrium-distributions-for-molecular-systems-with-deep-learning-nature-com.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[1238658],"tags":[],"class_list":["post-1028467","post","type-post","status-publish","format-standard","hentry","category-deep-learning"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1028467"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=1028467"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1028467\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=1028467"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=1028467"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=1028467"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}