{"id":1027299,"date":"2023-08-04T10:44:39","date_gmt":"2023-08-04T14:44:39","guid":{"rendered":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/uncategorized\/platform-reduces-barriers-biologists-face-in-accessing-machine-bio-it-world.php"},"modified":"2023-08-04T10:44:39","modified_gmt":"2023-08-04T14:44:39","slug":"platform-reduces-barriers-biologists-face-in-accessing-machine-bio-it-world","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/machine-learning\/platform-reduces-barriers-biologists-face-in-accessing-machine-bio-it-world.php","title":{"rendered":"Platform Reduces Barriers Biologists Face In Accessing Machine &#8230; &#8211; Bio-IT World"},"content":{"rendered":"<p><p>      August 1, 2023 | A group of scientists at the Wyss      Institute for Biologically Inspired Engineering at Harvard      University and MIT are convinced that automated machine      learning (autoML) is going to revolutionize biology by      removing many of the technical barriers to using      computational models to answer fundamental questions about      sequences of nucleic acids, peptides, and glycans. Machine      learning can be complicated, but it doesnt have to be, and      sometimes simpler is better, according to graduate student      Jackie Valeri, a big believer in the power of autoML to solve      real-world problems.    <\/p>\n<p>      AutoML is a method learning concept that helps users transfer      data to training algorithms and automatically search for the      best ML architecture for a given issue, lowering the demand      for expert-level computational knowledge that currently      outpaces the supply. It can also be pretty competitive with      even the best manually designed ML models that can take      months if not years to develop, says Valeri, as she and her      colleagues recently demonstrated in a paper published in Cell      Systems (DOI:      10.1016\/j.cels.2023.05.007).    <\/p>\n<p>      The article showcased the potential of their novel      BioAutoMATED platform which, unlike other autoML tools,      accommodates more than one type of ML model and is designed      to accept biological sequences. Its intended users are      systems and synthetic biologists with little or no ML      experience, says Valeri, who works in the lab of Jim Collins,      Ph.D. at the Wyss Institute.    <\/p>\n<p>      The all-in-one BioAutoMATED platform modifies three existing      AutoML toolsAutoKeras, which searches for optimal neural      networks; DeepSwarm, which looks for convolutional neural      networks; and TPOT, which hunts for a variety of other,      simpler modeling techniques such as linear regression and      random forest classifiersto come up with the most      appropriate model for a users dataset, she explains.      Standardized output results are presented as a set of      folders, each associated with one of those search techniques,      revealing the best performing model in graphic and text file      format.    <\/p>\n<p>      The tool is very meta, says Valeri, in that it is learning      on the learning. Model selection is often the part of      research projects that requires a lot of computational      expertise biologists generally do not possess and the task      cant be easily passed to an ML specialist even if one is to      be found because domain knowledge is needed in the      model-building process.    <\/p>\n<p>      Overall, biological researchers are excited about using      machine learning but until now have been stymied by the      amount of coding needed to get started, she says, noting      that it is not uncommon for ML models to have a codebase of      over 750 lines. The installation of packages alone can be a      huge barrier.    <\/p>\n<p>      Interest in ML has skyrocketed over the past year thanks      largely to the introduction of ChatGPT with its user-friendly      interface, but people have also quickly discovered they cant      trust everything the large language model has to offer, says      Valeri. Similarly, BioAutoMATED is useful but not a magic      bullet that erases data problems and like ML in general      should be approached with a healthy amount of skepticism to      ensure it is learning whats intended.    <\/p>\n<p>      BioAutoMATED will in the future likely be used together with      ChatGPT, predicts Wyss postdoctoral fellow Luis Soenksen,      Ph.D., co-lead author on the Cell Systems paper. Researchers      will simply articulate what they want to do and be presented      with the best questions, required data, and ML models to get      the job done.    <\/p>\n<p>      When put to the test, BioAutoMATED not only outperformed      other autoML tools but also some of the models created by a      professional ML expertand did it in under 30 minutes using      only 10 lines of input code from the user. The required      coding is for the basics, says Valeri, to specify the target      folder for results, the file name where input data can be      found, the column name where sequences can be found within      that file, and run times for these extensions.    <\/p>\n<p>      Users are instructed to first install Docker on their      computer, if they have not done so already, and are walked      through the process of doing that, she adds. The open      software platform sets up its own environment for running      applications, requiring only two lines of code to access the      Jupyter notebooks preloaded on BioAutoMATED that contain      everything needed to run the autoML tool. Its a quick      start for most people accustomed to using a computer.    <\/p>\n<p>      With a bit more coding, users can access some of the embedded      extras, says Valeri. These include the outputs from scrambled      control tests where BioAutoMATED generates sequences by      shuffling the order of nucleotides, answering the frequently      asked question of whether models are picking up on real      order-and sequence-specific biology.    <\/p>\n<p>      Half of the battle in biological research is knowing how to      ask the right questions, says Soenksen. The platform helps      users do that as well as provides insights leading to new      questions, hypotheses, models, and experiments.    <\/p>\n<p>      Users can also opt for data saturation tests where      BioAutoMATED sequentially reduces the dataset size to see the      effect on model performance, Valeri says. If you can say the      models do great with 20,000 sequences, maybe you dont have      to go to the effort of collecting 50,000 or 100,000      sequences, which is a real impactful finding for a biologist      actually doing the experiments.    <\/p>\n<p>      Two of the most exciting outputs from the tool, in Valeris      mind, are the interpretation and design results.      Interpretation results indicate what a model is learning      (e.g., nucleotides of elevated importance), including      sequence logos where the larger the size of the letter in      the sequence the more important it is to whatever function of      interest is being examined. Sequence logos of the raw data      can also be done to facilitate comparisons across ML      tools.    <\/p>\n<p>      Biologists using BioAutoMATED in this way can expect some      actionable outputs, says Valeri. They might want to pay more      attention to a motif that pops up through all these sequence      logos, for example, or do a deep mutational scanning of a      targeted region of the sequence that appears to be most      important.    <\/p>\n<p>      The other key output is a list of de novo design sequences      that are optimized for whatever function the model has been      trained on, she says. For the newly published study, this      focused on the downstream efficiency of a ribosome binding      site to translate RNA into protein in E. coli bacteria.    <\/p>\n<p>      BioAutoMATED was also used to identify areas of the sequence      most important in determining translation efficiency, and to      design new sequences that could be tested experimentally.      Further, the platform generated highly accurate information      about amino acids in a peptide sequence most critical in      determining an antibodys ability to bind to the drug      ranibizumab (Lucentis), as well as classified different types      of glycans into immunogenic and non-immunogenic groups based      on their sequences.    <\/p>\n<p>      Finally, the team had the platform optimize the sequences of      RNA-based toehold switches. This informed the design of new      toehold switches for experimental testing with minimal input      coding required.    <\/p>\n<p>      The time it takes to obtain results from BioAutoMATED depends      on several factors, including the question being asked and      the size of the dataset for model training, says Valeri.      Weve found the length of the sequence is a really big      factor... and the compute resources you have      available.    <\/p>\n<p>      The maximum user-allowed time for obtaining results is      another important consideration, adds Soenksen. The platform      can search for hours or days, as circumstances dictate. Time      constraints are routinely employed when training ML models as      a matter of practicality.    <\/p>\n<p>      Soenksen and Valeri both use BioAutoMATED as a benchmark for      their own custom-built models, and friends that have tested      the platform on different machines are enthusiastic about its      potential, they say. In the manuscript, the platform also had      good performance on many different datasets, including ones      specific to sequence lengths and types.    <\/p>\n<p>      I have personally used it for some quick paper explorations,      trying to see what data are available... [without] having to      take the time to code up my own machine learning models,      says Valeri. Although it is too soon to know how the tool      will be used by biologists elsewhere, it is already being      used regularly by a handful of scientists at Harvard      investigating short DNA, RNA, peptide, and glycan      sequences.    <\/p>\n<p>      BioAutoMATED is available to download fromGitHub. If we get a lot of traction [with it], and I      think we will, our team will probably put more resources into      the user interface, notes Soenksen, a serial entrepreneur in      the science and technology space. The long-term goal is to      make the tool usable by clicking buttons to further lower      barriers to access.    <\/p>\n<p>      If youre a machine learning expert, youll probably be able      to beat the output of BioAutoMATED, adds Valeri. We are      just trying to make it easy for people with limited machine      learning expertise to [quickly] get to a pretty good      model.    <\/p>\n<p>      Complicated neural networks and big language models, which      have a lot of parameters and require large amounts of data,      are not always best, she says. The simple-model techniques      identified by TPOT can be quite well suited to the      often-limited datasets biologists have available and can      perform as well as if not better than systems with more      advanced ML architecture.    <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Continue reading here: <\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/www.bio-itworld.com\/news\/2023\/08\/01\/platform-reduces-barriers-biologists-face-in-accessing-machine-learning\" title=\"Platform Reduces Barriers Biologists Face In Accessing Machine ... - Bio-IT World\">Platform Reduces Barriers Biologists Face In Accessing Machine ... - Bio-IT World<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> August 1, 2023 | A group of scientists at the Wyss Institute for Biologically Inspired Engineering at Harvard University and MIT are convinced that automated machine learning (autoML) is going to revolutionize biology by removing many of the technical barriers to using computational models to answer fundamental questions about sequences of nucleic acids, peptides, and glycans. Machine learning can be complicated, but it doesnt have to be, and sometimes simpler is better, according to graduate student Jackie Valeri, a big believer in the power of autoML to solve real-world problems.  <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/machine-learning\/platform-reduces-barriers-biologists-face-in-accessing-machine-bio-it-world.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[1231415],"tags":[],"class_list":["post-1027299","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1027299"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=1027299"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1027299\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=1027299"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=1027299"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=1027299"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}