{"id":1122778,"date":"2024-03-06T15:57:23","date_gmt":"2024-03-06T20:57:23","guid":{"rendered":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/uncategorized\/a-global-survey-of-prokaryotic-genomes-reveals-the-eco-evolutionary-pressures-driving-horizontal-gene-transfer-nature-com\/"},"modified":"2024-03-06T15:57:23","modified_gmt":"2024-03-06T20:57:23","slug":"a-global-survey-of-prokaryotic-genomes-reveals-the-eco-evolutionary-pressures-driving-horizontal-gene-transfer-nature-com","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/evolution\/a-global-survey-of-prokaryotic-genomes-reveals-the-eco-evolutionary-pressures-driving-horizontal-gene-transfer-nature-com\/","title":{"rendered":"A global survey of prokaryotic genomes reveals the eco-evolutionary pressures driving horizontal gene transfer &#8211; Nature.com"},"content":{"rendered":"<p><p>Genome selection and pangenome generation    <\/p>\n<p>    We based our analysis on the proGenomes v.2.2 dataset    containing 82,400 genomes grouped into 11,562 species (that is,    specI clusters) that were defined based on 40 single-copy    marker genes20. The    corresponding species tree generated based on concatenated    marker gene sequences was kindly provided by the authors of the    proGenomes article20.  <\/p>\n<p>    From this initial selection, we filtered out    metagenome-assembled genomes, single-amplified genomes, genomes    flagged as chimeric by GUNC39, genomes that    were not taxonomically cohesive with the rest of the specI    cluster according to GTDB26, genomes with no    16S rRNA gene sequence detected and genomes we could not    confidently map to the MicrobeAtlas database (see Mapping    genomes to MicrobeAtlas database OTUs below). The species tree    was pruned to remove these genomes using the ETE Toolkit v.3    (ref. 40). As a result,    we obtained 78,315 genomes grouped into 8,790 species. For each    species, a pangenome was generated by clustering all gene    sequences on 95% nucleotide sequence identity as described in    ref. 41.  <\/p>\n<p>    All gene sequences were clustered using MMseqs2 (ref.    42) with minimum    overlap of 50%, minimum identity threshold of 80% and    clustering mode 0. The rest of the parameters were left as    default. For each gene cluster, whenever sequences originated    from more than one genome within a species, we only retained    sequences that were most similar to those from other species    within the gene cluster. We then proceeded with gene clusters    containing sequences from at least five different species.    Sequences were then aligned using the automatic strategy    selection option in MAFFT v.7.471 (ref. 43), with all other    parameters left as default. On the basis of the multiple    sequence alignment, a gene tree was generated using FastTree    v.2.1.11 (ref. 44) using the    generalised time-reversible model45 of nucleotide    evolution, with all parameters left as default.  <\/p>\n<p>    Before performing tree reconciliation, we subsampled the    species tree using ETE Toolkit v.3 (ref. 40) to decrease    computational requirements in the following manner: for each    gene cluster, the species tree node corresponding to the last    common ancestor of all species within the gene cluster was    selected. Clades within the species tree not containing any    genes from the gene cluster were collapsed for computational    efficiency. Subsequently, the subsampled species tree was used    to root the gene tree using the OptRoot module from RANGER-DTL    v.2.0 (ref. 23). We then ran    RANGER-DTL with default settings to perform gene and species    tree reconciliation for a total of 500. Gene clusters in which    more than 50 optimal roots were detected were not considered    further. Reconciliations from each optimal root were aggregated    using the AggregateRanger_recipient module from RANGER-DTL    v.2.0. We used a custom script to aggregate results across    optimal roots and detect tree nodes that were labelled as    transfers. For downstream analysis, we considered only transfer    events detected in 80% reconciliations that contained gene    pairs with 0.5 minimum branch support in the gene tree. In    addition, all multifurcations containing 100% identical genes    from different species were considered to be transfer events.  <\/p>\n<p>    For each genome, we counted a gene as having undergone transfer    as long as its pangenome-representative gene was involved in a    transfer event. For the denominator (that is, total number of    genes assessed), we only considered genes if their    pangenome-representative genes had passed all steps described    above in HGT event detection. The number of genes transferred    was then divided by the total number of genes assessed and the    average based on all genomes within a species was calculated.    For the examples mentioned in the main text, we used data from    specI_v3_Cluster259 for A. baumannii and data from    specI_v3_Cluster712 for L. monocytogenes.  <\/p>\n<p>    The NCBI Sequence Read Archive46 was searched for    samples and studies containing any of the keywords    metagenomic, microb*, bacteria or archaea in their    metadata. The corresponding raw sequence data (as of 7 March    2020) were downloaded and quality filtered. To assign OTU    labels, quality filtered data were mapped to MAPref v.2.2.1    using MAPseq v.1.0 at a 0.5 confidence    level47. We then    filtered out samples containing less than 1,000 reads and\/or    less than 20 OTUs defined at 97% 16S rRNA gene identity and    retained samples with at least 90% community coverage    (calculated based on the formula in ref. 48).  <\/p>\n<p>    NCBI Sequence Read Archive sample metadata were parsed to    classify every sample into four general environments: animal,    aquatic, plant and soil. Subsequently, we calculated    BrayCurtis distances between all samples in the dataset and    compared community compositions in samples from independent    studies. When a sample was consistently similar to samples    assigned to a different environment, we adjusted its    environment label. In cases where samples with similar    community compositions had no general agreement between    assigned environments, we removed the environmental label.  <\/p>\n<p>    We used barrnap49 with default    settings to predict 16S rRNA gene sequences in the genome    selection, proceeding with sequences of 50% of expected    length. The sequences were then mapped to MAPref v.2.2.1 using    MAPseq v.1.0 (ref. 47), retaining only    sequences that mapped to an OTU with a 0.3 confidence level.    Genomes containing multiple 16S rRNA gene copies were mapped to    OTUs based on a majority rule (50% copies) or high confidence    (at least one copy with a 0.98 confidence level). Species    containing multiple genomes were mapped to OTUs based on    majority (50% genomes).  <\/p>\n<p>    For each OTU within the dataset, the average abundance was    calculated separately for all samples assigned to the animal,    aquatic, plant and soil environments. The OTU was then assigned    to its preferred environment based on the highest of the four    numbers.  <\/p>\n<p>    Distances between gene and species pairs were extracted from    the corresponding trees using the dist function in ETE Toolkit    v.3 (ref. 40). To plot the    distribution in Fig. 2a, only gene pairs    with 0.5 minimum branch support values and 50% sequence    overlap within the multiple sequence alignment were considered.    Gene pairs with and without transfer events were normalized    with respect to species distance by splitting the species    distance distributions into 80 bins and subsampling the group    with the larger number of pairs in each bin (either transfer    detected or no transfer detected) to the number of pairs in    the second group in the corresponding bin (either no transfer    detected or transfer detected). The same procedure was    performed for normalizing gene pairs with and without transfer    events with respect to gene distance. After normalization, the    resulting distributions were compared using the two-sided    MannWhitney U-test.  <\/p>\n<p>    To calculate gene ubiquity, we counted the number of genomes    represented by a gene in each pangenome versus the total number    of genomes in the species. For subsequent analysis, only    species encompassing ten or more genomes were considered. We    used previously defined thresholds25 to distinguish    extended core genes (90% gene ubiquity) and cloud genes (15%    gene ubiquity). In the species pair participating in HGT, the    species with the higher gene ubiquity was labelled as the    putative donor, whereas the species with the lower gene    ubiquity was labelled as the putative recipient. To compare    extended core and cloud genes with or without transfer events,    a two-sided Fishers exact test was performed.  <\/p>\n<p>    We used the COG category and KEGG pathway functional    annotations provided by the proGenomes database after running    eggNOG-mapper for eggNOG 5.0 (ref. 35). Each gene    cluster was annotated to the corresponding functional    categories based on the union of all gene annotations within    the cluster. To analyse genes associated with the mobilome, we    looked up which terms corresponded to the XMobilome: phages,    transposons category in the database of    COGs50,51 (mobilome,    curated, in Extended Data Fig. 4). In addition, we    extracted terms that contained the following keywords in the    annotations provided by the proGenomes database: phage,    transposon, transposase, transposition, transposable,    mobile, mobilization, integrase, integration,    plasmid, conjugative, conjugation, transformation and    competence (mobilome, uncurated, in Extended Data Fig.    4). To analyse genes    associated with transcription regulation, we extracted terms    from the transcription category that contained the following    keywords in the annotations provided by the proGenomes    database: regulation and regulator (transcription    regulation, uncurated, in Extended Data Fig. 4). We calculated a    functional categorys background expectation fraction by    counting the total number of genes that passed the pipeline    that were annotated to this category divided by the total    number of genes that passed the pipeline.  <\/p>\n<p>    For each detected transfer event, we calculated the average    species and gene distance by taking all average pairwise    distances between left descendants and right descendants of the    transfer event (for gene distance calculations, only gene pairs    with 50% sequence overlap were considered). The resulting    distribution of species and gene distances can be seen in Fig.    2e. For functional    enrichment analysis, minimum and maximum species and gene    distance cut-offs were selected in such a way that there were    no bins without observations, with the resulting area divided    into thirds. We also looked specifically at transfer events at    the 0.01 and 0.05 gene distance cut-offs (approximately 99%    and 95% sequence identity, respectively) as these results    would be more comparable to previous studies that detected HGT    events based on nearly identical sequences. We then counted the    number of transfer events annotated to each functional category    divided by the total number of transfer events in the area. The    observed fraction of events annotated to a specific function    was then tested with a two-sided binomial test against the    fraction of all genes on which the pipeline was run that were    annotated to this function. Resulting P values were    corrected for multiple testing using the HolmSidak method.  <\/p>\n<p>    A similar procedure was performed using KEGG ortholog    annotations, grouping them into KEGG pathway maps (0910109145)    for Extended Data Fig. 5 and antimicrobial    resistance genes (BR:ko01504) for Extended Data Fig.    6.  <\/p>\n<p>    We compared our functional enrichment analysis results with    those from refs. 9,13,18,28,29,30,31,32,33,34. In most of    these studies, functional categories were based on the COG    database, with the exception of ref. 13 (with categories    based on the SEED52) and refs.    9,28 (both with    categories based on TIGRFAMs53). The mapping    between COG categories and KEGG pathways (used in our study),    SEED and TIGRFAMs can be found in Supplementary Table    1.  <\/p>\n<p>    For our study, we considered enrichment data from the most    recent transfers, that is, gene distance bins 0.000.01,    0.000.05 and 0.000.25. These three gene distance bins    together with three species distance bins provided us with nine    data points to consider for each functional category. We    assigned a functional category to have strong evidence for    enrichment or depletion in transfers if at least seven of the    nine data points showed significant enrichment or depletion. We    assigned a functional category to have weak evidence for    enrichment or depletion in transfers if most data points showed    enrichment or depletion but this was not always statistically    significant.  <\/p>\n<p>    For ref. 18, we considered    the results depicted in Fig. 8d and Supplementary Table 13 of    the article. We calculated the first and third quartiles of the    HGT index using all genes in Supplementary Table 13. We    assigned a functional category to have strong evidence for    enrichment in transfers if the median HGT index from genes in    this category was greater than the third quartile. We assigned    a functional category to have strong evidence for depletion in    transfers if the median HGT index from genes in this category    was less than the first quartile. Only functional categories    containing at least five genes were considered.  <\/p>\n<p>    For ref. 34, we considered    the results depicted in Fig. 9 of the article. We considered    only recent HGT events (99% nucleotide sequence identity). We    assigned a functional category to have strong evidence for    enrichment in transfers if the median recent HGTs in this    category was greater than the third quartile. We assigned a    functional category to have strong evidence for depletion in    transfers if the median recent HGTs in this category was less    than the first quartile.  <\/p>\n<p>    For ref. 32, we considered    the results depicted in Fig. 4a (HTgenes row)    of the article. We considered a functional category to have    strong evidence for enrichment or depletion in transfers if the    observed-to-expected ratio of orthologous groups was    significantly different from one.  <\/p>\n<p>    For ref. 31, we considered    the results depicted in Supplementary Fig. 7 of the article. We    considered a functional category to have strong evidence for    enrichment or depletion in transfers if the relative proportion    of transferred genes was significantly over- or    underrepresented when compared with the set of all bacterial    genes.  <\/p>\n<p>    For ref. 30, we considered    the results depicted in the first two columns of Table 3 of the    article. We considered a functional category to be enriched in    transfers if its relative transferability was higher than one,    and to be depleted in transfers if its relative transferability    was lower than one. We used a P value cut-off of 0.05 to    distinguish strong and weak evidence for enrichment or    depletion.  <\/p>\n<p>    For ref. 33, we considered    the results depicted in Table 2 of the article. In the table,    functional categories were listed that significantly differed    from the background of all gene families. We used a P    value cut-off of 0.05 to distinguish strong and weak evidence    for enrichment or depletion.  <\/p>\n<p>    For ref. 29, we considered    the results depicted in Fig. 4b of the article. We    used Z-score cut-offs of 2 and 2 to distinguish strong    and weak evidence for enrichment or depletion.  <\/p>\n<p>    For ref. 13, we considered    the results depicted in Supplementary File 6 (SEED level 1 and    SEED level 2) of the article. We used a P value cut-off    of 0.05 to distinguish strong and weak evidence for enrichment    or depletion. We downweighted depletion evidence for the    transcription (regulatory) and signal transduction    categories as they both mapped to regulation and cell    signalling in the SEED. For COG categories that mapped to    multiple categories in the SEED, we indicated evidence based on    the consensus from these categories.  <\/p>\n<p>    For ref. 28, we considered    the results depicted in Table 2 of the article. We downweighted    depletion evidence for cell cycle control and mitosis and    cell motility as they both mapped to the cellular processes    in TIGRFAMs. We also downweighted enrichment evidence for    carbohydrate transport and metabolism as there was no    one-to-one mapping for this category.  <\/p>\n<p>    For ref. 9, we considered    the results depicted in Fig. 2 of the article. We    considered a functional category to be enriched in transfers if    the proportion of transferred genes was greater than 10%, and    to be depleted in transfers if the proportion of transferred    genes was less than 3%.  <\/p>\n<p>    An OTU was detected as present in a given sample if its    relative abundance was at least 0.01%. To calculate the    co-occurrence between two OTUs, we counted the number of    samples in which both OTUs were present and divided it by the    number of samples in which the less prevalent OTU was present.    Phylogenetic distances between OTUs were retrieved from the    MicrobeAtlas database 16S rRNA tree using the dist function in    ETE Toolkit v.3 (ref. 40).  <\/p>\n<p>    For modelling the relationship between co-occurrence and    phylogenetic distance, we only considered OTUs that exchanged    at least 1 gene with 30 other OTUs and OTU pairs in which both    OTUs were present in at least 20 environmental samples. The    power law equation (1) is as follows:  <\/p>\n<p>      $${{rm{CO}} approx ktimes      {rm{PD}}^{a},}$$    <\/p>\n<p>      (1)    <\/p>\n<p>    where CO stands for co-occurrence, PD stands for phylogenetic    distance, and k and a are parameters fitted using    the nlstools package in R54. Model residuals    were then used to calculate Spearman correlations with the    number of genes transferred. To generate the background    distribution, the number of genes was shuffled before    calculating Spearman correlations. The resulting distributions    of Spearman correlations generated based on raw co-occurrence    (precorrection), model residuals (postcorrection) or background    were compared with each other using the two-sided MannWhitney    U-test.  <\/p>\n<p>    The analysis depicted in Fig. 4bd has been performed    using a similar set-up as described in Gene and species    distance normalization. We used the 7 genes transferred    cut-off to denote OTU pairs with many transfer events as this    corresponded to the 80% quantile of OTU pairs with at least 1    gene transferred.  <\/p>\n<p>    Global networks of predicted interactions were computed with    FlashWeave v.0.19.0 (ref. 38). This method    uses the local-to-global learning approach55 to learn the    skeleton of a Bayesian network encoding putative ecological    relationships between species adjusted for ecological or    technical confounders. To this end, FlashWeave uses an    interleaved testing scheme that (1) heuristically determines    likely confounding variables for each pair of species (based on    univariate associations and previous iterations of the    algorithm), and (2) subsequently tests whether the focal    association holds when conditioned on these candidate    confounders.  <\/p>\n<p>    The parameters used for running FlashWeave were as follows:    sensitive=false, heterogeneous=true, and max_k=3 (with    confounder correction) or max_k=0 (without confounder    correction). With these settings, FlashWeave converts non-zero    read counts to centred log-ratio-transformed values to account    for compositionality and discretizes these values. Mutual    information tests are then run on the discretized values. We    used co-occurrence data from all 95,422 OTUs contained within    the environmental sample dataset, filtering the resulting    network for edges between the 4,380 OTUs for which transfer    event data were generated. OTU pairs with a score higher than    zero were considered as interacting. To normalize for    differences in phylogenetic distance and co-occurrence    distributions between species with at least seven genes    transferred and species with zero or one gene transfer, the    procedure described in Gene and species distance    normalization was performed with simultaneous subsampling on    phylogenetic distance and co-occurrence for 8080 bins.  <\/p>\n<p>    We used the same relative abundance numbers as calculated in    Preferred habitat assignment. For each OTU, we only    considered its abundance within its preferred environment,    denoting high-abundance OTUs as those whose abundance was above    the 80% quantile in this environment. In contrast, we denoted    low-abundance OTUs as those whose abundance was below the 20%    quantile in this environment. OTU pairs were then sorted based    on phylogenetic distance and the fraction of OTU pairs with at    least one transfer event detected was calculated for each    phylogenetic distance bin. Error bands were calculated using    Bernoullis principle of uncertainty. Resulting fractions were    then pairwise compared between the highhigh, highlow and    lowlow groups using a one-sided Wilcoxon rank-sum test.    Resulting P values were corrected for multiple testing    using the BenjaminiHochberg method.  <\/p>\n<p>    We computed a generalism index for each OTU reflecting its    environmental flexibility. This index was calculated based on    the entropy of the OTUs abundance values across the four major    environments (animal, aquatic, soil and plant). OTUs with    similar abundances across environments had higher entropy. OTUs    with uneven abundances across environments (a higher abundance    in one or a few of the environments compared with the rest) had    lower entropy.  <\/p>\n<p>    To compare inter-environmental transfers, we selected 200 OTUs    assigned to each environment (see Preferred habitat    assignment) that displayed the highest entropy (generalists)    and 200 OTUs that displayed the lowest entropy (specialists).    OTU pairs were then subsampled in such a way that phylogenetic    distance distributions were equal between all environments and    between generalists, specialists and all species. We then    counted the fraction of OTU pairs with at least one transfer    event detected. To generate the background expectation, OTU    pairs from all species were subsampled to the target    phylogenetic distance distribution 1,000. We then fit a normal    distribution to the generated data using the fitdistr function    in R56 to get an    estimate of the expected mean, s.d. and range of transfer rates    between different environments.  <\/p>\n<p>    Data from Figs. 2 and 4b,c and Extended Data    Figs. 16 were visualized    using seaborn v.0.11.2 (ref. 57) and matplotlib    v.3.5.1 (ref. 58) in Python    v.3.7.4. Data from Figs. 3, 4a, and 5    and Extended Data Fig. 7 were visualized    using ggplot2 v.3.3.5 (ref. 59) in R v.4.1.1.  <\/p>\n<p>    Further information on research design is available in the    Nature Portfolio    Reporting Summary linked to this article.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Go here to read the rest:<\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/www.nature.com\/articles\/s41559-024-02357-0\" title=\"A global survey of prokaryotic genomes reveals the eco-evolutionary pressures driving horizontal gene transfer - Nature.com\">A global survey of prokaryotic genomes reveals the eco-evolutionary pressures driving horizontal gene transfer - Nature.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Genome selection and pangenome generation We based our analysis on the proGenomes v.2.2 dataset containing 82,400 genomes grouped into 11,562 species (that is, specI clusters) that were defined based on 40 single-copy marker genes20. The corresponding species tree generated based on concatenated marker gene sequences was kindly provided by the authors of the proGenomes article20 <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/evolution\/a-global-survey-of-prokaryotic-genomes-reveals-the-eco-evolutionary-pressures-driving-horizontal-gene-transfer-nature-com\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187748],"tags":[],"class_list":["post-1122778","post","type-post","status-publish","format-standard","hentry","category-evolution"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1122778"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=1122778"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1122778\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=1122778"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=1122778"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=1122778"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}