Whole genomes from bacteria collected at diagnostic units around … – Nature.com

Posted: September 21, 2023 at 10:16 am

Preparation of partners to collect samples

Partners registered for participation by contributing isolates or DNA samples to the study. Material was sent to partners according to their registered participation format. This included material for sample collection, metadata registration, DNA extraction and sample shipment to Denmark. Specific protocols were provided, according to the registered participation format and a video for partners sampling isolates was made available via the TWIW web application and YouTube.

Partners were in charge of navigating national guidelines and regulations regarding ethical approval (such as institutional review boards, ethical review boards or other) of their participation in the study. The Danish National Scientific Ethics Committee was consulted with regards to The Technical University of Denmark leading the study, and based on their assessment of the study protocol, the committee concluded that the samples were not human and therefore the study did not require ethical approval. No patient material was transferred with the samples, and no patient identifiers were shared with the project. Only minimal metadata pertaining to the infection and bacterial isolates or their DNA were sampled.

Partners collected samples according to their availability to do so, during 2020. Due to the obstacles presented by the Covid-19 pandemic, ability to participate and carry out sampling was prioritised over sampling during a specific time (original study design and planning targeted sampling during March 2020).

Approximately 60 samples were collected at each individual diagnostic unit over a week. TableS1 lists the participating units with their study ID, country and city of origin, the month of collection, the amount of samples sent, whether the samples received were isolates or DNA and whether the unit made alterations to the sampling protocol. The 60 samples were to be randomly selected at the diagnostic units over the course of a week. Targeting sampling over all weekdays served the purpose of avoiding logistical bias from the internal logistics of the diagnostic unit. Targeting random sampling served the purpose of not targeting specific species or sample source types (i.e. urine samples, blood samples). Partners did prospective random sampling by estimating how many samples to collect every day over the course of a week, in order to collect approximately 60 samples over a week. Due to lack of diagnostic activities related to bacterial infections, a number of units prolonged the sampling time where simply all samples were included in the study, until 60 samples were acquired or sampling was halted due to other reasons.

Coal swabs were used to swab from the plates on which the pathogen was cultured a video illustrating the isolate sampling procedure can be viewed via this link. Parafilm was strapped around the lid of the coal swab for extra sealing. Coal swabs were kept dark, at 4 C or room temperature if 4 C storage was not available. Swabs were stored until shipment was possible for partners.

For partners extracting DNA, material corresponding to the DNA extraction kit and methodology used at DTU was provided to partners (DTU DNA extraction procedure is described under DNA extraction and library preparation). Partners were asked to provide at least 50l of eluted DNA, or at least 80l if the measured concentrations were <6ng/l.

Metadata sheets were provided for all partners, together with labels with printed sample names, unique to each sampling location. Labels were for application on the samples (coal swabs or tubes with DNA) and pertaining metadata sheets. Metadata sheets were for use in a laboratory setting, where metadata could not be recorded electronically from other lab records. The collected metadata was subsequently submitted electronically via Survey Monkey or in excel format for most partners. Few partners sent only the handwritten metadata sheets. The metadata variables are listed in Table1. Under no circumstances were internal patient identifiers (ids) or other references to individuals shared for the project.

Isolates were shipped as UN3373 biological sample category B. All coal swabs were put into absorptive pockets and into a zip lock bag labelled UN3373. The bag was placed in a shipment box labelled UN3373, together with any metadata sheets (these were also submitted electronically for the majority of samples). Shipment was performed by DHL, as Medical Express or ordinary parcel, depending on the options for the departure location. A single parcel was shipped by World Courier, from Mozambique to Denmark.

DNA samples were stored in Eppendorf tubes and sealed again with Parafilm. The tubes were placed in an 84-compartment foldable freezer box and placed in a bubble-wrap envelope. All DNA samples were shipped as ordinary parcels or letters, without cold chain.

Upon arrival in Denmark, samples were logged together with received metadata. Validation of the metadata was performed prior to database submission. Validation of metadata is explained in detail under Technical Validation. Logging entailed entering sample names (as written on the labels provided to partners), registration of unique sample ids, original as well as validated metadata and processing information with regards to culturing and freezing of isolates. Once validated, all information resulting from logging samples and their metadata was submitted to the MySQL database.

Isolates received on coal swabs were cultured on blood agar or chocolate agar, in presence of CO2 if necessary, and sub-cultured until the expected (as submitted by sampling partner) species were (presumedly) isolated (visual recognition by experienced laboratory professionals). In doubt of which species to go forward with, multiple isolates were brought forward for DNA extraction and sequencing and the correct isolate was decided upon after bioinformatic species prediction.

DNA was extracted using Qiagen DNeasy Blood & Tissue kit (Qiagen, Venlo, Netherlands) according to manufacturers protocol. DNA concentrations were measured on Qubit using Invitrogens Qubit dsDNA high-sensitivity (HS) assay kit (Carlsbad, CA, USA). DNA concentrations were diluted to approximately 0.2ng/l for library preparation. Libraries were prepared according to the Illumina NexteraXT DNA Library Prep Reference Guide (Illumina, Inc., San Diego, CA, USA) using standard normalisation.

All samples, except eight, were sequenced on an Illumina NextSeq 500 platform, paired-end sequencing, medium output flowcell (NextSeq500/550 Mid Output Kit v2.5 300 cycles, Cat. nr 20024905). Gram-negative samples were run 96 isolates in parallel, and Gram-positive samples were run 192 isolates in parallel. Few flow cells were run with mixed Gram-negative and Gram-positive samples with approximately 100 samples on a single flow cell. Eight samples were sequenced on an Illumina MiSeq platform, paired-end sequencing, 500 cycles (2251) on a V3 flowcell.

Sequencing data was downloaded from BaseSpace (Illuminas customer cloud platform) and transferred to the Danish National Supercomputer for Life Sciences11, a high-performance computing cluster, where it was both stored and processed, and all downstream analytics took place.

An in-house bioinformatics pipeline, called FoodQCPipeline v. 1.512, was used at default settings to quality assess the raw sequence data, trim the raw reads according to predefined quality thresholds and perform de-novo assembly on the genomes. The quality assessment and trimming of raw sequencing data is further described under Technical Validation. Given the spades option, FoodQCPipeline performs de-novo assembly with SPAdes v. 3.11.013. After running the FoodQCPipeline, both trimmed fastq data and fasta (draft assemblies) are available for downstream analyses. QC summary data was submitted to the MySQL database after genome validation, which is explained in detail under Technical Validation.

KmerFinder14, was used as one of two species prediction programs. KmerFinder assesses species identity by matching k-mers from the query sequence to a kmer-based database of reference strains. KmerFinder was run on the draft assemblies with default settings, the evaluation was done on total query coverage, which is calculated as the number of unique k-mers shared between the query and the template, divided by the number of unique k-mers in the query, with the first hit being accepted if it had more than 80% total query coverage.

The other species prediction software used, was rMLST15. In contrast to KmerFinder, rMLST identifies species based only on ribosomal multi-locus sequence typing, which includes the 53 genes that encode subunits of the bacterial ribosome. rMLST was run on assembled genomes through the open access API at https://pubmlst.org/species-id/species-identification-via-api. The first hit was accepted if it had more than 90% support.

The conclusion of the in silico identified species was based on either species or genus level concordance between the top hits for KmerFinder and rMLST, or an acceptable hit from only one of the two software. The point of using two different species prediction software was to allow for a sensitive assessment of whether the genomes were contaminated (KmerFinder), while complementing with a more robust but less sensitive species prediction software (rMLST). Species that could not be exactly identified are given as NA, if the genome was validated. The genome validation is described under Technical Validation. As with QC summary data, species prediction data was submitted to the MySQL database upon genome validation, and concordance between the KmerFinder and rmlst is given.

In order to identify acquired resistance genes in the validated bacterial genomes, ResFinder version 4.116 was run on the assemblies. All samples were run with the -s other option, meaning that the samples were not run as specific species. ResFinder has the option to run the samples as specific species, in which case a secondary program, PointFinder, is run. This analysis is omitted when running as -s other, and allows for complete cross-comparability of the output data resulting from our in-house ResFinder summary script, which in this case only encompasses acquired resistance genes. The ResFinder summary script produces different overviews of the ResFinder data, with both a class level and a drug level overview of acquired resistance genes, as well as the query coverage, percent identity to reference and position in the assembly of the hit. The ResFinder summary script is submitted as supplementary material, and is available as Supplementary file 1

Genetic distance-based phylogeny was inferred for sequencing runs that passed the technical validation (see below), using Evergreen COMPARE17,18,19 (commit b512e6e). The reference database was the complete bacterial chromosomal genomes from the refseq collection of National Center for Biotechnology Information (NCBI), last fetched in April 2021, homology reduced to 98 percent sequence identity, using kma_index from KMA with the settings for homology reduction -hr 0.769 and-ht 0.769. Consequently, the threshold for accepting a matching reference was also lowered to 98% (76.90% k-mer identity), and the inclusion criterium for consensus sequence completeness reduced to 80%. For displaying the phylogenies on the website, a custom script (Supplementary file 2) was used to select the minimum amount of phylogenetic trees that in totality contained all possible samples.

Read the original here:
Whole genomes from bacteria collected at diagnostic units around ... - Nature.com

Related Posts