{"id":1121918,"date":"2024-02-07T06:20:41","date_gmt":"2024-02-07T11:20:41","guid":{"rendered":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/uncategorized\/audio-based-ai-classifiers-show-no-evidence-of-improved-covid-19-screening-over-simple-symptoms-checkers-nature-com\/"},"modified":"2024-02-07T06:20:41","modified_gmt":"2024-02-07T11:20:41","slug":"audio-based-ai-classifiers-show-no-evidence-of-improved-covid-19-screening-over-simple-symptoms-checkers-nature-com","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/covid-19\/audio-based-ai-classifiers-show-no-evidence-of-improved-covid-19-screening-over-simple-symptoms-checkers-nature-com\/","title":{"rendered":"Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers &#8211; Nature.com"},"content":{"rendered":"<p><p>Dataset and study design    <\/p>\n<p>    This section contains an overview of how the dataset was    collected, its characteristics and its underlying study design.    More in-depth descriptions are provided in two accompanying    papers: Budd and co-workers23 report a    detailed description of the full dataset, whereas Pigoli et    al.30 present the    rationale for and full details of the statistical design of our    study.  <\/p>\n<p>    Our main sources of recruitment were the REACT study and the    NHS T+T system. REACT is a prevalence survey of SARS-CoV-2 that    is based on repeated cross-sectional samples from a    representative subpopulation defined via (stratified) random    sampling from Englands NHS patient register31. The NHS T+T    service was a key part of the UK governments COVID-19 recovery    strategy for England. It ensured that anyone developing    COVID-19 symptoms could be swab tested, followed by the tracing    of recent close contacts of any individuals testing positive    for SARS-CoV-2 (ref. 25).  <\/p>\n<p>    Enrolment for both the REACT and NHS T+T recruitment channels    was performed on an opt-in basis. Individuals participating in    the REACT study were presented with the option to volunteer for    this study. For the NHS T+T recruitment channel, individuals    receiving a PCR test from the NHS T+T pillar 2 scheme were    invited to take part in research (pillar 1 tests refer to    all swab tests performed in Public Health England    laboratories and NHS hospitals for those with a clinical need,    and health and care workers, whereas pillar 2 comprises    swab testing for the wider    population25). The guidance    provided to potential participants was that they should be at    least 18 years old, had taken a recent swab test (initially no    more than 48h, changing to 72h on 14 May 2021), agree to our    data privacy statement and have their PCR barcode identifier    available, which was then internally validated.  <\/p>\n<p>    Participants were directed to the Speak up and help beat    coronavirus web page24. Here, after    agreeing to the privacy statement and completing the survey    questions, participants were asked to record four audio clips.    The first involved the participant reading out the sentence: I    love nothing more than an afternoon cream tea, which was    designed to contain a range of different vowel and nasal    sounds. This was followed by three successive sharp    exhalations, taking the form of a ha sound. The final two    recordings involved the participant performing    volitional\/forced coughs, once, and then three times in    succession. Recordings were saved in .wav format. Smart phones,    tablets, laptops and desktops were all permitted. The audio    recording protocol was homogenized across platforms to reduce    the risk of bias due to device types.  <\/p>\n<p>    Existing metadata such as age, gender, ethnicity and location    were transferred from linked T+T\/REACT records. Participants    were not asked to repeat this information to avoid survey    fatigue. An additional set of attributeshypothesized to pose    the most utility for evaluating the possibility for COVID-19    detection from audiowas collected in the digital survey. This    was in line with General Data Protection Regulation    requirements that only the personal data necessary to the task    should be collected and processed. This set included the    symptoms currently on display (the full set of which are    detailed in Fig. 1e,f), and long-term    respiratory conditions such as asthma. The participants first    language was also collected to control for different    dialects\/accents, and complement location and ethnicity.    Finally, the test centre at which the PCR was conducted was    recorded. This enabled the removal of submissions when cases    were linked to faulty test centre results. A full set of the    dataset attributes can be found in Budd and    colleagues23.  <\/p>\n<p>    The final dataset is downstream of a quality control filter    (see Fig. 1g), in which a total    of 5,157 records were removed, each with one or more of the    following characteristics: (1) missing response data (missing a    PCR test); (2) missing predictor data (any missing audio files    or missing demographic\/symptoms metadata); (3) audio submission    delays exceeding ten days post test result; (4)    self-inconsistent symptoms data; (5) a PCR testing laboratory    under investigation for unreliable results; (6) a participant    age of under 18; and (7) sensitive personal information    detected in the audio signal (see Fig. 3d of ref.    23). Pigoli et    al.30 present these    implemented filters in full, and the rationale behind each one.    The final collected dataset, after data filtration, comprised    23,514 COVID+ and 44,328 COVID    individuals recruited between March 2021 and March 2022. Please    note that the sample size here differs to that in our    accompanying papers, in which Budd et al.23 reported numbers    before the data quality filter was applied, whereas our    statistical study design considerations, detailed in a work by    Pigoli and colleagues30, focused on data    from the restricted date range spanning March to November 2021.    We note the step-like profile of the COVID count is    due to the six REACT rounds, where a higher proportion of    COVID participants were recruited than in the T+T    channel. As detailed in the geo-plots in Fig. 1a,b, the dataset    achieves a good coverage across England, with some areas    yielding more recruited individuals than others. We are pleased    to see no major correlation between geographical location and    COVID-19 status, (Fig. 1c), with Cornwall    displaying the highest level of COVID-19 imbalance, with a 0.8%    difference in percentage proportion of COVID+ and    COVID cases.  <\/p>\n<p>    In our pre-specified analysis plan, we defined three training    sets and five test sets to define a range of analyses in which    we investigate, characterize and control for the effects of    enrolment bias in our data:  <\/p>\n<p>        Randomized train and test sets. A        participant-disjoint train and test set was randomly        created from the whole dataset, similar to methods in        previous works.      <\/p>\n<p>        Standard train and test set. Designed to be a        challenging, out-of-distribution evaluation procedure.        Carefully selected attributes such as geographical        location, ethnicity and first language are held out for the        test set. The standard test set was also engineered to over        represent sparse combinations of categories such as older        COVID+ participants30. The samples        included in this split exclusively consist of recordings        made prior to 29 November 2021.      <\/p>\n<p>        Matched train and test sets. The numbers of        COVID and COVID+ participants are        balanced within each of several key strata. Each stratum is        defined by a unique combination of measured confounders,        including binned age, gender and a number of binary        symptoms (for example, cough, sore throat, shortness of        breath; see Methods for a full        description). The samples included in this split        exclusively consist of recordings made prior to 29 November        2021.      <\/p>\n<p>        Longitudinal test set. To examine how classifiers        generalized out-of-sample over time, the longitudinal test        set was constructed only from participants joining the        study after 29 November 2021.      <\/p>\n<p>        Matched longitudinal test set. Within the        longitudinal test set, the numbers of COVID and        COVID+ participants are balanced within each of        several key strata, similarly as in the matched test set        above.      <\/p>\n<p>    The supports for each of these splits are detailed in Fig.    1h.  <\/p>\n<p>    Three separate models were implemented for the task of COVID-19    detection from audio, each representing an independent machine    learning pipeline. These three models collectively span the    machine learning research space thoroughlyranging from the    established baseline to the current state of the art in audio    classification technologiesand are visually represented in    Extended Data Fig. 7. We also fitted an    RF classifier to predict COVID-19 status from self-reported    symptoms and demographic data. The outcome used to train and    test each of the prediction models was a participants    SARS-CoV-2 PCR test result. Each models inputs and predictors,    and the details on how they are handled, can be found below.    Wherever applicable, we have reported our studys findings in    accordance with TRIPOD statement guidelines32. The following    measures were used to assess model performance: ROCAUC, area    under the precisionrecall curve (PRAUC), and UAR (also known    as balanced accuracy). Confidence intervals for ROCAUC, PRAUC    and UAR are based on the normal approximation    method33, unless    otherwise stated to be calculated by the DeLong    method34.  <\/p>\n<p>    We defaulted to the widely used openSMILESVM    approach35 for our baseline    model. Here, 6,373 handcrafted features (the ComParE 2016    set)including the zero-crossing rate and shimmer, which have    been shown to represent human paralinguistics wellare    extracted from the raw audio form. These features are then    concatenated to form a 6,373-dimensional vector,    fopenSMILE(w)v, where the    raw waveform, ({{{bf{w}}}}in    {{mathbb{R}}}^{n}) (n=clip duration in    secondssample rate) is transformed to ({{{bf{v}}}}in    {{mathbb{R}}}^{6,373}); v is then normalized    prior to training and inference. A linear SVM is fitted to this    space and tasked with binary classification. We select the    optimal SVM configuration on the basis of the validation set    before then retraining on the combined trainvalidation set.  <\/p>\n<p>    Bayesian neural networks provide estimates of uncertainty,    alongside strong supervised classification performance, which    is desirable for real-world use cases, especially those    involving clinical use. Bayesian neural networks are naturally    suited to Bayesian decision theory, which benefits    decision-making applications with different costs on error    types (for example, assigning unequal weighting to errors in    different COVID-19 outcome classifications)36,37. We thus supply    a ResNet-50 (ref. 38) BNN model. The    base ResNet-50 model showed initial strong promise for    ABCS5, further    motivating its inclusion in this comparison. We achieve    estimates of uncertainty through Monte-Carlo Dropout to achieve    approximate Bayesian inference over the posterior, as in ref.    39. We opt to use    the pre-trained model for a warm start to the weight    approximations, and allow full retraining of layers.  <\/p>\n<p>    The features used to create an intermediate representation, as    input to the convolutional layers, are Mel filterbank features    with default configuration from the VGGish GitHub (ref.    40): ({{{{bf{X}}}}}_{i}in {{mathbb{R}}}^{96times    64}), 64 log-mel spectrogram coefficients using 96    feature frames of 10ms duration, taken from a resampled signal    at 16kHz. Each input signal was divided into these    two-dimensional windows, such that a 2,880ms clip would    produce three training examples with the label assigned to each    clip (COVID+ or COVID). Incomplete    frames at edges were discarded. As with the openSMILESVM,    silence was not removed. For evaluation, the mean prediction    over feature windows was taken per audio recording, to produce    a single decision per participant. To make use of the available    uncertainty metrics, Supplementary Note 3 details an    uncertainty analysis over all audio modalities for a range of    traintest partitions.  <\/p>\n<p>    In recent years, transformers41 have started to    perform well in high-dimensional settings such as    audio42,43. This is    particularly the case when models are first trained in a    self-supervised manner on unlabelled audio data. We adopt the    SSAST44, which is on a    par with the current state of the art for audio event    classification. Raw audio is first resampled to 16kHz and    normalized before being transformed into Mel filter banks.    Strided convolutional neural layers are used to project the Mel    filter bank to a series of patch level representations. During    self-supervised pretraining, random patches are masked before    all of the patches are passed to a transformer encoder. The    model is trained to jointly reconstruct the masked audio and to    classify the order of which the masked audio occurs. The    transformer is made up of 12 multihead attention blocks. The    model is trained end to end, with gradients being passed all of    the way back to the convolutional feature extractors. The model    is pre-trained on a combined set of AudioSet-2M (ref.    45) and    Librispeech46, representing    over two million audio clips for a total of ten epochs. The    model is then fine-tuned in a supervised manner on the task of    COVID-19 detection from audio. Silent sections of audio    recordings are removed before then being resampled to 16kHz    and normalized. Clips are cut\/zero-padded to a fixed length of    5.12s, which corresponds to approximately the mean length of    the audio clip. For cases in which the signal length exceeds    5.12s (after silence is removed), the first 5.12s are taken.    At the training time, the signal is augmented through applying    SpecAugment47 along with the    addition of Gaussian noise. The output representations are mean    pooled before being fed through a linear projection head. No    layers are frozen and again the model is trained end-to-end.    The model is fine-tuned for a total of 20 epochs. The model is    evaluated on the validation set at the end of each epoch and    its weights are saved. At the end of training the best    performing model, over all epochs, is chosen.  <\/p>\n<p>    To predict SARS-CoV-2 infection status from self-reported    symptoms and demographic data, we applied an RF classifier with    default settings (having self-reported symptoms and demographic    data as inputs). In our dataset, predictor variables for the    symptoms RF classifier on our dataset comprised: cough; sore    throat; asthma; shortness of breath; runny\/blocked nose; a new    continuous cough; Chronic obstructive pulmonary disease (COPD)    or emphysema; another respiratory condition; age; gender;    smoker status; and ethnicity. In Han and colleagues    dataset18, predictor    variables for the symptoms RF classifier comprised: tightness    of chest; dry cough; wet cough; runny\/blocked nose; chills;    smell\/taste loss; muscle ache; headache; sore throat; short    breath; dizziness; fever; runny\/blocked nose; age; gender;    smoker status; language; and location. Prior to training,    categorical attributes were one-hot encoded. No hyperparameter    tuning was performed, and models were trained on the combined    Standard train and validation sets. For the hybrid    symptoms+audio RF classifier, the outputted predicted    COVID+ probability from an audio-trained SSAST is    appended as an additional input variable to the self-reported    symptoms and demographic variables listed above.  <\/p>\n<p>    The matched test set was constructed by exactly balancing the    numbers of individuals with COVID+ and    COVID in each stratum where, to be in the same    stratum, individuals must be matched on all of (recruitment    channel)(10-year-wide age bins)(gender)(all of six    binary symptoms covariates). The six binary symptoms matched on    in the matched test set were: cough; sore throat; asthma;    shortness of breath; runny\/blocked nose; and at least one    symptom.  <\/p>\n<p>    Our matching algorithm proceeds as follows. First, each    participant is mapped to exactly one stratum. Second, the    following matching procedure is applied separately in each    stratum: in stratum s (of a total of S strata)    let ns,+ and    ns, denote the number of individuals    with COVID+ and COVID, respectively, and    let ({{{{mathscr{A}}}}}_{s,+}) and    ({{{{mathscr{A}}}}}_{s,-})    be the corresponding sets of individuals. Use ({{{{mathscr{M}}}}}_{s,+}) and    ({{{{mathscr{M}}}}}_{s,-})    to denote random samples without replacement of size    (min    {{n}_{s,+},{n}_{s,-}}) from ({{{{mathscr{A}}}}}_{s,+}) and    ({{{{mathscr{A}}}}}_{s,-})    respectively. Finally we combine matched individuals across all    strata into the matched dataset ({{{mathscr{M}}}}) defined as:  <\/p>\n<p>      $${{{mathscr{M}}}}:= {cup }_{s =      1}^{S}({{{{mathscr{M}}}}}_{s,+}cup      {{{{mathscr{M}}}}}_{s,-}).$$    <\/p>\n<p>    The resulting matched test set comprised 907 participants who    were COVID positive and 907 who were COVID negative. The    matched training set was constructed similarly to the matched    test set, though with slightly different strata, so as to    increase available sample size. For the matched training set,    individuals were matched on all of: (10-year-wide age    bins)(gender)(all of seven binary covariates). The seven    binary covariates used for the matched training set were:    cough; sore throat; asthma; shortness of breath; runny\/blocked    nose; COPD or emphysema; and smoker status. The resulting    matched training set comprised 2,599 participants who were    COVID positive and 2,599 who were COVID negative.  <\/p>\n<p>    We consider the action of applying a particular testing    protocol to an individual randomly selected from a population.    The four possible outcomes ({O}_{hat{y},y}) are  <\/p>\n<p>      $${O}_{hat{y},y}:=      [,{{mbox{Predict COVID{{mbox{-}19}} status      as}}},,hat{y}],{{{rm{AND}}}},[,{{mbox{True      COVID{{mbox{-}19}} status is}}},,y]$$    <\/p>\n<p>      (2)    <\/p>\n<p>    for predicted COVID-19 status (hat{y}in {0,1}) and true COVID-19    status y{0,1}. We denote    the probability of outcome ({O}_{hat{y},y}) by  <\/p>\n<p>      $${p}_{hat{y},y}:=      {mathbb{P}}({O}_{hat{y},y})$$    <\/p>\n<p>      (3)    <\/p>\n<p>    and use ({u}_{hat{y},y}) to    denote the combined utility of the consequences of outcome    ({O}_{hat{y},y}). For a    particular population prevalence proportion, , the    ({p}_{hat{y},y}) are    subject to the constraints  <\/p>\n<p>      $${p}_{0,1}+{p}_{1,1}=uppi$$    <\/p>\n<p>      (4)    <\/p>\n<p>      $${p}_{0,0}+{p}_{1,0}=1-uppi      ,$$    <\/p>\n<p>      (5)    <\/p>\n<p>    leading to the following relationships, valid for    (0,1), involving the    sensitivity and specificity of the testing protocol:  <\/p>\n<p>      $${{{rm{sensitivity}}}}equiv      frac{{p}_{1,1}}{{p}_{1,1}+{p}_{0,1}}=frac{{p}_{1,1}}{uppi      }$$    <\/p>\n<p>      (6)    <\/p>\n<p>      $${{{rm{specificity}}}}equiv      frac{{p}_{0,0}}{{p}_{0,0}+{p}_{1,0}}=frac{{p}_{0,0}}{1-uppi      }.$$    <\/p>\n<p>      (7)    <\/p>\n<p>    The expected utility is:  <\/p>\n<p>      $${{{rm{EU}}}}=mathop{sum}limits_{hat{y}in      {0,1}}mathop{sum}limits_{yin      {0,1}}{u}_{hat{y},y}{p}_{hat{y},y}$$    <\/p>\n<p>      (8)    <\/p>\n<p>      $$={u}_{1,1}{p}_{1,1}+{u}_{0,1}(uppi      -{p}_{1,1})+{u}_{0,0}{p}_{0,0}+{u}_{1,0}(1-uppi      -{p}_{0,0})$$    <\/p>\n<p>      (9)    <\/p>\n<p>      $$begin{array}{l}=uppi      [({u}_{1,1}-{u}_{0,1})times      {{{rm{sensitivity}}}}+{u}_{0,1}]\\+(1-uppi      )[({u}_{0,0}-{u}_{1,0})times      {{{rm{specificity}}}}+{u}_{1,0}],end{array}$$    <\/p>\n<p>      (10)    <\/p>\n<p>    where equations (4) and (5) are substituted into    equation (8) to obtain equation    (9), and equations    (6) and (7) are substituted into    equation (9) to obtain equation    (10).  <\/p>\n<p>    To provide researchers easy access to running the code, we have    created a demonstration notebook where the participant is    invited to record their own sentence, cough, three cough    or exhalation sounds and evaluate our COVID-19 detection    machine learning models on it. The model outputs a COVID-19    prediction, along with some explainable AI analysis, for    example, enabling the user to listen back to the parts of the    signal which the model allocated the most attention to. In the    demonstration, we detail that this is not a clinical diagnostic    test for COVID-19, but that it is instead for research purposes    and does not provide any medical recommendation, nor should any    action be taken following its use. The demonstration file is    detailed on the main repository page and can be accessed at        <a href=\"https:\/\/colab.research.google.com\/drive\/1Hdy2H6lrfEocUBfz3LoC5EDJrJr2GXpu?usp=sharing\" rel=\"nofollow\">https:\/\/colab.research.google.com\/drive\/1Hdy2H6lrfEocUBfz3LoC5EDJrJr2GXpu?usp=sharing<\/a>.  <\/p>\n<p>    Further information on research design is available in the    Nature Portfolio    Reporting Summary linked to this article.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Excerpt from:<\/p>\n<p><a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/www.nature.com\/articles\/s42256-023-00773-8\" title=\"Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers - Nature.com\">Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers - Nature.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Dataset and study design This section contains an overview of how the dataset was collected, its characteristics and its underlying study design. More in-depth descriptions are provided in two accompanying papers: Budd and co-workers23 report a detailed description of the full dataset, whereas Pigoli et al.30 present the rationale for and full details of the statistical design of our study <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/covid-19\/audio-based-ai-classifiers-show-no-evidence-of-improved-covid-19-screening-over-simple-symptoms-checkers-nature-com\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[411164],"tags":[],"class_list":["post-1121918","post","type-post","status-publish","format-standard","hentry","category-covid-19"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1121918"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=1121918"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1121918\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=1121918"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=1121918"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=1121918"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}