{"id":1067871,"date":"2024-06-12T02:51:07","date_gmt":"2024-06-12T06:51:07","guid":{"rendered":"https:\/\/www.immortalitymedicine.tv\/assessing-calibration-and-bias-of-a-deployed-machine-learning-malnutrition-prediction-model-within-a-large-healthcare-nature-com\/"},"modified":"2024-08-18T11:40:19","modified_gmt":"2024-08-18T15:40:19","slug":"assessing-calibration-and-bias-of-a-deployed-machine-learning-malnutrition-prediction-model-within-a-large-healthcare-nature-com","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/machine-learning\/assessing-calibration-and-bias-of-a-deployed-machine-learning-malnutrition-prediction-model-within-a-large-healthcare-nature-com.php","title":{"rendered":"Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare &#8230; &#8211; Nature.com"},"content":{"rendered":"<p><p>    The primary training cohort used to recalibrate the model    included 49,652 patients (median [IQR] age = 66.0 [26.0]), of    which 49.9% self-identified as female, 29.6% self-identified as    Black or African American, 54.8% were on Medicare and 27.8% on    Medicaid. 11,664 (24%) malnutrition cases were identified.    Baseline characteristics are summarized in Table 1    and malnutrition event rates are summarized in Supplementary    Table 2. The validation    cohort used to test the model included 17,278 patients (median    [IQR] age = 66.0 [27.0]), of which 49.8% self-identified as    female, 27.1% self-identified as Black or African American,    52.9% were on Medicare, and 28.2% on Medicaid. 4,005 (23%)    malnutrition cases were identified.  <\/p>\n<p>    Although the model overall had a c-index of 0.81 (95% CI: 0.80,    0.81), it was miscalibrated according to both weak and moderate    calibration metrics, with a Brier score of 0.26 (95% CI: 0.25,    0.26) (Table 2), indicating that the    model is relatively inaccurate17. It also    overfitted the risk estimate distribution, as evidenced by the    calibration curve (Supplementary Fig. 1). Logistic    recalibration of the model successfully improved calibration,    bringing the calibration intercept to 0.07 (95% CI: 0.11,    0.03), calibration slope to 0.88 (95% CI: 0.86, 0.91), and    significantly decreasing Brier score (0.21, 95% CI: 0.20,    0.22), Emax (0.03, 95% CI: 0.01, 0.05), and Eavg (0.01, 95% CI:    0.01, 0.02). Recalibrating the model improved specificity (0.74    to 0.93), PPV (0.47 to 0.60), and accuracy (0.74 to 0.80) while    decreasing sensitivity (0.75 to 0.35) and NPV (0.91 to 0.83)    (Supplementary Tables 2 and 3).  <\/p>\n<p>    Weak and moderate calibration metrics between Black and White    patients significantly differed prior to recalibration (Table    3, Supplementary Fig.    2A, B), with the    model having a more negative calibration intercept for White    patients on average compared to Black patients (1.17 vs.    1.07), and Black patients having a higher calibration slope    compared to White patients (1.43 vs. 1.29). Black patients had    a higher Brier score of 0.30 (95% CI: 0.29, 0.31) compared to    White patients with 0.24 (95% CI: 0.23, 0.24). Logistic    recalibration significantly improved calibration for both Black    and White patients (Table 4, Fig. 1ac). For Black    patients within the hold-out set, the recalibrated calibration    intercept was 0 (95% CI: -0.07, 0.05), calibration slope was    0.91 (95% CI: 0.87, 0.95), and Brier score improved from 0.30    to 0.23 (95% CI: 0.21, 0.25). For White patients within the    hold-out set, the recalibrated calibration intercept was -0.15    (95% CI: -0.20, -0.10), calibration slope was 0.82 (95% CI:    0.78, 0.85), and Brier score improved from 0.24 to 0.19 (95%    CI: 0.18, 0.21). Post-recalibration, calibration for Black and    White patients still differed significantly according to weak    calibration metrics, but not so according to moderate    calibration metrics and the strong calibration curves (Table    4, Fig. 1). Calibration curves    of the recalibrated model showed good concordance between    actual and predicted event probabilities, although the    predicted risks for Black and White patients differed between    the 30th and 60th risk percentiles. Logistic recalibration also    improved the specificity, PPV, and accuracy, but decreased the    sensitivity and NPV of the model across both White and Black    patients (Supplementary Tables 2and 3). Discriminative    ability was not significantly different for White and Black    patients before and after recalibration. We also found    calibration statistics to be relatively similar in Asian    patients (Supplementary Table 4).  <\/p>\n<p>            Columns from left to right are curves for a, No            Recalibration b, Recalibration-in-the-Large and            c, Logistic Recalibration for Black vs. White            patients d, No Recalibration e,            Recalibration-in-the-Large and f, Logistic            Recalibration for male vs. female patients.          <\/p>\n<p>    Calibration metrics between male and female patients also    significantly differed prior to recalibration (Table    3, Supplementary Fig.    2C, D). The model had    a more negative calibration intercept for female patients on    average compared to male patients (1.49 vs. 0.88). Logistic    recalibration significantly improved calibration for both male    and female patients (Table 4, Fig. 1df). In male patients    within the hold-out set, the recalibrated calibration intercept    was 0 (95% CI: 0.05, 0.03), calibration slope was 0.88 (95%    CI: 0.85, 0.90), and Brier score improved from 0.29 to 0.23    (95% CI: 0.22, 0.24). In female patients within the hold-out    set, the recalibrated calibration intercept was 0.11 (95% CI:    0.16, 0.06), calibration slope was 0.91 (95% CI: 0.87, 0.94),    but the Brier score did not significantly improve. After    logistic recalibration, only calibration intercepts differed    between male and female patients. Calibration curves of the    recalibrated model showed good concordance, although the    predicted risks for males and females differed between the 10th    and 30th risk percentiles. Discrimination metrics for male and    female patients were significantly different before    recalibration. The model had a higher sensitivity and NPV for    females than males, but a lower specificity, PPV, and accuracy    (Supplementary Table 2). The recalibrated    model had the highest sensitivity (0.95, 95% CI: 0.94, 0.96),    NPV (0.84, 95% CI: 0.83, 0.85) and accuracy (0.82, 95% CI:    0.81, 0.83) for female patients, at the cost of substantially    decreasing sensitivity (0.27, 95% CI: 0.25, 0.30)    (Supplementary Table 3).  <\/p>\n<p>    We also assessed calibration by payor type and hospital type as    sensitivity analyses. In the payor type analysis, we found that    malnutrition predicted risk was more miscalibrated in patients    with commercial insurance with more extreme calibration    intercepts, Emax, and Eavg suggesting overestimation of risk    (Supplementary Tables 5 and 6, Supplementary Fig.    3A, B). We did not    observe substantial differences in weak or moderate calibration    across hospital type (community, tertiary, quaternary) except    that tertiary acute care centers had a more extreme calibration    intercept, suggesting an overestimation of risk (Supplementary    Tables 7 and 8, Supplementary Fig.    3C, D). Across both    subgroups, logistic recalibration significantly improved    calibration across weak, moderate, and strong hierarchy tiers    (Supplementary Table 5, Supplementary    Table 7, Supplementary    Figs. 4 and 5).  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Read this article:<br \/>\n<a target=\"_blank\" href=\"https:\/\/www.nature.com\/articles\/s41746-024-01141-5\" title=\"Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare ... - Nature.com\" rel=\"noopener\">Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare ... - Nature.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> The primary training cohort used to recalibrate the model included 49,652 patients (median [IQR] age = 66.0 [26.0]), of which 49.9% self-identified as female, 29.6% self-identified as Black or African American, 54.8% were on Medicare and 27.8% on Medicaid. 11,664 (24%) malnutrition cases were identified. Baseline characteristics are summarized in Table 1 and malnutrition event rates are summarized in Supplementary Table 2 <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/machine-learning\/assessing-calibration-and-bias-of-a-deployed-machine-learning-malnutrition-prediction-model-within-a-large-healthcare-nature-com.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[1231415],"tags":[],"class_list":["post-1067871","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1067871"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=1067871"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1067871\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=1067871"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=1067871"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=1067871"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}