{"id":1067874,"date":"2024-06-12T02:51:10","date_gmt":"2024-06-12T06:51:10","guid":{"rendered":"https:\/\/www.immortalitymedicine.tv\/a-machine-learning-based-approach-for-constructing-remote-photoplethysmogram-signals-from-video-cameras-nature-com\/"},"modified":"2024-08-18T11:40:21","modified_gmt":"2024-08-18T15:40:21","slug":"a-machine-learning-based-approach-for-constructing-remote-photoplethysmogram-signals-from-video-cameras-nature-com","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/machine-learning\/a-machine-learning-based-approach-for-constructing-remote-photoplethysmogram-signals-from-video-cameras-nature-com.php","title":{"rendered":"A machine learning-based approach for constructing remote photoplethysmogram signals from video cameras &#8230; &#8211; Nature.com"},"content":{"rendered":"<p><p>    In this section, the methodology used in this study is    presented, from the data processing techniques to the models    used to construct the rPPG. A general visualization of the    pipeline is presented in Fig.1.  <\/p>\n<p>            From data processing to comparison of the reference            photoplethysmogram (PPG) with the remote            photoplethysmogram (rPPG) constructed by the model. CV            cross-validation, RGB red, green, and blue channels, ML            machine learning. Colors: the green signal refers to            the rPPG reconstructed by the model, and the black            signal refers to the fingertip PPG.          <\/p>\n<p>    For this study, three public datasets were utilized:  <\/p>\n<p>    LGI-PPGI: This dataset is published under the CC-BY-4.0    license. The study was supported by the German Federal Ministry    of Education and Research (BMBF) under the grant agreement    VIVID 01S15024 and by CanControls    GmbH Aachen21. The LGI-PPGI    dataset is a collection of videos featuring six participants,    the sex of five is male and one is female. The participants    were recorded while performing four activities: Rest, Talk, Gym    (exercise on a bicycle ergometer), and Rotation (rotation of    the head of the subject at different speeds). The videos were    captured using a Logitech HD C270 webcam with a frame rate of    25 fps, and cPPG signals were collected using a CMS50E PPG    device at a sampling rate of 60 Hz. The videos were shot in    varying lighting conditions, with talking scenes recorded    outdoors and other activities taking place indoors.  <\/p>\n<p>    PURE: Access to this dataset is granted upon request. It    received support from the Ilmenau University of Technology, the    Federal State of Thuringia, and the European Social Fund (OP    2007-2013) under grant agreement N501\/2009 for the project    SERROGA (project number 2011FGR0107)26. The PURE dataset    contains videos of 10 participants, of which eight have the sex    male and two female, engaged in various activities classified    as Steady, Talk, Slow Translation (average speed is 7% of the    face height per second), Fast Translation (average speed is 14%    of the face height per second), Small Rotation (average head    angle of 20), and Medium    Rotation (average head angle of 35). The videos were captured using a    640480 pixel eco274CVGE camera by SVS-Vistek GmbH, with a 30    fps frame rate and a 4.8 mm lens. The cPPG signals were    collected using a CMS50E PPG device at a sampling rate of 60    Hz. The videos were shot in natural daylight, with the camera    positioned at an average distance of 1.1 m from the    participants faces.  <\/p>\n<p>    MR-NIRP indoor: This dataset is openly accessible without any    restrictions. It received funding under the NIH grant    5R01DK113269-0227. The MR-NIRP    indoor video dataset is comprised of videos of eight    participants, including six participants with sex male and two    female, with different skin tones: 1 Asian, 4 Indian, and 3    Caucasian. The participants were recorded while performing    Still and Motion activities, with talking and head movements    being part of the latter. The videos were captured using a FLIR    Blackfly BFLY-U3-23S6C-C camera with a resolution of 640640    and a frame rate of 30 fps. The cPPG signals were collected    using a CMS 50D+ finger pulse oximeter at a sampling rate of 60    Hz.  <\/p>\n<p>    Each dataset includes video recordings of participants engaged    in various activities, alongside a reference cPPG signal    recorded using a pulse oximeter. Table1    provides detailed characteristics of each dataset.  <\/p>\n<p>    The datasets used in our research are not only publicly    available but are also extensively utilized within the    scientific community for various secondary analyses. All    datasets received the requisite ethical approvals and informed    consents, in accordance with the regulations of their    respective academic institutions. This compliance facilitated    the publication of the data in academic papers and its    availability online. The responsibility for managing ethical    compliance was handled by the original data providers. They    ensured that these datasets were made available under terms    that permit their use and redistribution with appropriate    acknowledgment.  <\/p>\n<p>    Given the extensive use of these datasets across multiple    studies, additional IRB approval for secondary analyses of    de-identified and publicly accessible data is typically not    required. This practice aligns with the policies at ETH Zurich,    which do not mandate further IRB approval for the use of    publicly available, anonymized data.  <\/p>\n<p>    A comprehensive description of each dataset, including its    source, funding agency, and licensing terms, has been provided    in the manuscript. This ensures full transparency and adherence    to both ethical and legal standards.  <\/p>\n<p>    Several steps were necessary to extract the rPPG signal from a    single video. First, the regions of interest (RoI) were    extracted from the face. We extracted information from the    forehead and cheeks using the pyVHR    framework28, which includes    the software MediaPipe for the extraction of RoI from a human    face29. The RoI    extracted from every individual were composed of a total of 30    landmarks. Each landmark is a specific region of the face,    represented by a number that indicates the location of that    region. The landmarks 107, 66, 69, 109, 10, 338, 299, 296, 336,    and 9 were extracted from the forehead, the landmarks 118, 119,    100, 126, 209, 49, 129, 203, 205, and 50 were extracted from    the left cheek, and the landmarks 347, 348, 329, 355, 429, 279,    358, 423, 425, and 280 were extracted from the right cheek.    Every landmark was composed of 3030 pixels, and the average    across the red, green, and blue (RGB) channels was computed for    every landmark. The numbers of the landmarks of each area    represent approximately evenly spaced regions of that area.  <\/p>\n<p>    After all the landmarks were extracted, the RGB signals of each    landmark were used as input for the algorithms CHROM, LGI-PPGI,    POS, and ICA. These algorithms were chosen because of their    effectiveness in separating the color information related to    blood flow from the color information not related to blood    flow, as well as their ability to extract PPG signals from    facial videos. CHROM separates the color information by    projecting it onto a set of basis vectors, while LGI-PPGI uses    local gradient information to extract PPG signals. POS uses a    multi-channel blind source separation algorithm to extract    signals from different sources, and ICA separates the PPG    signals from the other sources of variation in the video. These    methods were chosen based on their performance in previous    studies and their ability to extract high-quality PPG signals    from facial videos20,23.  <\/p>\n<p>    For the data processing, the signals used as rPPG are the    outputs of the algorithms ICA, CHROM, LGI, and POS, and the    cPPG signals were resampled to the same fps as the rPPG. First,    the filters detrend and bandpass were applied to both the rPPG    and cPPG signals. Bandpass is a sixth-order Butterworth with a    cutoff frequency of 0.654 Hz. The chosen frequency range was    intended to filter out noise in both low and high frequencies.    Next, the rPPG signals were filtered by removing low variance    signals and were segmented into non-overlapping windows of 10    seconds, followed by minmax normalization. We applied    histogram equalization to the obtained spatiotemporal maps,    showing a general improvement in the performance of the    methods.  <\/p>\n<p>    Spectral analysis was performed on both the rPPG and cPPG    signals by applying Welchs method to each window of the    constructed rPPG and cPPG signals. The highest peak in the    frequency domain was selected as the estimated HR, with    alternative methods such as autocorrelation also tested.    However, these methods showed minimal absolute differences in    beats-per-minute absolute difference (HR). Welchs method    was deemed useful as it allowed for heart rate evaluation in    the frequency domain and demonstrated the predictive capability    of each channels rPPG signal.  <\/p>\n<p>    The model was trained using data sourced from the PURE dataset.    The input data contains information from 10 participants. Each    participant was captured across 6 distinct videos, engaging in    activities categorized as Steady, Talk, Slow Translation, Fast    Translation, Small Rotation, and Medium Rotation. This accounts    for a total of 60 videos, with an approximate average duration    of 1min. Each video was transformed to RGB signals. Then,    every RGB set of signals representing a video was subdivided    into 10-s fragments, with each fragment serving as a unit for    training data. The dataset used to train the model contains a    total of 339 such samples.  <\/p>\n<p>    Because the duration of each video is 10 seconds and the frame    rate is 30, each sample is represented by three RGB signals    composed of 300 time-steps. The RGB signals, serving as    training inputs, underwent a transformation process resulting    in the derivation of four distinct signals through the    application of the POS, CHROM, LGI, and ICA methods.    Consequently, each 10-s segment yielded four transformed    signals, which were intended for subsequent utilization as    input for the model. Before being fed to the model, data    preprocessing was applied to the signals. Then, a 5-fold    cross-validation (CV) procedure was conducted. During this    procedure, the dataset was partitioned into five subsets, with    a distribution ratio of 80% for training data and 20% for    testing data within each fold.  <\/p>\n<p>    The models architecture was composed of four blocks of LSTM    and dropout, followed by a dense layer. The model architecture    is shown in Fig.2. To reduce the number    of features of the model in each layer, the number of cells in    each block decreases from 90 to 1. The learning rate scheduler    implemented was ReduceLROnPlateau and the optimizer was    Adam30. Finally, the    metrics root mean squared error (RMSE) and Pearson correlation    coefficient (r) were set as loss function.  <\/p>\n<p>            The model architecture generates a remote            photoplethysmogram (rPPG) signal from three regions of            interest: the forehead (R1), left cheek (R2), and right            cheek (R3). The average value from each region is            calculated, and these averages are then combined to            produce the overall rPPG signal. The model is composed            of four blocks of LSTM and dropout, followed by a dense            layer. The methods ICA, LGI, CHROM, and POS were used            as input to the model. rPPG remote photoplethysmogram,            RGB red, green, and blue channels, LSTM long short-term            memory.          <\/p>\n<p>    To evaluate the signals, we applied four criteria: Dynamic Time    Warping (DTW), Pearsons r correlation coefficient,    RMSE, and HR. We computed each criterion for every window in    each video. We then took the average of the values of all the    windows to obtain the final results. This helped us to analyze    the results of every model from different points of view.  <\/p>\n<p>    DTW31 is a useful    algorithm for measuring the similarity between two time series,    especially when they have varying speeds and lengths. The use    of DTW is also relevant for this case because the rPPG and its    ground truth may not be aligned sometimes, so metrics that rely    on matching timestamps are less appropriate. The metric was    implemented using the Python package    DTAIDistance32.  <\/p>\n<p>    The equation below shows how the r coefficient    calculates the strength of the relationship between rPPG and    cPPG.  <\/p>\n<p>      $$r=frac{mathop{sum      }nolimits_{i =      1}^{N}({x}_{i}-hat{x})({y}_{i}-hat{y})}{sqrt{mathop{sum      }nolimits_{i =      1}^{N}{({x}_{i}-hat{x})}^{2}}sqrt{mathop{sum      }nolimits_{i = 1}^{N}{({y}_{i}-hat{y})}^{2}}}$$    <\/p>\n<p>      (1)    <\/p>\n<p>    In this equation, xi and    yi are the values of the rPPG and PPG    signals at lag i, respectively. (hat{x}) and (hat{y}) are their mean values.    N is the number of values in the discrete signals.  <\/p>\n<p>    The equation below shows how RMSE calculates the prediction    error, which is the difference between the ground truth values    and the extracted rPPG signals.  <\/p>\n<p>      $${{{{{{mathrm{RMSE}}}}}}},=sqrt{frac{mathop{sum      }nolimits_{i =      1}^{N}{left({x}_{i}-{y}_{i}right)}^{2}}{N}}$$    <\/p>\n<p>      (2)    <\/p>\n<p>    In this equation, N is the number of values and    xi, yi are    the values of the rPPG and contact PPG signals at lag i,    respectively.  <\/p>\n<p>    HR was estimated using Welchs method, which computes the power    spectral density of a signal and finds the highest peak in the    frequency domain. The peak was searched within a range of    39240 beats-per-minute (BPM), which is the expected range of    human BPMs. HR is obtained as the absolute difference between    the HR estimated from rPPG and the HR estimated from cPPG.  <\/p>\n<p>    To evaluate the models performance, we applied non-parametric    statistical tests, which have fewer assumptions about the data    distribution than parametric ones. Some comparisons involved    small sample sizes, such as those with a limited number of    subjects.  <\/p>\n<p>    The Friedman Test33 is appropriate    for this study because it evaluates the means of three or more    groups. Every group is represented by a model. If the    p-value is significant, the means of the groups are not    equal. The Nemenyi Test34 was used to    calculate the difference in the average ranking values and then    to compare the difference with a critical distance (CD). The    general procedure is to apply the Friedmann test to each group    and if the p-value is significant, the means of the    groups are not equal. In that case, the Nemenyi test is    performed to compare the methods pairwise. The Nemenyi test    helps to identify which methods are similar or different in    terms of their average ranks. The Bonferroni correction was    applied for multiple-comparison correction.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Read the original:<br \/>\n<a target=\"_blank\" href=\"https:\/\/www.nature.com\/articles\/s43856-024-00519-6\" title=\"A machine learning-based approach for constructing remote photoplethysmogram signals from video cameras ... - Nature.com\" rel=\"noopener\">A machine learning-based approach for constructing remote photoplethysmogram signals from video cameras ... - Nature.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> In this section, the methodology used in this study is presented, from the data processing techniques to the models used to construct the rPPG. A general visualization of the pipeline is presented in Fig.1.  <a href=\"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/machine-learning\/a-machine-learning-based-approach-for-constructing-remote-photoplethysmogram-signals-from-video-cameras-nature-com.php\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"limit_modified_date":"","last_modified_date":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[1231415],"tags":[],"class_list":["post-1067874","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"modified_by":null,"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1067874"}],"collection":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/comments?post=1067874"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/posts\/1067874\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/media?parent=1067874"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/categories?post=1067874"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/futurist-transhuman-news-blog\/wp-json\/wp\/v2\/tags?post=1067874"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}