Genomic researchers gain access to CSIRO’s AI-powered data … – Microsoft

Posted: May 4, 2023 at 12:15 pm

Genome-wide association studies (GWAS) play a crucial role in medical research. By examining millions of genetic variants across the entire genome in large populations, these studies can identify genetic variations that contribute to a particular disease or trait. GWAS have already led to breakthroughs in disease prevention, personalised medicine and drug development.

However, Dr Denis Bauer, who leads the Transformational Bioinformatics Group at the Commonwealth Scientific and Industrial Research Organisation (CSIRO), notes that traditional GWAS evaluate disease association for each genomic location individually, which can be limiting for complex diseases.

These diseases, such as dementia, represent the largest burden on the healthcare system and involve genes that interact with each other to create disease risk, she explains.

Statistical models struggle to evaluate the joint contribution of variants across the genome. So other common approaches compromise by investigating interactions between locations that have already shown association with the disease. Unfortunately, this approach runs the risk of not including the real drivers of disease that may have no effect individually but jointly contribute to disease development.

This limitation in traditional GWAS is one of the main reasons CSIRO created VariantSpark. The scalable machine learning framework, which recently became available on Microsofts Azure Marketplace. VariantSpark enables researchers to quickly and accurately analyse high-dimensional genomic data data sets with a large number of variables to find novel disease genes or predictive biomarkers.

In complex diseases, we are hunting very subtle signals, which means we need very large data sets to make robust statements, says Dr Bauer. VariantSpark can scale to mega-biobanks with millions of samples and is 90 per cent faster than traditional compute frameworks.

This puts researchers on the right track for finding evidence of epistasis, the non-additive gene-gene interactions that are postulated to drive complex diseases. It also boosts their ability to find predictive biomarkers that allow disease to be diagnosed early to potentially stop or delay progression.

Another reason CSIRO created VariantSpark was to help its research collaborators analyse their increasingly large and complex genomic data sets.

We were involved in analysing a cohort of several thousand individuals, and all the other tools failed on the size. So we either needed to compromise by analysing only a subset of the data, or innovate, says Dr Bauer.

We wanted to make VariantSpark publicly available because if we have problems processing large volumes of data or deeply interrogating complex cohorts, a lot of other researchers probably have that problem too.

While VariantSpark can scale to handle large and complex data sets, Dr Bauer notes that the solution also caters to researchers with smaller volumes of data.

View original post here:
Genomic researchers gain access to CSIRO's AI-powered data ... - Microsoft

Related Posts