AI discovers mystery DNA code which could offer medical breakthroughs

Scientists have discovered a mystery DNA combination that could allow medical breakthroughs – with the help of artificial intelligence.

Researchers at the University of California San Diego in the US were analysing DNA and its impact on gene activity.

The composition of genes is determined by instructions delivered by the order of our DNA and tied to four “bases”, identified as A, C, G and T.

Nearly 25% of genes are recognised by the TATAAA sequence, otherwise known as the TATA box.

The activation of the other 75% had remained unclear, but researchers have now identified an activation code that is at least as frequent as the TATA box.



They have called it the downstream core promoter region (DPR), the UC San Diego News Center reports.

James T. Kadonaga, a professor in UC San Diego’s Division of Biological Sciences and the paper’s senior author, said: “The identification of the DPR reveals a key step in the activation of about a quarter to a third of our genes.

“The DPR has been an enigma – it’s been controversial whether or not it even exists in humans.

“Fortunately, we’ve been able to solve this puzzle by using machine learning.”



In 1996, Prof Kadonaga and his colleagues found another gene activation sequence dubbed the DPE – a portion of the DPR – that allowed genes to be turned on without the TATA box.

However, they were unable to delve into the details until now.

Prof Kadonaga, lead author and post-doctoral scholar Long Vo ngoc, Cassidy Yunjing Huang, and Jack Cassidy studied 500,000 random versions of DNA.

They also created a machine learning model based on 200,000 versions which could analyse DPR activity in human DNA.

The AI then identified the DPR code in human genes.

Prof Kadonaga added: “In the same manner that machine learning enabled us to identify the DPR, it is likely that related artificial intelligence approaches will be useful for studying other important DNA sequence motif.

“A lot of things that are unexplained could now be explainable.”

The study, supported by the National Institute of General Medical Sciences (NIGMS) at the National Institutes of Health, was published in the Nature journal on September 9.

.