How data science is driving genomics and pharma

by

Ramya Sriram, digital content manager at Kolabtree explains how advances in big data are enabling genomics to thrive in pharma.

Out of the three billion pairs of DNA molecules that make up the human genome, only 0.1% are unique to each person. To find patterns useful to medical research in such a vast sea of data, researchers can benefit from tools that cut through the noise. Fortunately, these are the same problems that big data has been trying to solve for almost a decade.

Biotechnology, broadly encompassing technological applications using biological systems, finds itself in similar conditions to those that allowed big data to emerge. As the volume and variety of medical data grows, it becomes more advantageous to refine it into information that can be used to make decisions. In 2019, Nature published a paper exploring the current state of machine learning (ML) in drug discovery and development. Because of the vast quantity of patient data now readily available, ML has now become a valuable tool to model the behaviour of drugs and find potential applications for existing and new molecules. Big data is not just in the future of the pharmaceutical industry, it is shaping its present.

Genomics

Think of big data in the context of biotechnology, and your first thought probably relates to genome sequencing. The Human Genome Project, which ran from 1990 to 2003, was a pioneering effort that gave us access to three billion bases of data, opening the door to information on mutations, genes and more. Genome data is now at our fingertips, it can be sequenced in a few hours and for under £1,000. Think carefully about how much data that is — how are we going to make the best use of it?

To work with it effectively, data scientists use frameworks and tools to store, track, receive, analyse and interpret their data. Tools are now being built to automatically annotate specific genes, and software companies like DNAnexus, Knome and NextBio have sprung up to tackle genome interpretation. Interestingly, NextBio has even worked with Intel to improve its Hadoop platform for genomic big data analysis. The pharmaceutical and healthcare industries can use this insight to improve diagnostics, aid drug discovery or develop personalised medicine strategies.

Drug discovery and development

Bringing a new pharmaceutical product to market is a long process with many bottlenecks. Trials regularly fail to meet their objectives which can add further delay and increase the costs of an already expensive process. From finding a drug candidate to recruiting patients for a clinical trial, there are numerous data points, experiments and risk/benefit analyses to conduct, making the pharmaceutical industry a logical fit for big data analytics.

We can now use automated software to screen millions of compounds to identify drug candidates for a clinical trial. Pharmaceutical professionals can let artificial intelligence (AI) do the hard work of sifting through a huge library of potential drugs, assessing what is likely to work against the trial’s specific criteria.

Biotechnology company Numerate, for example, builds predictive models to help with small molecule drug design, making predictions on toxicity, metabolism, absorption, distribution and more. AI can also be used to come up with new combinations of compounds. Pharmaceutical companies can therefore screen drug candidates and pick the most likely ones to take to clinical trials.

Big data in biotechnology is not only about genomics — the data may also be collected by sensors. Wearable, ingestible or implantable sensors can provide a continuous data stream for clinical trials. This data can reduce the gap between measurements taken at appointments, mitigate for human error, identify reasons for dropout and may allow patients to go about their normal lives more easily.

Any improvement in the drug discovery or clinical trial process can save millions of dollars in development costs and speed up the time it takes to bring a potentially life-saving drug to market.

Healthcare and disease management

One big data challenge facing the healthcare sector is the storage and management of electronic medical records. In fact, the US government is investing $19 billion into boosting the uptake of electronic records. With patient information stored in this way, the industry has a pool of data to work with to help improve diagnoses and treatments.

As the number of commercial wearable devices capable of collecting medical data grows, so does the information that doctors could use to assist patients. Apple has demonstrated the possible benefits this could bring by collaborating with American Heart Association to produce a cardiology study on how heart rate relates to heart related hospitalisations. Using wearable technology, researchers on chronic conditions can acquire a constant high volume of data to analyse. If systems are in place to store enough valuable data, medical institutions could make more informed diagnosis and treatments.

In New York, the Partnership to Advance Clinical Electronic Research is working on a system to enable investigators and sponsors to use electronic patient records to help find patients for clinical trials. Oracle has also unveiled cloud-based applications that support the sharing of anonymised patient data between the health system and pharmaceutical companies.

An interdisciplinary solution

According to an analysis published in the Journal of Health Economics, the cost of producing new drugs was over $2 billion in 2014 and has been steadily growing in recent decades. Accessing the potential of new data, from genomics to wearable devices, could drastically increase our understanding of a substance or medical device before any trials are conducted. Finding innovative ways to use this new abundance of medical information could play a role in reducing the growing cost of R&D.

The line between data science and medical biotechnology is fading, leaving a grey area of research in the future of the pharmaceutical industry. The skills that will be needed to carry on data aided medical research might seem unattainable, but they can be easily incorporated using specialists. While 0.1% difference might not seem like much, it could represent the difference between billions of dollars in research.

Back to topbutton