Using Graphs in Drug Discovery to Probe Relationships

by

Shutterstock

For all the power of modern Artificial Intelligence (AI) when it comes to problem-solving with computers, you simply can’t explore every option and you have to avoid dead-ends.

Common understanding of AI is that it’s so smart that it skims every alternative and effortlessly arrives at the ideal path in a millisecond. In classical AI, researchers call this the British Museum approach—finding the shortest path is to find all possible paths and select the best from them.

The likelihood of that happening is as remote as expecting a group of monkeys in front of typewriters to reproduce all the books in a vast library. The reality is that some search paths are good, but most are poor and do not return anything useful.

So, when we examine a complex problem space, we need to be efficient at choosing what to explore and what to ignore. This is true for all kinds of data analysis, but it is especially true in drug discovery. Notoriously, 90% of clinical drug development fails, and only 6% of the drugs that make it all the way to clinical trials make it onto the market.

Given how time- and money-consuming this research is, finding ways to concentrate on what is worthwhile at the early drug discovery stage is clearly an attractive notion. But, again, it’s hard to establish what is worth looking at as almost all drugs are 'small' molecules.

In chemistry, small molecules are defined as less than 1,000 atomic mass units and smaller than proteins and nucleic acids. They bind proteins and nucleic acids and alter their functions. Pharmaceutical researchers are well aware that identifying a small molecule capable of binding to a crucial protein target is a challenging and laborious enterprise.

Compounding the problem, in early drug discovery, there is a lot of unstructured data and disparate sources of information, including your own research team and big public third-party medical datasets.

However, a promising methodology has arrived—the application of a new way of working with data into a graph database, codified into a knowledge graph. Knowledge graphs are a way of structuring data that highlights the connections between various concepts, objects, and ideas. Rather than storing information in isolated tables in a conventional database, knowledge graphs reveal an interconnected tapestry of entities and the links between them.

1% Promising to 15% Promising

A pharma organisation putting this idea to the test is the R&D arm of French-headquartered international pharmaceutical company Servier. Specialising in oncology, neuroscience and immuno-inflammation, cardiometabolic and venous diseases, and generics, Servier is a multi-billion dollar company dedicated to ​​being an agile, digital performer.

The company’s R&D function, Institut de Recherches Servier, had long experience working with small molecules. Historically, it would look for promising candidates (defined as candidate molecules relative to a predefined Servier target) for drug applications in a vast database containing millions of such tiny bits of biology.

Thierry Dorval is the Institute’s Head of Data Sciences and Data Management. “Here, we are trying all together to bring maximum data and generate knowledge to support our decision-making,” he says. “That means that when a decision is taken regarding the selection of a specific compound, or a specific antisense oligonucleotide, or specific mechanism of infection, we are bringing it to the experts with any useful insights we have generated.”

The data, he says, is from internal knowledge and also external collaboration, consortia, third parties, etc. Prior to leveraging graph technologies, the team had to painstakingly comb through the data, acknowledging that despite their efforts, they would only uncover promising candidates at a low success rate of less than 1%.

With its new graph and knowledge graph approach, it’s now achieving a consistent success rate of 15%—a 1400% increase. And that’s across a focused dataset or more like 1,000 small molecules instead of the unwieldy 1 million. That’s due to the implementation of its new graph-based decision support tool, Pegasus. By efficiently sifting therapeutic targets to identify the most relevant screening modalities, it helps Servier’s lab team design more appropriate experiments.

This follows the creation, using graph-based techniques and tools, of a library of small molecules and their relationships based on a wide range of heterogeneous information from a range of pre-existing data, including open-source medical knowledge banks.

Graphs: The Way to Generate Knowledge and Information Out of Basic Data

Graphs are key to this approach. They are a powerful way to capture even the most complex reality as a set of ‘nodes’ (entities) that contain useful information as attributes but which also sit in a network of connections or ‘relationships’ with their brothers and sisters.

Dorval comments: “Graph is about relationships. It really focuses on how entities are linked together rather than the entities themselves. Typically, when we are using graphs for research we are talking about entities like drugs, small molecules, proteins, and pathways. What really matters to us is less the protein itself than the way these proteins are interacting together, the way the small molecules are interacting with those proteins, and the ways the small molecules are similar to each other. Really, graphs are the way to choose data to generate knowledge and information out of our basic data.”

A key figure in architecting all this is Thierry’s colleague Jeremy Grignard, a Data and Research Scientist at the organisation. “It represents the different data we are handling, as well as the different algorithms. We can query the graph to keep finding answers about all the biological questions we want to ask,” he has said publicly. Because the data model underlying the graph can evolve depending on the questions we can add from Servier project teams, graph representation is very flexible and efficient for this.”

Ultimately, this is the value that graphs offer the researcher—a way to navigate often sparse data in ways that pick out the valuable links in data. As a result, Servier is convinced that graph technologies have the potential to revolutionise drug discovery and development by organising complex data and generating knowledge to support decision-making.

Linking back to the warning about not wasting time in drug discovery by pursuing non-viable search trees, it can be hugely valuable to test hypotheses to see why they are ‘wrong’. After all a fundamental tenet of the scientific method revolves around learning from failures and refutations.

Let’s leave the last word to Grignard: “Graphs mean that the compound has been selected rationally, so you know why your hit has been selected, but just as importantly you can ask why it didn’t. And that gives us really useful information and knowledge to the project about what worked and what didn't work.”

Sounds like Servier may be on to something truly pioneering.

Dr Alexander Jarasch is Technical Consultant for Pharma and Life Sciences at graph database and analytics leader Neo4j

Back to topbutton