How effective is natural language processing in clinical trials?

by ,

Igor Kruglyak, senior advisor at the global IT service provider Avenga and Michael DePalma, founder and president of digital specialists Pensare, LLC, examine the use of natural language processing (NLP) for investigator recruitment acceleration. 

NLP is one of the fastest adopted business technologies in the world, only two years after Google first released its pre-trained Bidirectional Encoder Representations from Transformers (BERT). BERT provides a state-of-the-art output on 11 NLP tasks, and has a deeper sense of language context than any other language model developed before. In the last three years, NLP has made more progress than any other subfield of AI and estimates predict that the worldwide NLP market size will reach $43 billion by 2025, compared to $11 billion in 2019. 

Simply put, NLP enables computers to analyse written or spoken human language, to extract its meaning and to obtain insights from these data. The pharmaceutical industry has started utilising the technology, for example, to analyse medical data, to ensure pharmacovigilance and to enhance medical care with health assistants.

Patient Recruitment in Clinical Trials

Historically, one of the most acute issues that hamper the success of clinical trials, is inefficient recruitment. Globally, 86% of clinical trials fail to recruit patients on time. Although the reasons for such high failure rates are diverse and complex, insufficient resources and the time-consuming nature of the process are considered among the most significant negative-impact factors.

According to Tufts Center for the Study of Drug Development (Tufts CSDD), a company's ability to quickly identify clinical investigators, often among doctors and healthcare influencers, is tightly connected with successful patient recruitment. One study concluded that 1 in 10 investigative sites failed to enrol a single patient in a given clinical trial, and less than 60% met or exceeded their target enrolment levels. Therefore, finding reputable investigators who can source eligible patients to participate is crucial for the success of clinical research as a whole. But how can this process be improved?

Practical Steps of the NLP-Featured Approach to Investigator Recruitment

The healthcare sector has always been of particular interest to data scientists. Many consider it a near-perfect domain to showcase NLP's value.  By various estimates, 80% of medical data (i.e., from medical records, imaging devices, sensors, wearables, health documents, and articles) remains unlabelled and untapped after it was created. However, all this unstructured data when sorted, labeled, and cleared has an enormous potential to disrupt clinical research. 

Modern NLP techniques help to process and analyse clinical documentation, extract the required information, and automate much of the work that researchers previously had to do themselves. Some of the techniques that have proved to be especially effective and time-saving are:

Following topic modelling and relationship extraction, impact factor algorithms can be utilised to measure the relative importance of authors that have researched specific topics. After analysing links between articles, a numerical weighting to every article in a set of articles, and on a specific topic, can be assigned. In this way, it is possible to measure a publications' relevance. Moreover, this technique defines the importance of every scientific article and every doctor who has published an article by measuring the publication's quotations from other articles.

When designing clinical trials, these NLP techniques can be used by researchers to screen articles published by investigators and find those authors/investigators with substantial experience in specific disease states. This can be achieved by placing the connections between authors and their relative weight within a specific dataset. For instance, taking into account the correlation between a number of held trials and enrolment rates, it makes sense to filter out and then include the investigators that have prior experience in participating in clinical research within a particular data set.

Combining NLP and Social Graphs

Taking a similar social graph and combining it with NLP allows the visualisations of interconnections between article authors. The most well-known social graph was built by Facebook. It connects over 2.7 billion monthly active users and is used for real-life tracking and micro-targeting.

In the life science industry, this opens up the opportunity to connect with investigators-influencers that have conducted a considerable amount of research on a topic and who can then provide a valuable contribution to a study. Visualised in a heat map, it enables employees of clinical research organisations to understand with just one glance an author’s authority on a certain topic. It can also be used to see the connections between investigators and invite previously not invited ones (for example, if they have conducted research on a corresponding topic) to participate in a clinical study. This knowledge can help sponsor-companies to increase their international and domestic market penetration as well as to spend less money on marketing because they can allocate their resources more effectively.

NLP in practice 

In order to concentrate on core competencies and utilise the full power of NLP and social graphs, it is often advisable for clinical organisations to make use of experienced product development outsourcers. One company that was able to speed up patient recruitment by implementing custom-built social graphs is Avenga customer QPharma. Custom social graphs were used to create a database of relevant key opinion leaders. 

However, the ultimate value of NLP in clinical trials is not limited to effective investigator recruitment. Applied to medical data, NLP can help with automating manual work, reducing the number of errors and time spent, facilitate billing processes by extracting information from unstructured notes and enrich clinical decision support systems. This can be helpful at basically every stage of the process of drug development as it tremendously speeds up many time-consuming tasks, creates the basis for insight-driven decisions and allows researchers to focus on their actual work instead of looking for information in count- and endless bodies of text. 

Back to topbutton