Measuring relevance

Measuring relevance in literature review materials.

Here is a summary of my conversation with CEO of Latent Knowledge, James Reilly, about literature relevancy and why it is a critical topic in academic research today.

Rob: Hello James, thank you for joining me today to discuss literature relevancy, as it pertains to academic research. Just as a background to readers, you are currently involved in a startup in this field, right?

James: Yes, thanks Rob, I’m the founder of Latent Knowledge, so I’ve definitely been thinking about this topic for some time. Specifically, the challenges that come with putting together a literature review when developing a research hypothesis.

Rob: Just to get some background together first, what exactly is relevancy in the context of a literature review?

James: Well, relevancy can change as you investigate various sub-topics in your review scope. Sub-topic concepts, such as a measurement technique, instrument, analysis method, found within a greater publication can be defined within all intended project variables, or within certain predisposing dependencies. If taking relevant concepts with specific predisposing dependencies, such as use case requirements (principles of chemistry within materials, animals, humans) these “relevancies found or discovered in a literature search” can be hidden at first glance.

Rob: So what do you think a major driving factor could be behind that?

James: I think intention is the driving factor, most notably the Environmental conditions, Application, Theoretical versus experimental type of project, Clinical trials versus patents versus academic, professional research versus undergraduate research; they all require different types of literature review and varying degrees or rigor. With these different types comes a need to approach each using a different strategy. PRISMA introduces a collective review of literature review search strategies, for example Relevancy in this context is determined by a group effort that iteratively introduces new keywords or key phrases, which a group member proposes, and a supervisor, group lead, or group consensus validate. This proposition and validation activity is systematic iteration until all potentially relevant concepts are considered. Concepts are deemed irrelevant because they are not connected to the primary subject matter, research thesis that is being considered, or because they are fundamentally outside the scope of the systematic literature review. This PRISMA group curation method is often used to determine what should be considered within or outside the scope of the literature review.

Rob: So then with each discrete strategy for each intention, do you think there might be some cross-pollination or bleed-through between them?

James: Well, actually the various concepts in that approach should be seen more as a spectrum. Within a literature review exist a primary subject matter or research thesis which is broken into sub components and additional concept organizational structure. Knowledge ontologies and subject matter taxonomies attempt to define the structure of all knowledge, in terms of connectedness and hierarchical nesting. These result in a rigid structure that requires maintaining as new knowledge is created. Therefore, relevancy in a literature reviewcould be guided by pre-existing knowledge structure. A novel project that is multi-, inter-, or trans- disciplinary by nature requires connecting concepts found in disparate branches of ontologies and taxonomies, which is challenging to do with current computational techniques.

Rob: I can see how this would be really challenging for a researcher to juggle, on top of exploratory data analysis and methodology development!

James: Yes, mastery of disparate and specific knowledge differs from person to person — which is why a collaborative approach is important. Group approaches to literature search strategy helps overcome this human condition.

Rob: If a collaborative approach is critical, how do you see everyone staying within scope and focus? Being “on the same page” so to speak?

James: I think the key here is to maintain a common metric, and to measure relevancy using those metrics as a means to gauge improvement. I have a background in the science of expertise and competition, and have a lot of experience in sports doing this type of thing. I can tell you that athletic performance has a lot of common metrics used to gauge improvement and applying a similar approach could really help produce better results in collaborative background literature reviews.

Rob: What do you think the definition of relevancy would be in the context of collaborative literature review?

James: Relevancy is important to deliver the most helpful results from a database to a user in response to their search query. Semantic matching of database content is not new. Relevancy has traditionally been represented as a binary judgment on analog inclusion factors that are represented and searchable, such as Medical Subject Headings which are indexed alongside the title and abstract, and are defined by the authors and editors. Relevancy drives user experience in a search engine or literature search context. Relevancy determines the sequence of information provided to the end user of a software program facilitating internet search and content discovery.

Rob: What do you think some of those metrics would be?

James: I think PageRank, popularity of webpage by web traffic through the site, coupled with keyword syntax matching has been the primary metric in recent history. Co-citation analysis, co-word analysis and Heterogeneous bibliographic networks, which are differentials between references, are also very important. Also, Keyword Analysis, which could use analog, syntax driven matching could benefit from fuzzy logic to handle grammatical and taxonomical differences to quantify relevancy in article abstracts or paragraphs of text.

Rob: This is really interesting. How can we do this leveraging a fuzzy logic system?

James: Well, we start by analyzing the relationship between keywords and the research topic. We could also assess the frequency and distribution of keywords in the text. Fuzzy logic offers a powerful methodology for assessing the frequency and distribution of keywords in an abstract, surpassing traditional binary approaches. By considering the concept of partial matches and degrees of relevance, it allows for a more nuanced analysis of keyword occurrence. This enables a more comprehensive understanding of the keyword distribution, accounting for variations in language, context, and terminology usage. With fuzzy logic, researchers and analysts can gain deeper insights into the keyword landscape within an abstract, enhancing their ability to extract meaningful information and make informed decisions.

Rob: Do you implement all of these concepts into your current release of Latent Knowledge’s search engine, LitView?

James: Currently we are vectorising relevant language in each abstract as a way to measure similarity between papers returned in a literature search. But we definitely see this transforming into a more collaborative process where relevancy is better defined and literature abstracts are compared from different perspectives. As I explained earlier, a collaborative approach is critical to optimising improvement in the literature review process, and so we are really pushing hard to implement new and novel ways to improve this moving forward.

Rob: Awesome James, looking forward to hearing more about this in the future!

If you have comments or questions for James or Rob, please send your enquiries to jamie@latentknowledge.co or rob@weburban.com. Latent Knowledge

Building powerful research tools with artificial intelligence and natural language processing.https://latentknowledge.co/