em360tech image

Scientists recently developed a new algorithm to detect fraudulent and suspicious activities in computational biology.

The algorithm could be particularly beneficial for individuals or organisations working with sensitive data such as a pharmaceutical company developing a drug or involved in gene editing.

Called the Triangle-Densest-k-Subgraph (TDkS) algorithm, it is designed to improve graph mining capabilities in tightly connected clusters within large networks such as social media networks, financial networks, biological systems, transportation or communication networks. 

Often such networks involve large data sets which are highly susceptible to manipulation or misrepresentation. 

This is why Professor Nikolaos Sidiropoulos from the University of Virginia School of Engineering and Applied Science developed this new cluster-based algorithm to enhance graph mining capabilities.

It’s a new relaxation algorithm that essentially improves the way networks are analysed like social media connections to identify fraudulent activities more accurately.

Identifying Suspicious Activities

The new approach to spotting fraudulent activities looks at triangles of connections instead of observing individual pairs of points, such as two people who often communicate on social media.

The Triangle-Densest-k-Subgraph algorithm scrutinizes groups of three points where each pair is linked, capturing more tightly knit relationships.

For instance, it can identify suspicious activities within tightly connected clusters of elements, such as small groups of friends or clusters of genes.

Overall, the new algorithm improves the graph mining capabilities by solving the problem of locating triangle-dense subgraphs in large networks by using a technique called submodular relaxation.

The submodular relaxation technique simplifies the problem without sacrificing important details, making it more efficient to solve. 

As a result, the Triangle-Densest-k-Subgraph algorithm (TDkS) can help organisations uncover hidden patterns and insights within complex networks, such as those found in social media, biology, and fraud detection.

They can derive meaningful and reliable insights from large datasets without neglecting minor, often overlooked details or compromising accuracy. 

What is Graph Mining?

Graph mining is a toolkit that comprehends and extracts relevant information from complex networks or graphs. These could include real-world applications like social networks, biological systems, transportation networks, and communication networks.

what is graph mining
TAMER YILMAZ | Adobe Stock

Essentially, graph mining is a specialised type of data mining and machine learning that centres around interpreting data represented as graphs or networks. 

While traditional data mining and machine learning (ML) techniques often deal with structured or tabular data, graph mining is specifically designed to handle data with complex relationships and interdependencies.

In graph mining, the data is represented as a graph, where nodes represent elements (like people, genes, or objects) and edges represent the relationships between them. 

Researchers can discover valuable insights and knowledge by analysing the structure and patterns within these graphs. 

Deepayan Chakrabarti, a researcher defines graph mining in Springer Nature journal as a set of tools and techniques used to analyse the properties of real-world graphs and predict how the structure and properties of a given graph might affect some applications. 

Chakrabarti added that it’s also used to develop models that can generate realistic graphs that match the patterns found in real-world graphs of interest.

The new algorithm improves the graph mining capabilities by solving the problem of locating triangle-dense subgraphs in large networks by using the submodular relaxation technique.

The TDkS algorithm can help organizations gain valuable insights from their data, improve their products and services, and mitigate risks.

The research behind the Triangle-Densest-k-Subgraph algorithm (TDkS) was published in the journal – IEEE Transactions on Knowledge and Data Engineering in September 2024. 

The study was a collaboration between Sidiropoulos and Aritra Konar, an assistant professor of electrical engineering at KU Leuven in Belgium.

The researchers said that the research proves TDkS is NP-hard and is not amenable to efficient approximation, in the worst-case, implying that the TDkS problem is computationally quite challenging to . 

“By judiciously exploiting the structure of the problem, we propose a relaxation algorithm for the purpose of obtaining high-quality, sub-optimal solutions,” the researchers said.