Application of SAGA for Matching Parsed Biological Literature

SAGA stands for Subgraph Index for Approximate Graph Alignment. It is an efficient tool for approximate subgraph matching. SAGA allows users to match a query graph against a large database of graphs. At the core of SAGA is a flexible graph distance model that incorporates node approximate matching as well as approximate structure matching. A powerful indexing method is implemented to speed up the matching process. Some applications of SAGA include querying/comparing pathways and querying parsed biomedical literature databases to find similar documents.
 
In this application, we use SAGA to query gene network graphs generated by parsing biological literature datasets. A graph is generated for each document where the nodes are genes, and edges are used to represent sentences that mention the two genes. Comparing these graphs provides a method for detecting similar documents.
 
The database currently, has 48,445 documents. On average, there are 5.0 nodes and 18.76 edges per graph. Click here for a list of documents. The entire database can be downloaded by clicking on this link.
 
For this application, we employ a new scoring model on top of SAGA. The details of the scoring model can be found via this link.
 

Please Enter the Query Document's ID (the query document must come from the list of documents):

Document ID:

Enter the Cutoff for Percentage of Matched Nodes: 

%


If you have any questions or suggestions, please contact ytian [at] umich [dot] edu Last updated July 17, 2006.