SAGA stands for Subgraph Index for
Approximate Graph Alignment. It is an efficient tool for
approximate subgraph matching. SAGA allows users to match a query graph against
a large database of graphs. At the core of SAGA is a flexible graph distance
model that incorporates node approximate matching as well as approximate
structure matching. A powerful indexing method is implemented to speed up the
matching process. Some applications of SAGA include querying/comparing pathways
and querying parsed biomedical literature databases to find similar documents.
In this application, we use SAGA to query gene network graphs generated by
parsing biological literature datasets. A graph is generated for each document
where the nodes are genes, and edges are used to represent sentences that
mention the two genes. Comparing these graphs provides a method for detecting
similar documents.
The database currently, has 48,445 documents. On average, there are 5.0 nodes
and 18.76 edges per graph. Click
here for a list of documents. The entire database can be downloaded by
clicking on this link.
For this application, we employ a new scoring model on top of SAGA. The details
of the scoring model can be found via
this link.
If you have any questions or suggestions, please contact ytian [at] umich [dot] edu Last updated July 17, 2006.