Implementation of Graph Based API to Expose Knowledge Graphs

Intern partners Emily Zhang and Omar Aziz worked with Katie Christensen and Dr. Guangrong Qin from the Shmulevich Lab this summer on a project focused around developing an API which exposed knowledge graphs. See what they researched down below!

Abstract

During this project, we created an API using a fastAPI framework which clients can use to query requests. The API is connected to a Neo4j graph database using a Google Cloud Server. The database uses nodes and relationships to store drug, disease, and gene data.

Background/Aim

This project is an extension of the Translator project by the Shmulevich Lab, Scripps Institute, and more. However, during previous iterations, a relational database was used instead. In this project, we used a graph-based API which has a larger focus on relationships. With this, we hope clients can discover new findings between drugs, diseases, and genes.

Knowledge Graphs

Data was gathered from online resources such as PubMed, SIGNOR2, cancer.gov, DrugBank, and more. The data then underwent a standardization process which transformed the data into usable knowledge.

Our current database contains three knowledge graphs which represent three relationships: Drug-Target, Gene-Gene, and Drug-Disease. Within each knowledge graph, there is a subject entity which acts or affects an object entity. For example, in the Drug-Target knowledge graph, the drugs are the subject, because it acts on the Target. The subject and object will go on to represent nodes in Neo4j. The relationship between the subject and the object is defined by the predicate, which stemmed from the edge.

Each knowledge graph is harmonized in its format. Subjects and Objects contain properties such as ID, name, synonym, category, and ID prefix. Drug IDs were gathered using PUG REST and Chembl REST API; gene IDs were gathered using the NCBI database; disease IDs were gathered using the MONDO Ontology REST API. The predicates were standardized using the BioLink model.

API Development

Our team next developed an API. An API is an application programming interface. These API's can be used for clients to gain access to information from the database, without directly affecting it thus keeping database security. Our team used the FastAPI Python framework to develop the api. We then created a general json endpoint. This json format was standardized so a client could input whatever info they wanted to query in that json file, which was then put through a python script that parsed it, extracted the data, and put it in the corresponding query for the database. The information is then parsed back in to the original json format and returned to the client through the API endpoint.

Knowledge graph representation: