This summer, Lea Repovic and Sarah Feng interned with Dr. Andrew Magis in the Health Data Science Lab. They began their internship by learning about genome-wide association studies and polygenic risk scores as the foundation for their future research. After settling on allergies and allergic diseases as their topic of study, they embarked on phase one of the project: researching genes and single nucleotide polymorphisms associated with those conditions through literature mining.
Phase two of Lea and Sarah’s project was analyzing their own data with Python. Their metabolomic, proteomic, clinical, and diet data was from the Arivale database. They ran three correlation studies on this data: with single-nucleotide polymorphisms, with self-reported patient disease states, and with sex stratification. After these tests, the cohort added another layer to their research: polygenic risk scores, or an individual’s genetic likelihood of having a certain disease.
Lea and Sarah found interesting results in their analyses, which they corroborated with previously published literature. Throughout the two months, the pair honed their skills in computational biology, utilizing tools that created a more comprehensive data-driven approach to allergic diseases.