Genetic variation plays a substantial role in heritability for Alzheimer’s with twin studies suggesting 58-79% heritability for AD1. Decreased costs of genomics have resulted deeply phenotyped patient data sets such as in the case of the wellness startup Arivale. The >2000 analytes collected from >4800 Arivale patients offer molecular insights into complex diseases such as Alzheimer’s by utilizing clinically significant variants from recent meta-analyses of genome-wide association studies (GWAS) and regressing them with different analytes. While this was done by Laura Heath through the use of Phenome Wide Association studies (PheWAS), this method is  underpowered for rare variants which is why I used Single Kernel Association Testing  with Optimized uniform testing(SKAT-O) study to utilize multi-omic, deep-phenotyping data to shed light on genetic pathways to development of Alzheimer’s disease (AD). By using SKATO I too was able to correlate genes to analytes and better understand causal pathways for the disease and potentially provide targets for intervention. Results located variants found in Laura’s study and revealed new ones due to implementing different methods.


  1. Read over relevant literature via ISB related papers (Laura’s paper5 and P100 paper2 ), PubMed and NCBI to understand the processes behind SKAT-O and PheWAS.
  2. Utilized ISB Github repositories to understand the SKATO pipeline for genes.
  3. Developed python code on Jupyter Notebook using primarily stack overflow along with assignments to manipulate Arivale data.
  4. Constructed a demographic table that includes measures of cardiovascular health along with allele types for APOE. 
  5. Identified genes and ran them through the SKAT-O Pipeline
  6. Created Manhattan plots to visualize gene-analyte association.
  7. Located the variants that have high predicted impact utilizing genes in the existing PheWAS research.



There were strong associations observed between multiple lipid analytes and the APOE protein which plays a key role in lipid transport, including shuttling cholesterol to neurons in healthy brains3. ABCA7 has also shown a strong correlation with the genus Waddlia and the proteins IL17D and CEACAM1. CEACAM1 is known to be involved in intercellular homophilic and heterophilic binding interactions that affect a wide array of cellular processes related to cellular activation, proliferation and death. Missense in the ABCA7 gene have been associated with increased risk of AD in multiple studies. ABCA7 is involved in lipid efflux from cells into lipoprotein particles, plays a role in lipid homeostasis, and has also been implicated in amyloid processing and deposition in the brain. The results support ABCA7’s lipid-related function.  INPP5D showed a strong correlation with 2-hydroxystearate, which is a long chained fatty acid, known to be associated in lipid regulation (blue dot 2nd highest): This supports previous studies of INPP5Ds role in lipid regulation.


Many thanks to:

  • Dr. Evans, Dr. Hood and Dr. Price for allowing me to work in the Hood-Price lab where I gained invaluable experience in using big data and systems approaches to treat complex diseases.
  • Dr. Baloni, Dr. Funk, Dr. Magis and especially Dr. Rappaport for mentoring me on the project.
  • The SEE Program, Claudia Ludwig, Rachel Calder and Becky Howsmon for their support throughout the internship

This internship was made possible due to contributions from ISB

ISB High School Interns 2020