Population Analytics Platform

  • Software Architect

  • Researched & designed distributed data model

  • Developed research APIs for data investigation

As Principal Engineer, I led a software engineering team that built a population analytics platform for cancer research. We designed a specialized system to interrogate high-dimensional genomic and clinical data. The platform contains a cluster of web apps that can scale dynamically to support fast-growing data sets. Currently, it supports more than 400m unique data points for more than 1.6m unique patients.

The primary tool in the platform is a single page web app built with Vue.js and D3.js. Using custom interactive data visualizations, users swiftly traverse clinical and molecular features. They compare more than 100k distinct variables across entire populations in near-real-time. As they navigate, they can make selections on graphics to filter and segment the population into cohorts.

Once a researcher identifies a cohort of interest, they can load patient features into a Jupyter Notebook or Shiny App. These tools allow data scientists to write custom analysis using familiar languages—like Python and R. However, the Jupyter Notebook and Shiny Apps remain hosted on our cloud infrastructure. This ensures secure access to data with encryption, authorization, and audit logging. Plus, this enables access to computing power necessary to perform large-scale statistical modeling. When finished, researchers share their findings with others using reports generated in the system.