About Me

I am a lab rat turned motivated machine learning (ML) scientist with a deep interest for how data-driven approaches can support healthcare innovation. Over the past three years in the biotech industry, my focus has been on developing cutting-edge machine learning models for cancer therapeutics ranging from optimizing cell therapy to enhancing drug discovery. My ongoing research interests lie in developing machine learning tools that can better leverage the rich information of biological data too often constrained by size, sparsity, and heterogeneity.

Machine Learning for Early Drug Discovery

I see a strong future in the utilization of machine learning models towards small molecule drug discovery through DNA-encoded libraries (DEL). My previous hands-on laboratory work in PCR assay development combined with my experience in computing with large genomic data has afforded me unique expertise in scrubbing DEL datasets for machine learning (ML) pipelines. The high-throughput data generated from DEL has the potential to narrow the structural search for synthesizable small molecules with improved affinity to targets. The newest question becomes: how can we simultaneously optimize for properties? For function? While canonical ML inference methods - and even computationally intense LLMs - may perform well on DEL data for binding predictions, I believe there is a very real need to new develop additional models that can offer interpretability, and optimize for cross-reactivity and function from sparse, and often size-limited assay data. I am particularly excited to leverage my experience with Hierarchical Machine Learning towards tackling these questions.

Specific Skills: python, RDKit, AWS, Jira, Confluence, Git

Data-driven Insights into Disease and Cancer Immunotherapy

I have also contributed significantly to cancer cell therapy product development for ovarian and endometrial cancer by studying robust models for integrating cell phenotypic, transcriptomic, and epigenetic data with T cell receptor immune repertoires. To do this, I worked heavily with Nanostring, and bulk single-cell RNAseq and ATACseq data. I am currently developing a natural language processing model to analyze peripheral blood immune repertoires as biomarkers for immunotherapy in advanced renal cancer.

Specific Skills: NGS data formats (eg. fasta), bioconductor, R, bulk-, ss- RNAseq, ATACseq, UMAPs, volcano plot analysis, TCR CDR3 motif analysis and featurization, Markov Modeling of DNA and primary protein structure.

Machine Learning for Smarter Laboratory Automation

You are never better than your data. In my early lab-rat years, I generated in the laboratory every piece of data that I would then ask ML algorithms to learn on. I feel this experience is crucial to every ML scientist as it is my strong belief that we are never better than our data. From the lab, I developed an interest in how canonical ML methods can be adapted to size-limited, sparse, feature-heavy, heterogeneous (generally difficult) biological data. I received a fellowship from the Center for Machine Learning and Health and UPMC Enterprises to develop Hierarchical Machine Learning to optimize the 3D printing of bio-polymers—a crucial step for the industrial manufacturing of implants and drug testing.

Specific Skills: 3D printing (PLA plastic, and alginate biopolymer printing) Matlab, python, HML, LASSO for feature reduction.

Additional work in smarter automation for health care includes the design of a low-cost, disposable point-of-care device for infectious disease diagnosis. I designed and coded the main frame for an automated chemical vapor deposition system for fabricating graphene field-effect transistors for implantable monitoring of excitable cell activity.

Specific Skills: PCR, PCR primer design, LAMP, Soft-lithography, microfluidic chip design, LabView