Machine Learning for Early Drug Discovery

At Nurix Therapeutics, I worked on augmenting and diversifying the list of small molecule ligand candidates that could bind to targets of interest. I used an ensemble of Machine Learning (ML) models for predicting binding affinity from DNA Encoded Libraries (DEL) training data with both targets and counter-targets. During this time, I developed an interest in exploring new ML techniques for downstream property predictions. Towards this end, I authored a python package to wrangle and aggregate assay data by common heuristics for downstream models and tracking analytics.

Future Work:

I see a strong future for the embedding of machine learning models in standard small molecule drug discovery pipelines. My previous hands-on laboratory work in PCR assay development combined with my experience in computing with large genomic data has afforded me unique expertise in scrubbing DEL datasets for ML pipelines. The high-throughput data generated from DEL has the potential to narrow the structural search for synthesizable small molecules with improved affinity to targets. The newest question becomes: how can we simultaneously optimize for properties? For function? While canonical ML inference methods - and even computationally intense LLMs - may perform well on DEL data for binding predictions, I believe there is a very real need to new develop additional models that can offer interpretability, and optimize for cross-reactivity and function from sparse, and often size-limited assay data. I am particularly excited to leverage my experience with Hierarchical Machine Learning towards tackling these questions.

Specific Skills: python, RDKit, AWS, Jira, Confluence, Git