AI Safety Researcher
Genomic Foundation Models are anticipated to move towards deployment in areas such as genomic regulatory element identification, functional assessment of genomic variants, anticancer therapies (e.g. CAR T-cell therapy), and synthetic DNA biosecurity screening, creating an attack surface where fragile internal representations can be exploited by minimal, biologically plausible changes. In our Apart Fellowship project we set out to expand our foundational work and 1) probe and improve Genomic Foundation Model robustness by extending promoter classification benchmarks with adversarial attacks; 2) experimentally validate in vitro whether these adversarial perturbations disrupt genuine biological function or instead expose model fragility; 3) restore model reliability by implementing Iterative Black-Box Adversarial Training as a defense.
Collaborating with the Meaningstack Foundation on an open infrastructure for portable, participatory AI governance: The Agentic Collaboration Governance Protocol to enable communities to author machine-readable governance blueprints that coordinate multi-agent systems in real-time, ensuring AI exploration aligns with human values and collective flourishing.
Here, we explore limitations in current large language model (LLM) safety evaluation frameworks and examine how prompt style can affect the safety classification of LLM outputs. We use the SALAD-Bench and its MD-Judge evaluator to classify ChatGPT3.5-turbo responses to over 21,000 harmful prompts across 6 major harm categories into safe or unsafe responses using one simple directive and one Chain-of-Thought prompt. The simple directive and CoT prompts resulted in 7% and 16% unsafe responses, respectively. Analyzing individual responses, we identified large-scale patterns of false safe classification. This misclassification gives a false sense of security and can potentially further unsafe LLM behavior when future models are trained to meet benchmarking goals.
Course: Technical AI Safety by BlueDot Impact (Jan. 2026)
Course: AGI Strategy by BlueDot Impact (Oct. 2025)
Course: AI Safety, Ethics and Society by the Center for AI Safety (May 2025)
Certificate: ChatGPT Prompt Engineering and Advanced Data Analysis (Coursera, 2023)
I am a physicist by training, with two decades of experience spanning research, leadership, data analysis, and software engineering in two fields: astrophysics and oncology. As I have been learning more about AI safety, I have developed a passionate sense of urgency to contribute to the safe and responsible development of AI.
I am committed to donating pro bono hours of my focused, productive work time to an AI project with an organization that moves the needle in the field.
I am open to new opportunities and collaborations. Feel free to reach out!