Andrea Loehr

AI Safety Projects

Building Consensus on AI Evaluation Practices

A cross-sector initiative to establish community-endorsed guidance for conducting and reporting high-quality AI evaluations, including a Systematic Review of evaluation frameworks, reporting checklists, measurement standards, and methodological critiques across ML, medicine, metrology, software testing, and experimental sciences.

Apart Fellowship

Genomic Foundation Models are anticipated to move towards deployment in areas such as genomic regulatory element identification, functional assessment of genomic variants, anticancer therapies (e.g. CAR T-cell therapy), and synthetic DNA biosecurity screening, creating an attack surface where fragile internal representations can be exploited by minimal, biologically plausible changes. In our Apart Fellowship project we set out to expand our foundational work and 1) probe and improve Genomic Foundation Model robustness by extending promoter classification benchmarks with adversarial attacks; 2) experimentally validate in vitro whether these adversarial perturbations disrupt genuine biological function or instead expose model fragility; 3) restore model reliability by implementing Iterative Black-Box Adversarial Training as a defense.

Exploring Gaps in Model Safety Evaluation: Findings from Red-Teaming the SALAD-Bench Benchmark for Large Language Models

Here, we explore limitations in current large language model (LLM) safety evaluation frameworks and examine how prompt style can affect the safety classification of LLM outputs. We use the SALAD-Bench and its MD-Judge evaluator to classify ChatGPT3.5-turbo responses to over 21,000 harmful prompts across 6 major harm categories into safe or unsafe responses using one simple directive and one Chain-of-Thought prompt. The simple directive and CoT prompts resulted in 7% and 16% unsafe responses, respectively. Analyzing individual responses, we identified large-scale patterns of false safe classification. This misclassification gives a false sense of security and can potentially further unsafe LLM behavior when future models are trained to meet benchmarking goals.

View on GitHub

About Me

I am a physicist by training, with two decades of experience spanning research, leadership, data analysis, and software engineering in two fields: astrophysics and oncology. As I have been learning more about AI safety, I have developed a passionate sense of urgency to contribute to the safe and responsible development of AI.

I am committed to donating pro bono hours of my focused, productive work time to an AI project with an organization that moves the needle in the field.

I am open to new opportunities and collaborations. Feel free to reach out!

My Journey in AI Safety

Course: Technical AI Safety by BlueDot Impact (Jan. 2026)

Course: AGI Strategy by BlueDot Impact (Oct. 2025)

Course: AI Safety, Ethics and Society by the Center for AI Safety (May 2025)

Certificate: ChatGPT Prompt Engineering and Advanced Data Analysis (Coursera, 2023)