David Glukhov
PhD Student at University of Toronto & Vector Institute.
I’m David, a first year PhD student in Computer Science at the University of Toronto and the Vector Institute working with Prof. Nicolas Papernot and Prof. Vardan Papyan.
I am interested in formalizing desiderata of secure and reliable generative AI. In this pursuit, I have formalized the commonly described goal of preventing adversaries from learning problematic things through an information-theoretic lens, demonstrating empirical and theoretical limitations of current approaches for safety evaluations and defense methods, and provably demonstrating a safety-utility tradeoff. To illustrate the challenge, I have proposed mosaic prompts, an attack method consisting of decomposing an impermissible task into dual-use, permissible sub-tasks posed to a victim model, enabling jailbreak-free attacks which bypass extant defense methods. I am now looking into “hallucinations” in generative models, with the aim of understanding why, when, and how they occur.