Home Projects About Blog Gallery Contact

Academic Research

Research & Publications

Advancing the frontiers of AI safety, neuroscience-inspired computing, and interpretable machine learning through rigorous academic research and collaboration.

47Publications

1842Citations

18h-index

$3.2MResearch Grants

12PhD Students

35Collaborators

JOURNAL

2024

Adversarial Robustness in Large Language Models: A Comprehensive Survey

Roberts, G., Smith, J., Chen, L.Journal of AI Safety Research

This paper presents a comprehensive survey of adversarial robustness techniques in large language models, examining both attack vectors and defense mechanisms.

AI Safety

LLMs

Adversarial ML

Survey

42 citations

PDF

Code

DOI

CONFERENCE

2024

Neuroscience-Inspired Architectures for Interpretable AI

Roberts, G., Johnson, M.NeurIPS 2024

We propose novel neural architectures inspired by biological neural circuits that provide inherent interpretability while maintaining competitive performance.

Neuroscience

Interpretable AI

Neural Architecture

28 citations

PDF

DOI

CONFERENCE

2023

Formal Verification of Neural Network Safety Properties

Roberts, G., Anderson, K., Liu, W.ICML 2023

A novel framework for formally verifying safety properties in deep neural networks using abstract interpretation and SMT solvers.

Formal Methods

AI Safety

Verification

67 citations

PDF

Code

DOI

PREPRINT

2024

Emergent Communication in Multi-Agent Reinforcement Learning

Roberts, G., Park, S., Zhang, Y.arXiv preprint

Investigation of emergent communication protocols in multi-agent systems trained with reinforcement learning, revealing surprising linguistic structures.

Multi-Agent RL

Emergent Communication

Language

PDF

Code

Collaborate With Me

I'm always interested in collaborating on research projects at the intersection of AI safety, neuroscience, and interpretable machine learning.

Contact for Research

Google Scholar