Praxis Research

We research to ensure human oversight and control over future AIs.

The motivation for our work is explained in this talk.
The best way to get in touch is by using our contact form.
We invite all researchers to collaborate on open problems through Sprints.

Deception

Language Models Learn to Mislead Humans via RLHF

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

Collusion

LLM Evaluators Recognize and Favor Their Own Generations

Spontaneous Reward Hacking in Iterative Self-Refinement

Coherence

Unsupervised Elicitation of Language Models

People SprintsSprint projectsArticlesDeception Collusion Coherence