Phase 2 project: Persona Elicitation

Submit through this form. Don’t edit the prefilled fields.
Email shi.feng@gwu.edu with “Praxis Sprint UE2 questions” as title if you have any questions or issues.

Overview

In this lab, we explore how Internal Coherence Maximization (ICM) can be applied operationally to support outer alignment. Concretely, we use ICM to improve value specification in a setting that requires personalization and pluralistic alignment. Intuitively, coherence measures the consistency of a model’s responses across related prompts. Here, we use ICM to search for coherent labels that describe personas (e.g., country-level opinion patterns) and then use those labels to better align a user’s query with meaningful context. Concretely:

Use ICM to elicit coherent labels over the GlobalOpinionQA (GOQA) dataset,
Treat each country as a distinct persona, and
Use these persona-specific labels to guide model behavior via in-context learning.

This project asks you to apply an existing ICM implementation to a new setting and evaluate its usefulness for value specification and by extension outer alignment.

Part 1: Implementation

Your first task is to apply ICM to GlobalOpinionQA (GOQA), and evaluate how well ICM-derived persona labels support in-context learning.

Tasks

Download the GlobalOpinionQA dataset and keep the questions with the national survey data from here (you should also limit your countries)
Prepare the GOQA data in a TruthfulQA-style format

For ICM, you need to divide the data into 2 equivalence classes.
For a quick implementation, focus on questions with binary survey options.

Define personas

Treat each country as a persona.
For each country/persona, split the data into:

A training set for ICM label search, and
A hold-out test set of questions.

Run ICM

For each persona (country), run ICM over the training data to search for coherent labels.
You may use all available training data per persona or subsample it for computational efficiency.
Once ICM finishes searching for labels, use those labels for many-shot in-context learning.
Generate a figure to compare the test set accuracy of four conditions: zero-shot, zero-shot chat, ICM in-context learning, gold label in-context learning

Report test accuracy of each persona and in aggregate

Figure 1: test accuracy aggregated over all personas. This should follow the format of Figure 1 in the ICM paper, but does not need to match the visual style exactly.
Figure 2: test accuracy as a function of the number of in-context examples. Compare ICM-searched labels, random labels, and gold labels.

Model choice

Base: Llama-3.1-405B
Chat: Llama-3.1-405B-instruct
Use both through hyperbolic API

You are expected to look through and understand the docs on your own.

Deliverables

Submit your code using the submission form. Google drive link to zip file or link to Github repo.
Your code repo should include the main results figure.

Evaluation criteria

Accuracy of your ICM-based setup (correct use of the algorithm on GOQA).
Many-shot accuracy using your ICM labels (based on your bar chart).
Code clarity. But we don’t seek production-quality code.
You can use AI tools. If you do so, let us know what tools and how, and include links to relevant chat logs in your submission.

Expected time allocation: 3-6 hours

Persona elicitation

Phase 2 project: Persona Elicitation

Overview

Part 1: Implementation