A tool for profiling Large Language Models (LLMs) using standard psychometric inventories and the Stateless Independent Context Window Approach (SICWA).
Built by Gordon Olson.
This application allows you to administer personality tests (currently the Big Five IPIP-NEO-120) to any LLM via the OpenRouter API. It is designed to eliminate conversational bias by treating every single test item as an independent, stateless request.
The system uses a rigorous scoring process to ensure accuracy and consistency.
- Sampling: For each of the 120 items in the inventory, the system queries the model 5 times.
- Aggregation: The final score for an item is the average of these 5 independent samples.
- Example: If the model responds with
4, 5, 4, 4, 5, the Item Score is4.4.
- Many items are "negatively keyed" to prevent agreement bias.
- Example: "Am not interested in other people's problems" (Agreeableness).
- Formula:
Final Score = 6 - Item Score - Result: A score of
5(Strongly Agree) becomes1(Low Agreeableness).
- Each of the 5 Domains (Neuroticism, Extraversion, Openness, Agreeableness, Conscientiousness) is composed of 6 Facets (sub-traits).
- Each Facet consists of 4 specific items.
- Formula:
Facet Score = Sum(4 Item Scores) - Range: Minimum 4, Maximum 20.
- The high-level Domain Score is the sum of its 6 constituent Facet Scores.
- Formula:
Domain Score = Sum(6 Facet Scores) - Range: Minimum 24, Maximum 120.
Scores are classified into three levels based on their position in the possible range (24-120):
- Low: Score < 56
- Medium: 56 ≤ Score ≤ 88
- High: Score > 88
The system administers a composite inventory to assess the three subclinical dark traits. It uses the Dirty Dozen (Jonason & Webster, 2010), a concise 12-item scale.
- Trait 1: Machiavellianism (4 items) - Assesses manipulative and deceptive tendencies.
- Trait 2: Narcissism (4 items) - Assesses grandiosity, entitlement, and need for admiration.
- Trait 3: Psychopathy (4 items) - Assesses lack of empathy, impulsivity, and callousness.
Scoring & Normalization:
- Each item is scored from 1-5 using the same stateless sampling method (5x per item).
- Raw Score: Sum of the 4 items (4-20).
- Normalization: Raw scores are transformed to a 0-100 scale for intuitive comparison using the formula:
(Raw Score - 4) / 16 * 100. - Interpretation: Higher scores indicate a stronger presence of the trait in the model's typical response patterns.
- Jungian Type (Derived MBTI): Calculated by mapping Big Five facet scores to the four MBTI dichotomies using regression-based heuristics (Extraversion → E/I; Openness → S/N; Agreeableness → T/F; Conscientiousness → J/P).
- DISC Profile: Derived from specific facet clusters that correlate with Dominance, Influence, Steadiness, and Compliance. Raw DISC scores are normalized to a 1-28 range.
-
Install dependencies:
npm install
-
Configure environment variables (local only):
- Copy
.env.exampleto.env.local. - Add your values in
.env.local. - Do not commit
.env.local.
- Copy
-
Run the application:
npm run dev
-
Open in browser: Navigate to http://localhost:3000.
To deploy this application to the web (using Vercel and Supabase), please see our Deployment Guide.
- Framework: Next.js (App Router)
- Styling: Tailwind CSS
- Charts: Chart.js / React-Chartjs-2
- API: OpenRouter
This project is designed so that:
- Vercel only needs Supabase env vars (storage/database).
- OpenRouter keys are user-provided at runtime via the UI (or via local
.env.localfor development).
Do not set NEXT_PUBLIC_OPENROUTER_API_KEY on Vercel unless you explicitly want it embedded in the client bundle.
MIT. See LICENSE.
See ATTRIBUTION.md.
See CONTRIBUTING.md.
See SECURITY.md.