Centaur: A Foundation Model for Human Cognition

2025-07-17 2025-07-17

Tak@

Hello, I'm Tak@, a system integrator.

How would you feel if a "model" emerged that could predict and reproduce the human mind, much like a computer program? It might sound like something out of a sci-fi movie, but in the world of cognitive science, such a model is rapidly becoming a reality.

Its name is Centaur.

In this article, I'll unravel how this groundbreaking model helps us understand our own "mind" and what new possibilities it brings to the future of science, incorporating my perspective as a system integrator.

The Versatility of Centaur: Setting It Apart from Traditional AI Models

We make countless decisions daily, from simple acts like choosing breakfast cereal to complex problems like developing disease treatments or exploring space.

It's astounding how broad human intelligence is, as we acquire new skills from just a few examples, infer causal relationships, and let curiosity drive our actions. This diversity is what makes us human.

In contrast, most modern AI and cognitive science models are "specialized," built for specific purposes.

For instance, Google DeepMind's AlphaGo excels at Go but can do little else. In cognitive science, models like Prospect Theory offer deep insights into decision-making mechanisms but don't explain how learning, planning, or exploration occur.

However, to understand the human mind as a whole, we must shift from these "domain-specific theories" to "unified theories" that cross various domains.

The importance of such a unified approach has long been recognized by pioneers in cognitive science. As they said, "A unified theory of cognition is the only way to control knowledge."

Centaur: Challenging Towards a General Cognitive Model

As a step toward this grand goal, a challenge was posed: to create a computational model that can predict and reproduce human behavior in any situation. To meet this challenge, the Centaur foundation model for human cognition was developed.

As a system integrator, I typically develop systems tailored to client requests. I often find myself challenged by the low versatility of these systems, as each is specialized for a particular task. Therefore, I deeply resonate with the idea behind Centaur: a single model aiming to capture diverse human behaviors.

Centaur was designed based on data. Specifically, it was created by fine-tuning the cutting-edge language model Llama 3.1 70B with Psych-101, a massive human behavior dataset.

This approach allows Centaur to leverage the vast knowledge embedded in the language model and adapt it to human behavior.

Psych-101: A Massive Dataset of the Human Mind

The existence of the Psych-101 dataset, used for its training, is crucial for Centaur's impressive versatility. This dataset boasts an unprecedented scale and diversity.

The Astonishing Scale and Diversity of the Dataset

Psych-101 encompasses data from 160 different psychological experiments. It involves over 60,000 participants and records more than 10 million human choice behaviors.

These experiments cover a wide range of cognitive domains, including multi-armed bandits, decision-making, memory, supervised learning, and Markov decision processes.

A notable feature of this dataset is that each trial's data from these experiments is described in natural language. This allows different experimental paradigms (frameworks) to be represented in a common format, laying the groundwork for the language model to understand human behavior more deeply.

An "Ongoing" Dataset

The development of Psych-101 is still underway. While it currently focuses on learning and decision-making, future plans include incorporating even more domains, such as psycholinguistics, social psychology, and economic games, as well as information on "individual differences" like age, personality, and socioeconomic status.

This aims to enable the model to capture more diverse human behaviors and even individual variations.

However, it's recognized that the current data is biased towards "Western, Educated, Industrialized, Rich, and Democratic (WEIRD)" populations, indicating room for future improvement.

Nevertheless, the existence of this large and diverse dataset undeniably forms the foundation for Centaur's astonishingly accurate capture of human behavior. If all your life choices were recorded as data, what patterns would emerge? Isn't it fascinating to imagine?

Centaur's Remarkable Predictive Power and "Mind" Reproduction

Centaur doesn't just predict human behavior; it seems to reproduce the human "mind" itself. Its capabilities have been demonstrated through various rigorous tests.

Highly Accurate Behavior Prediction in Unseen Situations

First, Centaur was shown to predict the behavior of "unseen participants" (not included in the training data) with higher accuracy than existing cognitive models. Centaur consistently shows a superior fit to human behavior in almost all experiments compared to existing cognitive models.

Even more surprisingly, Centaur extends its capabilities to "experiments not included in its training data."

Adapting to cover story changes: For example, Centaur accurately captured human behavior even in an experiment involving a "magic carpet" story, which was not in its training data. This suggests that it doesn't just memorize patterns but understands the essence of the task.
Adapting to task structure changes: While the two-armed bandit experiment (a task to earn rewards from two choices) was in the training data, Centaur accurately predicted human behavior in a new task called "Maggie's Farm," which increased the choices to three. This means it can apply the underlying cognitive mechanisms even when the basic task structure changes.
Generalizing to entirely new domains: Furthermore, Centaur successfully captured human behavior in new cognitive domains completely absent from its training data, such as those requiring logical reasoning. This demonstrates true versatility, unconstrained by specific domains.

Generating Human-like Behavior and Predicting Reaction Times

Centaur can not only predict behavior by considering past actions but also autonomously generate human-like behavior through "open-loop simulation."

This is a more stringent test where the model simulates its own responses while feeding them back. Centaur performed comparably to humans in this test as well.

For instance, it reproduced human-specific behavior patterns, such as exploring based on uncertainty, which is rarely seen in many language models.

It was also found that Centaur can reproduce not just the behavior of specific individuals, but also the diverse behavioral patterns exhibited by an entire group (e.g., a mixture of model-free and model-based learning).

Interestingly, while Centaur accurately predicts human behavior, it struggles to predict the behavior of artificial agents. This suggests that the model has learned human-specific characteristics.

Beyond mere choices, Centaur can even predict human reaction times.

This implies that the model captures aspects of the cognitive process behind actions, such as "how long a person thinks before acting." Its accuracy, measured by the R2 coefficient, surpassed existing language and cognitive models.

As a developer, I find it challenging to incorporate such complex human behavioral characteristics into the AI tools I build. This is reminiscent of the excitement I feel when I manage to "hear" a customer's unspoken needs and propose a system that exceeds their expectations.

Striking Alignment Between Internal Representation and Neural Activity

Centaur's remarkable capabilities extend beyond predicting behavior. Its internal representations, which can be thought of as its "thought processes," show a strong tendency to align with human brain activity. This means that as the model learns human behavior, it acquires an internal structure similar to how the human brain processes information.

Neural Alignment from Behavioral Data

Researchers conducted two analyses to verify how well Centaur's internal representations align with human neural activity.

fMRI Data Prediction in a Two-Step Task: First, they analyzed functional magnetic resonance imaging (fMRI) data from humans performing a "two-step task" decision-making experiment. Although the cover story for this task was not in Centaur's training data, Centaur's internal representations consistently predicted human neural activity more accurately compared to Llama's internal representations. This indicates that fine-tuning with large-scale behavioral data effectively aligns the model's internal representations with human neural activity.
fMRI Data Prediction in a Sentence Reading Task: Next, a similar analysis was performed using fMRI data from humans reading simple six-word sentences. The main purpose of this analysis was to see if neural alignment would be maintained in unrelated settings after fine-tuning on cognitive experiments. As a result, Centaur tended to show a higher correlation with human neural activity than Llama, suggesting that fine-tuning is beneficial for neural alignment.

These results suggest that Centaur not only "behaves like a human" but that its "internal mechanisms" for generating that behavior may also be approaching the mechanisms of the human brain.

This could even imply that AI is not just mimicking the workings of our "mind" but, in a sense, "understanding" it. If AI could decipher your thought patterns, would you find it convenient or a little unsettling?

Centaur: Opening New Horizons in Cognitive Science

Centaur and Psych-101 are more than just research tools; they hold the potential to profoundly transform the future of cognitive science. These models can accelerate data-driven scientific discovery and serve as powerful aids in the human development of cognitive models.

Accelerating Scientific Discovery Through Model-Driven Insights

The paper "A foundation model to predict and capture human cognition" provides specific examples of how Centaur and Psych-101 can lead to scientific discoveries.

Discovery of New Decision-Making Strategies

Because Psych-101 data is in natural language form, it can be easily processed by language-based reasoning models like DeepSeek-R1. Researchers asked DeepSeek-R1 to explain participant behavior in a multi-attribute decision-making experiment.

The model then derived a new decision-making strategy that combined two heuristics (rules of thumb) that had not been combined in previous research. When this model was implemented, it was found to explain human response behavior more accurately than any strategy previously considered in the original research.

Improving Models Through Scientific Regret Minimization

Even the model discovered by DeepSeek-R1 couldn't quite match Centaur's prediction accuracy. To address this, researchers used a technique called "scientific regret minimization."

This method uses a prediction model as a "benchmark" to identify human responses that are not captured by existing models but are, in principle, predictable.

Typically, this method requires large, experiment-specific datasets, but by using Centaur, this process could be significantly broadened without collecting specific data.

Upon closer examination of the responses that Centaur could accurately predict but the DeepSeek-R1-discovered model could not, it was found that participants sometimes chose lower-valued options.

This pattern suggested that the switch between the two heuristics might be less rigid than initially thought.

Therefore, by replacing the strict rule with a "weighted average of the two heuristics," the resulting model achieved prediction accuracy comparable to Centaur while remaining interpretable.

When I build AI models in my daily work, I'm excited by the prospect that this approach could improve model accuracy with less data. I can hardly contain my excitement about this!

Automated Cognitive Science and Future Applications

Centaur is expected to have many more applications in the context of "automated cognitive science." For example, it could be used for "in silico prototyping" of experimental research.

This means the model could be used to simulate in advance which experimental designs would yield the largest effect sizes (strength of experimental results), how to reduce the number of participants needed, or how to estimate the power of an effect (statistical significance).

In the future, by analyzing Centaur's internal representations even more deeply, it will be possible to generate hypotheses about how humans represent knowledge and process information, and then test those hypotheses through actual experiments. This marks a significant step for cognitive science entering a new era.

Conclusion

We've explored how Centaur, a groundbreaking foundation model for human cognition, can predict human behavior with astonishing accuracy and even reproduce its thought processes.

Unlike traditional AI specialized for specific tasks, Centaur has successfully generalized its capabilities to unseen situations and entirely new domains by learning from the extensive psychological experiment dataset, Psych-101.

The fact that its internal mechanisms align with human brain activity truly hints at a future where human thought can be visualized and analyzed as digital data. This is a realm once only discussed in science fiction.

Establishing a unified theory of cognition has been a long-standing goal in psychology. Centaur has made significant progress toward this goal, consistently demonstrating superior performance in competition with many established models. This strongly indicates that the discovery of data-driven general cognitive models is a highly promising direction for research.

The advancements in research, exemplified by Centaur, in unraveling the mysteries of human cognition, will serve as a key to understanding our own "mind" more deeply. What possibilities do you envision as AI delves deeper into the intricacies of the human "mind"?

Source: Nature paper

Follow me!