Is Your Future Visible to AI? A System Integrator's Take on the Evolution of Predictive AI

Hi, I’m Tak@, a system integrator. I spend my days exploring the potential of generative AI and developing new web services.

The future is always uncertain. But what if AI could predict future events with the same or even greater accuracy than human "superforecasters"? How would our personal and societal decision-making change? Imagine this: your next career choice, a company's major investment decision, or a nation's pandemic response—in all these crucial moments, AI could show you concrete future possibilities.

Advancing Event Forecasting through Massive Training of Large Language Models: Challenges, Solutions, and Broader Impacts

The Dawn of Predictive AI

In the past, large language models (LLMs) used for event forecasting faced harsh criticism for being overhyped. I was skeptical myself when I first heard reports that they were approaching "superforecaster level." But subsequent research and technological advancements have turned my skepticism into genuine surprise.

In recent years, numerous studies using more rigorous evaluation methods have shown that the latest LLMs, such as GPT-4o and Claude-3.5-Sonnet, are steadily improving their predictive accuracy. For instance, in one dynamic benchmark, GPT-4o recorded a Brier score (a metric for forecasting error, where a lower value indicates better performance) of 0.133, and Claude-3.5-Sonnet scored 0.122. While these scores don't yet match the superforecaster AI's 0.096, they are on par with or even surpass the collective wisdom of the general public (Brier score 0.121). Seeing AI, which I once viewed as a mere pattern recognizer, now delving deep into information and reasoning like a human expert is truly astonishing for a system developer like me.

This remarkable progress is fueled by several technological breakthroughs:

  • The application of Reinforcement Learning (RL): AI now evaluates its own predictions and learns from rewards, entering a self-improvement cycle where it not only "predicts" but also "learns to predict better." For example, one study using Polymarket data showed that a model improved its Brier score significantly through RL, reaching a level similar to the OpenAI o1 model.
  • The emergence of advanced reasoning models like "Deep Research": These models go beyond relying on simple prompts. They can search and test various reasoning strategies on their own, just like a human researcher, to solve problems. This ability to integrate complex information and derive conclusions from multiple perspectives under uncertainty is exactly what's needed for future prediction.

Given these positive trends, researchers are now strongly suggesting that the time is ripe to train LLMs for event forecasting on a massive scale. This suggests that AI could evolve beyond simple information retrieval and generation to provide a deeper, more accurate "predictive intelligence" about uncertain future events. It might fundamentally change how we "read the future" in our business and daily lives. Don't you think so?

The "Walls" Holding Back Predictive AI: The Challenges of Data and Learning

However, realizing this dream of a "superforecaster AI" requires overcoming some major hurdles. Training LLM-based event forecasting models presents unique challenges not found in other AI tasks.

1. The "Noise" and "Sparsity" of Prediction Outcomes

Forecasting future events is not like a typical Q&A task with a single, clear answer. For example, to the question, "Will SpaceX successfully achieve orbital flight by June 2024?" we don't have a certain answer today. Even if SpaceX did succeed on March 14, 2024, the probability back in December 2023 was not 100%; it contained uncertainty, or noise. If we were to train an AI to learn a "100% success rate," it might hinder its ability to develop proper search and reasoning skills.

Furthermore, even for significant events like a presidential election or a rocket launch, the number of "similar past examples" available for training is very limited. From a machine learning perspective, this leads to a serious problem of data sparsity. With limited data, a model risks overfitting to specific patterns or failing to generalize to new situations. This high level of uncertainty is like trying to find a goal in a dense fog. In traditional system development, clear requirements are key, but with predictive AI, that's not the case.

2. The "Knowledge Cut-off" Problem: Past Information as a Hindrance

The second challenge is the knowledge cut-off problem. LLMs only possess knowledge up to their training cut-off date. For event forecasting, however, a model often needs to make predictions based only on information available up to a specific past date (the question date). If the evaluation data includes past events that the LLM already knows the outcome of, the model can simply recall the memorized information instead of engaging in genuine search and reasoning. This prevents it from developing true predictive capabilities. To address this, a dynamic benchmark, not a static one, is crucial for evaluating the true abilities of modern LLMs.

3. The "Simple Reward Structure" Problem: Is AI "Cheating"?

Third is the simple reward structure problem. In reinforcement learning, AI learns to maximize the reward it's given. In event forecasting, the reward is typically based on the error between the model's predicted probability and the actual outcome. But this alone can allow the model to gain rewards relatively easily without developing proper reasoning or information retrieval skills. For instance, an AI might issue an extreme prediction (0% or 100%) even without confidence, hoping for a big reward if it happens to be right. This is unlike math or coding tasks, where a correct intermediate reasoning process is necessary to get the final answer. This kind of "cheating" prevents the AI's actual predictive ability from improving. It could even lead to confident hallucinations—AI generating information that is not based on facts—if this issue isn't resolved.

AI's "Learning Strategies" to Overcome the Walls

To overcome these complex challenges, researchers are employing various clever strategies. Just as a veteran system integrator uses a mix of technologies to build a system for complex customer requirements, the methods for training AI are also evolving.

1. Diverse Use of Reward Signals

What constitutes a "correct" answer in event forecasting is a constant debate. While using the final outcome is intuitive, it's a noisy signal. So, a new focus has been placed on using market forecasts from prediction markets like Polymarket or Metaculus. These platforms reflect the collective wisdom of participants, and the probabilities they generate are considered the most reliable estimates of the "hidden probability" at that time.

By using both the final outcome and market forecasts as rewards, and even incorporating the market forecast from an intermediate point in time, AI can learn more accurately, especially when data is sparse. This feels much like project management, where evaluating not just the final result but also intermediate progress and customer feedback is crucial.

2. Data Augmentation to Overcome "Knowledge Cut-off"

To solve the knowledge cut-off problem, we need to use data that forces the model to rely on genuine reasoning, not just memorization.

  • Events the LLM doesn't "know" well: Training the model on past events whose facts the LLM knows, but whose relationships or comparative results it hasn't memorized, can encourage it to use search and reasoning skills to develop general predictive abilities.
  • Counterfactual events: This method involves creating fictional events with outcomes opposite to what actually happened and generating a hypothetical news story about them for training. For example, for a successful SpaceX launch, a counterfactual scenario of "launch failure" would be created. This prompts the LLM to reason based on the provided information, even for events before its knowledge cut-off, fostering true reasoning skills. It's similar to conducting a system failure simulation—creating the worst-case scenario to learn from it.

3. Auxiliary Signals to Foster Reasoning

To counter the simple reward structure problem and improve reasoning, researchers propose giving auxiliary reward signals that evaluate the reasoning process itself, not just the outcome.

  • Evaluating the process: One method is to use a separate "judge LLM" trained on human evaluations of the AI's reasoning process. This ensures the AI is rewarded not just for the answer, but for the quality of its thought process. It's like a programmer being evaluated not only on the code's functionality but also on its design philosophy.
  • Subquestions: Creating subquestions related to a main prediction task also helps. For example, when predicting a presidential election winner, the AI would also be asked to forecast sub-events like, "Will Candidate X lead in the October polls?" This encourages the AI to form a consistent, deeper reasoning process across multiple related events.

A "Predictive Web" Woven from Massive Data

To truly boost AI's predictive capabilities, vast and diverse datasets are essential. As a system integrator, I know that the quantity and quality of data, and how efficiently it's used, are what determine a system's performance. This data-driven approach is paramount in event forecasting.

1. Prediction Market Datasets

The data from prediction markets like Polymarket and Metaculus has been a primary resource. It provides structured information like the event's "question date," "resolution date," "final outcome," and the market's "forecast." The fact that Polymarket's forecasts for the 2024 U.S. presidential election were more accurate than traditional experts shows the reliability of this data.

2. Public Datasets

Beyond prediction market data, researchers are actively using "public datasets" defined in database formats on the web. This includes quarterly GDP statistics, central bank economic indicators (FRED), conflict databases (ACLED), and climate data from NASA and NOAA. With massive amounts of time-series data, these datasets hold immense value for generalizing AI's predictive abilities.

3. Crawled Datasets

Building crawled datasets by automatically generating forecasting questions and answers from unstructured web data like Wikipedia, news articles, and arXiv is also advancing. This approach is like a new sensor for AI, allowing it to feel the "pulse" of the world and turn countless pieces of information into a treasure trove of predictive data.

These data collection methods enable the creation of a dynamic benchmark with many unresolved questions, which can significantly accelerate the development cycle of predictive AI. The AI learns, evaluates, and evolves on its own.

The Future Society Shaped by Predictive AI

As superforecaster AI evolves and becomes deeply integrated into society, the way we make decisions could change dramatically.

  • Scalable and personalized forecasting: AI could provide low-cost answers to a massive number of questions that current prediction markets or human forecasters can't handle. It could even address private questions based on personal values, like "Which charity will my donation have the biggest impact on?"
  • Tackling ambiguous questions: Predictive AI could handle subjective and multifaceted questions like, "Did ChatGPT have a positive effect on English education?" It could break down these ambiguous questions into more measurable subquestions, providing a reasonable "guesstimate" even in uncertain situations.
  • Human-AI collaboration: AI can serve as a tool to enhance human expertise. The most effective approach will likely be an ensemble method that combines AI predictions with human forecasts.
  • Integration with AI agents: The capabilities of predictive AI can be seamlessly integrated into general LLM agents. For example, an AI agent developing a new tech adoption strategy could use predictive AI to quantitatively assess a technology's market success probability, competitor reactions, and potential regulatory changes.

The Light and Shadow of Predictive AI: Challenges for Social Implementation

For predictive AI to be widely accepted and used, we still need to overcome some significant challenges. As a system integrator, I know that technology adoption always comes with pros, cons, and unforeseen risks.

1. Systemic Evaluation and Effective Communication of Trustworthiness

The most critical challenge is to systematically evaluate the trustworthiness of AI predictions in various real-world situations and effectively communicate that information to users. We need trustworthiness metrics that help users decide how much to rely on an AI's probabilistic forecasts. The "quality" of predictive AI is more complex and dynamic than that of traditional systems.

2. Addressing Potential Risks

As predictive AI becomes more common, new and unexpected risks will arise.

  • Self-fulfilling prophecies: An AI predicting an economic recession could influence market sentiment, potentially causing the recession itself.
  • Malicious attacks on the system: Bad actors could strategically publish misleading content online or create biased markets to distort the AI's training data.
  • Over-reliance on inaccurate predictions: Users might put too much trust in an inaccurate prediction and suffer negative consequences, like a failed business investment.
  • Model bias: Predictive models, trained on historical data from complex socioeconomic domains, could inherit and amplify existing biases. For example, systematically underestimating the economic potential of certain regions could perpetuate existing socioeconomic inequalities.

AI is a powerful tool, but if we misuse it, it could become a double-edged sword with a massive societal impact. We must bear full responsibility for its safe deployment.

The Future is Not Just Prediction, But Co-creation

We are now at the threshold of an era where AI can "see" future events more deeply and broadly than ever before. This journey has been full of uncertainty, moving from initial skepticism to rapid technological progress and the emergence of new challenges.

However, these challenges also offer an opportunity to redefine the relationship between humans and AI. An AI with enhanced predictive power is not here to replace us, but to be a powerful tool for us to "predict" the future together and then "co-create" a better future based on those predictions.

The impact of predictive AI on society is immeasurable. We must maximize its potential while carefully managing the associated risks. For AI to become a "thinking partner" that helps us co-create a better future, rather than just a "prediction machine," dialogue and understanding are essential not only among technologists but across society as a whole.

How would you like to use AI's predictions in your future choices? And what kind of future do you want to create with AI?

Follow me!

photo by:Drew Beamer