The Wonder of AI Teaching: A New Approach Radically Transforming Our Learning
Hi there, I'm Tak@! I usually work in system development, and in my free time, I explore the possibilities of generative AI.
Today, I want to talk about Sakana AI's groundbreaking research where AI has learned not just to solve problems, but also to "teach."
AI Evolution and the "Learning Wall"
In recent years, AI, especially Large Language Models (LLMs), has made remarkable progress. We're constantly amazed by their ability to solve complex problems and generate text, just like humans. To enhance "reasoning abilities," such as solving math problems or writing code, a technique called reinforcement learning has often been used in the AI world.
The Challenges of Traditional Reinforcement Learning
In the traditional approach, AI learns to "solve problems." For example, when given a difficult problem, if it gets the correct answer, it's "praised" (given a reward). If it makes a mistake, it receives feedback like "almost there." Through this repetition, AI develops its problem-solving skills.
However, this method had several challenges. First, AI wouldn't get specific hints until it found the correct answer, making it extremely difficult to figure out how to get it right on its own. It's like searching for an exit in a dark room. Therefore, this method was only applicable to already quite intelligent, expensive, and large-scale AIs.
Additionally, AI trained with this method became strong in specific tasks but lacked adaptability. Furthermore, there was a "gap" between solving a problem and "explaining how to solve it clearly," as the objectives were slightly different. It's like a teacher who's great at solving problems but poor at explaining them.
The Birth of "Reinforcement Learning Teachers (RLT)"
Amidst these challenges, Sakana AI's new research, "Reinforcement Learning Teachers (RLT)," might change the game. Their idea was a complete shift in perspective: instead of AI learning to "solve problems," it would "learn to teach."
A New Learning Method Inspired by Human Teachers
Real teachers don't necessarily need to "discover every theorem themselves" to teach effectively, do they? They use already known answers and solutions to create explanations that students can easily understand. RLT works similarly.
RLT is given the "correct answer" along with the problem from the start. The AI's job then becomes generating a "clear explanation" of how to arrive at that correct answer. In other words, with the answer in hand, the AI learns how to explain it in a way that another AI model (which we call the "student model") can understand best.
RLT's Clever Mechanism: Student Understanding as "Reward"
So, how does RLT learn to create "good explanations"? This is where the research gets interesting. RLT uses the explanations it creates to measure how well another AI model (the "student model") understands the problem and receives "rewards" based on that outcome.
Dense Feedback Promotes Growth
While traditional reinforcement learning provided broad feedback like "correct or incorrect" (sparse rewards), RLT receives much more detailed, dense feedback (dense rewards). Specifically, the RLT's generated explanation is given to the student model, and the student model's confidence in the correct answer after reading the explanation is numerically measured.
Furthermore, each step in the explanation is evaluated for how natural and logically connected it is for the student model—in other words, how "sensible" it is. If the explanation is unclear, the student model's understanding will be low, and RLT won't receive a reward. Conversely, if the student model understands smoothly, RLT receives many rewards, further improving its teaching method.
Having been involved in system development for many years, I recall an experience long ago deciphering complex COBOL code. At that time, a senior colleague's clear explanations helped me immensely, and it was a project where I learned the importance of communicating effectively. This RLT mechanism reminds me of an experienced system architect who diagrams a complex system so beginners can understand it, meticulously confirming their comprehension as they teach.
Thanks to this dense feedback, RLT can efficiently grow as an "expert in teaching," even without the ability to solve problems on its own.
Amazing Results: Small Teachers Nurture Large Students
This "learning to teach" approach has yielded truly astonishing results.
Small Models Become High-Performing Teachers
In Sakana AI's experiments, an RLT with only 7 billion parameters (comparable to an AI with general human capabilities) was found to be better at teaching reasoning skills than much larger AIs, such as DeepSeek R1, which has 671 billion parameters—hundreds of times larger. This shows that even small models can achieve very high capabilities by specializing in "teaching."
Furthermore, a 7-billion-parameter RLT was able to train a student model four times its size, with 32 billion parameters, and that student model also acquired excellent reasoning abilities. This is akin to a primary or secondary school teacher instructing students on advanced, university-level knowledge, which is truly groundbreaking in the AI world.
Significant Cost Reduction
Another major advantage of this technology is its ability to drastically reduce learning costs and time. It's reported that what would take months to train a 32-billion-parameter AI using traditional reinforcement learning was completed in less than a day with RLT. This significantly lowers the barrier to developing high-performance AI.
Why are RLT's Explanations So Clear?
What kind of "good explanations" does RLT produce to help student models learn?
Clear and Logical "Traces of Thought"
Explanations generated by traditional "problem-solving" focused AIs (like DeepSeek R1) sometimes suggested the use of external tools like calculators or even included out-of-place humorous comments. In contrast, RLT's explanations were found to include more specific and logical steps with less superfluous writing.
It's as if a seasoned teacher accurately understands where students might stumble and meticulously clarifies those points, showing each step of the thought process very clearly. This allows student models to learn reasoning skills more efficiently and deeply.
Ability to Handle Unseen Challenges
Even more surprisingly, RLT was able to create effective explanations for completely new types of problems it had never learned, in a zero-shot manner (without prior training). This indicates that RLT didn't just learn "tricks for solving problems" but acquired the general ability to "teach."
AI's "Teaching Power" Resonates with Our Own Learning
The Sakana AI's RLT research I've introduced today points to a new direction in AI development. It's not just about pursuing AI performance but also focusing on the educational aspect: "how AI conveys knowledge and promotes learning."
I'm developing an "AI Learning Planner" tool as a hobby. This tool proposes an optimal learning plan just by entering a goal and duration. At the root of this tool is the question of "how to learn most efficiently and effectively." The RLT research seems to answer that very question by having AI itself become a "good teacher."
When we learn something, the presence of a teacher who provides clear explanations or a guide who shows concrete steps is incredibly valuable, isn't it? AI, too, is greatly expanding its possibilities by learning not just to find the correct answer, but also how to clearly "teach" the "thought process" to arrive at that answer.
The Future of AI and Learning
What kind of future will "learning-to-teach" AI, like RLT, bring?
First, more high-performance AI will become easier to develop. This is because smaller AIs will be able to transfer their knowledge to "students" as "teachers," without requiring expensive large-scale models. This could lower the barrier to AI research and encourage more people to participate in AI development.
Ultimately, it also suggests the possibility of systems where AI teaches itself and continues to learn. It's like an AI creating its own practice problems, explaining them, grading them, and becoming smarter and smarter. This feels like a step closer to the dream of "self-growing AI."
Sakana AI's research today foreshadows AI increasingly becoming a "partner" in our learning and intellectual activities, rather than just a "tool."
What will AI "teach" us, and what will we "learn" from AI? The possibilities are sure to continue expanding.