Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.

Dwarkesh Patel
"One of the very confusing things about the models right now: how to reconcile the fact that they are doing so well on evals.
And you look at the evals and you go, 'Those are pretty hard evals.'
But the economic impact seems to be dramatically behind.
There is [a possible] explanation. Back when people were doing pre-training, the question of what data to train on was answered, because that answer was everything. So you don't have to think if it's going to be this data or that data.
When people do RL training, they say, 'Okay, we want to have this kind of RL training for this thing and that kind of RL training for that thing.'
You say, 'Hey, I would love our model to do really well when we release it. I want the evals to look great. What would be RL training that could help on this task?'
If you combine this with generalization of the models actually being inadequate, that has the potential to explain a lot of what we are seeing, this disconnect between eval performance and actual real-world performance"

Dwarkesh Patel21 hours ago
The @ilyasut episode
0:00:00 – Explaining model jaggedness
0:09:39 - Emotions and value functions
0:18:49 – What are we scaling?
0:25:13 – Why humans generalize better than models
0:35:45 – Straight-shotting superintelligence
0:46:47 – SSI’s model will learn from deployment
0:55:07 – Alignment
1:18:13 – “We are squarely an age of research company”
1:29:23 – Self-play and multi-agent
1:32:42 – Research taste
Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!
234.57K
“There are more companies than ideas by quite a bit.
Compute is large enough such that it's not obvious that you need that much more compute to prove some idea.
AlexNet was built on 2 GPUs. The transformer was built on 8 to 64 GPUs. Which would be, what, 2 GPUs of today? You could argue that o1 reasoning was not the most compute heavy thing in the world.
For research, you definitely need some amount of compute, but it's far from obvious that you need the absolutely largest amount of compute.
If everyone is within the same paradigm, then compute becomes one of the big differentiators.”
@ilyasut

Dwarkesh Patel21 hours ago
The @ilyasut episode
0:00:00 – Explaining model jaggedness
0:09:39 - Emotions and value functions
0:18:49 – What are we scaling?
0:25:13 – Why humans generalize better than models
0:35:45 – Straight-shotting superintelligence
0:46:47 – SSI’s model will learn from deployment
0:55:07 – Alignment
1:18:13 – “We are squarely an age of research company”
1:29:23 – Self-play and multi-agent
1:32:42 – Research taste
Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!
144.37K
“From 2012 to 2020, it was the age of research. From 2020 to 2025, it was the age of scaling.
Is the belief that if you just 100x the scale, everything would be transformed?
I don't think that's true. It's back to the age of research again, just with big computers.”
@ilyasut

Dwarkesh Patel21 hours ago
The @ilyasut episode
0:00:00 – Explaining model jaggedness
0:09:39 - Emotions and value functions
0:18:49 – What are we scaling?
0:25:13 – Why humans generalize better than models
0:35:45 – Straight-shotting superintelligence
0:46:47 – SSI’s model will learn from deployment
0:55:07 – Alignment
1:18:13 – “We are squarely an age of research company”
1:29:23 – Self-play and multi-agent
1:32:42 – Research taste
Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!
179.52K
Top
Ranking
Favorites
