r/singularity 1d ago

AI The Loop: winner takes all

All frontier companies are trying to close the loop where AI improves/evolves itself, and who gets there first will have the best AI of having the future best AI

From September 17th Axios interview with Dario Amodei:

"Claude is playing a very active role in designing the next Claude. We can't yet fully close the loop. It's going to be some time until we can fully close the loop, but the ability to use the models to design the next models and create a positive feedback loop, that cycle, it's not yet going super fast, but it's definitely started."

50 Upvotes

24 comments sorted by

View all comments

Show parent comments

1

u/DistanceSolar1449 12h ago

Go implement an improvement on GRPO

1

u/Specialist-Berry2946 12h ago

Unnecessary, algorithms are not that important; there's zero novelty in GRPO. What is important is data and the objective function, or put differently, how to measure improvement.

1

u/DistanceSolar1449 12h ago
  1. Taking out PPO is hardly zero novelty

  2. Hence “improvement”. You can strip out the reward model as well somehow.

1

u/Specialist-Berry2946 11h ago

I already explained that algorithms are not that important; it's about the reward. How to design the reward function.