DApp Store | Piattaforma Web3 per eventi e giochi

Argomenti di tendenza

TL è stato in tutti gli ambienti RL per le ultime due settimane. @willccbb e @PrimeIntellect stanno facendo un lavoro incredibile costruendo una piattaforma per ambienti aperti (attualmente in accesso beta). Ne ho scritto e parlato un po' prima, ma voglio solo ribadire che la collaborazione open source sugli ambienti sarà una grande opportunità per generare modelli di ragionamento open-source SOTA. Buona fortuna al team PI🫡

i'll confess i do have a very specific mission in mind with this project. the semi-vague private beta rollout is part of it. the set of tasks we're sourcing is part of it. the GPU bounties are part of it. the shitposts are part of it. the podcasts are part of it. mindshare is crucial here. let me explain. currently, a lot of the discussion around RL environments is focused on this new wave of startups whose business model is building and selling environments to a very small number of big labs on an exclusive basis. mechanize is the loudest, but there's a number of them. instead of spending on instruction-tuning samples and annotations, labs are eager to buy private environments as their next big consumable resource for model training. this phenomenon is both a serious risk to the prospect of open-source models remaining competitive, as well as a major opportunity to tip the scales if we can shift the center of gravity. if good environments are all expensive and hidden, open-source models will fall even further behind. this is essentially what's happened with pretraining data. but if a sufficiently robust ecosystem of open-source tooling for environments and training can emerge, then the open-source option can also be the state-of-the-art. this is more-or-less what's happened with pytorch. tipping the scales here is my goal. our goal. i joined prime intellect because everyone was insanely talented, was goddamn serious about the mission of open-source AGI for everyone and wasn't afraid to say it, and because the team had a singular structural advantage that meant we could actually take some real swings. we sell compute. we build infra to improve what you can do with that compute. we do research on how to make that compute interoperate in new ways. we're training bigger and better models. we have the right incentives to do the hard, necessary work. these pieces are all connected. we can't do it alone. no one can. it'll take startups and enterprises and students and professors around the world. open research currently does not have the tools to study the questions that big labs have deemed most crucial to future progress. we have to find a way to build those tools. we're trying to make that easier. we all have to get better at working together, at not reinventing the wheel, at assembling individual pieces into bigger puzzles. let's take what we've collectively done so far, clean it up, make it work together, bring more people into the tent, and start playing more positive-sum games. if we can't find better ways to work together, we're heading towards an AI future where we collectively just *do not know what these models even are*, because the curtain is never lifted, and everything we can actually see is just a toy. there is a different type of company you could build in this space; one which still lets you sell to the big labs, but not exclusively; one which still lets you have your trade secret moats and print sweet ARR, but doesn't make us collectively less informed about the future we're building. browserbase. cursor. exa. modal. morph. and countless others. let's do more of these. you can build a great company by making powerful tools and harnesses for agents which reflect the high-value tasks people want models to actually do. have elements of it which are open to try freely, and elements which are hosted behind an API. charge by usage with some premium enterprise features. build the best LLM-shaped excel clone, or figma clone, or turbotax clone. change it just enough to avoid a lawsuit, and then let private cutomers see the more lawsuit-robust version. enjoy some healthy competition in the arena, and find ways to partner where it counts. find your angle and be so good that you can sell to everyone, whether for RL or for actual usage. hit critical mass and be so affordable that it's not worth it for anyone to try and rebuild what you've already made. this is the timeline i hope we end up in. it's a world where the big labs can all still do great, and will likely offer the easiest ways to spend a bit more to get improved general performance. but it's also one where open-source models aren't far behind, and everyone who cares enough can basically see what's going on and understand how the models we use are actually trained. if you're thinking about starting or joining a company focused on RL environments, i urge you to think about which timeline you're implicitly betting on, and reflect on how you feel about that.

17,97K

Principali

Ranking

Preferiti