Trendaavat aiheet
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
"Paras video tekoälystä, jonka olen nähnyt"
Jos haluat oppia tekoälyn arvioinnit tyhjästä seuraamalla käytännön esimerkkiä, sinun kannattaa katsoa ilmainen, vaiheittainen opetusohjelmani @_amankhan.
Linkki videoon alla olevassa postauksessa.


24.8.2025
"Everyone says AI evaluations are important, so let's actually build one live from scratch.”
Here's my new episode with @_amankhan (Arize) where we build AI evals for a customer support agent live, including:
✅ Creating the eval criteria
✅ Labeling the golden dataset
✅ Aligning LLM judges with human scores
Some insights from Aman:
1. PMs must do manual labeling themselves. "I never found it useful to outsource human evals to contractors. The PM has to be in the spreadsheet to maintain good judgment."
2. Define what good/average/bad looks like on criteria like accuracy and tone upfront. This becomes your rubric for consistent evaluation across your team.
3. Make sure your LLM judges align with your human scores before you scale. Test the judges on a few dozen cases first and aim for at least 80%+ match rate.
📌 Watch now:
Also available on:
Spotify:
Apple:
Newsletter:
4K
Johtavat
Rankkaus
Suosikit