? ThursdAI - June 20th - ? Claude Sonnet 3.5 new LLM king, DeepSeek new OSS code king, Runway Gen-3 SORA competitor, Ilya's back & more AI news from t…

FromThursdAI - The top AI news from the past week

Start listening View podcast show

? ThursdAI - June 20th - ? Claude Sonnet 3.5 new LLM king, DeepSeek new OSS code king, Runway Gen-3 SORA competitor, Ilya's back & more AI news from t…

FromThursdAI - The top AI news from the past week

ratings:

Length:

69 minutes

Released:

Jun 20, 2024

Format:

Podcast episode

Description

Hey, this is Alex. Don't you just love when assumptions about LLMs hitting a wall just get shattered left and right and we get new incredible tools released that leapfrog previous state of the art models, that we barely got used to, from just a few months ago? I SURE DO! Today is one such day, this week was already busy enough, I had a whole 2 hour show packed with releases, and then Anthropic decided to give me a reason to use the #breakingNews button (the one that does the news show like sound on the live show, you should join next time!) and announced Claude Sonnet 3.5 which is their best model, beating Opus while being 2x faster and 5x cheaper! (also beating GPT-4o and Turbo, so... new king! For how long? ¯\_(ツ)_/¯)Critics are already raving, it's been half a day and they are raving! Ok, let's get to the TL;DR and then dive into Claude 3.5 and a few other incredible things that happened this week in AI! ? TL;DR of all topics covered: * Open Source LLMs * NVIDIA - Nemotron 340B - Base, Instruct and Reward model (X)* DeepSeek coder V2 (230B MoE, 16B) (X, HF)* Meta FAIR - Chameleon MMIO models (X)* HF + BigCodeProject are deprecating HumanEval with BigCodeBench (X, Bench)* NousResearch - Hermes 2 LLama3 Theta 70B - GPT-4 level OSS on MT-Bench (X, HF)* Big CO LLMs + APIs* Gemini Context Caching is available * Anthropic releases Sonnet 3.5 - beating GPT-4o (X, Claude.ai)* Ilya Sutskever starting SSI.inc - safe super intelligence (X)* Nvidia is the biggest company in the world by market cap* This weeks Buzz * Alex in SF next week for AIQCon, AI Engineer. ThursdAI will be sporadic but will happen!* W&B Weave now has support for tokens and cost + Anthropic SDK out of the box (Weave Docs)* Vision & Video* Microsoft open sources Florence 230M & 800M Vision Models (X, HF)* Runway Gen-3 - (t2v, i2v, v2v) Video Model (X)* Voice & Audio* Google Deepmind teases V2A video-to-audio model (Blog)* AI Art & Diffusion & 3D* Flash Diffusion for SD3 is out - Stable Diffusion 3 in 4 steps! (X)ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.? New king of LLMs in town - Claude 3.5 Sonnet ? Ok so first things first, Claude Sonnet, the previously forgotten middle child of the Claude 3 family, has now received a brain upgrade! Achieving incredible performance on many benchmarks, this new model is 5 times cheaper than Opus at $3/1Mtok on input and $15/1Mtok on output. It's also competitive against GPT-4o and turbo on the standard benchmarks, achieving incredible scores on MMLU, HumanEval etc', but we know that those are already behind us. Sonnet 3.5, aka Claw'd (which is a great marketing push by the Anthropic folks, I love to see it), is beating all other models on Aider.chat code editing leaderboard, winning on the new livebench.ai leaderboard and is getting top scores on MixEval Hard, which has 96% correlation with LMsys arena.While benchmarks are great and all, real folks are reporting real findings of their own, here's what Friend of the Pod Pietro Skirano had to say after playing with it: there's like a lot of things that I saw that I had never seen before in terms of like creativity and like how much of the model, you know, actually put some of his own understanding into your request-@SkiranoWhat's notable a capability boost is this quote from the Anthropic release blog: In an internal agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems, outperforming Claude 3 Opus which solved 38%. One detail that Alex Albert from Anthropic pointed out from this released was, that on GPQA (Graduate-Level Google-Proof Q&A) Benchmark, they achieved a 67% with various prompting techniques, beating PHD experts in respective fields in this benchmarks that average 65% on this. This... this is crazyBeyond just the benchmarks This to me is a ridiculous jump because Opus was just so so good already, and Sonnet 3.5 is jumpin

Released:

Jun 20, 2024

Format:

Podcast episode

Titles in the series (58)

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more. sub.thursdai.news

Skip carousel

More Episodes from ThursdAI - The top AI news from the past week

Skip carousel

Related podcast episodes

Skip carousel

Discover this podcast and so much more

? ThursdAI - June 20th - ? Claude Sonnet 3.5 new LLM king, DeepSeek new OSS code king, Runway Gen-3 SORA competitor, Ilya's back & more AI news from t…

? ThursdAI - June 20th - ? Claude Sonnet 3.5 new LLM king, DeepSeek new OSS code king, Runway Gen-3 SORA competitor, Ilya's back & more AI news from t…

Description

Titles in the series (58)

More Episodes from ThursdAI - The top AI news from the past week

Related podcast episodes