Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

ratings:
Length:
64 minutes
Released:
Jan 5, 2024
Format:
Podcast episode

Description

Happy 2024! We appreciated all the feedback on the listener survey (still open, link here)! Surprising to see that some people’s favorite episodes were others’ least, but we’ll always work on improving our audio quality and booking great guests. Help us out by leaving reviews on Twitter, YouTube, and Apple Podcasts! ? Big thanks to Chris Anderson for the latest review - be like Chris!Note to the Audio-only ListenerBecause of the nature of today’s topic, it makes the most sense to follow along the demo on video rather than audio. There’s also about 30 mins of demos and technical detail that we had to remove from the audio version, because they didn’t make sense without video.Trailer here.Full 90min chat:(In other words, pls jump over and watch on our YouTube if you can! Did you know we are now posting every episode to YouTube? We’ve been multimodal for a long time!)Trend 1: GPT4-V CodingYou might remember Greg Brockman’s hand-scribble-to-working-website demo from the GPT-4 demo from March. This was largely inaccessible to the rest of us until the GPT4-V API was released at Dev Day in November.As mentioned in our November 2023 recap, one of the biggest viral trends was tldraw’s open source “Make It Real” demo: starting from a simple wireframe and text annotations, you could create a real, functioning UI with the click of a button. Provoking another crisis of confidence in developer circles:And using state charts:And provoking responses from Excalidraw, a competitor.You can see us creating a Replit clone in this silent video here:Since our intervew the new GPT4V Coding metagame has been merging app UI’s and SQL with Supabase (another AIE Summit speaker) and other backend tools:* generating SQL* converting ERDs to SQL (part 2, for MariaDB)* seeding sample data* doing migrationsTrend 2: Latent Consistency ModelsAs covered in the Latent Space Paper Club in November, 3 papers drove a roughly ~100x acceleration in the speed of text to image generation over the past year:* Consistency Models (with Ilya Sutskever)* Latent Consistency Models (from Tsinghua)* LCM-LoRA (also Tsinghua, same authors)With the invaluable help of Fal.ai (friends of the show and AI Engineer Summit and progenitors of the viral GPU Rich/Poor hats mentioned on the Semianalysis episode), TLDraw has also been at the forefront of putting this research into production, with two projects:* drawfast: add a prompt, start sketching into the canvas and see each stroke affect the drawing. Overlap multiple of them to extend and merge drawings.* lens: a collaborative canvas where in real time people can draw and have their sketch turn into AI-generated art. Start drawing at the bottom and see it scroll into the magic canvas. For nontechnical people in your life, we do recommend showing them lens.tldraw.com (and its predecessor that we discuss on the show) on your and their mobile devices.The Rise of Multimodal PromptingAt the first AI Engineer Summit in October, Logan (our first guest!) declared this the Year of Multimodality. Over the next 2 months we saw an explosion of activity in multimodal: GPT-4V’s API release at OpenAI Dev Day (our coverage here), LLaVA (our chat with author here on Visual Instruction Tuning), BakLLaVA, Qwen-VL, CogVLM, etc.On today’s episode we have Steve Ruiz, founder of tldraw. The project originally started as an open source whiteboard that Steve built for himself and then “accidentally made a really, really good visual multimodal prompting application environment”. Turns out that infinite canvas and generative models are a very good match:* Design is iterative: DALL-E, Midjourney, etc all work in a linear way: prompt goes in, 1-4 images come back. As you generate more, the previous images scroll away from your view. In a canvas environment, you can see the progression of your generation and visually “branch” by putting new prompts in different spaces.* UI has “layers”: when designing interfaces there are different layers to it: the functionality,
Released:
Jan 5, 2024
Format:
Podcast episode

Titles in the series (75)

The podcast by and for AI Engineers! We are the first place over 50k developers hear news and interviews about Software 3.0 - Foundation Models changing every domain in Code Generation, Computer Vision, AI Agents, and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from tiny (George Hotz), Databricks, Glean, Replit, Roboflow, MosaicML, UC Berkeley, OpenAI, and more. www.latent.space