Daily AI Skills
Posts
New TTS Model that Rivals ElevenLabs

New TTS Model that Rivals ElevenLabs

PLUS: Complete V0+Claude+Cursor Coding Guide + Sand AI Magi-1

Daily AI Skills
April 23, 2025

Welcome back to Daily AI Skills.

Here’s what we are covering today:
1. Sand AI Magi-1: Autoregressive Video Gen Model
2. A Text-to-Speech Model made by 2 undergrads
3. Claude’s Value System released by Anthropic

+ Complete V0+Claude+Cursor Coding Guide

Sand AI Unveils Magi-1: An Autoregressive Video Generation Model

Korean startup Sand AI has released Magi-1, a 24 billion parameter autoregressive diffusion model that redefines video generation, rivalling top commercial systems. Built by a lean team with limited resources, this open-source marvel, launched on April 21, 2025, is empowering creators with unprecedented control and quality.

The Details:

Innovative Design: Magi-1 generates videos chunk-by-chunk (24 frames each) using a transformer-based variational autoencoder (VAE) with 8x spatial and 4x temporal compression. Its autoregressive denoising algorithm ensures smooth, high-fidelity output with concurrent processing of up to four chunks.
Cutting-Edge Tech: Built on a Diffusion Transformer, Magi-1 incorporates Block-Causal Attention, SwiGLU, and a novel distillation algorithm for efficient inference. It supports text-driven image-to-video (I2V) tasks with fine-grained control over motion, timing, and scene transitions.
Unmatched Performance: Magi-1 outperforms open-source models like Wan-2.1 and closed-source systems like Kling on benchmarks, excelling in motion quality, instruction adherence, and physical precision (Physics-IQ score: 56.02 for V2V). Human evaluations confirm its state-of-the-art status.
Accessible and Scalable: Released under Apache 2.0 with pre-trained weights (24B and upcoming 4.5B models), Magi-1 runs on H100s or RTX 4090s with 500 free monthly credits. Sand AI plans a creator-focused platform for seamless content production.

ok wow...
Sand AI just dropped Magi-1, a new groundbreaking open source video generation model.
Unmatched control over timing, motion & dynamics and so much more...
More examples below:
— Angry Tom (@AngryTomtweets)
2:02 PM • Apr 21, 2025

Try out the model here: https://sand.ai/?x=test1

2 Undergrads Build TTS AI Model that Rivals Elevenlabs

Korean startup Nari Labs has unveiled Dia, an open-source text-to-speech model that it claims surpasses top commercial tools like ElevenLabs and Sesame—remarkably created by two undergraduates without any external funding.

Here’s what stands out:

Dia is a 1.6 billion parameter model with advanced features such as emotional inflection, multiple speaker identities, and nonverbal sounds like laughter, coughing, and screams.
The project took inspiration from Google’s NotebookLM and used Google’s TPU Research Cloud for computing resources.
In direct comparisons, Dia reportedly outperforms ElevenLabs Studio and Sesame CSM‑1B in areas like timing accuracy, emotional expressiveness, and handling nonverbal audio cues.
According to founder Toby Kim, Nari Labs now aims to build a consumer-facing app centred on creative social content generation and remixing using Dia.

Two undergrads. One still in the military. Zero funding.
One ridiculous goal: build a TTS model that rivals NotebookLM Podcast, ElevenLabs Studio, and Sesame CSM.
Somehow… we pulled it off. Here’s how 👇
— Toby Kim (@_doyeob_)
11:43 PM • Apr 21, 2025

Try out Dia here: https://github.com/nari-labs/dia/

Claude’s Values Revealed Through 300,000 Real Conversations

Anthropic has released a new study examining how AI models like Claude make moral decisions by analysing hundreds of thousands of real user interactions. This marks the first large-scale effort to chart the values that guide these models in everyday conversations.

Here’s what they found:

The study looked at over 300,000 anonymised conversations and identified 3,307 distinct values reflected in Claude’s responses.
These values were grouped into five categories: Practical, Knowledge-related, Social, Protective, and Personal. Practical and Knowledge-related values were the most commonly expressed.
The most frequent values included helpfulness and professionalism, while ethical values tended to surface when the AI refused to carry out harmful tasks.
Claude’s value expression varied depending on the context—for instance, it emphasised “healthy boundaries” when giving relationship advice, and “human agency” when discussing AI ethics.

Read the full paper by Anthropic here

Complete V0+Claude+Cursor Coding Guide

by Aryan Singh (ex-SWE @ Google)

📩Forward it to people you know who are keeping pace with the changing AI world, and stay tuned for the next edition to stay ahead of the curve!

Maybe you missed out on these: