
Why try to understand Gen Z slang when it might be easier to communicate with animals?
Today, Google unveiled DolphinGemma, an open-source AI model designed to decode dolphin communication by analyzing their clicks, whistles, and burst pulses. The announcement coincided with National Dolphin Day.
The model, created in partnership with Georgia Tech and the Wild Dolphin Project (WDP), learns the structure of dolphins’ vocalizations and can generate dolphin-like sound sequences.
The breakthrough could help determine whether dolphin communication rises to the level of language or not.
Trained on the world’s longest-running underwater dolphin research project, DolphinGemma leverages decades of meticulously labeled audio and video data collected by WDP since 1985.
The project has studied Atlantic Spotted Dolphins in the Bahamas across generations using a non-invasive approach they call “In Their World, on Their Terms.”
“By identifying recurring sound patterns, clusters and reliable sequences, the model can help researchers uncover hidden structures and potential meanings within the dolphins’ natural communication—a task previously requiring immense human effort,” Google said in its announcement.
The AI model, which contains roughly 400 million parameters, is small enough to run on Pixel phones that researchers use in the field. It processes dolphin sounds using Google’s SoundStream tokenizer and predicts subsequent sounds in a sequence, much like how human language models predict the next word in a sentence.
DolphinGemma doesn’t operate in isolation. It works alongside the CHAT (Cetacean Hearing Augmentation Telemetry) system, which associates synthetic whistles with specific objects dolphins enjoy, such as sargassum, seagrass, or scarves, potentially establishing a shared vocabulary for interaction.
“Eventually, these patterns, augmented with synthetic sounds created by the researchers to refer to objects with which the dolphins like to play, may establish a shared vocabulary with the dolphins for interactive communication,” according to Google.
Field researchers currently use Pixel 6 phones for real-time analysis of dolphin sounds.
The team plans to upgrade to Pixel 9 devices for the summer 2025 research season, which will integrate speaker and microphone functions while running both deep learning models and template matching algorithms simultaneously.
The shift to smartphone technology dramatically reduces the need for custom hardware, a crucial advantage for marine fieldwork. DolphinGemma’s predictive capabilities can help researchers anticipate and identify potential mimics earlier in vocalization sequences, making interactions more fluid.
Understanding what cannot be understood
DolphinGemma joins several other AI initiatives aimed at cracking the code of animal communication.
The Earth Species Project (ESP), a nonprofit organization, recently developed NatureLM, an audio language model capable of identifying animal species, approximate age, and whether sounds indicate distress or play—not really language, but still, ways of establishing some primitive communication.
The model, trained on a mix of human language, environmental sounds, and animal vocalizations, has shown promising results even with species it hasn’t encountered before.
Project CETI represents another significant effort in this space.
Led by researchers including Michael Bronstein from Imperial College London, it focuses specifically on sperm whale communication, analyzing their complex patterns of clicks used over long distances.
The team has identified 143 click combinations that might form a kind of phonetic alphabet, which they’re now studying using deep neural networks and natural language processing techniques.
While these projects focus on decoding animal sounds, researchers at New York University have taken inspiration from baby development for AI learning.
Their Child’s View for Contrastive Learning model (CVCL) learned language by viewing the world through a baby’s perspective, using footage from a head-mounted camera worn by an infant from 6 months to 2 years old.
The NYU team found that their AI could learn efficiently from naturalistic data similar to how human infants do, contrasting sharply with traditional AI models that require trillions of words for training.
Google plans to share an updated version of DolphinGemma this summer, potentially extending its utility beyond Atlantic spotted dolphins. Still, the model may require fine-tuning for different species’ vocalizations.
WDP has focused extensively on correlating dolphin sounds with specific behaviors, including signature whistles used by mothers and calves to reunite, burst-pulse “squawks” during conflicts, and click “buzzes” used during courtship or when chasing sharks.
“We’re not just listening anymore,” Google noted. “We’re beginning to understand the patterns within the sounds, paving the way for a future where the gap between human and dolphin communication might just get a little smaller.”
Edited by Sebastian Sinclair and Josh Quittner
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.
Be the first to comment