Directed Imitation During Vocal Learning

August 1, 2023
In Progress
Share this project


Vocal learning is one of the great achievements of evolution, contributing to humans’ capacity of cumulative culture. How did vocal learning evolve and which are its underlying biological mechanisms? These are some of the pressing questions in language sciences and ethology. Many similarities exist between the developmental learning strategies in humans and in songbirds. In both, vocal learning depends on the interaction with experienced vocalizers. During the first year of an infant’s life, interactions with adults enable the acquisition of vocal units such as words. Similarly, in juvenile songbirds, interactions with adult singers shape the acquisition of songs. However, there are important knowledge gaps in either species about these interactions and about the extent to which they constitute attempts of immediate vocal imitations where either the tutor or the tutee tries to imitate the other. We seek to understand whether imitation attempts during tutor-tutee interactions shape the developmental learning of vocal units. Our interaction-imitation-learning (IIL) hypothesis states that the learning of a vocal unit is driven by directed speech/song and by tutee imitations, in a mechanism shared among humans and songbirds. Longitudinal data has been collected during the developmental song learning in zebra finches and humans. On the songbirds, with the video data, we aim to determine whether two songbirds are interacting with each other and what behavior they are exhibiting, and then with audio data, we learn embedded features to identify imitation events and understand how the songs develop over time by analyzing the learn embeddings. The approach can later be modified and extended to data in humans.



SDSC Team:
Xiaoran Chen
Guillaume Obozinski

PI | Partners:

ETH Zurich, Birdsong and Natural Language Group:

  • Prof. Richard Hahnloser
  • Dr. Sabine Stoll

More info



The project aims to understand how young individuals learn vocal communication through imitating the sounds made by adults. The Interaction-Imitation-Learning (IIL) hypothesis assumes that young individuals learn to compose sentences or songs by acquiring vocal units through imitation. With a multi-modal dataset containing sounds, videos, and transcripts, the research approaches the problem by first analyzing bird behavior and bird songs, then moving towards understanding more complex learning processes in young children.

Figure 1: Illustration of the interaction-imitation-learning hypothesis, which states that a vocal unit is learned through imitations by tutees in both humans and songbirds.

Proposed Approach / Solution

SDSC leads the work packages of bird directness estimation and feature-based imitation prediction. By determining the direction in which a bird is facing, one can estimate whether two birds are facing each other to interact. This is critical for identifying whether the juvenile is interacting with an adult bird and determining with which bird the juvenile is interacting. We trained a keypoint prediction model on videos captured from three different viewpoints to obtain a set of 2D coordinates for each individual. Utilizing the geometry relations, measured reference points and camera calibration videos, we estimated 3D coordinates from 2D coordinates of multiple views. Directedness can then be calculated by the distance and angles between the head/beak positions of a pair of individuals.

To predict imitation with a feature-based approach, embeddings of sounds produced by both juvenile and adult individuals will be learned, considering prior knowledge on which sounds are generated during interaction. These embeddings will then be used to estimate imitation events by calculating the similarity using a metric also developed in this project.

Figure 2: Illustration of directness estimation (gender is only to indicate the individuals).


The project will provide scientific insights on language learning through imitation by young individuals as well as empirical evidence for how imitation plays a role in this process.



Additional resources


  1. Nath T, Mathis A, Chen AC, Patel A, Bethge M, Mathis MW. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. BioRxiv. 2018 Nov 24
  2. Rychen J, Rodrigues DI, Tomka T, Rüttimann L, Yamahachi H, Hahnloser RHR. A system for controlling vocal communication networks. Sci Rep. 2021 May 27;11(1):11099.
  3. Oord A van den, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, et al. WaveNet: A Generative Model for Raw Audio. arXiv. 2016;
  4. Lipkind D, Zai AT, Hanuschkin A, Marcus GF, Tchernichovski O, Hahnloser RHR. Songbirds work around computational complexity by learning song vocabulary independently of sequence. Nat Commun. 2017 Nov 1;8(1):1247.


Related Pages

More projects


In Progress
Interpretable and Robust Machine Learning for Mobility Analysis
No items found.


In Progress
Feature Learning for Bayesian Inference
No items found.


In Progress
Personalized epidural electrical stimulation of the lumbar spinal cord for clinically applicable therapy to restore mobility after paralyzing spinal cord injury
No items found.


In Progress
Lensless Actinic Metrology for EUV Photomasks
No items found.


Latest news

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data
May 1, 2024

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data

We’ve developed a smart solution for wind tunnel testing that learns as it works, providing accurate results faster. It provides an accurate mean flow field and turbulence field reconstruction while shortening the sampling time.
The Promise of AI in Pharmaceutical Manufacturing
April 22, 2024

The Promise of AI in Pharmaceutical Manufacturing

The Promise of AI in Pharmaceutical Manufacturing

Innovation in pharmaceutical manufacturing raises key questions: How will AI change our operations? What does this mean for the skills of our workforce? How will it reshape our collaborative efforts? And crucially, how can we fully leverage these changes?
Efficient and scalable graph generation through iterative local expansion
March 20, 2024

Efficient and scalable graph generation through iterative local expansion

Efficient and scalable graph generation through iterative local expansion

Have you ever considered the complexity of generating large-scale, intricate graphs akin to those that represent the vast relational structures of our world? Our research introduces a pioneering approach to graph generation that tackles the scalability and complexity of creating such expansive, real-world graphs.

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!