Sound Extraction

This project is part of my work at the School of Computer Science, University of Birmingham, showcasing my expertise in Time Series Analysis, Data Visualisation and Signal Processing to turn concepts into practical solutions. Let’s explore how I’ve applied my skills to tackle challenges and create value in sound extraction.

Project Demonstration

Sound plays a vital role in our daily lives, serving as a rich source of information across various domains. From identifying melodies in music recognition systems to monitoring environmental sounds like animal calls or machinery noise, the ability to analyze audio signals opens up numerous possibilities. Sound extraction is the process of isolating meaningful signals from complex, mixed, or noisy environments, making it a cornerstone of modern audio analysis and processing.

1. Project Overview

This project focuses on extracting and analyzing animal sounds from noisy or mixed audio environments, demonstrating how data science techniques can uncover meaningful patterns in audio signals. By combining signal processing, feature extraction, and visualization, it presents a robust approach to sound analysis with applications in speech recognition, wildlife monitoring, and bioacoustic research.

Dataset: a natural mixed noisy wildlife sound.
Aim: to isolate individual sounds from mixed noisy sounds.
Steps: analyse and visualise time series, spectrum, and spectogram
Application: speech recognition, wildlife monitoring, and bioacoustic research.

2. Input Sound

Analyzing the input sound is a crucial first step in sound extraction and analysis. This phase involves working with a complex audio mixture containing various animal sounds. Advanced signal processing techniques are employed to decompose the audio into its components, isolating meaningful sounds from the noise. These foundational steps are critical for ensuring the clarity and accuracy of the subsequent analyses, enabling the identification of patterns and features within the sound data.

2.1. Input Audio Player

To understand the complexity of the mixed audio, it is essential to listen to the original recording. The mixed audio contains overlapping animal sounds, which present a challenging yet realistic simulation of real-world audio environments. Listening to the input provides context and helps assess the effectiveness of the extraction techniques used later in the process.

2.2. Time Series

The time series analysis reveals how the signal’s amplitude varies over time, providing a temporal representation of the sound. This visualization highlights patterns such as recurring peaks or silences, which may correspond to specific sounds or events in the audio. By examining the time series, it becomes possible to segment the audio into meaningful portions, laying the groundwork for feature extraction and classification.

A Python-generated time series chart shows the amplitude changes across the duration of the audio. Peaks and valleys in the graph represent the intensity of different sound components over time.

2.3. Spectral Analysis

Spectral analysis offers a detailed view of the frequency distribution within the sound at a given moment. By breaking the signal into its constituent frequencies, it becomes easier to identify specific ranges of interest, such as low-frequency hums or high-pitched chirps. This step is particularly useful for isolating the dominant frequencies associated with individual animal sounds.

A Python-generated spectrum chart illustrates the energy distribution across frequencies, highlighting prominent components that play a significant role in the audio.

2.3. Spectogram

The spectrogram provides a dynamic visualization of how the frequency content of the audio evolves over time. This representation combines temporal and spectral information, creating a powerful tool for identifying patterns such as recurring sounds or transient events. For example, the spectrogram can reveal periodic calls of a bird or the intermittent croaks of a frog.

The spectrogram is depicted as a heatmap, where brighter colors represent higher energy levels at specific frequencies and times. This detailed view aids in distinguishing overlapping sounds and understanding their temporal characteristics.

3. Output Sound

The output sound represents the culmination of a series of processes designed to isolate and refine meaningful audio signals from the mixed input. This stage transforms raw, complex audio data into clearly identifiable components, such as individual animal sounds. The steps involved include filter design, spectral analysis, spectrogram generation, and time series reconstruction. Each step plays a critical role in ensuring the quality and accuracy of the extracted sounds, which are then evaluated by listening and comparing them with the original mixture.

3.1. Filter Design

Filter design is the first critical step in isolating specific sound components. Filters, both with and without smoothing, are manually crafted using Python to target precise frequency ranges. These filters work by suppressing unwanted frequencies while amplifying those of interest, ensuring that the extracted sounds are as clear and noise-free as possible. Gaussian smoothing, in particular, is applied to improve the sound quality by minimizing abrupt transitions in the frequency domain.

The visualization of the filters highlights their effectiveness in isolating distinct ranges of interest. This step serves as the foundation for subsequent analyses, ensuring that only the relevant portions of the audio are processed further.

3.2. Sound Isolation

Time series reconstruction reverses the filtering process, converting frequency-domain data back into the time domain. This step is crucial for generating audio that is both intelligible and true to its original characteristics. Using Python, both smoothed and unsmoothed time series are visualized to compare the effectiveness of the applied techniques.

Sound Isolation, Spectral Analysis, Spectrum — **Sound Isolation**

The reconstructed time series ensures that the extracted audio can be seamlessly listened to and analyzed, making it a key milestone in the overall process. This step bridges the gap between raw data processing and human interpretation.

3.3. Time Series Reconstruction

Spectral analysis dives deeper into the frequency content of the sound, offering a focused examination of isolated components. Using Python-generated spectrum charts, the energy distribution of the filtered sounds across various frequencies is visualized. Frequencies highlighted in orange represent the targeted components used for sound isolation.

Time Series Reconstruction, Sound Reconstruction — **Time Series Reconstruction**

This analysis ensures that the extracted signals retain the key characteristics of their original counterparts while eliminating extraneous noise. It also provides valuable insights into the frequency behavior of the isolated sounds, aiding in classification and further refinement.

3.4. Spectogram

The spectrogram generation step creates a dynamic, time-frequency representation of the isolated sounds. The spectrograms are visualized as heatmaps, where brighter colors correspond to higher energy levels in specific frequency ranges.

Spectrograms are analyzed to assess the clarity and precision of the isolation process. The frequencies are categorized into low, medium, and high ranges, representing different types of animal sounds. This categorization simplifies the task of distinguishing between species and identifying patterns within the audio data.

3.5. Listen to Output Sounds

The ultimate test of sound extraction is the ability to listen to and evaluate the isolated outputs. The final audio files represent distinct animal sounds extracted from the original mixture.

Output 1: Frogs – Low-frequency croaks and rhythmic sounds typical of amphibians.

Output 2: Birds - A clear representation of bird calls, showcasing the high-frequency chirps and tweets characteristic of avian sounds

Output 3: Insects – High-pitched, repetitive sounds such as chirps or buzzing, commonly associated with insects.

4. Conclusion

This project exemplifies the transformative power of sound extraction techniques in isolating meaningful information from complex audio environments. By leveraging signal processing, visualization, and advanced data analysis, it demonstrates an innovative end-to-end approach to audio data exploration. The extracted sounds not only showcase the clarity and precision achievable through these methods but also highlight their potential impact in various fields.

From speech recognition to wildlife monitoring and bioacoustic research, the applications of these techniques are vast and impactful. This journey into sound extraction underscores the value of combining data science with signal processing to uncover hidden insights within audio data. Thank you for exploring this fascinating realm of sound analysis! Explore similar projects in my portfolio and feel free to contact me for collaboration opportunities in data science and signal processing.