SANE 2015: Tuomas Virtanen (TUT) on Sound Event Detection in Realistic Environments.
Speech and Audio in the Northeast. Oct 22, 2015. Sound event detection in realistic environments using multilabel deep neural networks Prof. Tuomas Virtanen, Tampere University of Technology Auditory scenes in our everyday environments such as office, car, street, grocery store, and home consist of a large variety of sound events such as phone ringing, car passing by, footsteps, etc. Computational analysis of sound events has plenty of applications e.g. in context-aware devices, acoustic monitoring, analysis of audio databases, and assistive technologies. This talk describes methods for automatic detection and classification of sound events, which means estimating the start and end times of each event, and assigning them a class label. We focus on realistic everyday environments, where multiple sound sources are often present simultaneously, and therefore polyphonic detection methods need to be used. We present a multilabel deep neural network based system that can be used to directly recognize temporally overlapping sounds. Event detection results on highly realistic acoustic material will be presented, and audio and video demonstrations will be given. We will also introduce the upcoming IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events.
Speech and Audio in the Northeast. Oct 22, 2015. Sound event detection in realistic environments using multilabel deep neural networks Prof. Tuomas Virtanen, Tampere University of Technology Auditory scenes in our everyday environments such as office, car, street, grocery store, and home consist of a large variety of sound events such as phone ringing, car passing by, footsteps, etc. Computational analysis of sound events has plenty of applications e.g. in context-aware devices, acoustic monitoring, analysis of audio databases, and assistive technologies. This talk describes methods for automatic detection and classification of sound events, which means estimating the start and end times of each event, and assigning them a class label. We focus on realistic everyday environments, where multiple sound sources are often present simultaneously, and therefore polyphonic detection methods need to be used. We present a multilabel deep neural network based system that can be used to directly recognize temporally overlapping sounds. Event detection results on highly realistic acoustic material will be presented, and audio and video demonstrations will be given. We will also introduce the upcoming IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events.