Analyzing Liquid Pouring Sequences via Audio-Visual Neural Networks

Justin Wilson1, Auston Sterling1, and Ming C. Lin1,2

1University of North Carolina at Chapel Hill, 2University of Maryland, College Park




A microphone records the audio of the target object filling up with liquid and a camera captures video images. Our multimodal CNN Pouring Sequence Neural Network (we refer to as PSNN) comprised of 2D convolutional, max pooling, fully connected, and softmax layers similar to the Impact Sound Neural Network (ISNN). Multi-class classification is used for discrete weight estimation (classes of 0.2 oz increments) and liquid and container prediction while binary classification is used for overflow detection.

ABSTRACT

Existing work to estimate the weight of a liquid poured into a target container often require predefined source weights or visual data. We present novel audio-based and audio-augmented techniques, in the form of multimodal convolutional neural networks (CNNs), to estimate poured weight, perform overflow detection, and classify liquid and target container. Our audio-based neural network uses the sound from a pouring sequence—a liquid being poured into a target container. Audio inputs consist of converting raw audio into mel-scaled spectrograms. Our audio-augmented network fuses this audio with its corresponding visual data based on video images. Only a microphone and camera are required, which can be found in any modern smartphone or Microsoft Kinect. Our approach improves classification accuracy for different environments, containers, and contents of the robot pouring task. Our Pouring Sequence Neural Networks (PSNN) are trained and tested using the Rethink Robotics Baxter Research Robot. To the best of our knowledge, this is the first use of audio-visual neural networks to analyze liquid pouring sequences by classifying their weight, liquid, and receiving container.


PUBLICATION


Analyzing Liquid Pouring Sequences via Audio-Visual Neural Networks

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019
Justin Wilson, Auston Sterling, and Ming C. Lin
@inproceedings{Wilson2019AnalyzingLP,
author = {Wilson, Justin and Sterling, Auston and Lin, Ming},
year = {2019},
month = {11},
pages = {7702-7709},
title = {Analyzing Liquid Pouring Sequences via Audio-Visual Neural Networks},
doi = {10.1109/IROS40897.2019.8968118},
journal={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}
}
     

DEMO VIDEO

Video download


DATASETS

Dataset download