Podcast 58: Jerry Mahabub

Jerry Mahabub, founder of GenAudio and inventor of AstoundSound 3D-audio technology, talks about his research in brain imaging and perception as a teenager and how that research led to AstoundSound, how the algorithm works with 2-channel and surround systems, his new recording studio where the soundtrack for benefit short-film We Are the World 25 for Haiti was mixed and encoded with AstoundSound, answers to chat-room questions, and more.

Run Time: 1:02:46

Click here to listen to this podcast.

Jerry Mahabub is the Chairman, President, and CEO of GenAudio, which he founded in 2003. GenAudio has developed AstoundSound, a 4D sound-localization technology for integration within professional, prosumer, and consumer software and hardware applications.

Mahabub has over 22 years experience in research and development and 15 years in business development and technology-license negotiations. He has developed and commercialized some of the world's most sophisticated technologies for over 270 companies worldwide, including Boeing, Jet Propulsion Laboratories, US Naval Warfare Center and the US Army.

Here's the YouTube video of the podcast:

Share | |
uavJazzGuyy's picture
Scott Wilkinson's picture
I just heard of this research a couple of weeks ago, and I'm certainly going to investigate it and write about what I learn, so stay tuned! Maybe I'll get Professor Choueiri as a guest on the podcast.
jerryphysicist's picture

Many years of extensive R&D using binaural dummy heads has been conducted by many people. Klaus Genuit with Head Acoustics, Bill Gardner with MIT Media Labs, and many others. They all produce a 3D spatial audio experience, however, to truly understand how we as humans localize a sound source, we must go to the root of our processing - The Brain, and this is one thing that dummy heads do not have (hence the name "dummy" head). The approach of the unique R&D that I did at a superconducting magnetic resonance imaging (MRI) lab and binaural dummy head measurement, recordings, and analysis are very different. For academic work and to enlighten students about psychoacoustics in general and enable them to do research projects, dummy head measurements and recordings work fantastic. For real world applications such as professional audio mixing, embedded solutions for consumer electronics, among others, they do not work very good as the derived filters from dummy heads create unwanted phase issues, noise floor issues, among other problems. Dummy heads have their place for certain types of spatial recordings, however, for real-time processing of any input audio sound source recorded using any microphone, dummy heads have no place and the filters that are measured from dummy heads are very problematic and intensive processing to compensate for these issues become necessary such as cross talk cancellation and the additional processing necessary to compensate for this still does not resolve many other issues that dummy heads create. In a nutshell, all audio technologies have there place, and I am a big fan of dummy heads from way back in the day, and have built over 100 of them throughout my life to compare results to the filters derived from MRI/EEG/MEG brain scan measurements, and the dummy head filters are not even remotely close to the level of accuracy (with no phase issues) that are determined from actual human brain response filters. Hope this helps to answer your question from above.

Scott Wilkinson's picture
Thanks, Jerry! Just to be perfectly clear, from what I've been able to learn so far, the Princeton research is based on dummy-head measurements and crosstalk cancellation, so it has no commonality with AstoundSound.
bearcatsandor's picture

I'm considering building a 360 degree Ambisonics system with at least 12 speakers for recording and playback. It can also be carried on a 2 channel medium (see b-format to UJH encoding).

Are there any comments you can make to compare/contrast Ambisonics and Astoundsurround?

Regarding AstoundSound

How many speakers are required for a 360 degree field and what is the preferred set-up shape?

I assume that all speakers should be equal (i.e this is not a matrix-based process).

How should room treatments be applied in a listening environment? How does room interaction effect it?

How wide or narrow is the sweet spot? Can a listener sit off-axis and still get the effect?

As a Linux user i'm wondering when you'll release your products for the linux market. It's commercial so i'm not asking for the sourcecode to be opened (though that would be awesome of course), but at least binary LADSPA plugins for encoding/decoding?


jerryphysicist's picture

Wow! 12 speaker discrete - I am interested to see how that compares to Tomlinson Holman's 10.2 system he worked on (not sure if he is still doing this or not).

1) Not much to say with one Ambisonics versus our software based technology. Their approach is very different from ours, and as I stated above, every audio technology has it's place, and perhaps you have found a good niche for Ambisonics. Curious to learn more about what you are trying to accomplish.

2) Speaker placement for surround mixing should be in accordance with the ITU spec. For stereo listening as long as the speakers are not soo close to each other that from whatever the listening distance is it will sound mono, then AstoundSound will work just fine. As far as optimal speaker placement for 2 channel audio playback in AstoundSound, form an equilateral triangle.

3) Room treatment - Depends on how much money you want to spend. For a critical listening environment, a semi-anechoic room would be great, however, unrealistic for most to build something of this caliber. Realistically, you need not worry so much about the room (other than I would not recommend you listen in an echo chamber made of tile), rather tuning of the speakers/monitors in the room using some form of room EQ. This should be done with any home theater or large auditorium environment. i use the Meyer Galileo in my studio and have multiple settings that can be used depending on how the room is being used (e.g. screen up or down, X-Curve, etc).

4) Room interaction affects the way we hear and listen to anything and everything, so I this question is a bit ambiguous. Could you be more specific, in particular, what kind of room are you looking to build or have already built that you want to listen to a sound localization cue technology in?

5) Very wide sweet spot (bigger than a conventional surround sound field). Of course, for "optimal" listening, equidistant between the two speakers is best (that goes for any audio presentation or delivery format including standard stereo). You can be at the edge of a theater and still very accurately experience Astound because the horns throw very directionally and are aimed straight ahead (such as the JBLs or the Meyers). If you are outside of the speaker field, than your audio is most likely going to not sound the way it was intended by the mix engineer in the first place, and therefore expect significantly reduced results when listening in Astound.

6) Linux - Not on our radar screen...Sorry, however, the world of professional audio mixing and engineering is most definitely not Linux based.

Hope this sufficiently answered your questions! :)

bearcatsandor's picture

thanks for the reply. That does answer my questions nicely.

What i'm looking for is to be able to record something and put the listener back at the event. I'm more interested in a realistic re-creations of the original sound field, not a guitar suddenly flying at me. If the recording was made in a church, i want the listener's walls to disappear and to have them be able to hear as though they were in that space. This is what ambisonics was designed for.

I'm assuming that with #d sound thing will start out the way things did for stereo: "hey! Look what we can do"with hard panning and 'fantastic' effects flying around. Can the 3D sound be used to recreate the natural sound stage (read: ambiance and acoustics) of an event or is it more suited to special effects type surround?

Mayasound's picture

I am also involved with 3D audio technology which was done in collaboration with Weiss Electronics

I can play any CD (including mix CDs made on your PC) and present them in a 3D format on relatively inexpensive equipment (though it does sound better on better speakers. Our objective is to minimize the need for new source material and make the existing content sound as good as possible.

Actually it can also make an iPod/Zune also sound great but the source needs to be up-sampled to 44.1K at least to be processed by Maya. We can show you demos in Malaysia (KL), Melbourne, Australia or in Santa Monica, CA


We are in early stages of commercialization and it compares favorably to the samples I have heard here (http://www.studio360.org/2011/apr/29/adventures-3d-sound/) or on Astoundsound demos.

Several mastering engineers have also taken a good look at the technology and approved it as it creates the 3D quality without any kind of spectral manipulation and does not require binaural head recordings or any upfront manipulations.

Of course higher quality records do sound better but one of the qualities of this technology is that can make all the down and dirty MP3 files everybody has lying around also sound like audiophile recordings.

I would be happy to demo this in Santa Monica, CA for "serious parties".

Enter your Sound & Vision username.
Enter the password that accompanies your username.