Friday, June 25, 2010

Audio: Streams and Triggers

Same project, same problems, different sense. On the surface, audio distribution seems like a simple issue, or at least, simpler than the lights. The technology for distributing light cues over ethernet is clearly a bit arcane, and sadly, not terribly well documented. Sound data, however, is thrown across ethernet every day. Computers have access to not one but many well maintained, well documented, high quality audio streaming protocols. In fact, with enough work, many pre-existing data-transfer libraries can even be shoehorned into doing duty as an audio stream. With a central server and a (cheap, outdated, obsolete) computer in every room that needs sound cues, a streaming network could provide all the audio needed, directly over ethernet in tidy TCP packets. So far, everything looks pretty good.

Until we look into how those protocols work. Try opening an internet radio station, or sharing music with an Apple TV or Airport Express. In fact, try viewing a video on YouTube. There are palpable seconds between the moment the data is requested, and the time it takes to get to your ears. Of course, this is perfectly fine when your only goal is simply to hear that music, or watch that video. They will get to you, no problem, that's what those systems are meant to do, get the data from a central location, out to speakers. These protocols will all buffer and therefore they will all present a significant latency. Getting your Youtube video three seconds after you click is pretty good. Playing a sound cue three seconds late isn't.

So where to go from here? Well it's pretty clear that those streaming protocols aren't very helpful in this situation. But only because they were designed to solve a different problem. Those protocols solve the issue of how to get the data to the user, they have nothing to do with time and triggers and everything to do with pushing data around. Most of the time they are used when the device playing back the media either cannot or should not have its own copy of the media, e.g. an AirportExpress which has no memory of its own to store data on, or YouTube where copy-righted content can stay safely on the server. The issue here is getting the data to play on time, not getting the data to location. So what if the client machines, the low-end boxes distributed throughout the building, already had the audio files loaded onto them, waiting to be triggered at any moment. No waiting for data to buffer, no delay, near 0 latency. And not just triggers to play and stop, but for gain, speed and duration modification too. This way the bandwidth isn't being taken up by audio pouring through it, choking off the light cues and increasing delay. So far as I can tell this doesn't exist yet. Oh well, time to make it.

Edit: Austin just made an interesting point, this is pretty much what Apple did with their iTunes remote protocol. I have to say I am a little bit worried by how damn slow that is, and I'm hoping this will be better. Right now I'm still laying down some groundwork code/systems.

No comments:

Post a Comment