By David Meyer, Kordz.
The headline act of HDMI 2.0 is clearly 4K and its HFR (High Frame Rate) iterations, but of the 245 pages of the official HDMI 2.0 specification, approximately one third is all about audio.
New HDMI 2.0 audio features include:
• Multi-stream audio to accompany the new 'dual-view' video capabilities.
• 10.2-, 22.2- and 30.2-channel '3D Audio' formats.
• One-bit audio versions of the above.
• Unprecedented-quality two-channel audio with 1.536MHz sampling rate.
• Dynamic Audio Lip Sync (DALS).
We will take a look at each of these, along with where and how it all fits in the HDMI signalling. It is also prudent to explore the hardware and connectivity requirements and challenges that lie ahead. 
Multi-stream Audio
Two years ago, I first wrote about theoretical alternate applications for 3D technology. A leading example was to enable each 'eye' to be used possibly to give different viewers independent full screen images on the same display. Some CE vendors showed working concepts of this at CES 2013. The missing link however, was how to handle the separate audio requirements, but this has now been answered by HDMI 2.0.
[caption id="attachment_4937" align="aligncenter" width="313"]
 Two different pictures can be displayed simultaneously on the same TV, as demonstrated above in the reflected image in the polarised mirror.[/caption]
While 'dual-view' permits two simultaneous video streams, 'multi-stream' audio supports up to four separate audio streams, and these can be used in conjunction with single- or dual-view video. For example, for gamers, it could mean two players each combating with full-screen 4K imagery and discrete audio delivery. Multilingual households could benefit from multi-streaming of up to four independent stereo tracks, or when combined with dual-view, could be used to allow viewing of different programmes simultaneously on the same TV - the 'marriage saver' as it is being dubbed. Other possible applications could extend to gaming music soundtrack on/off, or audio commentary in movies, etc. 
Samsung developed a full video and audio prototype of this technology which was being demonstrated at the HDMI Techzone at CES 2014 in January. The dual-view glasses contained integrated ear buds for audio, with a small switch over-ear to toggle between viewer 1 and 2. Its flawlessness in application was stunning.
3D Audio
Three new multi-channel audio formats have been defined, which will be able to deliver three-dimensional steering of audio, the likes of which we have not experienced before. No longer limiting sound to just circle the room on a single plane, 3D audio enables height steering, even right over the top of the room. The three versions are 10.2, 22.2 and 30.2 channels. 
[caption id="attachment_4938" align="aligncenter" width="450"]
 3D Audio 30.2 channel speaker placement (Source: HDMI 2.0 Specification (Appendix B), HDMI Forum, LLC).[/caption]
You will notice that in this flagship array there are thirteen speakers on the front wall alone, in three rows for audio height control. There are then five speakers across three planes along each side wall, complemented by six speakers across mid and high rows at the rear. A top-centre speaker in the ceiling provides dome steering for what is quite literally over-the-top sound. This is all supported by two discrete LFE channels, and is all in up to 24-bit 192kHz uncompressed quality.
'What's the point of all that?' you may ask. Granted it is not for everyone, rather the top end, multi-million dollar installation level. Imagine an 'IMAX Private Theater' installation (see 
www.imaxprivatetheater.com) with a huge perforated screen. A 30.2 speaker array could enable precise placement of audio to specific parts of the screen, width or height, partnered with unprecedented immersion around the room. 
One-bit Audio
Also known as the Direct Stream Digital (DSD) format in SACD, one-bit audio uses delta-sigma modulation rather than pulse code modulation (PCM) and so reduces each sample from the usual 16 or 24 bits down to a single bit, but at a sampling rate of 2.8224MHz. This results in a large dynamic range (120dB) and frequency response up 100kHz, but most notably, it is claimed to alleviate the stepped curve and artefacts that may otherwise result from fewer, larger samples. I won't go into any further detail here, but suffice to say that HDMI 2.0 enables this format to be used in conjunction with all multi-stream and 3D audio subsets.
[caption id="attachment_4939" align="aligncenter" width="600"]
 Analogue-to-digital conversion of an audio signal using 16 bits (left) and one bit (right).[/caption]
1.536MHz Two-channel Audio
Some say that one-bit audio may still suffer from quantisation errors due to the mismatch between mastering and analogue-to-digital conversion (which are usually 32 or 64-bit operations). What makes it so good is the very high sampling rate to produce a smoother curve, akin to the analogue master. The new high-end two-channel format defined in HDMI 2.0 gives us the best of both worlds - an uncompressed high bit rate (16 or 24) combined with a very high sampling rate of 1.536MHz. The resulting resolution is unprecedented; theoretically up to 13 times that of one-bit/SACD, and 8 times that of 192kHz/24-bit linear PCM. 
Dynamic Audio Lip Sync
Audio Lip Sync (ALS) was first introduced in the HDMI 1.3 specification of 2006, resolving many issues of the day. Processing of audio within an AVR (Audio Video Receiver) often resulted in audio lagging slightly behind the picture, so ALS introduced a downstream packet into HDMI to remedy this. However since then, the tables have turned; advances in video processing power and expectations in modern displays has led to latency between the TV's input socket to actual display on screen, flipping the lip sync challenge and making the video lag behind the audio. 'Dynamic Audio Lip Sync' (DALS) in HDMI 2.0 addresses this by requiring future supporting displays to provide data upstream through HDMI as to their inherent video latency characteristics, so an AV receiver can hold back its audio output to match the latency of the video downstream. Voila! Back in sync - or so we hope.
Where Does all the Extra Audio Fit?
In the course '4K Compatibility & HDMI System Design' which I have presented internationally for CEDIA over the past two years, I explore the breakdown of active video and blanking in each video frame. The latter is where we find the HDMI 'Data Island Periods' and 'Control Periods'. It is in the Data Islands that audio resides, along with other smaller packets such as assorted metadata and 3D video descriptors etc.
[caption id="attachment_4936" align="aligncenter" width="599"]
 Depiction of HDMI periods in a 720x480 video frame, showing 138 pixels of Hblank (Source: HDMI 1.4b Specification, HDMI Licensing, LLC)[/caption]
As video resolution increases, so too does the relative size of the blanking region. Logically, the greater the blanking region, the more audio can be crammed in. Conversely, the lower the resolution, the relatively smaller this region will be, limiting its capacity. Formats such as VGA and 480p support only a few of the new audio features, but that is okay. After all, why would you want full resolution 3D audio with 480p anyway? Section 9.3.1 of the HDMI 2.0 specification lists all of the video/audio combinations, but in summary, up to 30.2-channel 96kHz audio can be supported along with 720p video, and up to 192kHz audio with 2160p video. Very impressive.
HDMI System Support and Connectivity Challenges
Needless to say, support for all of this new capability will require new hardware, details of which will emerge in time. As is already the case, it will be non-compliant practice for any device or cable vendor to refer to any of this as 'HDMI 2.0'. They should rather focus on actual features to ensure it remains informative and relevant for the industry and buying public.
A major challenge will arise however, with HDMI cables, as is already being recognised by many leaders in our industry. To date, an HDMI cable is really only expected to perform to 1080p/60 (4.455Gbps aggregate), and even then it has actually been the slow-speed signalling, namely EDID (within DDC), that has been the predominant cause of interoperability failures in the field. Raising the bar to 9Gbps with 4K video, then double again to 18Gbps for 4K/60 under HDMI 2.0 methodology, will see high-speed TMDS signalling likely rise to the number 1 cause of failures as cable bandwidth limits are seriously pushed for the first time, regardless of the EQ curve behind it. Either way, CATx cable is ultimately out of the window as an option. But it doesn't stop there.
I, along with many in the industry, have seen examples of short-length HDMI cables that currently do not even work with 3D - despite nothing having changed in the cable specification to enable 3D; they should all work. It is a combination of EDID upstream and 3D identifiers downstream that enable 3D operation, yet many cables manage to break something so seemingly simple. This occurs through things such as poor mechanical contact, high transition jitter, or excessive capacitance and skew, to name a few. It is certainly not all just 'ones and zeros'! 
[caption id="attachment_4935" align="aligncenter" width="387"]
 Cables must be up to scratch if they are to carry HDMI 2.0 signals without breaking them.[/caption]
The new features of HDMI 2.0 will contain more metadata and infoframe subsets than ever before; things such as HDMI 2.0 mode, scrambling headers, dynamic audio lip sync data, multi-stream audio packets, expanded CEC modes, etc. Cables that broke simple 3D flags in the past will break far more than this in future, tightening the noose on substandard products as deficiencies will become more obvious. Where the industry has trended away from native tip-to-tip HDMI signalling, particularly in long lengths, I predict a return to native basics to provide the best fighting chance for stability. At a whopping 3GHz of bandwidth (at 18Gbps), both mechanical and electrical quality and compliance in the cable will be ever more important to ensure successful implementation of HDMI 2.0.
Conclusion
HDMI 2.0 contains an impressive upgrade path for 4K video, but is really only delivering predicted High Frame Rate (HFR) capabilities over the 2009 HDMI 1.4 specification. What is really impressive about HDMI 2.0 is what it delivers for audio; 3D Audio up to a mind-boggling 30.2 channels of 192kHz/24-bit resolution, 2-channel at 1536kHz sampling rate, and very versatile multi-stream audio capabilities. 
With these new capabilities will undoubtedly come new challenges. But what it boils down to is that new higher standards demand even higher quality elements. After all, great 1080p still looks better than bad 4K. Likewise, the choice of every component in a system, today or tomorrow, not to mention learned advice and education to support it, will combine to determine the ultimate performance and stability of that system. I look forward to the experience, and hope you do too.
David Meyer is the Founder and Managing Director of Kordz, specialist in reliable long-reach HDMI. Following the launch of HDMI 2.0 at IFA in Berlin in September 2013, Kordz became the first approved HDMI 2.0 Adopter in the world, outside of the HDMI Forum.
www.kordz.com
Comments on this article are welcome. See below.