Ultra HD Premium: The commercial logo of HDR TV

High dynamic range imaging (HDR) is a technique used in imaging and photography to reproduce a greater dynamic range of luminosity than is possible with standard digital imaging or photographic techniques. The aim is to present the human eye with a similar range of luminance to that which, through the visual system, is familiar in everyday life. The human eye, through adaptation of the iris and other methods, adjusts constantly to the broad dynamic changes ubiquitous in our environment. The brain continuously interprets this information so that a viewer can see in a wide range of light conditions.

For imaging, HDR, as its name implies, is a method that aims to add more “dynamic range” to photographs, where dynamic range is the ratio of light to dark in a photograph. In principle, when HDR is enabled during that image capture, the camera instead of taking one photo, three photos are taken at different light exposures. Then, either with an automatic software (as is done at Mobile Phone cameras) or with a sophisticated image editing software, the pictures with the different exposures are overlayed  and the best parts of each photo are highlighted.

Toward the HDR TVs, the UHD Alliance (UHDA) [1] of TV manufacturers, broadcasters and film producers have decided to create a new brand logo beyond UHD, the Ultra HD Premium that defines the technical specifications that a TV must meet in order to deliver a HDR/premium 4K experience.


The UHDA’s new ULTRA HD PREMIUM specifications cover multiple display technologies and reference established industry standards and recommended practices from the Consumer Technology Association, the Society of Motion Picture and Television Engineers, the International Telecommunications Union and others. Moving further forward, UHDA Launches “ULTRA HD PREMIUM” Logo and Certification Licensing for Ultra HD Blu-ray Disc Players.

Summarizing the minimum requirements [2]:

Minimum resolution of 3,840 x 2,160 – which remains the same as the 4K/Ultra HD TVs.

10-bit color depth – In contrast to the 8-bit color space that Blu-Ray players use today, the UHD Premium TVs must be able to receive and process a 10-bit colour signal, often called ‘deep color’, supporting over a billion colors.

Minimum of 90% of P3 colors – To certify a TV as an Ultra HD Premium TV, the TV must be able to display 90% of the colors defined by the P3 color space [3] (More info here).

Signal Input– BT.2020 color representation [4]

High Dynamic Range – SMPTE ST2084 EOTF [5]

Minimum dynamic range – To qualify with UHD Premium, a TV should meet a minimum standard for the maximum and the minimum brightness it can achieve.

There are two different requirements in order to accommodate the pros and cons of different TV technologies.
  1. Aiming at LED TVs: More than 1,000 nits peak brightness and less than 0.05nits black level
  2. Aiming at OLED TVs: More than 540 nits brightness and less than 0.0005 nits black level

Please note that TVs could be certified Ultra HD Premium retroactively, but few TVs released in 2015 can meet the standard.


VMAF: A Netflix Video Quality Metric

A couple of weeks ago, a very interesting article posted at NetFlix Tech Blog, providing us a view towards a practical video quality metric, as it is perceived by the worldwide leading content provider – NetFlix.

The proposed metric used by NetFlix is called Video Multimethod Assessment Fusion (VMAF) and seeks to reflect the viewer’s perception of the NetFlix streaming quality.  The plan for this metric is to be provided as an open-source tool and to be possible for the research community to get involved in the evolution process of this metric.

The main objective of NetFlix is to deliver content of high quality, providing to the subscribers a great viewing experience: smooth video playback, free of annoying picture artifacts, given the constraints of the network bandwidth and viewing device.

Currently NetFlix utilizes the most contemporary codecs, such as H.264/AVC, HEVC and VP9, in order to stream at reasonable bit-rates with the cost of quality degradation and the appearance of coding specific artifacts. At NetFlix, they encode the video streams in a distributed cloud-based media pipeline (more info is available here), which allows to scale to meet the needs of the service. To minimize the impact of bad source deliveries, software bugs and the unpredictability of cloud instances (transient errors), automated quality monitoring is performed at various points of the pipeline.

Towards the video quality research, NetFlix first starts with building an appropriate data set for experimentation, which meets the standards of the specific services, in terms of content variety and source of artifacts (due to TCP-based streaming, the quality degradation observed at NetFlix is caused by two types of artifacts:

  1. compression artifacts (due to lossy compression) and
  2. scaling artifacts (for lower bitrates, video is downsampled before compression, and later upsampled on the viewer’s device)

So the objective of NetFlix research is to focus on building a special purpose metric, based on the two aforementioned artifacts, which will outperform the general purpose video quality metrics.

For the NetFlix dataset, a sample of 34 source clips (also called reference videos) was selected, each 6 seconds long, from popular TV shows and movies from the Netflix catalog and combined them with a selection of publicly available clips. The source clips covered a wide range of high-level features (animation, indoor/outdoor, camera motion, face close-up, people, water, obvious salience, number of objects) and low level characteristics (film grain noise, brightness, contrast, texture, motion, color variance, color richness, sharpness). Using the source clips, NetFlix researchers encoded H.264/AVC video streams at resolutions ranging from 384×288 to 1920×1080 and bitrates from 375 kbps to 20,000 kbps, resulting in about 300 distorted videos. This sweeps a broad range of video bitrates and resolutions to reflect the widely varying network conditions of Netflix members.

Using this data set, subjective DSIS assessment tests were performed as specified in recommendation ITU-R BT.500-13. The results of these process where mapped to respective DMOS values. The scatter plots below show the observers’ DMOS on the x-axis and the predicted score from different quality metrics on the y-axis, namely: PSNR, SSIM, Multiscale FastSSIM, and PSNR-HVS.


It can be seen from the graphs that these metrics fail to provide scores that consistently predict the DMOS ratings from observers.Above each plot, we report the Spearman’s rank correlation coefficient (SRCC), the Pearson product-moment correlation coefficient (PCC) and the root-mean-squared-error (RMSE) figures for each of the metrics, calculated after a non-linear logistic fitting, as outlined in Annex 3.1 of ITU-R BT.500-13. SRCC and PCC values closer to 1.0 and RMSE values closer to zero are desirable. Among the four metrics, PSNR-HVS demonstrates the best SRCC, PCC and RMSE values, but is still lacking in prediction accuracy. To address this issue, NetFlix adopts a machine-learning based model to design a metric that seeks to reflect  human perception of video quality.

NetFlix researchers by collaborating with Prof. C.-C. J. Kuo and his group at the University of Southern California, developed Video Multimethod Assessment Fusion, or VMAF, that predicts subjective quality by combining multiple elementary quality metrics.By ‘fusing’ elementary metrics into a final metric using a machine-learning algorithm – in NetFlix case, a Support Vector Machine (SVM) regressor – which assigns weights to each elementary metric, the final metric could preserve all the strengths of the individual metrics, and deliver a more accurate final score. The machine-learning model is trained and tested using the opinion scores obtained through the aforementioned subjective experiment on NetFlix dataset.

The current version of the VMAF algorithm uses the following elementary metrics fused by Support Vector Machine (SVM) regression:

  1. Visual Information Fidelity (VIF) [1]. VIF is a well-adopted image quality metric based on the premise that quality is complementary to the measure of information fidelity loss. In VMAF, a modified version of VIF is adopted, where the loss of fidelity is included as an elementary metric.
  2. Detail Loss Metric (DLM) [2]. DLM is an image quality metric based on the rationale of separately measuring the loss of details which affects the content visibility, and the redundant impairment which distracts viewer attention. The original metric combines both DLM and additive impairment measure (AIM) to yield a final score. In VMAF, only the DLM is adopted as an elementary metric.

VIF and DLM are both image quality metrics. We further introduce the following simple feature to account for the temporal characteristics of video:

  1. Motion. This is a simple measure of the temporal difference between adjacent frames. This is accomplished by calculating the average absolute pixel difference for the luminance component.

These elementary metrics and features were chosen from amongst other candidates through iterations of testing and validation. From the posted article, it is not sufficiently clear, how the Motion metric is applied at shot boundaries, which results to high values and most probably are discarded.

Then, NetFlix researches compare the accuracy of VMAF to PSNR-HVS, the best performing metric from the earlier section, where it is clear that VMAF performs appreciably better.


The articles reports also on comparison of VMAF to the Video Quality Model with Variable Frame Delay (VQM-VFD) [3], considered by many as state of the art in the field. VQM-VFD is an algorithm that uses a neural network model to fuse low-level features into a final metric. It is similar to VMAF in spirit, except that it extracts features at lower levels such as spatial and temporal gradients.


It is clear that VQM-VFD performs close to VMAF on the NFLX-TEST dataset. Since the VMAF approach allows for incorporation of new elementary metrics into its framework, VQM-VFD could serve as an elementary metric for VMAF as well.

Summarizing, the article provides the SRCC, PCC and RMSE of the different metrics discussed earlier, on the NetFlix dataset and three popular public datasets: the VQEG HD (vqeghd3 collection only), the LIVE Video Database and the LIVE Mobile Video Database. The results show that VMAF outperforms other metrics in all but the LIVE dataset, where it still offers competitive performance compared to the best-performing VQM-VFD.

LIVE dataset*

PSNR 0.416 0.394 16.934
SSIM 0.658 0.618 12.340
FastSSIM 0.566 0.561 13.691
PSNR-HVS 0.589 0.595 13.213
VQM-VFD 0.763 0.767 9.897
VMAF 0.3.1 0.690 0.655 12.180

*For compression-only impairments (H.264/AVC and MPEG-2 Video)

Finally, the article concludes on the current open research issues:

  1. Viewing conditions. Netflix supports thousands of active devices covering smart TV’s, game consoles, set-top boxes, computers, tablets and smartphones, resulting in widely varying viewing conditions for our members. With more subjective data, NetFlix researchers plan to generalize the algorithm such that viewing conditions (display size, distance from screen, etc.) can be inputs to the regressor.
  2. Temporal pooling. Our current VMAF implementation calculates quality scores on a per-frame basis. In many use-cases, it is desirable to temporally pool these scores to return a single value as a summary over a longer period of time. For example, a score over a scene, a score over regular time segments, or a score for an entire movie is desirable. A perceptually accurate temporal pooling mechanism for VMAF and other quality metrics remains an open and challenging problem.
  3. A consistent metric. Since VMAF incorporates full-reference elementary metrics, VMAF is highly dependent on the quality of the reference. Unfortunately, the quality of video sources may not be consistent across all titles in the Netflix catalog. Sources come into our system at resolutions ranging from SD to 4K. Because of this, it can be inaccurate to compare (or summarize) VMAF scores across different titles. For quality monitoring, it is highly desirable that absolute quality scores are calculated that are consistent across sources. So, the future work includes the development of a method that applies an automated way to predict what opinion the viewers form about the quality of the video delivered to them, taking into account all factors that contributed to the final presented video on that screen.

Original Post: http://techblog.netflix.com/2016/06/toward-practical-perceptual-video.html


[1] H. Sheikh and A. Bovik, “Image Information and Visual Quality,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430–444, Feb. 2006.

[2] S. Li, F. Zhang, L. Ma, and K. Ngan, “Image Quality Assessment by Separately Evaluating Detail Losses and Additive Impairments,” IEEE Transactions on Multimedia, vol. 13, no. 5, pp. 935–949, Oct. 2011.

[3] S. Wolf and M. H. Pinson, “Video Quality Model for Variable Frame Delay (VQM_VFD),” U.S. Dept. Commer., Nat. Telecommun. Inf. Admin., Boulder, CO, USA, Tech. Memo TM-11-482, Sep. 2011.

Key Network Delivery Metrics

Streaming Video Alliance (specifically the QoE working group) released a few days ago a document describing key network delivery metrics for streaming Internet video. Although many more metrics could have been documented, these particular metrics represent the most commonly used.

QoE metrics

The report is available online at the Streaming Video Alliance web site (here).