Video Multimethod Assessment Fusion

Video Multimethod Assessment Fusion (VMAF) is an objective full-reference video quality metric developed by Netflix in cooperation with the University of Southern California and the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin. It predicts subjective video quality based on a reference and distorted video sequence. The metric can be used to evaluate the quality of different video codecs, encoders, encoding settings, or transmission variants.


The metric is based on initial work from the group of Professor C.-C. Jay Kuo at the University of Southern California.[1][2][3] Here, the applicability of fusion of different video quality metrics using support vector machines (SVM) has been investigated, leading to a "FVQA (Fusion-based Video Quality Assessment) Index" that has been shown to outperform existing image quality metrics on a subjective video quality database.

The method has been further developed in cooperation with Netflix, using different subjective video datasets, including a Netflix-owned dataset ("NFLX"). Subsequently renamed "Video Multimethod Assessment Fusion", it was announced on the Netflix TechBlog in June 2016[4] and version 0.3.1 of the reference implementation was made available under a permissive open-source license.[5]

In 2017, the metric was updated to support a custom model that includes an adaptation for cellular phone screen viewing, generating higher quality scores for the same input material. In 2018, a model that predicts the quality of up to 4K resolution content was released. The datasets on which these models were trained have not been made available to the public.


VMAF uses existing image quality metrics and other features to predict video quality:

  • Visual Information Fidelity (VIF): considers information fidelity loss at four different spatial scales
  • Detail Loss Metric (DLM):[6] measures loss of details, and impairments which distract viewer attention
  • Mean Co-Located Pixel Difference (MCPD): measures temporal difference between frames on the luminance component
  • Anti-noise signal-to-noise ratio (AN-SNR)

The above features are fused using a SVM-based regression to provide a single output score in the range of 0–100 per video frame, with 100 being quality identical to the reference video. These scores are then temporally pooled over the entire video sequence using the arithmetic mean to provide an overall differential mean opinion score (DMOS).

Due to the public availability of the training source code ("VMAF Development Kit", VDK), the fusion method can be re-trained and evaluated based on different video datasets and features.


An early version of VMAF has been shown to outperform other image and video quality metrics such as SSIM, PSNR-HVS and VQM-VFD on three of four datasets in terms of prediction accuracy, when compared to subjective ratings.[4] Its performance has also been analyzed in another paper, which found that VMAF did not perform better than SSIM and MS-SSIM on a video dataset.[7] In 2017, engineers from RealNetworks reported good reproducibility of Netflix' performance findings.[8]


A reference implementation written in C and Python ("VMAF Development Kit, VDK") is published as free software under the terms of version 2 of the Apache License (ALS 2). Its source code and additional material are available on GitHub.[5]

See also


  1. Liu, Tsung-Jung; Lin, Joe Yuchieh; Lin, Weisi; Kuo, C.-C. Jay (2013). "Visual quality assessment: recent developments, coding applications and future trends". APSIPA Transactions on Signal and Information Processing. 2. doi:10.1017/atsip.2013.5. ISSN 2048-7703.
  2. Lin, Joe Yuchieh; Liu, T. J.; Wu, E. C. H.; Kuo, C. C. J. (December 2014). "A fusion-based video quality assessment (FVQA) index". Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific: 1–5. doi:10.1109/apsipa.2014.7041705. ISBN 978-6-1636-1823-8.
  3. Lin, Joe Yuchieh; Wu, Chi-Hao; Ioannis, Katsavounidis; Li, Zhi; Aaron, Anne; Kuo, C.-C. Jay (June 2015). "EVQA: An ensemble-learning-based video quality assessment index". Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference on: 1–5. doi:10.1109/ICMEW.2015.7169760. ISBN 978-1-4799-7079-7.
  4. Blog, Netflix Technology (2016-06-06). "Toward A Practical Perceptual Video Quality Metric". Netflix TechBlog. Retrieved 2017-07-15.
  5. vmaf: Perceptual video quality assessment based on multi-method fusion, Netflix, Inc., 2017-07-14, retrieved 2017-07-15
  6. Li, S.; Zhang, F.; Ma, L.; Ngan, K. N. (October 2011). "Image Quality Assessment by Separately Evaluating Detail Losses and Additive Impairments". IEEE Transactions on Multimedia. 13 (5): 935–949. doi:10.1109/tmm.2011.2152382. ISSN 1520-9210.
  7. Bampis, Christos G.; Bovik, Alan C. (2017-03-02). "Learning to Predict Streaming Video QoE: Distortions, Rebuffering and Memory". arXiv:1703.00633 [cs.MM].
  8. Rassool, Reza (2017). "VMAF reproducibility: Validating a perceptual practical video quality metric" (PDF). 2017 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). Retrieved 2017-11-30.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.