Advanced Video Quality Tool (AVQT)

AVQT works great; doing some comparisons with VMAF and SSIMPLUS. Some questions.

  1. Why are you reporting scores for segments and not a total score?
  2. Why are scores on the bottom of the CSV file where I have to scroll to see them rather than at the top?

A suggestion - I'm finding it challenging to compute VMAF with FFmpeg on the Mac. Probably just me as I'm not that proficient on the Mac. But, if you added VMAF scoring to AVQT, users would have an easy way to compute VMAF as well.

Just a thought.

Thanks.

Jan Ozer

Replies

Thank you Jan for your questions and suggestions. Let me provide some answers to your questions:

  1. We agree there is value in reporting an overall score for the whole video. However, this is quite challenging as it requires subjective data on long duration videos to design and evaluate an aggregation model. The aggregation model would need to mathematically model several memory related aspects in the human visual system such as first and last impressions, sudden quality drops and the length of low quality periods.
  2. For convenience, the segment scores are printed to the console at the end of the process. The generated CSV and JSON files are mainly meant to be used by scripts for automation purposes.
  • Hello!

    I saw the talk on AVQT at last year's Video@Scale event. There, it was mentioned that we could get in touch for a Linux version of AVQT.

    Is it possible for us to get hold of the Linux version of AVQT?

    Thanks.

Add a Comment

Dear ME:

Thanks for your response.

  1. There are multiple aggregation models already in use and already accepted - see here: https://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=139140

AFAIK, there are no metrics that don't produce an aggregated score; VMAF does, SSIMPLUS does, all the legacy metrics do. You're already aggregating by segment; how much difference would it be for the whole file?

  1. For convenience? A program can find a score anywhere in the file. Humans need to scroll to see it. So you're putting it last where a human has to work the absolute hardest to see it when it makes no difference to the machine? It's much less convenient for the human and makes no difference to the machine.

Thanks again for your answers.

I have to agree with Jan here. In particular, related to the CSV output, you said that "the generated CSV and JSON files are mainly meant to be used by scripts for automation purposes", but the generated CSV is all but machine-readable.

The file starts with a number of lines not containing any comma at all, but a header saying "Advanced Video Quality Tool (AVQT) - CLI", then a few lines separated by a colon (:) with metadata on the processing.

Only then it starts with a CSV-conformant set of lines like:

Frame Index,AVQT
1,1.00
2,1.00
3,1.00
…

But later, it suddenly changes the semantics of the data again:

Segment Index,AVQT
1,1.00
…

This makes it really hard to parse. A proper CSV file should have a fixed number of columns with a known meaning, and it should not change the semantics in the middle of the file. Also, a CSV file cannot have metadata like this added to the top. If I wanted to parse it, I would have to manually strip away all this, and I would have to consider the change of semantics in the middle of the file.

This means I cannot simply use a CSV parser like Python's CSV reader, pandas or a statistical software like R or Excel to give me a meaningful output without having to manually clean up the output file.

If you are trying to convey different types of records, you should have three CSV outputs:

  1. Metadata, with headers version,test_file,reference_file,segment_duration,temporal_pooling,display_width,display_height,viewing_distance
  2. Per-frame scores
  3. Per-segment scores

This way, all the data can be parsed cleanly.

For more info on how CSV data can be laid out cleanly, please see the paper "Tidy Data" by Hadley Wickham.