Previous Page DIGITAL-S Web Site JVC Home Page (U.S.)

COMPRESSION IN DIGITAL VTRS

By John Watkinson - April 1999 edition, Broadcast Hardware International.

Reprinted with permission

The methods used meaningfully to assess the quality of digital VTRs using compression are quite different from the tests applied to uncompressed DVTRs or analog VTRs. The recent tests carried out by the EBU/SMPTE Task Force took a realistic view of the subject. Here is an explanation of the design of the tests and why the results align closely with what theory predicts.

The core of the digital VTR is a data recorder and this is provided with a powerful error correction system to allow operation on a variable medium such as tape. When the error correction system is working within its capability, which is the normal case, the replayed data are identical to those provided during recording and as a result the core recorder causes no quality loss at all.

In an uncompressed DVTR the picture quality is determined by the accuracy of the ADC used to convert the original video to data. In a SDI digital production environment, each generation of digital processing such as DVE, mixing etc, causes a small increase in the noise floor due to round-off errors. This is the only generation loss in an uncompressed system.

In a DVTR using lossy compression, which includes all of the currently available compressed DVTRs, the transparent core digital recorder is surrounded by an encoder on the record side and a decoder on the replay side. Assuming that good ADCs are used, the picture quality is now determined by the encoder. With a lossy compression system, the output from the decoder is not identical to what went in to the encoder. There is a small generation loss, bringing back some of the attributes of analog systems.

For dubbing purposes, compressed DVTRs can escape generation loss entirely by transferring the compressed or native data between them. When the data on the two tapes are identical, the quality when decoding the second tape will be identical to that obtained when decoding the first. This is the reason behind the moves to extend SDI into SDTI so that it can transport native compressed data.

The attraction of compression is that it allows the bit rate of digital video to be reduced. This has a number of advantages for recording. The cost can be reduced, or the equipment can be miniaturised, or the recording density can be reduced to give an increase in robustness. In transmission, compression reduces the bandwidth required, and DVB certainly wouldn't be feasible without compression. Compression also allows video to be sent faster than real time within a given bandwidth. Clearly the higher the compression factor, the greater the extent to which these advantages are obtained.

However, there is a limit to the compression factor which can be used before quality loss sets in. Unfortunately it is very difficult to say what that compression factor is, because there is no one correct answer. The allowable compression factor is fundamentally limited by:

* The type of compression technique used

* The number of generations of encoding and decoding expected

* The nature of the production steps taking place at each generation.

* The latency of the encoder, i.e. the time delay in compressing the video

* The nature of the program material

* The editing freedom required

The above factors are based on physics and are manufacturer independent. Practical equipment may approach these limits as a function of economics or of the experience and ingenuity of the manufacturer.

In broadcasting, a universal standard is an essential requirement so that receiver manufacturers know what to make. The DVB/MPEG standard lays down exactly what the standard transmitted bitstream looks like so that the broadcaster knows what to encode. The television viewer is not going to perform any production steps, and so a high latency compression scheme is important to obtain a high compression factor. It is advantageous for broadcasting if the decoders are simple and cheap even if this means the encoders are complex and expensive, simply because there will be thousands of decoders for every encoder. This is known as asymmetric compression.

In a DVTR format, the need for a universal standard is not important because every DVTR contains, and must contain, a compatible encoder-decoder pair, whose operation essentially becomes part of the format. As users will expect to edit their video recordings, long latency compression simply isn't appropriate. As the purchase price of the DVTR must include one encoder and one decoder, the economics of asymmetry just don't apply.

Because of these differences, the MPEG standards have "optimised for delivery to the consumer" written through them like seaside rock. In its intended application, MPEG works stunningly well, as anyone who has seen a well mastered Digital Video Disc can testify. In the wrong application, it's not so impressive.

Figure 1

Fig.1 shows the constant quality curve for MPEG. In its broadcast form using bidirectional long latency (IBBP) coding, the bit rate is 40 percent of that needed if only intra-coding (I only) is used.

Unfortunately if editing is going to be performed, long latency coding should be avoided because it causes generation loss.The MPEG standard allows many combinations of picture types in the GOP for different applications, but changing between one MPEG GOP structure and another is inevitably lossy. Quite apart from any extra generations needed for production steps, material recorded on a 4:2:2 IB structure may need to be converted to I only for editing and then converted to 4:2:0 IBBP for broadcasting. Thus the arguments in favour of an entirely MPEG signal chain are misguided.

MPEG is an asymmetrical long latency system optimised for single generation broadcasting, whereas production recorders require symmetrical intra frame systems optimised for editing and multi-generation production. ENG recorders work best with symmetrical intra-frame systems having a lower bit rate which allows miniaturisation at the expense of multi-generation performance. As the bit rate goes down, better results are obtained by downsampling the colour to reduce the source data rate.

When considering concatenation of compression codecs, it is important to keep in mind what will happen between each generation. If no production step is performed and the decoded video is simply re-encoded, the generation loss can be quite small, and some manufacturers make a big play of this, but it begs the question of why, in a production system, anyone would decode only to encode again? This might be the scenario if a dub were required between two compressed machines but the only interface available was SDI, but this requirement disappears when native compressed data can be passed between machines using SDTI for complete freedom from generation loss.

In practice, the only meaningful test for quality in a compression environment is to make tests at various numbers of generations and to perform some representative production step between each generation.

This was the approach taken by the EBU/SMPTE task force when they tested DVCPRO, SX and D-9 recorders and a 50Megabit/sec 4:2:2 Profile Intra-frame MPEG codec using analog Betacam SP and Digital Betacam as controls.

As many variables as possible were eliminated by using the same sequences as source material for each test. These were a mixture of EBU test sequences and some actual broadcast material. The Double Stimulus Continuous Quality Scale (DSCQS) testing system was used to avoid unintentional bias, and tests were made at four times picture height (4H) for critical assessment and at 6H for normal assessment.

The results were presented in bargraph form where the length of the bar represents the degree of impairment. On each result the threshold of 12.5% is shown for reference. Below this amount of impairment human viewers tend to have dificulty deciding if there is any impairment at all and results depend on the expertise of the viewer.

The SX and DVCPRO tests were divided into three levels of generation loss using Betacam SP as a control. Fig.2 shows that these were:

a) first generation quality, which corresponds to the use of a compressed DVTR for acquisition, the decoded video being assumed be to passed into an uncompressed digital system for the remainder of the production process.

b) fourth generation quality with two moderate simulated production steps between generations and one decode/encode dub, corresponding to an application which the EBU calls "hard news". The production steps here consisted of a temporal shift of one frame which might result from putting the signal through a wide variety of units such as frame synchronisers. This was followed by a spatial shift of two pixels horizontally and one vertically. This was considered to be a likely scenario when real picture manipulation such as in DVEs are employed.

c) seventh generation quality with three simulated production steps between generations and three decode/encode dubs, an application called "magazine" by the EBU and corresponding to heavily post produced material. The three production steps here were the two steps of the "hard news" test with an additional spatial shift.

Figure 2

When SX and DVCPRO were compared with analog Betacam SP, the results on first generation are that all were acceptable at both picture heights. The analog system was preferred by a small margin.

On fourth generation the same trend was evident with the analog format still holding up well andd the digital formats slightly exceeding the acceptability threshold with DVCPRO ahead of SX in the worst cases.

SX has an IB structure and a temporal shift of one frame interchanges the picture types, stressing the codec, whereas the intra frame coding of DVC doesn't care. On the other hand the horizontal shift of two pixel stresses a 4:1:1 format recorder because the colour pixels are interpolated, whereas the 4:2:2 sampling of SX doesn't care. So the tests are fair.

I suspect, however, that the reason SX gives such poor worst case performance in comparison with DVC here is that SX is using 4:2:2 coding in the belief that it gives better pictures than 4:1:1. Whilst this is true in general, in low bit rate compression systems it isn't true. Using 4:2:2 requires 33% more input data than 4:1:1, and so for a given bit rate the compression factor has to be 33% higher. Whilst a 50 megabit/second system can handle it, lower bit rates can't.

At the low rate of 18 megabits/second used in SX, the compression factor and the noise floor is simply too high for multi-generation performance. It would work better if it used 4:1:1.

The seventh generation test basically killed all of the candidates, with the Betacam SP showing the limits of analog cassette technology, DVCPRO coming next with a bad case of concatenation and SX coming in last due to carrying all that surplus chroma.

Massimo Visca of the RAI (part of the EBU test group), describing these tests, concluded that:

"Although first generation is considered acceptable, by the fourth generation the user is forced to accept a quality which is lower than an analog component recording. At 7th generation the quality is so poor that it cannot be considered for broadcast applications."

The tests on D-9 and 50 Megabit 4:2:2 MPEG were similar to the above except that the production steps were more severe owing to the higher bit rates being used. Digital Betacam was used as a control, but the analog Betacam SP was retained. Fig.3 shows the test configuration. Note that there is no 50 Megabit MPEG tape format, the unit tested was a encoder decoder pair.

Figure 3

As the bit rates used were higher, the first generation quality loss was minute. As the EBU report states:

"For the first generation, the performance of D-9 compression and the compression used in Digital Betacam was rated to be identical."

The fourth generation quality loss was similar in magnitude between D-9 and Digital Betacam. However, the report stated:

"Differences in system performance are detectable on closer scrutiny and can be described as softness in areas of high picture detail for D-9 and increased coding noise for Digital Betacam."

Ordinarily this might be a matter of preference, but where the production system is the source for a digital television broadcast, the MPEG encoding system is intolerant of noise and so D-9 is preferable to Digital Betacam.

In the most critical test, D-9 outperformed 50 megabit MPEG as shown in Fig.4. Both use intra frame coding and have the same bit rate and the same 4:2:2 sampling so this as fair as tests get. The result can only be interpreted to mean that D-9 has a better intra frame codec than that of the MPEG coder tested. MPEG has an enormous toolbox of coding tricks and works because of the synergy between them. They are all used in IBBP coding, but in I only coding, most of them aren't used at all, but the means to select them remains as an overhead.

The conclusion of the Task Force regarding D-9 was " No remarkable decrease of picture quality was observed up to the 7th generation". Thus D-9 was judged suitable for mainstream broadcast applications.

At seventh generation, Digital Betacam shows a very slight advantage in direct comparison with D-9. Bearing in mind that the bit rate of Digital Betacam is 70 percent higher, it would be surprising if there was no difference, but what is significant is how small the difference is. This further points to the D-9 format having a very good compression algorithm, being able to nearly match Digital Betacam at 88 megabits/sec and outperform MPEG at the same bit rate.

Another consideration is that, according to the Task Force report, Sony will not provide a native (compressed data) interface for Digital Betacam. This means that a simple dub through SDI suffers generation loss. JVC have indicated that D-9 machines will support SDTI so that dubs will be lossless.

Thus in practice, a real D-9 SDTI installation will have fewer generations than a Digital Betacam SDI installation.

These tests effectively back up the predictions of communications theory. My conclusions are as follows:

1. MPEG is optimised to be a delivery technology where asymmetrical coding is an advantage.

2. Production recording works best with symmetrical coding. Thus at low bit rates DVCPRO outperforms the MPEG based SX format and at 50 megabits/sec D-9 outperforms MPEG.

3. The requirement for an entirely MPEG production chain is a myth.

4. DVCPRO is a cost effective choice for acquisition and low-complexity production.

5. D-9 essentially delivers the same performance as Digital Betacam and represents a cost effective choice for mainstream television production.

* * *