JVC DIGITAL S - COMPRESSED DIGITAL VIDEO RECORDING AND INTERFACES

Yoshimichi Nagaoka Kiyoshi Honma David Gifford Neil Neubert
Victor Co. of Japan Victor Co. of Japan Victor Co. of Japan JVC Professional

Products Company

INTRODUCTION

JVC Professional Products has developed Digital S to provide a high quality digital video tape recording system at an economical and affordable cost for all users. JVC has applied new digital video technologies to the development of Digital S, a new digital video tape recording system for professional and a great variety of high quality program production applications. The new Digital-S system achieves three major goals: virtually infinite improvement of picture quality and multi-generation performance that results from digital video recording; a very high quality, yet economical, cost-effective digital video recording solution for professional video producers; and playback compatibility and continuity with an existing library of analog S-VHS video tapes. The result is Digital S, a new professional video tape recording format that combines all the advantages of digital audio/video recording with an outstanding cost/performance ratio.

DIGITAL-S DEVELOPMENT CONCEPT

Numerous performance issues and customer requirements have to be considered to determine the technologies that must be employed in a digital video tape recorder. While the recording capacity of digital video tape is very great, bit rate reduction is a significant technology that can be employed to further increase the recording time of any given length of video tape. Digital S employs bit rate reduction and the amount, or compression ratio for Digital S, was determined after establishing the following performance and feature requirements:

These lead to the choice of a 3.3 to 1 compression ratio for bit rate reduction in the Digital-S video recording format. In addition to many of the requirements listed above, the need to accommodate: all editing methods, operations, and functions; frame accurate editing; and visible picture search up to 32 times normal playback speed in the forward and reverse directions, resulted in selection of intra-frame bit rate reduction for Digital-S. Intra-frame compression also makes excellent error handling capability possible.

Digital S uses the 4:2:2 image sampling structure specified by Recommendation ITU-R BT.601. Television distribution services are rapidly migrating to digital delivery to the home receiver using MPEG-2 compressed digital video. The choice of 4:2:2 sampling for Digital S assures satisfactory concatenation to the 4:2:0 sampling structure of the MPEG-2 ML @ MP used for digital television distribution. The following figure shows three common image sampling structures in use for professional television applications.

DIGITAL-S VIDEO TAPE RECORDING FORMAT

Recording System

DIGITAL S" employs the azimuth (±15°) recording system to suppress crosstalk from adjacent tracks. A flying (rotary) erase head is incorporated to erase-and-record each segment and helical track individually during editing to assure reliable, frame accurate insert and assemble mode edit capability.

The modulation system uses S-INRZI/24-25 modulation with the PR4 detection method. The required recording density is achieved utilizing a track width of 20 µm, a 2-bit length of 0.587 µm, and recording clock frequency set at 49.5 MHz. The error correction system employs a double-encoded correction method using Reed-Solomon code. This format makes concealment of playback errors possible even when one of the heads malfunctions, as data can be interpolated using the playback data of the previous frame from the other head.

Digital S utilizes ½ inch metal particle video tape housed in a robust cassette similar to those of S-VHS and W-VHS. Details of the Digital-S video tape track pattern are shown in the four figures that follow. One frame of video is recorded on ten tracks for the 525/60 television system and on twelve tracks for the 625/50 system. The digital video signal, digital audio signals, and subcode (including system data) are written in the helical tracks recorded by the rotating heads. Sectors are recorded in the order of; Video 0, Subcode, Audio 1/3, Audio 4/2, Video 1. Edit gaps are provided between sectors to enable independent editing of each sector. The last figure shows the detailed sector arrangement on each helical track.

 

DIGITAL S - VIDEO TAPE PATH

 

VIDEO TAPE TRACKS - 525/60 AND 625/50

 

SUBCODE, AUDIO, VIDEO SECTORS - 625/50

 

DIGITAL S - HELICAL TRACK SECTOR DETAILS

Tape Track Details

The record/playback area on the helical track is the section where the recording head scanning area and the playback head scanning area overlap, that is, the area from the playback head scanning starting point to the record head scanning end point. The record/playback area is positioned at the center of the video tape width.

The ITI sector houses the insert position control for insert editing and information about the track to be formed. The ITI sector is recorded at three locations along the track so that it can be read and played back at all speeds.

The audio sector is located at the center of the track where the influence of a non-linearity condition will remain small enough to allow sufficient RF signal playback S/N at the in and out points for insert editing.

Information recorded in the subcode sector is used for high speed search. It consists mainly of index related information including absolute track number and time code. Consequently, the subcode sector is also located at the center of the track so that it can be read and played back during high speed playback in the half-loading mode.

Digital VCRs that employ compression impose processing times for both encoding (compression), and decoding (decompression). Digital-S requires the very same amount of processing time for both encoding and decoding. Digital-S requires the time of two video frames for both encoding and decoding, thus, Digital-S encoding and decoding are symmetrical. Digital-S uses independent heads for recording and playback. The pre-read function requires these to be geometrically located on the drum such that the playback head traces any tape track, approximately four frames in advance of the record head which is the amount of time required for decode/encode processing. For pre-read operation, the playback head is located about one millimeter higher on the head drum, than the record head.

Video / Audio Sector Elements

Sync block elements are the same for both video and audio, and consist of :

Sync Area - 2 Bytes
ID Code - 3 Bytes
Data - 77 Bytes
Inner Parity - 8 Bytes
TOTAL - 90 Bytes

Reed-Solomon coding (85, 77) is used for the inner code and is the same for both video and audio. The outer code construction is Reed Solomon code (149, 138) for video, and Reed-Solomon code (14, 9) for audio.

The audio outer code has greater redundancy by parity and a greater correction capability than that of video.

Video Signal Processing

Digital S utilizes the DCT (Discrete Cosine Transform) process to transform the video signal to an 8 × 8 matrix of DCT coefficients. These represent the frequencies and their amplitudes that are contained in 8 Horizontal × 8 Vertical pixel areas of the picture, for subsequent compression quantizing. DCT blocks of an 8 × 8 dimension are used in many video compression systems, and Digital S uses 8 × 8 DCT blocks, as well. Luminance, Y, and each color component, R–Y, and B–Y possess 8 × 8 DCT blocks.

Digital S DCT blocks are combined to form macro blocks. One Digital S macro block is made up of two luminance (Y) DCT blocks, and one each DCT block for the (R–Y) and (B–Y) color components. A macro block contains a total of four DCT blocks. A macro block is the basic element of the error concealment process in Digital S.

Macro blocks are combined to form super blocks. Twenty seven adjacent macro blocks, 9 Horizontal × 3 Vertical macro blocks, make up one super block.

Super blocks are arranged in the video frame so linear sequences of them, representing coherent and continuous horizontal rows in the video frame, can be recorded on the video tape tracks to permit good visible picture display in shuttle, search, and slow motion playback modes. Digital S provides good picture display at search speeds up to 32 times normal playback speed in forward and reverse, and high quality digital slow motion playback between ± 1/3 times normal playback speed.

SUPER BLOCK POSITIONS WITHIN PICTURE

Digital S writes six track pairs to record a 625/50 video frame on tape. Track pairs are written by two head pairs, located 180 degrees apart on the head drum. The heads in each pair are of opposite azimuth angles. The first track of a pair possesses a positive azimuth angle, and the second, a negative azimuth angle. Super blocks representing the same portion of every video frame would, therefore, be recorded on, and played back from the same tracks, by the same head pairs, for every video frame since there are two head pairs, and six is an even number of track pairs. Consequently, odd and even rows of super blocks are alternately recorded on the first (positive) and second (negative) azimuth tracks of succeeding frames so that the played back picture may be renewed every two frames in the event of the failure of a single record or playback head.

DIGITAL S - SIX TRACK PAIRS PER FRAME - 625 / 50

Five track pairs, an odd number, are written by the same two head pairs, an even number, for 525/60 systems. Thus, each track is recorded and played back by the opposite head pair for each succeeding video frame, and alternate odd-even super block recording is not required.

 

DIGITAL S - FIVE TRACK PAIRS PER FRAME - 525 / 60

Shuffling

Prior to encoding (compression), Digital S "shuffles" the digital data representing a video frame for the purpose of distributing complex picture data over the entire video frame. Complex picture information characteristically accumulates in just a few distinct areas of a typical television picture. Shuffling disassembles each video frame in units of one macro block, thereby redistributing adjacent macro blocks evenly over the entire video frame. Areas of concentrated picture complexity are, thus, distributed in small macro block units throughout the stream of digital video data. The compression process does not receive excessive data during very short periods and, consequently, compression performance is significantly improved.

Digital S creates a sequential stream of digital video data "segments" as input to the compression process. A segment is made up of five macro blocks that have been selected from all over the video frame by the shuffling process. Shuffling strives to assure that the amount of data contained in all of the video segments is as equal as possible.

Quantization and Variable Length Coding

Digital S employs conventional DCT transformation of the time based digital video data to a stream of frequency based DCT coefficients. Digital S utilizes a unique quantization process that is quite different from that of others such as M-JPEG and MPEG-2, however. Prior to actual quantizing, processes of classification, initial scaling, and estimation are applied to the data stream.

1. A DCT block is first classified into one of four groups determined by the maximum absolute

  • value of the AC coefficients it contains. A simple 8 × 8 pixel DCT block, containing little information that is difficult to compress, might have only a few AC coefficients of low value. Such DCT block might, therefore, be a Class 0 DCT block. A DCT block containing complex picture information might have many AC coefficients of moderate value, or even, a few AC coefficients of high value. Such DCT blocks might, therefore, be Class 3 DCT blocks.
  • 2. Initial scaling transforms AC coefficients from 10 to 9 bits by rounding them off.

    3. Class 3 is rounded more precisely than classes 0, 1 and 2. Estimation is a process of selecting

  • the quantization factor to be applied to each. Digital S video segment (5 macro block sequence). Quantized data cannot exceed a maximum size of 385 bytes for each video segment. The quantization factor is chosen by a process of estimating the quantizer output data from the DCT AC coefficient data present at the quantizer input for each video segment. Factors are chosen to achieve the least compression and, therefore, the best possible picture quality within the output limit of 385 bytes.
  • 4. The applied quantization factor, that is, an actual number that the DCT AC coefficient data is

  • divided by in the quantizer, is determined by the estimation process. Four defined frequency dependent areas of the 8 × 8 DCT blocks can be quantized by different divisors and there are nine possible divisor combinations for these four areas. The divisors are determined by the class number of the DCT data in each video segment and the QNO, or quantization number from 0 to 15, that assures the output data will be 385 bytes or less.
  • 5. Variable length coding utilizes a two dimensional Huffman code. Code word length is determined

  • by the relationship between the "run", the length of consecutively recurring zeroes, and the "amp", the amplitude of the any data value other than zero that immediately follows a "run" of zeroes in each quantized DCT block.
  • Modulation

    The Digital S modulation scheme employs a combination of randomized bit streams and interleaved NRZI coding. These offer less influence on low frequencies than do NRZI methods that utilize PR4 coding. The modulation process is completed by applying the 24-25 transform. The result is a high performance modulation circuit from less circuitry than that required for other block modulation methods such as 8-14, for instance.

    Data is randomized using the M series polynomial function X7+X3+1. 24-25 modulation inserts an extra bit at the beginning of three consecutive randomized bytes resulting in a 25 bit code word. The extra bit can be a "1" or "0" and is chosen to prevent long consecutive sequences of like bits, that is, to maintain the AC characteristic of the modulation signal.

     

    EXAMPLE OF AN IMPLEMENTATION OF DIGITAL S

    INTERFACE

    Digital S provides a compressed digital data interface bitstream known as the DIF stream. The DIF stream can be used to connect compressed Digital S signals between Digital S VCRs without the need to decode to, and encode from ITU-R BT.601. The DIF bitstream can be mapped to the SMPTE 305M Serial Data Transport Interface (SDTI) and then carried between devices within the data payload of the common SMPTE 259M / ITU-R BT.656 Serial Digital Interface (SDI). The compressed Digital S bitstream thus emerges from the VCR via a conventional SDI signal and can be transported throughout television facilities using their existing infrastructures and standard SDI routing systems.

     

    The data in one video frame is divided in half to make two "video channels". Ten DIF sequences for 525/60 systems, and twelve DIF sequences for 625/50 systems are contained in one video channel. Each DIF sequence is made up of 150 DIF blocks, each of which contain; header, subcode, V aux, audio, or video data. Each DIF block carries 80 bytes of digital data.

    CONCLUSION

    The Digital-S Video tape Recording Format was designed to satisfy the requirements of a great variety of video recording applications. It is a perfect complement and companion with professional analog recording products and systems currently in service, and offered by JVC Professional Products Company and others. Digital-S possesses, however, exceptional features, technology, performance, and specifications, that will satisfy the video recording needs and requirements of tele-producers, broadcasters, and video professionals alike. In addition, Digital-S promises to satisfy all of these requirements at a significant new level of economy, unknown to digital video tape recording before now.

    BACK TO D9 LINKS PAGE