Traditional Culture Encyclopedia - Photography and portraiture - Ask for a multimedia technical paper of about 3200 words.

Ask for a multimedia technical paper of about 3200 words.

Multimedia image compression technology

Multimedia data compression technology is one of the key technologies of modern network development. Because there are various redundancies in image and sound signals, data can be compressed. There are two kinds of data compression technologies: lossless compression and lossy compression, and these compression technologies have different standards.

First, multimedia data compression technology

When C.E.Shannon founded information theory, he proposed that data should be regarded as a combination of information and redundancy. Early data compression became a part of information theory because it involved redundancy. Data can be compressed because of various redundancies. Among them, there are temporal redundancy, spatial redundancy, information entropy redundancy, prior knowledge redundancy and other redundancies. Temporal redundancy is a common redundancy in speech and sequence images, and there is a strong correlation between two frames in moving images. By using inter-frame motion compensation, the rate of image data can be greatly compressed. So is pronunciation. Especially in voiced segments, speech signals show strong periodicity for a long time (several to tens of milliseconds), and a high compression ratio can be obtained through linear prediction. Spatial redundancy is used to represent some spatial regularity in image data, such as large spatial redundancy in a large uniform background. Information entropy redundancy refers to the redundancy caused by not following the optimal coding in the sense of information theory in the symbolic representation of information sources. This redundancy can be compressed by entropy coding, such as Huff-man coding. Prior knowledge redundancy means that the understanding of data has a considerable relationship with prior knowledge. For example, when the receiver knows that the first few letters of a word are AD ministro, he can immediately guess that the last letter is R, so in this case, the last letter does not contain any information, which is a priori knowledge redundancy. Other redundancies refer to redundancies caused by information that is not perceived subjectively.

Generally, data compression techniques can be divided into lossless compression (also known as redundant compression) and lossy compression (also known as entropy compression). Lossless compression is to remove or reduce redundancy in data, but these redundancies can be reinserted into data, so there will be no distortion. This method is generally used for text data compression, which can ensure the complete recovery of the original data; Its disadvantage is low compression ratio (its compression ratio is generally 2: 1 to 5: 1). Lossy compression compression entropy, so there is a certain degree of distortion; It is mainly used to compress data such as sound, image and dynamic video, and the compression ratio is relatively high (generally, the compression ratio is as high as 20: 1). The latest compression technology called "e-igen-ID" can compress genetic data 65.438+0.5 billion times. For multimedia images, there are still image compression standards (JPEG standard, that is, "JointPhotographicExpertGroup" standard) and dynamic image compression standards (MPEG standard, that is, "MovingPictureExpertGroup" standard).

JPEG uses the psychological and physiological characteristics of human eyes and its limitations to compress color, monochrome and multi-gray continuous tones, static images and digital images, so it is very suitable for less complicated and generally comes from real situations.

An image of a real scene. It defines two basic compression algorithms: one is based on distortion, and the other is based on spatial linear prediction (DPCM). In order to meet various requirements, it has formulated four working modes: lossless compression, sequential working mode based on DCT, progressive working mode and layered working mode.

MPEG is used to compress moving images. The MPEG standard includes three parts: (1)MPEG video, (2)MPEG audio and (3)MP system (synchronization of video and audio). MPEG video is the core of this standard. It adopts the compression method combining intra-frame and inter-frame, and is based on discrete residual transform (DCT) and motion compensation. MPEG can compress the image to more than1100 when the image quality cardinal number remains unchanged. MPEG audio compression algorithm is based on human ear masking filtering function. Using the basic principle of sound psychology, that is, when playing back the audio of a certain frequency, you can't hear the sound of that frequency, and compress the redundant audio signals that people can't hear or basically can't hear. Finally, the compression ratio of audio signals reaches 8: 1 or higher, and the sound quality is realistic, comparable to CD records. According to MPEG standard, MPEG data stream contains system layer and compression layer data. System layer includes timing signal, synchronization of image and sound, and multiple synchronization.

Distribution and other information. The compression layer contains the actual compressed image and sound data. After the video and audio signals are merged and synchronized, the data transmission rate is1.5 MB/s. The transmission rate of compressed image data is 1.2M, and the transmission rate of compressed sound is 0.2 MB/s. ..

The development of MPEG standards has gone through different levels, such as MPEG- 1, MPEG-2, MPEG-4, MPEG-7 and MPEG-2 1. In different MPEG standards, each standard is based on the previous standard and backward compatible with the previous standard. At present, MPEG- 4 standard is widely used in image compression. MPEG-4 is a huge extension of MPEG-2, and its main target is multimedia application. In MPEG-2 standard, our concept is a single image, which contains all the elements of an image. Under the MPEG-4 standard, our concept becomes multi-image elements, and each multi-image element is independently encoded. This standard contains a description of the receiver, telling the receiver how to form the final image.

The above picture not only shows the concept of MPEG-4 decoder, but also clearly depicts the purpose of each component. Here, instead of using a single video or audio decoder, several decoders are used, and each decoder only receives a specific image (or sound) element and completes the decoding operation. Each decoding buffer only receives its own sensitive data stream and forwards it to the decoder. The composite memory completes the storage of image elements and sends them to the appropriate position of the display. The same is true for audio, but the obvious difference is that all elements are required to be provided at the same time. The time stamp on the data ensures that these elements can be synchronized in time and correctly. MPEG-4 standard distinguishes and stipulates natural elements (physical images) and synthetic elements, and computer-generated animation is an example of synthetic elements. For example, a complete image can contain an actual background image, preceded by an animation or another natural image. Such images can be optimally compressed and transmitted to the receiver independently of each other, and the receiver knows how to combine these elements. In MPEG-2 standard, images are compressed as a whole. Under the MPEG-4 standard, every element in the image is optimized and compressed. The static background does not need to be compressed to the next I frame, otherwise the bandwidth will be very tight. If the background image is still 10 second, it only needs to be transmitted once (assuming we don't have to worry about someone cutting into this channel during this time), and only need to continuously transmit the relatively small image elements in the foreground. For some program types, this will save a lot of bandwidth. The MPEG-4 standard also handles audio in the same way. For example, there is a soloist accompanied by an electronic synthesizer. Under the MPEG-2 standard, the soloist and synthesizer should be mixed first, and then the synthesized audio signal should be compressed and transmitted. Under the MPEG-4 standard, we can compress the solo separately, and then transmit the channel signal of the digital interface of the musical instrument, so that the receiver can reconstruct the sound. Of course, the receiver must be able to support MIDI playback. Compared with transmitting synthetic signals, transmitting solo signals and MIDI data alone saves a lot of bandwidth. Other program types can also make similar provisions. MPEG-7 standard is also called multimedia content description interface standard. Images can be described by parameters such as color, texture, shape and motion. MPEG-7 standard relies on many parameters to classify images and sounds and query their databases.

Second, the realization method of multimedia data compression technology

At present, there are nearly one hundred ways to realize multimedia compression technology, among which the compression method based on source theory coding, discrete cosine transform and wavelet decomposition technology are more representative. Wavelet technology breaks through the limitations of traditional compression methods and introduces a new idea of removing local and global correlation redundancy, which has great potential, so it has attracted many researchers in recent years. In wavelet compression technology, an image can be decomposed into several regions, called "small blocks"; In each film, the image is decomposed into several low-frequency and high-frequency components after filtering. Low-frequency components can be quantized with different resolutions, that is, the low-frequency part of the image needs a lot of binary bits to improve the signal-to-noise ratio of image reconstruction. Low-frequency components can be finely quantized and high-frequency components can be roughly quantized, because you can't easily see noise and errors in the changing area. In addition, segmentation technology has been proposed as a compression method, which relies on the repetitive characteristics of actual graphics. Using block technology to compress images will occupy a lot of computer resources, but it can achieve good results. With the help of pattern recognition technology developed by DNA sequence research, the traffic through WAN link can be reduced, and the compression ratio can reach up to 90%, thus providing a larger compression ratio for network transmission of images and sounds, reducing network load and better realizing network information dissemination.

Third, the compression principle.

Because there is some redundancy between image data, data can be compressed. Shannon, the founder of information theory, proposed that data should be regarded as a combination of information and redundancy. The so-called redundancy is because there is a great correlation between the pixels of an image, which can be deleted by some coding methods, thus achieving the purpose of reducing redundant compressed data. In order to remove redundancy in data, it is often necessary to consider the statistical characteristics of signal sources or establish statistical models of signal sources. Image redundancy includes the following contents:

(1) Spatial redundancy: correlation between pixels.

(2) temporal redundancy: redundancy between two consecutive frames of a moving image.

(3) Information entropy redundancy: unit information is greater than its entropy.

(4) Structural redundancy: there is a very strong texture structure in the area of the image.

(5) Knowledge redundancy: there is a fixed structure, such as a human head.

(6) Visual redundancy: The distortion of some images is imperceptible to human eyes.

Digital image compression usually uses two basic principles:

Correlation of (1) digital images. There is usually a strong correlation between adjacent pixels in the same line of an image and corresponding pixels in adjacent frames of a moving image. Removing or reducing these correlations will also remove or reduce the redundancy in image information, that is, realize the compression of digital images.

(2) People's visual psychological characteristics. Human vision is insensitive to sharp edge changes (visual masking effect) and the color resolution is weak. Using these characteristics, the coding accuracy can be reduced appropriately in the corresponding part, so that people can not feel the decline of image quality visually, thus achieving the purpose of digital image compression.

There are many coding and compression methods, and there are also different classification methods from different angles. For example, from the perspective of information theory, they can be divided into two categories:

(1) Redundant compression method, also known as lossless compression, information preservation coding or entropy coding. Specifically, the decoded image is exactly the same as the image before compression coding without distortion. Mathematically, this is a reversible operation.

(2) Information compression methods, also known as lossy compression, distortion coding or entropy coding. In other words, the decoded image is different from the original image, and some distortion is allowed.

Image compression and coding methods applied in multimedia can be divided into:

(1) What kind of lossless compression coding? 6? 1 huffman encoding? 6? 1 arithmetic coding? 6? 1 run-length coding? 6? 1 empel zev coding

(2) What kind of lossy compression coding? 6? 1 predictive coding: DPCM, motion compensation? 6? 1 frequency domain method: text transform coding (such as DCT), subband coding? 6? 1 spatial domain method: statistical block coding? 6? 1 model method: fractal coding, model-based coding? 6? 1 Based on importance: filtering, sub-sampling, bit allocation and vector quantization.

(3) Mixed coding? 6? 1JBIG, H26 1, JPEG, MPEG and other technical standards.

An important index to measure the advantages and disadvantages of a compression coding method

The compression ratio of (1) is higher, several times, dozens times, hundreds times or even thousands times;

(2) The compression and decompression speed should be fast, the algorithm should be simple and the hardware implementation should be easy;

(3) The decompressed image has better quality.

Fourthly, JPEG image compression algorithm.

1 ...JPEG compression process

JPEG compression is implemented in four steps:

1. color mode conversion and sampling;

2.DCT transform;

3. Quantify;

4. coding.

2. 1. Color mode conversion and sampling

RGB color system is the most commonly used way to express colors. JPEG uses YCbCr color system. If you want to use JPEG basic compression method to process panchromatic images, you must first convert RGB color mode image data into YCbCr color mode data. Y stands for brightness, Cb and Cr stand for chroma and saturation. Data conversion can be completed by the following calculation formula.

y = 0.2990 r+0.5870g+0. 1 140 b

CB =-0. 1687 r-0.33 13G+0.5000 b+ 128

Cr = 0.5000 r-0.4 187g-0.08 13B+ 128

The human eye is more sensitive to low-frequency data than high-frequency data. In fact, human beings

Our eyes are also much more sensitive to the change of brightness than to the change of color, which means that the data of Y component is more important. Because the data of Cb component and Cr component are relatively unimportant, only some data can be taken for processing. To increase the compression ratio. JPEG usually has two sampling methods: YUV4 1 1 and YUV422, which represent the data sampling rates of Y, Cb and Cr respectively.

2.2.DCT transform

The full name of DCT transform is discrete cosine transform, which refers to converting a group of light intensity data into frequency data, so as to understand the change of light intensity. If we modify the high-frequency data and then turn back to the original data, it is obviously different from the original data, but it is not easy for human eyes to recognize.

When compressing, the original image data is divided into 8×8 data unit matrices. For example, the content of the first luminance value matrix is as follows:

JPEG takes the whole brightness matrix, chroma Cb matrix and saturation Cr matrix as a basic unit, which is called MCU. Each MCU contains no more than 10 matrices. For example, if the ratio of row and column sampling is 4:2:2, then each MCU will contain four brightness matrices, one chroma matrix and one saturation matrix.

When the image data is divided into 8*8 matrices, each value must be subtracted from 128, and then substituted into the DCT transformation formula, so that the purpose of DCT transformation can be achieved. The image data value must be subtracted from 128, because the number range accepted by the DCT conversion formula is between-128 and+127.

DCT transformation formula:

X and y represent the coordinate position of a value in the image data matrix.

F(x, y) represents several values in the image data matrix.

U and v represent the coordinate position of a value in the matrix after DCT transformation.

F(u, v) represents a value in the matrix after DCT transformation.

U=0 and v = 0 c (u) c (v) =11.414.

U>0 or v>0 c(u)c(v)= 1

The natural number of matrix data after DCT transformation is frequency coefficient, and the maximum value of these coefficients is f (0 0,0), which is called DC. The remaining 63 frequency coefficients are mostly positive and negative floating-point numbers close to 0, collectively referred to as AC.

3.3, quantification

After the image data is converted into frequency coefficients, it must go through the quantization process before it can enter the coding stage.

In the quantization stage, two 8*8 matrix data are needed, one is to process the frequency coefficient of brightness, and the other is

For the frequency coefficient of chromaticity, divide the frequency coefficient by the value of quantization matrix to get the integer closest to quotient.

That is, quantization is completed.

When quantizing the frequency coefficient, the frequency coefficient is converted from floating-point number to integer number, which is the most convenient to execute.

After coding. However, after the quantization stage, all the data only keep integer approximation and are lost again.

The quantization table provided by JPEG is as follows:

2.4, coding

Huffman encoding has no patent right and has become the most commonly used encoding method in JPEG. Huffman encoding is usually realized by a complete single chip microcomputer.

When encoding, the DC value and 63 AC values of each matrix data will use different huffman encoding tables, and the brightness and chroma will also need different huffman encoding tables, so a * * * needs four encoding tables to successfully complete JPEG encoding.

DC coding

DC is a differential coding method, which modulates the color by differential pulse coding, that is, the difference between each DC value and the previous DC value is obtained in the same image component for coding. The main reason why DC adopts differential pulse coding is that in continuous tone images, the difference is mostly smaller than the original value, and the number of bits required for coding the difference will be much less than that required for coding the original value. For example, if the difference is 5, its binary representation value is 10 1. If the difference is -5, first change it to a positive integer of 5, and then binary convert it to the complement of 1. The so-called 1' s complement is to change the value of each bit to1if it is 0; When this bit is 1, it becomes 0. The number of bits that should be reserved for the difference value of 5 is 3. The following table lists the comparison between the number of digits that should be reserved for the difference and the content of the difference.

Add some difference Huffman code values before the difference. For example, if the number of digits with brightness difference of 5( 10 1) is 3, then the Huffman code value should be 100, and the two together are 100 1. The following two tables are the coding tables of DC difference of brightness and chromaticity respectively. According to the contents of these two tables, the Huffman code value can be added to the DC difference to complete the DC coding.

Ac coding

AC coding method is slightly different from DC. Before AC coding, 63 AC values must be sorted in zigzag, that is, connected in series in the order shown by the arrow in the figure below.

If 63 AC values are arranged, the AC coefficient is converted into an intermediate symbol, which is expressed as RRRR/SSSS, RRRR refers to the number of ACs with a value of 0 before non-zero AC, and SSSS refers to the number of bits required for AC values. The correspondence between the range of AC coefficient and SSSS is similar to the comparison table of DC differential digits and differential contents.

If the number of AC with continuous zeros is greater than 15, 16 petty items 15/0, which is called ZRL (Zero Rum Length), and (0/0) is called EOB (Enel of Block) to represent the following ones.

The rest of the AC coefficients are all equal to 0. Taking the middle symbol value as the index value, the appropriate Huffman code value is found from the corresponding AC coding table, and then connected with the AC value.

For example, the middle character of a group of brightness is 5/3 and the AC value is 4. First, use 5/3 as the index value. Find111111110065438 from the huffman encoding table of brightness AC. Huffman code1111111/0065438+.

Because the huffman encoding tables of brightness AC and chroma AC are relatively long, they are omitted here. Interested parties can refer to related books.

Realize the above four steps, that is, complete JPEG compression of an image.