JPEG Encoding Algorithm Implementation Details

 

The process by which the software will execute the algorithms is described by the following block diagram:

 

 

Figure 4: Software Flow

 

A.      PPM File Decoding:

 

There are two file formats in which a PPM (Portable Pixel Map) file is stored; ASCII and binary. A PPM file is read, string by string if it is an ASCII file, and character by character if it is a binary file. The first three lines of the PPM file describe the file. Any line that begins with‘#’ is assumed to be a comment and ignored. The first line specifies whether the file is an ASCII file, denoted by the string ‘P3’, or a binary file, denoted by the string ‘P6’.  The next line of the file contains the dimensions of the file, or the number or pixels in the file. These two values are in the order of horizontal pixels first, then vertical pixels second. The third line species the maximum allowable RGB pixel value that is found in the file, this value is almost always 256. All other lines after this line contain RGB pixel values. They are read from left to right, then top to bottom. A single pixel data value contains values for each color plane. The RGB data values are stored in the PPM file in plain text, with values ranging from 0 to the maximum RGB value.

 

B.      YIQ Color Space Conversion:

 

After the RGB values are read, they are converted to the YIQ color space. ‘Y’ represents luminance, ‘I’ represents in-phase and ‘Q’ represents the quadrature. YIQ is used due to the fact that by taking some known facts about human vision, a much higher compression rate can be attained since many RGB values are very close to zero once the color space conversion is done. One of these is the tendency for the human eye to notice variations of brightness intensity much more than variations of color in an image. The JPEG algorithm can take advantage of this by applying different rules for brightness and color variations.

 

The YIQ conversion equations are shown below:

 

                Y =          0.299*R                  +   0.587*G             +   0.114*B - 128

                 I =          -0.15*R                    -    0.275*G            +   0.45*B

                Q =          0.4375*R                -    0.375*G             -   0.0625*B

 

C.      MCU (Minimum Coded Unit) Division

 

The next step is to subdivide the YIQ values into their respective MCU’s (Minimum Coded Units). For this application, the size of the MCU is 8 values by 8 values, for a total of 64 values. Each MCU is taken from right to left and then top to bottom. If the number of pixel values in the last row or last column of the file are not divisible by 8, rather than truncate these values, zeros are appended to the missing rows or columns.

 

D.      DCT (Discrete Cosign Transform)

 

A two dimensional discrete cosine transform (DCT2) is then applied to each MCU separately. The matrices are now in the frequency domain. Since the transformation matrix is band-limited, only 8 x 8 = 64 frequency components are included in the result. The human visual system is very dependent on spatial frequencies within an image. The DCT allows one to approximately break down an image into a set of waveforms, each with a particular spatial frequency. In doing this one can discard information that is not perceptible to the human visual system and keep the information that it is important to it.  One of the advantages of using a DCT process as JPEG does is that real world images often have very gradual changes in brightness and color. This leads, generally, to very low (often zero) values in the high frequency coefficients of the DCT output. At this point, the values in the transformed MCU may range from 0 to 4096; therefore each value must be stored in 3 bytes.

 

E.       Quantization

 

Each MCU is then scaled by an 8 x 8 matrix Q (known as the quantization matrix), which is determined by empirical studies of human visual perception. Since the human eye is more perceptive to lower frequencies than to higher frequencies, the idea is to scale the amplitude of each frequency by a quantization factor to eliminate the higher frequencies. The act of quantizing the output of the DCT is what introduces the "lossy" aspect of the JPEG algorithm. It also enables one of the main compression features that JPEG is able to perform. As mentioned previously, images tend have low spatial frequency changes in a localized area. Combined with the tendency for humans to not notice high spatial frequency changes, this allows a large amount of the high order spatial frequency coefficients to be discarded. This is implemented by choosing the quantization factors appropriately for the coefficient to be quantized. It is common to see a normal 8x8 pixel section of an image represented by two or three numbers in the DC and low frequency AC coefficients and the rest of the quantized coefficients being zero.

The ‘Y’ color space is scaled by the luminance table, and the ‘I’ and ‘Q’ color spaces are scaled by the chrominance table. Rather than performing matrix division on these two matrices, each value of the MCU is simply divided by the matching value in the table. These matrices are shown below:

 

                                                                

16

16

16

16

24

40

51

61

16

16

16

19

26

58

60

55

16

16

16

24

40

57

69

56

16

17

22

29

51

87

80

62

18

22

37

56

68

109

103

77

24

35

55

64

81

104

113

92

49

64

78

87

103

121

120

101

72

92

95

98

112

100

103

99

 

Table 1: Luminance Table

 

 

17

18

24

47

99

99

99

99

18

21

26

66

99

99

99

99

24

26

56

99

99

99

99

99

47

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

99

 

Table 2: Chrominance Table

 

As can be noticed from these tables, the lowest value is 16, which will ensure that the resulting data values will not be greater than 256, and they will therefore be able to be stored in just 2 bytes.

 

F.       RLE (Run-Length Encoding)

 

Once the aforementioned steps are done, it is not unusual for more than half of the frequency amplitudes to be zero. To take advantage of this fact, the JPEG algorithm records only non-zero amplitudes; their position within the matrices are captured by recording the number zeros preceding them. This traversal produces a sequence of pairs p, where p = (# of zeros preceding value, value). A special pair (0, 0) is used to indicate the end of the current matrix. Another special case exists where the number of zeros preceding a non-zero co-efficient is greater than or equal to 16. In this case, a “ZRL” symbol is written, and the zero counting starts again at zero. Also important to note, is that rather than reading the MCU from left to right and top to bottom, the MCU is read using a zigzag technique. This is done using a re-order array which is specified below:

                               

0

1

5

6

14

15

27

28

2

4

7

13

16

26

29

42

3

8

12

17

25

30

41

43

9

11

18

24

31

40

44

53

10

19

23

32

39

45

52

54

20

22

33

38

46

51

55

60

21

34

37

47

50

56

59

61

35

36

48

49

57

58

62

63

 

            Table 3: Re-Order Array

                                                    (Note: the values are read in the order that the numbers specify)

 

G.      Huffman Entropy Encoding

 

The next and final step of this algorithm is the entropy, implemented using Huffman encoding. Fortunately, the JPEG standard provides a number of tables of Huffman codes to describe the different ways that the input quantized data is to be encrypted. The quantized DCT coefficients tend to have two different styles of values within them. The first is the DC coefficient or the very first value in the zigzag ordered array. This value is always the average value and as such is likely to be any number within the range permitted. The second is all of the AC values (the second to the last coefficient in the zigzag ordered array). The values of these coefficients are likely to be close to zero, with the likelihood of being zero increasing as the coefficients near the end of the array. These two types of information are different enough to warrant separating them and applying different methods of coding to get the optimal compression efficiency. Additionally, coding in JPEG also distinguishes between whether the data is Luminance or Chrominance information. To do this it allows for different code tables to be specified for each of the components, much the same as how the quantization tables are defined. This means that a total of four Huffman code tables are generally defined in color images, two for DC information and 2 for AC information.

 

                                             i.            Entropy Coding

 

Entropy Coding is a form of compression that relies on developing "code" words in a library for the input values or "symbols". Since each of the symbols has different probabilities of occurring, "variable length" codes are used. If an event is most likely to happen more often than the other events, then it makes sense to use a code word that is actually shorter, or requires less information, than the other event codes. Another requirement for entropy encoding is that since “variable length” codes are used, it is necessary to mandate that no code begins with a smaller, already defined code. This ensures that all codes are decoded correctly.

 

                                           ii.            Coding the DC Coefficients

While the DC coefficient of the input array is fairly unpredictable by itself, the property of real world images to not have a great deal of change in a localized area can be used to predict the current DC coefficient value. By using a differential prediction model (DPCM) one can increase the probability that the value that is encoded will be small. To obtain this value one simply subtracts the DC coefficient from the previously processed MCU from the DC coefficient of the current MCU. This value is called the "DPCM difference" or the "DIFF". Once this DIFF value is calculated, it is compared to the following table to identify the symbol group that it belongs to (SSSS in the following diagram). The Huffman code that this symbol is described by is defined in the code tables that the JPEG standard provides, which are attached to this report.

SSSS

DPCM Difference

 

 

 

 

0

 

0

 

1

-1

,

1

2

-3,-2

,

2,3

3

-7,..,-4

,

4,..,7

4

-15,..,-8

,

8,..,15

5

-31,..,-16

,

16,..,31

6

-63,..,-32

,

32,..,63

7

-127,..,-64

,

64,..,127

8

-255,..,-128

,

128,..,255

9

-511,..,-256

,

256,..,511

10

-1023,..,-512

,

512,..,1023

11

-2047,..,-1024

,

1024,..,2047

12

-4095,..,-2048

,

2048,..,4095

13

-8191,..,-4096

,

4096,..,8191

14

-16383,..,-8192

,

8192,..,16383

15

-32767,..,-16384

,

16384,..,32767

16

 

 

32768

             Table 4: Huffman Coding of DPCM Differences

However this does not completely describe the DIFF coefficient as the SSSS code is only for a range of absolute values, except of course if the DIFF is zero. The full code is created when a number of additional bits are appended to the SSSS code stream to identify the sign and fully specify the magnitude of the DIFF. These additional bits are defined in the following diagram.

SSSS DPCM Difference Additional Bits

 

 

 

 

(binary)
0

 

0

 

 

-

 

1 -1 , 1 0 , 1
2 -3,-2 , 2,3 00,01 , 10,11
3 -7,..,-4 , 4,..,7 000,..,011 , 100,..,111
4 -15,..,-8 , 8,..,15 0000,..,0111 , 1000,..,1111
:

 

:

 

 

:

 

:

 

:

 

 

:

 

16

 

 

32768

 

-

 

Table 5: Additional bits for Huffman DPCM coding

As can be seen the code to be appended is found by taking the SSSS low order bits of the DIFF if the DIFF is positive. If the DIFF is negative then one is subtracted from the DIFF (the twos complement representation of the negative DIFF) and taking the SSSS low order bits is the code to be appended. Note that the JPEG baseline standard only allows for 12 bit DPCM values (the DCT output was 12 bit, following from an 8 bit input sample), so a value of SSSS of 11 is the maximum supported. In the special case of the first 8x8 pixel section of the component being processed, the DIFF is assigned the value of the current DC coefficient value (ie the previous section DC coefficient is zero). Note that the Huffman code tables for SSSS can be different depending on which component is currently being processed. If the JPEG standard libraries are being used then there is one each for luminance and chrominance components. The additional bits that get appended to the SSSS codes are always the same, irrespective of the component being processed.

                                         iii.            Coding the AC Coefficients

The AC coefficients are coded using a substantially different procedure then the DC coefficient version. Because the value of the AC coefficients tends to be close to zero after the quantization step of the JPEG algorithm, the coding of these coefficients manipulates long runs of zeroes. The selection of a symbol is based on a combination of the number of zeroes (RRRR) that preceded the current coefficient and the range of the current coefficient. The range is calculated by the same method as it was in the DC coefficient case, selecting the SSSS symbol from the following figure.

SSSS AC Coefficients

 

 

 

 

0

 

0

 

1 -1 , 1
2 -3,-2 , 2,3
3 -7,..,-4 , 4,..,7
4 -15,..,-8 , 8,..,15
5 -31,..,-16 , 16,..,31
6 -63,..,-32 , 32,..,63
7 -127,..,-64 , 64,..,127
8 -255,..,-128 , 128,..,255
9 -511,..,-256 , 256,..,511
10 -1023,..,-512 , 512,..,1023
11 -2047,..,-1024 , 1024,..,2047
12 -4095,..,-2048 , 2048,..,4095
13 -8191,..,-4096 , 4096,..,8191
14 -16383,..,-8192 , 8192,..,16383
15 -32767,..,-16384 , 16384,..,32767
16

 

 

32768

Table 6: Huffman Coding of AC Coefficients

The "RUN-SIZE" symbol is then selected by cross referencing the RRRR and SSSS values in the following figure, once found the code for this symbol can then be found by searching through the tables that the JPEG standard provides.

 

 

SSSS

 

 

0 1 2 3 4 5 6 7 8 9 10

 

0 EOB 01 02 03 04 05 06 07 08 09 0A

 

1 N/A 11 12 13 14 15 16 17 18 19 1A

 

2 N/A 21 22 23 24 25 26 27 28 29 2A

 

3 N/A 31 32 33 34 35 36 37 38 39 3A

 

4 N/A 41 42 43 44 45 46 47 48 49 4A
R 5 N/A 51 52 53 54 55 56 57 58 59 5A
R 6 N/A 61 62 63 64 65 66 67 68 69 6A
R 7 N/A 71 72 73 74 75 76 77 78 79 7A
R 8 N/A 81 82 83 84 85 86 87 88 89 8A

 

9 N/A 91 92 93 94 95 96 97 98 99 9A

 

10 N/A A1 A2 A3 A4 A5 A6 A7 A8 A9 AA

 

11 N/A B1 B2 B3 B4 B5 B6 B7 B8 B9 BA

 

12 N/A C1 C2 C3 C4 C5 C6 C7 C8 C9 CA

 

13 N/A D1 D2 D3 D4 D5 D6 D7 D8 D9 DA

 

14 N/A E1 E2 E3 E4 E5 E6 E7 E8 E9 EA

 

15 ZRL F1 F2 F3 F4 F5 F6 F7 F8 F9 FA

        Table 7: Huffman AC Statistical Model Run-Length/Amplitude Combinations

As can be seen from this figure there are two special codes defined, ZRL and EOB. EOB     is the special case of their being no non-zero coefficient values in the current array. ZRL is the special case of there being 16 zeroes in sequence and there are more non-zero coefficient values in the remainder of the array. Again, as per the DC case, additional bits are required to complete the code for the RUN-SIZE symbol with the magnitude and sign information of the non-zero component of the symbol. Referring to the following figure gives the bits required.

SSSS

AC Coefficients

Additional Bits

 

 

 

 

(binary)

0

 

0

 

 

-

 

1

-1

,

1

0

,

1

2

-3,-2

,

2,3

00,01

,

10,11

3

-7,..,-4

,

4,..,7

000,..,011

,

100,..,111

4

-15,..,-8

,

8,..,15

0000,..,0111

,

1000,..,1111

:

 

:

 

 

:

 

:

 

:

 

 

:

 

16

 

 

32768

 

-

 

                                    Table 8: Additional bits for Huffman AC Coefficients coding

                                The AC coefficients are processed from the first (or lowest order) AC coefficient in the zigzag array through to the last coefficient (or 64th) value in the  array. The array is scanned through until a non-zero value is found, or 16 zeroes in sequence are found. A  typical run of codes being output from this procedure is one to three RUN-SIZE symbol codes and one EOB symbol code. Note that the Huffman code tables for RUN-SIZE can be different depending on which component is currently being processed. If the JPEG standard libraries are being used then there is one each for luminance and chrominance components. The additional bits that get appended to the RUN-SIZE codes are always the  same, irrespective of the component being processed.

                                          iv.            Decoding

                                When decoding the JPEG encoded data stream, the decoder simply searches through the relevant Huffman code library (ie DC or AC, Luminance or Chrominance) that is stored within the JPEG file for a code that matches the current bit stream. Because each code is  guaranteed to be unique (even codes of different lengths) there is no problem in finding  the correct symbol that the code is representing.

H.      JPEG File Header Specifications:

The file format used for this implementation was JFIF (JPEG File Interchange Format). This is not strictly a part of the JPEG standard, but was developed to attempt to standardize the JPEG formats so that greater compatibility between platforms and applications could be obtained. Currently JFIF is the main file format that is used, especially for JPEG Baseline compressed images. The JPEG standard defines a number of header fields and sections to allow the JPEG encoded image to be decoded. There are quite a few of header sections within the header of the file. Each section then contains data used to decode the image. These header symbols are shown in the table below:

Description Symbol Marker(HEX)
Baseline DCT, Huffman SOF0 FFC0
Extended sequential DCT, Huffman SOF1 FFC1
Progressive DCT, Huffman SOF2 FFC2
Spatial (sequential) lossless, Huffman SOF3 FFC3
Differential Sequential DCT, Huffman SOF5 FFC5
Differential progressive DCT, Huffman SOF6 FFC6
Differential spatial, Huffman SOF7 FFC7
Reserved for JPEG extensions, Arithmetic JPG FFC8
Extended sequential DCT, Arithmetic SOF9 FFC9
Progressive DCT, Arithmetic SOF10 FFCA
Spatial (sequential) lossless, Arithmetic SOF11 FFCB
Differential sequential DCT, Arithmetic SOF13 FFCD
Differential progressive DCT, Arithmetic SOF14 FFCE
Differential spatial, Arithmetic SOF15 FFCF
Define Huffman Table(s) DHT FFC4
Define Arithmetic coding conditioning DAC FFCC
Restart RSTm FFD0-FFD7
Start of Image SOI FFD8
End of Image EOI FFD9
Start of Scan SOS FFDA
Define Quantization Table DQT FFDB
Define number of Lines DNL FFDC
Define Restart Interval DRI FFDD
Define Hierarchical progression DHP FFDE
Expand reference components EXP FFDF
Reserved for Application Segments APPn FFE0-FFEF
Reserved for JPEG extensions JPGn FFF0-FFFD
Comment COM FFFE
Temporary arithmetic Coding TEM FF01
Reserved RES FF02-FFBF

Table 9: JPEG Header Symbols and Codes


                Complete List of Available Segments in JPEG

         i.            Start of Image (SOI) Segment

This segment is always 2 bytes long and always contains the marker FFD8.

       ii.            JFIF Application (APP0) Segment

This segment contains the following information in order. The purpose of this segment is to define higher level details such as image size and miscellaneous version control information. The number in brackets indicates the number of bytes the information is stored in.

·         Segment Marker (2). It is always FFE0.

·         Section length (2). Does not include the segment marker

·         Application code (4). Is always "JFIF"

·         JFIF major version number (2)

·         JFIF minor version number (2)

·         Horizontal pixel density of the image (2)

·         Vertical pixel density of the image (2)

·         Width of Thumbnail Image (1)

·         Height of thumbnail Image (1)

     iii.            Define Quantization Table (DQT) Segment

This segment contains information detailing the Quantization Tables for the encoded image. The number in brackets indicates the number of bytes the information is stored in.

·         Segment Marker (2). Is always FFD8

·         Section length (2). Does not include the segment marker

·         Quantization Table Header (1). Top nibble denotes the precision of the table, low nibble denotes the Table ID

·         Data for relevant Table (64). Each entry is one byte each and is ordered as per the zig-zag method used in quantization.

These steps are repeated for each Quantization table defined.

      iv.            Start of Frame (SOF) Segment

This segment details information at a lower level such as which component uses what Quantization table, sampling factors etc. The number in brackets indicates the number of bytes the information is stored in.

·         Segment Marker (2). Is always FFC0

·         Section length (2). Does not include the segment marker

·         Precision of samples (1). This is always 08

·         Horizontal resolution of image (2).

·         Vertical resolution of image (2)

·         Number of Components in image (1).

·         Component ID (1).

·         Sampling factor for Component (1). High nibble is vertical sampling factor; low nibble is horizontal sampling factor.

·         Quantization table ID for component (1)

These steps are repeated for each component used in the image. JFIF also specifies that if the YIQ color format is used, then the component ID for Y should be 01H, the ID for I should be 02 and Q should be 03.

        v.            Define Huffman Table (DHT) Segment

This segment defines the contents of the Huffman Code Libraries. This entire segment is repeated once for each Huffman Code Library used. The number in brackets indicates the number of bytes the information is stored in.

·         Segment Marker (2). Is always FFC4

·         Section length (2). Does not include the segment marker

·         Table ID (1).

·         Number of table entries with a certain code length (n). I.e. first byte is for the number of codes with 1 bit code length, the second byte is for the number of codes with 2 bit code length and so on until the biggest code length is reached.

·         Symbols for either the SSSS or the RUN-SIZE values calculated in the Huffman coding procedure (m). The symbols are each one byte and are entered in order of size. I.e. the symbol of code length 1 (if any) is entered first, followed by any symbols of code length 2 and so on.

Note that if the JPEG Standard Huffman code libraries are being used, then the entries for steps 4 and 5 are defined in the standard (also they are attached to this report as an appendix).

      vi.            Start of Scan (SOS) Segment

This segment contains header information as well as actual scan data. Multiple SOS Segments can be found if there is more than one scan. The number in brackets indicates the number of bytes the information is stored in.

·         Segment Marker (2). Is always FFDA

·         Section length (2). Does not include the segment marker. This also only includes header information; it does not include compressed image data.

·         Number of components in scan (1).

·         Component ID (1). This was defined is the SOF segment, step 7.

·         Huffman Table used for component in step 4 (1). High nibble is DC Table ID and Low nibble is AC Table ID.

These steps are repeated for each component defined in the SOF segment.

    vii.            MCU Data

Each consecutive MCU data is added bit by bit with no padding, which means there are no byte boundaries from one MCU’s data to the next. The only padding that occurs is at the end of the scan when the remaining bits in the last byte to be filled with zeros if the byte is incomplete.

If ever, throughout the scan, a situation occurs where a byte contains an FF then a 00 must be inserted after it to ensure the decoder does not mistake it for a new segment.

  viii.            End of Image (EOI) Segment

This segment is always 2 bytes long and always contains the marker FFD9.