CoCalc -- rfc1950.txt

GitHub Repository: freebsd/freebsd-src
Path: blob/main/sys/contrib/zlib/doc/rfc1950.txt
⁴⁸²⁶² views
1

2

3

4

5

6

7
Network Working Group                                         P. Deutsch
8
Request for Comments: 1950                           Aladdin Enterprises
9
Category: Informational                                      J-L. Gailly
10
                                                                Info-ZIP
11
                                                                May 1996
12

13

14
         ZLIB Compressed Data Format Specification version 3.3
15

16
Status of This Memo
17

18
   This memo provides information for the Internet community.  This memo
19
   does not specify an Internet standard of any kind.  Distribution of
20
   this memo is unlimited.
21

22
IESG Note:
23

24
   The IESG takes no position on the validity of any Intellectual
25
   Property Rights statements contained in this document.
26

27
Notices
28

29
   Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly
30

31
   Permission is granted to copy and distribute this document for any
32
   purpose and without charge, including translations into other
33
   languages and incorporation into compilations, provided that the
34
   copyright notice and this notice are preserved, and that any
35
   substantive changes or deletions from the original are clearly
36
   marked.
37

38
   A pointer to the latest version of this and related documentation in
39
   HTML format can be found at the URL
40
   <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
41

42
Abstract
43

44
   This specification defines a lossless compressed data format.  The
45
   data can be produced or consumed, even for an arbitrarily long
46
   sequentially presented input data stream, using only an a priori
47
   bounded amount of intermediate storage.  The format presently uses
48
   the DEFLATE compression method but can be easily extended to use
49
   other compression methods.  It can be implemented readily in a manner
50
   not covered by patents.  This specification also defines the ADLER-32
51
   checksum (an extension and improvement of the Fletcher checksum),
52
   used for detection of data corruption, and provides an algorithm for
53
   computing it.
54

55

56

57

58
Deutsch & Gailly             Informational                      [Page 1]
59

60
RFC 1950       ZLIB Compressed Data Format Specification        May 1996
61

62

63
Table of Contents
64

65
   1. Introduction ................................................... 2
66
      1.1. Purpose ................................................... 2
67
      1.2. Intended audience ......................................... 3
68
      1.3. Scope ..................................................... 3
69
      1.4. Compliance ................................................ 3
70
      1.5.  Definitions of terms and conventions used ................ 3
71
      1.6. Changes from previous versions ............................ 3
72
   2. Detailed specification ......................................... 3
73
      2.1. Overall conventions ....................................... 3
74
      2.2. Data format ............................................... 4
75
      2.3. Compliance ................................................ 7
76
   3. References ..................................................... 7
77
   4. Source code .................................................... 8
78
   5. Security Considerations ........................................ 8
79
   6. Acknowledgements ............................................... 8
80
   7. Authors' Addresses ............................................. 8
81
   8. Appendix: Rationale ............................................ 9
82
   9. Appendix: Sample code ..........................................10
83

84
1. Introduction
85

86
   1.1. Purpose
87

88
      The purpose of this specification is to define a lossless
89
      compressed data format that:
90

91
          * Is independent of CPU type, operating system, file system,
92
            and character set, and hence can be used for interchange;
93

94
          * Can be produced or consumed, even for an arbitrarily long
95
            sequentially presented input data stream, using only an a
96
            priori bounded amount of intermediate storage, and hence can
97
            be used in data communications or similar structures such as
98
            Unix filters;
99

100
          * Can use a number of different compression methods;
101

102
          * Can be implemented readily in a manner not covered by
103
            patents, and hence can be practiced freely.
104

105
      The data format defined by this specification does not attempt to
106
      allow random access to compressed data.
107

108

109

110

111

112

113

114
Deutsch & Gailly             Informational                      [Page 2]
115

116
RFC 1950       ZLIB Compressed Data Format Specification        May 1996
117

118

119
   1.2. Intended audience
120

121
      This specification is intended for use by implementors of software
122
      to compress data into zlib format and/or decompress data from zlib
123
      format.
124

125
      The text of the specification assumes a basic background in
126
      programming at the level of bits and other primitive data
127
      representations.
128

129
   1.3. Scope
130

131
      The specification specifies a compressed data format that can be
132
      used for in-memory compression of a sequence of arbitrary bytes.
133

134
   1.4. Compliance
135

136
      Unless otherwise indicated below, a compliant decompressor must be
137
      able to accept and decompress any data set that conforms to all
138
      the specifications presented here; a compliant compressor must
139
      produce data sets that conform to all the specifications presented
140
      here.
141

142
   1.5.  Definitions of terms and conventions used
143

144
      byte: 8 bits stored or transmitted as a unit (same as an octet).
145
      (For this specification, a byte is exactly 8 bits, even on
146
      machines which store a character on a number of bits different
147
      from 8.) See below, for the numbering of bits within a byte.
148

149
   1.6. Changes from previous versions
150

151
      Version 3.1 was the first public release of this specification.
152
      In version 3.2, some terminology was changed and the Adler-32
153
      sample code was rewritten for clarity.  In version 3.3, the
154
      support for a preset dictionary was introduced, and the
155
      specification was converted to RFC style.
156

157
2. Detailed specification
158

159
   2.1. Overall conventions
160

161
      In the diagrams below, a box like this:
162

163
         +---+
164
         |   | <-- the vertical bars might be missing
165
         +---+
166

167

168

169

170
Deutsch & Gailly             Informational                      [Page 3]
171

172
RFC 1950       ZLIB Compressed Data Format Specification        May 1996
173

174

175
      represents one byte; a box like this:
176

177
         +==============+
178
         |              |
179
         +==============+
180

181
      represents a variable number of bytes.
182

183
      Bytes stored within a computer do not have a "bit order", since
184
      they are always treated as a unit.  However, a byte considered as
185
      an integer between 0 and 255 does have a most- and least-
186
      significant bit, and since we write numbers with the most-
187
      significant digit on the left, we also write bytes with the most-
188
      significant bit on the left.  In the diagrams below, we number the
189
      bits of a byte so that bit 0 is the least-significant bit, i.e.,
190
      the bits are numbered:
191

192
         +--------+
193
         |76543210|
194
         +--------+
195

196
      Within a computer, a number may occupy multiple bytes.  All
197
      multi-byte numbers in the format described here are stored with
198
      the MOST-significant byte first (at the lower memory address).
199
      For example, the decimal number 520 is stored as:
200

201
             0     1
202
         +--------+--------+
203
         |00000010|00001000|
204
         +--------+--------+
205
          ^        ^
206
          |        |
207
          |        + less significant byte = 8
208
          + more significant byte = 2 x 256
209

210
   2.2. Data format
211

212
      A zlib stream has the following structure:
213

214
           0   1
215
         +---+---+
216
         |CMF|FLG|   (more-->)
217
         +---+---+
218

219

220

221

222

223

224

225

226
Deutsch & Gailly             Informational                      [Page 4]
227

228
RFC 1950       ZLIB Compressed Data Format Specification        May 1996
229

230

231
      (if FLG.FDICT set)
232

233
           0   1   2   3
234
         +---+---+---+---+
235
         |     DICTID    |   (more-->)
236
         +---+---+---+---+
237

238
         +=====================+---+---+---+---+
239
         |...compressed data...|    ADLER32    |
240
         +=====================+---+---+---+---+
241

242
      Any data which may appear after ADLER32 are not part of the zlib
243
      stream.
244

245
      CMF (Compression Method and flags)
246
         This byte is divided into a 4-bit compression method and a 4-
247
         bit information field depending on the compression method.
248

249
            bits 0 to 3  CM     Compression method
250
            bits 4 to 7  CINFO  Compression info
251

252
      CM (Compression method)
253
         This identifies the compression method used in the file. CM = 8
254
         denotes the "deflate" compression method with a window size up
255
         to 32K.  This is the method used by gzip and PNG (see
256
         references [1] and [2] in Chapter 3, below, for the reference
257
         documents).  CM = 15 is reserved.  It might be used in a future
258
         version of this specification to indicate the presence of an
259
         extra field before the compressed data.
260

261
      CINFO (Compression info)
262
         For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
263
         size, minus eight (CINFO=7 indicates a 32K window size). Values
264
         of CINFO above 7 are not allowed in this version of the
265
         specification.  CINFO is not defined in this specification for
266
         CM not equal to 8.
267

268
      FLG (FLaGs)
269
         This flag byte is divided as follows:
270

271
            bits 0 to 4  FCHECK  (check bits for CMF and FLG)
272
            bit  5       FDICT   (preset dictionary)
273
            bits 6 to 7  FLEVEL  (compression level)
274

275
         The FCHECK value must be such that CMF and FLG, when viewed as
276
         a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
277
         is a multiple of 31.
278

279

280

281

282
Deutsch & Gailly             Informational                      [Page 5]
283

284
RFC 1950       ZLIB Compressed Data Format Specification        May 1996
285

286

287
      FDICT (Preset dictionary)
288
         If FDICT is set, a DICT dictionary identifier is present
289
         immediately after the FLG byte. The dictionary is a sequence of
290
         bytes which are initially fed to the compressor without
291
         producing any compressed output. DICT is the Adler-32 checksum
292
         of this sequence of bytes (see the definition of ADLER32
293
         below).  The decompressor can use this identifier to determine
294
         which dictionary has been used by the compressor.
295

296
      FLEVEL (Compression level)
297
         These flags are available for use by specific compression
298
         methods.  The "deflate" method (CM = 8) sets these flags as
299
         follows:
300

301
            0 - compressor used fastest algorithm
302
            1 - compressor used fast algorithm
303
            2 - compressor used default algorithm
304
            3 - compressor used maximum compression, slowest algorithm
305

306
         The information in FLEVEL is not needed for decompression; it
307
         is there to indicate if recompression might be worthwhile.
308

309
      compressed data
310
         For compression method 8, the compressed data is stored in the
311
         deflate compressed data format as described in the document
312
         "DEFLATE Compressed Data Format Specification" by L. Peter
313
         Deutsch. (See reference [3] in Chapter 3, below)
314

315
         Other compressed data formats are not specified in this version
316
         of the zlib specification.
317

318
      ADLER32 (Adler-32 checksum)
319
         This contains a checksum value of the uncompressed data
320
         (excluding any dictionary data) computed according to Adler-32
321
         algorithm. This algorithm is a 32-bit extension and improvement
322
         of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
323
         standard. See references [4] and [5] in Chapter 3, below)
324

325
         Adler-32 is composed of two sums accumulated per byte: s1 is
326
         the sum of all bytes, s2 is the sum of all s1 values. Both sums
327
         are done modulo 65521. s1 is initialized to 1, s2 to zero.  The
328
         Adler-32 checksum is stored as s2*65536 + s1 in most-
329
         significant-byte first (network) order.
330

331

332

333

334

335

336

337

338
Deutsch & Gailly             Informational                      [Page 6]
339

340
RFC 1950       ZLIB Compressed Data Format Specification        May 1996
341

342

343
   2.3. Compliance
344

345
      A compliant compressor must produce streams with correct CMF, FLG
346
      and ADLER32, but need not support preset dictionaries.  When the
347
      zlib data format is used as part of another standard data format,
348
      the compressor may use only preset dictionaries that are specified
349
      by this other data format.  If this other format does not use the
350
      preset dictionary feature, the compressor must not set the FDICT
351
      flag.
352

353
      A compliant decompressor must check CMF, FLG, and ADLER32, and
354
      provide an error indication if any of these have incorrect values.
355
      A compliant decompressor must give an error indication if CM is
356
      not one of the values defined in this specification (only the
357
      value 8 is permitted in this version), since another value could
358
      indicate the presence of new features that would cause subsequent
359
      data to be interpreted incorrectly.  A compliant decompressor must
360
      give an error indication if FDICT is set and DICTID is not the
361
      identifier of a known preset dictionary.  A decompressor may
362
      ignore FLEVEL and still be compliant.  When the zlib data format
363
      is being used as a part of another standard format, a compliant
364
      decompressor must support all the preset dictionaries specified by
365
      the other format. When the other format does not use the preset
366
      dictionary feature, a compliant decompressor must reject any
367
      stream in which the FDICT flag is set.
368

369
3. References
370

371
   [1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
372
       available in ftp://ftp.uu.net/pub/archiving/zip/doc/
373

374
   [2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
375
       available in ftp://ftp.uu.net/graphics/png/documents/
376

377
   [3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
378
       available in ftp://ftp.uu.net/pub/archiving/zip/doc/
379

380
   [4] Fletcher, J. G., "An Arithmetic Checksum for Serial
381
       Transmissions," IEEE Transactions on Communications, Vol. COM-30,
382
       No. 1, January 1982, pp. 247-252.
383

384
   [5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
385
       November, 1993, pp. 144, 145. (Available from
386
       gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.
387

388

389

390

391

392

393

394
Deutsch & Gailly             Informational                      [Page 7]
395

396
RFC 1950       ZLIB Compressed Data Format Specification        May 1996
397

398

399
4. Source code
400

401
   Source code for a C language implementation of a "zlib" compliant
402
   library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.
403

404
5. Security Considerations
405

406
   A decoder that fails to check the ADLER32 checksum value may be
407
   subject to undetected data corruption.
408

409
6. Acknowledgements
410

411
   Trademarks cited in this document are the property of their
412
   respective owners.
413

414
   Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
415
   the related software described in this specification.  Glenn
416
   Randers-Pehrson converted this document to RFC and HTML format.
417

418
7. Authors' Addresses
419

420
   L. Peter Deutsch
421
   Aladdin Enterprises
422
   203 Santa Margarita Ave.
423
   Menlo Park, CA 94025
424

425
   Phone: (415) 322-0103 (AM only)
426
   FAX:   (415) 322-1734
427
   EMail: <[email protected]>
428

429

430
   Jean-Loup Gailly
431

432
   EMail: <[email protected]>
433

434
   Questions about the technical content of this specification can be
435
   sent by email to
436

437
   Jean-Loup Gailly <[email protected]> and
438
   Mark Adler <[email protected]>
439

440
   Editorial comments on this specification can be sent by email to
441

442
   L. Peter Deutsch <[email protected]> and
443
   Glenn Randers-Pehrson <[email protected]>
444

445

446

447

448

449

450
Deutsch & Gailly             Informational                      [Page 8]
451

452
RFC 1950       ZLIB Compressed Data Format Specification        May 1996
453

454

455
8. Appendix: Rationale
456

457
   8.1. Preset dictionaries
458

459
      A preset dictionary is specially useful to compress short input
460
      sequences. The compressor can take advantage of the dictionary
461
      context to encode the input in a more compact manner. The
462
      decompressor can be initialized with the appropriate context by
463
      virtually decompressing a compressed version of the dictionary
464
      without producing any output. However for certain compression
465
      algorithms such as the deflate algorithm this operation can be
466
      achieved without actually performing any decompression.
467

468
      The compressor and the decompressor must use exactly the same
469
      dictionary. The dictionary may be fixed or may be chosen among a
470
      certain number of predefined dictionaries, according to the kind
471
      of input data. The decompressor can determine which dictionary has
472
      been chosen by the compressor by checking the dictionary
473
      identifier. This document does not specify the contents of
474
      predefined dictionaries, since the optimal dictionaries are
475
      application specific. Standard data formats using this feature of
476
      the zlib specification must precisely define the allowed
477
      dictionaries.
478

479
   8.2. The Adler-32 algorithm
480

481
      The Adler-32 algorithm is much faster than the CRC32 algorithm yet
482
      still provides an extremely low probability of undetected errors.
483

484
      The modulo on unsigned long accumulators can be delayed for 5552
485
      bytes, so the modulo operation time is negligible.  If the bytes
486
      are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
487
      and order sensitive, unlike the first sum, which is just a
488
      checksum.  That 65521 is prime is important to avoid a possible
489
      large class of two-byte errors that leave the check unchanged.
490
      (The Fletcher checksum uses 255, which is not prime and which also
491
      makes the Fletcher check insensitive to single byte changes 0 <->
492
      255.)
493

494
      The sum s1 is initialized to 1 instead of zero to make the length
495
      of the sequence part of s2, so that the length does not have to be
496
      checked separately. (Any sequence of zeroes has a Fletcher
497
      checksum of zero.)
498

499

500

501

502

503

504

505

506
Deutsch & Gailly             Informational                      [Page 9]
507

508
RFC 1950       ZLIB Compressed Data Format Specification        May 1996
509

510

511
9. Appendix: Sample code
512

513
   The following C code computes the Adler-32 checksum of a data buffer.
514
   It is written for clarity, not for speed.  The sample code is in the
515
   ANSI C programming language. Non C users may find it easier to read
516
   with these hints:
517

518
      &      Bitwise AND operator.
519
      >>     Bitwise right shift operator. When applied to an
520
             unsigned quantity, as here, right shift inserts zero bit(s)
521
             at the left.
522
      <<     Bitwise left shift operator. Left shift inserts zero
523
             bit(s) at the right.
524
      ++     "n++" increments the variable n.
525
      %      modulo operator: a % b is the remainder of a divided by b.
526

527
      #define BASE 65521 /* largest prime smaller than 65536 */
528

529
      /*
530
         Update a running Adler-32 checksum with the bytes buf[0..len-1]
531
       and return the updated checksum. The Adler-32 checksum should be
532
       initialized to 1.
533

534
       Usage example:
535

536
         unsigned long adler = 1L;
537

538
         while (read_buffer(buffer, length) != EOF) {
539
           adler = update_adler32(adler, buffer, length);
540
         }
541
         if (adler != original_adler) error();
542
      */
543
      unsigned long update_adler32(unsigned long adler,
544
         unsigned char *buf, int len)
545
      {
546
        unsigned long s1 = adler & 0xffff;
547
        unsigned long s2 = (adler >> 16) & 0xffff;
548
        int n;
549

550
        for (n = 0; n < len; n++) {
551
          s1 = (s1 + buf[n]) % BASE;
552
          s2 = (s2 + s1)     % BASE;
553
        }
554
        return (s2 << 16) + s1;
555
      }
556

557
      /* Return the adler32 of the bytes buf[0..len-1] */
558

559

560

561

562
Deutsch & Gailly             Informational                     [Page 10]
563

564
RFC 1950       ZLIB Compressed Data Format Specification        May 1996
565

566

567
      unsigned long adler32(unsigned char *buf, int len)
568
      {
569
        return update_adler32(1L, buf, len);
570
      }
571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618
Deutsch & Gailly             Informational                     [Page 11]
619

620

621
Product

Resources

Company