Variable-length quantity

A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight-bit bytes) to represent an arbitrarily large integer. A VLQ is essentially a base-128 representation of an unsigned integer with the addition of the eighth bit to mark continuation of bytes. VLQ is identical to LEB128 except in endianness. See the example below.

Applications and history

Base-128 compression is known by many names – VB (Variable Byte), VByte, Varint, VInt, EncInt etc.[1]

A variable-length quantity (VLQ) was defined for use in the standard MIDI file format[2] to save additional space for a resource-constrained system, and is also used in the later Extensible Music Format (XMF).

Base-128 is also used in ASN.1 BER encoding to encode tag numbers and object identifiers.[3] It is also used in the WAP environment, where it is called variable length unsigned integer or uintvar. The DWARF debugging format[4] defines a variant called LEB128 (or ULEB128 for unsigned numbers), where the least significant group of 7 bits is encoded in the first byte, and the most significant bits are in the last byte (so effectively it is the little-endian analog of a VLQ). Google Protocol Buffers use a similar format to have compact representation of integer values,[5] as does Oracle Portable Object Format (POF)[6] and the Microsoft .NET Framework "7-bit encoded int" in the BinaryReader and BinaryWriter classes.[7]

It is also used extensively in web browsers for source mapping – which contain a lot of integer line and column number mappings – to keep the size of the map to a minimum.[8]

Variable-width integers in LLVM use a similar principle. The encoding chunks are little-endian and need not be 8 bits in size. The LLVM documentation describes a field that uses 4-bit chunk, with each chunk consisting of 1 bit continuation and 3 bits payload.[9]

Benefits

  1. Compactness: One of the primary benefits of VLQ encoding is its compactness. Since it uses a variable number of bytes to encode an integer, smaller integers can be represented using fewer bytes, resulting in a smaller overall file size. This is particularly useful in scenarios where storage space is at a premium, such as in embedded systems or mobile devices.
  2. Efficiency: VLQ encoding is an efficient way to store and transmit data. Since smaller integers are represented using fewer bytes, this reduces the amount of data that needs to be transmitted, which in turn reduces the time and bandwidth required to transmit the data.
  3. Flexibility: Another advantage of VLQ encoding is its flexibility. Since the number of bytes used to represent an integer is based on its magnitude, VLQ encoding can handle integers of different sizes. This means that VLQ encoding can be used to represent integers of any size, from small 8-bit integers to large 64-bit integers.

General structure

The encoding assumes an octet (an 8-bit byte) where the most significant bit (MSB), also commonly known as the sign bit, is reserved to indicate whether another VLQ octet follows.

VLQ octet
7 6 5 4 3 2 1 0
27 26 25 24 23 22 21 20
A Bn

If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows.

B is a 7-bit number [0x00, 0x7F], and n is the position of the VLQ octet where B0 is the least significant. The VLQ octets are arranged most significant first in a stream.

Variants

The general VLQ encoding is simple, but in basic form is only defined for unsigned integers (nonnegative, positive or zero), and is somewhat redundant, since prepending 0x80 octets corresponds to zero padding. There are various signed number representations to handle negative numbers, and techniques to remove the redundancy.

Group Varint Encoding

Google developed Group Varint Encoding (GVE) after observing that traditional VLQ encoding incurs many CPU branches during decompression. GVE uses a single byte as a header for 4 variable-length uint32 values. The header byte has 4 2-bit numbers representing the storage length of each of the following 4 uint32s. Such a layout eliminates the need to check and remove VLQ continuation bits. Data bytes can be copied directly to their destination. This layout reduces CPU branches, making GVE faster than VLQ on modern pipelined CPUs.[10]

PrefixVarint is a similar design but with a uint64 maximum. It is said to have "been invented multiple times independently".[11] It is possible to be changed into a chained version with infinitely many continuations.

Signed numbers

Sign bit

Negative numbers can be handled using a sign bit, which only needs to be present in the first octet.

In the data format for Unreal Packages used by the Unreal Engine, a variable-length quantity scheme called Compact Indices[12] is used. The only difference in this encoding is that the first VLQ octet has the sixth bit reserved to indicate whether the encoded integer is positive or negative. Any consecutive VLQ octet follows the general structure.

Unreal Signed VLQ
First VLQ octet Other VLQ octets
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
27 26 25 24 23 22 21 20 27 26 25 24 23 22 21 20
A B C0 A Cn (n > 0)

If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows.

If B is 0, then the VLQ represents a positive integer. If B is 1, then the VLQ represents a negative number.

C is the number chunk being encoded, and n is the position of the VLQ octet where C0 is the least significant. The VLQ octets are arranged least significant first in a stream.

Zigzag encoding

An alternative way to encode negative numbers is to use the least significant bit for sign. This is notably done for Google Protocol Buffers, and is known as a zigzag encoding for signed integers.[13] One can encode the numbers so that encoded 0 corresponds to 0, 1 to −1, 10 to 1, 11 to −2, 100 to 2, etc.: counting up alternates between nonnegative (starting at 0) and negative (since each step changes the least-significant bit, hence the sign), whence the name "zigzag encoding". Concretely, transform the integer as (n << 1) ^ (n >> k - 1) for fixed k-bit integers.

Two's complement

LEB128 uses two's complement to represent signed numbers. In this scheme of representation, n bits encode a range from −2n to 2n − 1, and all negative numbers start with a 1 in the most significant bit. In Signed LEB128, the input is sign-extended so that its length is a multiple of 7 bits. From there the encoding proceeds as usual.[14]

In LEB128, the stream is arranged least significant first.[14]

Removing redundancy

With the VLQ encoding described above, any number that can be encoded with N octets can also be encoded with more than N octets simply by prepending additional 0x80 octets as zero-padding. For example, the decimal number 358 can be encoded as the 2-octet VLQ 0x8266, or the number 0358 can be encoded as 3-octet VLQ 0x808266, or 00358 as the 4-octet VLQ 0x80808266 and so forth.

However, the VLQ format used in Git[15] removes this prepending redundancy and extends the representable range of shorter VLQs by adding an offset to VLQs of 2 or more octets in such a way that the lowest possible value for such an (N + 1)-octet VLQ becomes exactly one more than the maximum possible value for an N-octet VLQ. In particular, since a 1-octet VLQ can store a maximum value of 127, the minimum 2-octet VLQ (0x8000) is assigned the value 128 instead of 0. Conversely, the maximum value of such a 2-octet VLQ (0xFF7F) is 16511 instead of just 16383. Similarly, the minimum 3-octet VLQ (0x808000) has a value of 16512 instead of zero, which means that the maximum 3-octet VLQ (0xFFFF7F) is 2113663 instead of just 2097151.

In this way, there is one and only one encoding of each integer, making this a base-128 bijective numeration.

Examples

Diagram showing how to convert 106903 from decimal to uintvar representation

Here is a worked-out example for the decimal number 137:

  • Represent the value in binary notation (e.g. 137 as 10001001)
  • Break it up in groups of 7 bits starting from the lowest significant bit (e.g. 137 as 0000001 0001001). This is equivalent to representing the number in base 128.
  • Take the lowest 7 bits, and that gives you the least significant byte (0000 1001). This byte comes last.
  • For all the other groups of 7 bits (in the example, this is 000 0001), set the MSb to 1 (which gives 1000 0001 in our example). Thus 137 becomes 1000 0001 0000 1001, where the bits in boldface are something we added. These added bits denote whether there is another byte to follow or not. Thus, by definition, the very last byte of a variable-length integer will have 0 as its MSb.

Another way to look at this is to represent the value in base-128 and then set the MSB of all but the last base-128 digit to 1.

The Standard MIDI File format specification gives more examples:[2][16]

Integer
(decimal)
Integer (binary) Variable-length quantity (binary) Integer
(hexadecimal)
Variable-length
quantity
(hexadecimal)
0
00000000 00000000 00000000 00000000
                           00000000
00000000 00
127
00000000 00000000 00000000 01111111
                           01111111
0000007F 7F
128
00000000 00000000 00000000 10000000
                  10000001 00000000
00000080 81 00
8192
00000000 00000000 00100000 00000000
                  11000000 00000000
00002000 C0 00
16383
00000000 00000000 00111111 11111111
                  11111111 01111111
00003FFF FF 7F
16384
00000000 00000000 01000000 00000000
         10000001 10000000 00000000
00004000 81 80 00
2097151
00000000 00011111 11111111 11111111
         11111111 11111111 01111111
001FFFFF FF FF 7F
2097152
00000000 00100000 00000000 00000000
10000001 10000000 10000000 00000000
00200000 81 80 80 00
134217728
00001000 00000000 00000000 00000000
11000000 10000000 10000000 00000000
08000000 C0 80 80 00
268435455
00001111 11111111 11111111 11111111
11111111 11111111 11111111 01111111
0FFFFFFF FF FF FF 7F

References

  1. ^ Jianguo Wang; Chunbin Lin; Yannis Papakonstantinou; Steven Swanson. "An Experimental Study of Bitmap Compression vs. Inverted List Compression" Archived 2019-12-07 at the Wayback Machine. 2017. doi:10.1145/3035918.3064007.
  2. ^ a b MIDI File Format: Variable Quantities.
  3. ^ "ITU-T Recommendation X.690" (PDF). International Telecommunication Union. 2002.
  4. ^ DWARF Standard.
  5. ^ Google Protocol Buffers.
  6. ^ Oracle Portable Object Format (POF).
  7. ^ System.IO.BinaryWriter.Write7BitEncodedInt(int) method and System.IO.BinaryReader.Read7BitEncodedInt() method.
  8. ^ Introduction to javascript source maps.
  9. ^ "LLVM Bitcode File Format", section "Variable Width Integers". Accessed 2019-10-01.
  10. ^ Jeff Dean. "Challenges in Building Large-Scale Information Retrieval Systems" (PDF). p. 58. Retrieved 2020-05-30.
  11. ^ Olesen, Jakob Stoklund (31 May 2020). "stoklund/varint". Archived from the original on 19 November 2020. Retrieved 9 July 2020.
  12. ^ "Unreal Packages". 1999-07-21. Archived from the original on 2010-08-20. Retrieved 2021-08-29.
  13. ^ Protocol Buffers: Encoding: Signed Integers.
  14. ^ a b Free Standards Group (December 2005). "DWARF Debugging Information Format Specification Version 3.0" (PDF). p. 70. Retrieved 2009-07-19.
  15. ^ "Git – fast, scalable, distributed revision control system". 28 October 2021.
  16. ^ Standard MIDI-File Format Spec. 1.1.

Read other articles:

Dam in Central Tablelands, New South WalesOberon DamOberon Dam wall and reservoir, with fuse plug spillway in top rightLocation of the Oberon Dam inNew South WalesCountryAustraliaLocationCentral Tablelands, New South WalesCoordinates33°43′S 149°52′E / 33.717°S 149.867°E / -33.717; 149.867StatusOperationalConstruction began1943Opening date1959Owner(s)State Water CorporationDam and spillwaysType of damEmbankment damImpoundsFish RiverHeight34 m...

 

Toy Tinkers Serial Donald DuckBerkas:Toy Tinkers.jpgKartu judulSutradaraJack HannahProduserWalt DisneyMusikPaul J. SmithAnimatorBob CarlsonVolus JonesBill JusticeJack Boyd (efek)StudioWalt Disney ProductionsDidistribusikan olehRKO Radio PicturesLama waktu7 menitNegaraAmerika SerikatBahasaInggris Toy Tinkers adalah sebuah film pendek animasi yang diproduksi dalam bentuk Technicolor oleh Walt Disney Productions dan dirilis ke layar lebar pada 16 Desember 1949 oleh RKO Radio Pictures. Berlatar b...

 

Election 1991 Boston mayoral election ← 1987 November 5, 1991 1993 →   Candidate Raymond Flynn Edward J. Doherty Party Nonpartisan Nonpartisan Popular vote 63,582 21,659 Percentage 74.58% 25.41% Results by wardFlynn:      60–70%      70–80%      80–90% Mayor before election Raymond Flynn Elected Mayor Raymond Flynn Elections in Massachusetts General 1942 1944 1946 1948 1950 1952 19...

Scottish footballer Greg Stewart Stewart with Birmingham City in 2016Personal informationFull name Greg Alexander James Stewart[1]Date of birth (1990-03-17) 17 March 1990 (age 33)Place of birth Stirling, Scotland[2]Height 5 ft 10 in (1.78 m)[3]Position(s) Forward, attacking midfielderTeam informationCurrent team Mumbai CityNumber 24Youth career Rangers2003–2006 HeartsSenior career*Years Team Apps (Gls)2007–2010 Syngenta Amateurs 2010–2014 Cowd...

 

Чотири набори даних мають ідентичні статистичні характеристики, але їх графіки істотно різняться. Квартет Анскомбе складається з чотирьох послідовностей з ідентичними значеннями простих статистичних властивостей, але їхні графіки істотно відрізняються. Кожен набір ск

 

Jewish scribe For other uses, see Sofer (disambiguation). For the talmudic tractate, see Soferim (Talmud). A sofer at work, Ein Bokek, Israel A Middle Eastern sofer sews together the pieces of parchment A sofer, sopher, sofer SeTaM, or sofer STM (Hebrew: סופר סת״ם, scribe; plural of sofer is soferim, סופרים) is a Jewish scribe who can transcribe Sifrei Kodesh (holy scrolls), tefillin (phylacteries), mezuzot (STM, סת״ם, is an abbreviation of these three terms) and other relig...

Toivo Mikael KivimäkiPerdana Menteri FinlandiaMasa jabatan14 Desember 1932 – 7 Oktober 1936PresidenPehr Evind SvinhufvudPendahuluJuho SunilaPenggantiKyösti Kallio Informasi pribadiLahir5 Juni 1886Tarvasjoki, FinlandiaMeninggal6 Mei 1968(1968-05-06) (umur 81)Partai politikPartai Progresif NasionalSunting kotak info • L • B Toivo Mikael Kivimäki (5 Juni 1886 – 6 Mei 1968), J.D., adalah kepala departemen hukum sipil di Universitas Helsinki 1931–1956, Perdana ...

 

Margaret Mead 1948 zum Anlass ihres Vor­trages Some Anthro­pologi­cal Consi­dera­tions Con­cer­ning Guilt auf dem „Second Inter­natio­nal Sym­posium on Fee­lings and Emo­tions“ in den USA Margaret Mead (* 16. Dezember 1901 in Philadelphia, Pennsylvania; † 15. November 1978 in New York) war eine US-amerikanische Ethnologin (cultural anthropologist). Sie gilt als eine der entschiedensten Vertreterinnen des Kulturrelativismus im 2...

 

Село Біндуґапол. Binduga Координати 52°21′ пн. ш. 22°52′ сх. д. / 52.350° пн. ш. 22.867° сх. д. / 52.350; 22.867Координати: 52°21′ пн. ш. 22°52′ сх. д. / 52.350° пн. ш. 22.867° сх. д. / 52.350; 22.867 Країна ПольщаПольщаВоєводство Мазовецьке воєводс...

1925 film by Alfred Santell ClassifiedTheatrical release posterDirected byAlfred SantellScott Beal (assistant)Written byJune MathisBased onClassifiedby Edna FerberProduced byCorinne GriffithStarringCorinne GriffithCinematographyHarold RossonEdited byCyril GardnerDistributed byFirst National PicturesRelease date October 11, 1925 (1925-10-11) Running time7 reelsCountryUnited StatesLanguageSilent (English intertitles) Classified is a 1925 American silent drama film directed by Alf...

 

Railway station in Iyo, Ehime Prefecture, Japan S10 Kushi Station串駅Kushi Station in 2015General informationLocationFutamichokushi, Iyo City, Ehime Prefecture 799-3312JapanCoordinates33°38′50″N 132°33′46″E / 33.6472°N 132.5628°E / 33.6472; 132.5628Operated by JR ShikokuLine(s)     Yosan LineDistance225.0 km (139.8 mi) from TakamatsuPlatforms1 side platformTracks1ConstructionStructure typeAt grade (sidehill cut)Bicycle f...

 

R.Soedarsono[[Bupati Jombang]] 7Masa jabatan1958 – 1962PresidenSoekarnoGubernurR. T. A. MilonoR. Soewondo RanoewidjojoPendahuluM. SoebijaktoPenggantiHassan Wirjokoesoemo Informasi pribadiLahir(1921-09-24)24 September 1921Sambirembe, Karangrejo, Magetan, Jawa Timur , Hindia BelandaMeninggal6 Mei 1997(1997-05-06) (umur 75)RS dr. Soetomo, Surabaya, Jawa TimurMakamTPU Pulo Sampurno, JombangKebangsaanIndonesiaPartai politikGolongan KaryaSuami/istriMy. Roro OentariAnak1. Endang ...

Species of bat Western red bat Scientific classification Domain: Eukaryota Kingdom: Animalia Phylum: Chordata Class: Mammalia Order: Chiroptera Family: Vespertilionidae Genus: Lasiurus Species: L. frantzii Binomial name Lasiurus frantzii(Peters, 1871) Western red bat range in green and yellow The western red bat or desert red bat (Lasiurus frantzii) is a species[1] of microbat in the family Vespertilionidae. It is found in western North America and Central America. Taxonomy Previ...

 

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Dragon Data – news · newspapers · books · scholar · JSTOR (October 2019) (Learn how and when to remove this template message) Dragon Data Ltd.IndustryComputer hardwareFounded1982Defunct1984FateNo longer in existence (final owner bankrupt)HeadquartersWales, Unit...

 

Suburb in Southern District, Hong KongWong Chuk Hang 黃竹坑涌尾SuburbStaunton CreekA view of commercial and industrial Staunton Creek from the Wong Chuk Hang RoadCountry Hong KongDistrictSouthern DistrictSeatWong Chuk HangGovernment • BodyDistrict Council • District CouncillorTsui Yuen-wa (Democratic Party) Staunton CreekTraditional Chinese黃竹坑涌尾Cantonese YaleWòhngjūkhāangcūngmèi Literal meaningCreek's end of Yellow Bamboo GullyTranscriptio...

Secondary academy in Crawley, West Sussex, EnglandHazelwick School AcademyAddressHazelwick Mill LaneCrawley, West Sussex, RH10 1SXEnglandCoordinates51°07′31″N 0°09′55″W / 51.1252°N 0.1652°W / 51.1252; -0.1652InformationTypeSecondary academyMottoEffort Achieves.Established1952Local authorityWest Sussex County CouncilSpecialistsTechnology, humanities and ITDepartment for Education URN137263 TablesOfstedReportsChair of GovernorsRachel BowronHead teacherAnn Fea...

 

Historic church in North Carolina, United States United States historic placeSt. Frances Methodist ChurchU.S. National Register of Historic Places Show map of North CarolinaShow map of the United StatesLocationOff NC 308, Lewiston, North CarolinaCoordinates36°7′24″N 77°10′40″W / 36.12333°N 77.17778°W / 36.12333; -77.17778Arealess than one acreBuilt1845ArchitectBragg, ThomasArchitectural styleGreek RevivalNRHP reference No.82003426[1]A...

 

Review of the election 1986 United States Senate election in Arizona ← 1980 November 4, 1986 1992 →   Nominee John McCain Richard Kimball Party Republican Democratic Popular vote 521,850 340,965 Percentage 60.48% 39.51% County results McCain:      50–60%      60-70% Kimball:      50–60% U.S. senator before election Barry Goldwater Republican Elected U.S. Senator John McCain Republica...

1965 Indian filmNaanalTheatrical release posterDirected byK. BalachanderWritten byK. BalachanderBased onNaanalby K. BalachanderProduced byG. V. SaravananStarringR. MuthuramanK. R. VijayaMajor SundarrajanSowcar JanakiSrikanthNageshCinematographyP. N. SundaramEdited byN. R. Kittu S. MuthuMusic byV. KumarProductioncompanySaravana PicturesRelease date 24 December 1965 (1965-12-24) Running time135 minutesCountryIndiaLanguageTamil Naanal (transl. Reed) is a 1965 Indian Tamil-la...

 

ألعاب فيديو مستقلة (بالإنجليزية: Independent video games)‏ في العادة تسمى «إندي جيمز» (indie games) هي ألعاب فيديو يتم إنشاؤها من قبل أفراد أو فرق صغيرة من دون وجود دعم مالي من ناشر ألعاب فيديو .[1][2][3] الألعاب المستقلة غالبا ما تركز على الابتكار والاعتماد على التوزيع الرقمي. شهد ...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!