Compression in general
The problem
Whilst modern PCs tend to come equipped with pretty large sized hard drives, it is still all too easy to run out of disk space. A further problem is when you send or receive files via the Internet. It can take a long time to send a big file to someone, especially on a slow connection, so what can be done to help? The answer is to compress the files so they take up less room and sending time.
What is Compression?
Compression is the reversible or non-reversible conversion process of reducing the size of a file by encoding its data information, performed so that the data can be stored or transmitted more efficiently. This compression can be achieved on data but also on a special kind of data: the binary file in the form of an executable or a DLL or any other kind of binary files. Either way, the result is a reduction in the number of bits and bytes, leading to a smaller file size. The size of the data in compressed form relative to its original size is known as the compression ratio. Ratios can differ big time depending on the algorithm used and depending on the nature of the file to be compressed.
How to use compression?
One way is to use programs that are designed to compress and uncompress files. Once compressed, files cannot generally be used until they are decompressed again and as such, compression is good for archival or for emailing. A well-known example of a compression technology is ZIP, a common standard for compressing data files. For binaries, this way is not possible because the compressed executable would loose all starting capabilities as it needs to be self-contained (see how this is solved in binaries below).
Compression is also used in many cases without the user realizing it. A modem uses a form of compression when it sends and receives data. One may have noticed that even if connected at 32K which ought to limit download speeds to around 3.5k a second, one often sees double that speed when downloading text or other highly compressible files. Another place it happens transparently is with graphics files.
How does compression work?
When you have a file containing text, there can be repetitive single words, word combinations and phrases that use up storage space unproductively. Or there can be media such as high tech graphical images in it whose data information occupies too much space. To reduce this inefficiency electronically, you can compress the document.
Compression is done by using compression algorithms (formulae) that rearrange and reorganize data information so that it can be stored more economically. By encoding information, data can be stored using fewer bits. This is done by using a compression/decompression program that alters the structure of the data temporarily for transporting, reformatting, archiving, saving, etc.
Compression, when at work, reduces information by using different and more efficient ways of representing the information. Methods may include simply removing space characters, using a single character to identify a string of repeated characters, or substituting smaller bit sequences for recurring characters. Some compression algorithms delete information altogether to achieve a smaller file size. Depending on the algorithm used, files can be adequately or greatly reduced from its original size.
If the inverse of the process, decompression, produces an exact replica of the original data then the compression is lossless. Lossy compression, usually applied to image data, does not allow reproduction of an exact replica of the original image, but has a higher compression ratio. Thus lossy compression allows only an approximation of the original to be generated.
Lossy compression
Lossy compression shrinks files by throwing away bits of data that hopefully won't be noticed. MP3 is such a system. It relies on the psycho-acoustic way the brain interprets audio and uses various tricks to produce something which sounds almost the same but is actually missing as much as 90% of the data. Another lossy system is Jpeg or JPG, which is designed to provide high compression on photographic type images.
For instance, in an image containing a green landscape with a blue sky, all the different and slight shades of blue and green are eliminated with compression. The essential nature of the data isn’t lost as the essential colours are still there. Large portions of the picture will be the same colour, perhaps whole lines. Rather than storing an entire row of perhaps 800 white pixels with each needing two bytes to store the colour (allowing a maximum of 65535 possible colours) which would result in 2 x 800 or 1600 bytes, you could store two bytes for the colour, a code byte that means 'repeat this many times' and another two to store the 800. That ends up as 4 bytes to store what used to be 1600, a huge saving.
Lossless Compression
Lossless Compression is a type of compression that can reduce files without a loss of information in the process. The original file can be recreated exactly when uncompressed. To achieve this, algorithms create reference points (substitution characters) for things such as textual patterns, store them in a catalogue and send them along with the smaller encoded file. When uncompressed, the file is “re-generated” by using those documented reference points to re-substitute the original information.
Lossless compression is ideal for documents containing text and numerical data where any loss of textual information can’t be tolerated. ZIP compression, for instance, is a Lossless compression that detects patterns and replaces them with a single character. LZMA compression is another example of lossless compression. This is also the kind of compression used in lARP64Free and lARP64Pro (see more below).
This seemingly impossible task relies on the fact that most files contain large amounts of space or repetitive data. As an example, remark that in this piece of info you are reading right now, the word 'compression' appears over and over again, each one taking 11 bytes of storage. A compression system could note this and after the first occurrence, rather than store the actual word, it can store a one byte indicator to say it is a repeat word plus a byte to indicate which word it is. The result is that each occurrence of 'compression' now needs 2 bytes not 11, a saving of 9 bytes and over 80% for that word. If you now repeat that process for the 256 most common words, you can make quite a difference to the size of the file. When you decompress the file, the decompression program finds these codes for repeated words and restores the full words in their place thus restoring the document to its original size and content.
Results
The success of data compression depends largely on the data itself and some data types are inherently more compressible than others. Generally some elements within the data are more common than others and most compression algorithms exploit this property, known as redundancy. The greater the redundancy within the data, the more successful the compression of the data is likely to be. Digital video contains a great deal of redundancy and thus is very suitable for compression.
A device (software or hardware) that compresses data is often know as an encoder or coder, whereas a device that decompresses data is known as a decoder. A device that acts as both a coder and decoder is known as a codec.
A great number of compression techniques have been developed and some lossless techniques can be applied to any type of data. Development, in recent years, of lossy techniques specifically for image data has contributed a great deal to the realisation of digital video applications.
So far for the compression in general, what about compression on binaries?
Software or executable and DLL compression
Software compression is mainly executable and DLL compression by any means of compressing an executable (DLL) file and combining the compressed data with the decompression code it needs into a single executable (DLL). This decompression code that is added to the compressed data is often called the decompression stub. Running a compressed executable essentially means that the decompression stub unpacks the original executable code before passing control to the recomposed original binary. The effect is the same as if the original uncompressed executable had been run. To the casual user, compressed and uncompressed executables are indistinguishable .
The act of compressing an executable file is often referred to as "packing", a typical name for an executable compressing program then becomes a "packer".
A compressed executable is a self-extracting archive, where compressed data is packaged together with the decompression stub into an executable file. So, there is no seperate program needed to execute a compressed executable file. Most packed executables decompress directly into the memory and need no free file system space to start. However, some decompressor stubs are known to write the uncompressed executable to the file system in order to start it.
Software distributors use executable compression for a variety of reasons, primarily to reduce the secondary storage requirements of software. Executable compressors are specifically designed to compress executable code, that's why they often achieve better compression ratio than standard data compression facilities. This allows software distributors to stay within the constraints of their chosen distribution media, or to reduce the time and bandwidth customers require to access software distributed via the Internet.
Executable compression is also frequently used to deter reverse engineering or to obfuscate the contents of the executable by proprietary methods of compression and/or added encryption. Malware is known to be compressed in many of the cases, to hide their presence from antivirus scanners. Executable compression can be used to prevent direct disassembly, mask string literals and modify signatures. However, executable compression does not eliminate the chance of reverse engineering, it can only make the process more costly. In general, compression only is certainly unsufficient to circumvent cracking.
A compressed software requires less storage space in the file system, thus less time to transfer data from the file system into memory. On the other hand, it requires some time to decompress the data before execution begins. However, the speed of various storage media has not kept up with average processor speeds, so the storage is very often the bottleneck. Thus the compressed executable will load faster on most common systems. On modern desktop computers, this is rarely noticeable unless the executable is unusually big, so loading speed is not a primary reason for or against compressing an executable.
Software compression allowes to store more software in the same amount of space, without the hassle of having to manually unpack an archive file every time the user wants to use the software.
lARP64Free and lArp64Pro
lARP64Free as 64 bit software compression program and lArp64Pro as 64 bit software protection program both have built-in compression based on the Lempel-Ziv-Markov chain Algorithm (LZMA). It is a compression algorithm using a dictionary compression scheme. It features a high compression ratio for a small stub size at a good compression and decompression speed which makes it ideal as an executable compression algorithm.
Why choose LZMA?
LZMA is a kick-butt compression algorithm for size's sake. It does a super job of compressing executable files, usually providing a better compression than other algorithms, especially on larger files. LZMA is an adaptation of LZ77 with the goal of large compression and fast decompression. It uses range encoding (Huffman coding) and uses a dictionary size as necessary.
lARP64Tech has decided for LZMA because it offers so many advantages.