2017  Kodetalk | Feedback | Privacy Policy | Terms | About
userimage

Compressed String of Java 9 will reduce String memory consumption for a better performance.

Before Java 9, while creating any string object it's internally creates array of char (char[]) and stores the value. Size of char is 16 byte in java while to store a char requires 1 byte[8 bit] of space. So from Java 9 updating the String storage logic to byte[] using ISO-8859-1/Latin-1 as they contain no special characters. From now strings could be represented with just one byte per character, that would mean just half of the memory would be used.


String instances usually stored on the heap which takes a big portion of the heap memory as internally it stores as char[]. So making String twice as small would mean not only a significant memory consumption reduction, but also a significant reduction of Garbage Collection overhead.


UseCompressedStrings feature was rather conservative: while distinguishing between char[] and byte[] case, and trying to compress the char[] into byte[] on String construction, it done most String operations on char[], which required to unpack the String. Therefore, it benefited only a special type of workloads, where most strings are compressible (so compression does not go to waste), and only a limited amount of known String operations are performed on them (so no unpacking is needed). In great many workloads, enabling -XX:+UseCompressedStrings was a pessimization.


UseCompressedStrings implementation was basically an optional feature that maintained a completely distinct String implementation in alt-rt.jar, which was loaded once the VM option is supplied. Optional features are harder to test, since they double the number of option combinations to try.


Java 9 Compact Strings


While the implementation of Compressed Strings was flawed in many ways, the main idea was still valid. The implementation was just not solid enough. In Java 9, a new feature was introduced as a replacement of Compressed Strings - Compact Strings.


Instead of having char[] array, String is now represented as byte[] array. Depending on which characters it contains, it will either use UTF-16 or Latin-1, that is either one or two bytes per character. There is a new field inside the String class coder, which indicates which variant is used. Unlike Compressed Strings, this feature is enabled by default. If necessary (in a case where there are mainly UTF-16 Strings used), it can still be disabled by -XX:-CompactStrings.


The change does not affect any public interfaces of String or any other related classes. Many of the classes were reworked to support the new String representation, such as StringBuffer or StringBuilder.


Performance Impact


Unlike Compressed Strings, the new solution does not contain any String repacking, thus should be much more performant. In addition to significant memory footprint reduction, it should provide a performance gain when processing 1-byte Strings as there is much less data to be processed. Garbage Collection overhead will be reduced as well. Processing of 2-byte string does mean a minor performance hit, because there is some additional logic for handling both cases for Strings. But overall, performance should be improved as 2-byte Strings should represent just a minority of all String instances. In case there are performance issues in situations, where the majority of Strings are 2-byte, the feature can always be disabled.


Conclusion


Java 9 introduces a new feature, which does reduce the memory footprint of String to half in most of the cases. It is backward compatible, no public interfaces were changed. If required, it can be disabled by an -XX flag. It is a successor to the Compressed Strings from Java 6.