Friday, April 17, 2009

UseCompressedOops

I learned of a nifty Hotspot option recently,
-XX:+UseCompressedOops
It sounds like this largely solves the bloat problem that happens when you go from 32-bit to 64-bit with Java as this option tells Hotspot to allow heap sizes up to 32 gig, use the eight extra registers that are available with AMD64 and to use 32-bit pointers/references whenever possible to limit bloat. From what I've heard, tests indicate that the bloat is much smaller (~10% compared to the 50%+ you get with vanilla 64-bit) and performance is comparable. Here's a longer post by Ismael Juma to fill you in on the details: Note that, as of recently, this option was only available in the Hotspot "performance" release, but is slated to be added to the main release.

Tuesday, August 12, 2008

Garbage Collection, or an OutOfMemoryError Guide

I've had my share of struggling with the Java Hotspot GC and I'm sure I'm not alone. I see semi-regular postings about OutOfMemoryErrors (OOMEs) on the solr-user list. Many of these are as simple as someone not realizing that they need to pass a max heap parameter (-Xmx) to the jvm to run any kind of serious application, but occasionally the issues are more subtle. Here are some tips and notes that I've put together while building applications using Sun's Hotspot JVM.

  1. If you haven't already, turn on GC logging by passing the following options to the jvm: "-XX:+PrintGC -XX:+PrintGCDetails"
    • It is tremendously easier to debug GC/OOME problems if you can see what the GC is doing.
    • For each "Full GC", you'll see three sizes for each of the three generations (young, old, perm): (1) space occupied before the GC, (2) space occupied after the GC, and (3) size of the generation. If #2 is close to #3 for old or perm, you likely need to increase the maximum size for that generation.
  2. The Permanent Generation stores class information and interned strings. If you need to change the max size, use the option -XX:MaxPermSize (e.g. -XX:MaxPermSize=150m).
  3. The New and Old generations share the heap space. If you get a heap space OOME or your old gen space is getting low, you need to do one of two things: (1) increase the max heap size using -Xmx (e.g. -Xmx1g), or (2) increase the new ratio -XX:NewRatio (e.g. -XX:NewRatio=8), which determines how the old and new generations split up the heap. If you're on 32-bit, try #1; if you're on 64-bit try both.
  4. 64-bit JVM objects take up more space than 32-bit JVM objects. The size difference varies by object, but don't be surprised if you need 50% more heap when moving from a 32-bit JVM to a 64-bit one. Some size comparisons.
The need to tune NewRatio was a recent discovery for us and seems to be somewhat specific to the 64-bit PC platform. Hotspot uses a default of 8 for x86 "-server", which means that, for a 1 gig heap, the new generation is limited to 114m and the old generation is limited to 910m. However, for amd64, this value defaults to 2, or 341m for NewGen and 683m for OldGen. Any long-lived object (which isn't an interned string) will make it to the old generation before long, so you can easily get an OOME with 700m worth of non-transient data.

Additional resources:

Friday, March 14, 2008

It gets worse

Yesterday, I described my surprise at learning the true sizes of various Java Objects. What I didn't mention is that the sizes I was looking at were for a 32-bit JVM. Later yesterday, I tested out the same code on a 64-bit JVM. Egad. It gets worse. Sizes for various Java Objects:
32-bit64-bit
Object816
Integer1624
Long1624
Float1624
Double1624
String4064
Date2432
Calendar432544
byte[0]1624
byte[32]4856
byte[128][0]25764120
ArrayList<Integer>(1)5696
ArrayList<Integer>(2)80128

Thursday, March 13, 2008

The size of a java.util.Calendar is 432 bytes

Calendar's are cool. They're one of the few objects (in any language) which I consider to have solved the datetime problem. So many datetime objects require extra work from the programmer for (seemlingly) simple queries such determining whether x occurred before or after a month ago. You typically can't simply add a year/month/date/hour/etc. and you always have to be careful of whether an index value is 0, 1, or -1900 based. Calendar lets you focus on the date operations you are performing rather than devising tricks to get around other people's laziness. You can add just about any date/time quantity, there are static fields for month values, and you can compare two Calendar's via the before and after methods.

But, recently, my love of Calendar's took a big hit. This was shortly after I stumbled upon Java Tip 130: Do you know your data size? I was tempted to discount this article due to it's age---5.5 years---so much has changed since then in the Java world. But, after running the SizeOf code and discovering that memory consumption for the various basic objects Vladimir Roubtsov tested hadn't changed, I re-read his article carefully and took every word to heart. That an empty String consumes 40 bytes took me by surprise. But, it wasn't until I tested Calendar that I was truly shocked. In the current system I'm working on, I am storing hundreds-of-thousands of objects with Calendar's as fields. No wonder they consumed so much memory! Now I understand whence that outlandish consumption came.

Going forward, I won't stop using Calendar's, but I'll be much more careful about my usage of them. No more throwing a Calendar into an object that I'll allocate thousands of times...

Other useful articles on memory usage:

Monday, January 28, 2008

The Subtle NullPointerException

Most NullPointerException's are blatantly obvious, like the attempted dereferencing of a null object. For someone like me who is well-rooted in C/C++, this is a completely natural thing to look for. Recently I discovered a type of NPE in Java that I wasn't ready for. Before I describe it, I'll ask a question: what happens when you pass a null Integer as an argument to a function that accepts int? Yep, NPE. What's a bit subtle about this NPE is that it happens in the context of the parent function. I.e. the stack trace does not reference the int-argument function---Java tries to convert a null Integer to int and fails, throwing an NPE as a result, before the int-argument function is called. Took me a bit of brain-thrashing before I realized what I had stumbled upon. 'course, the good thing is that once you've seen something like this once, you're unlikely to be tripped-up again. One change I've decided to make in my coding is to reduce/eliminate int function arguments. There's enough overhead in a function call to make the Integer/int choice a mute point. And, using Integer yields a more clear stack trace and makes clear that null's are a concern.

Wednesday, January 23, 2008

MySQL Streaming Result Set, Part II

When I get streaming result sets from mysql, I sometimes have to perform operations that take minutes to complete. Occasionally, I get java.io.EOFException: Can not read response from server. As I've learned, this is due to the mysql net_write_timeout setting. If you wait more than this many seconds between reads of the streaming result set, mysql may close the connection. However, setting net_write_timeout to a large value globally (such as in the my.cnf file) may be dangerous. Fortunately, there is a Connector/J property which will set net_write_timeout to a specified value only for streaming result sets. That property is netTimeoutForStreamingResults. By default, it is set to 600 seconds. To change it, you will need to be using Connector/J 5.1.0 or later. We set this value in our connection url.

Wednesday, December 5, 2007

MySQL Streaming Result Set

WTF? I thought this was a Java blog. Well, yes, it is. And, if you're just mucking around with java, or using it for a small application or research project, you probably won't have to deal with java.sql. But, if you're building most any kind of decent sized java application in the business world, it's hard to avoid using a database (or ten). We've been generally happily using JDBC and Hibernate to communicate with MySQL. That is, until one of my grab-all-the-data-from-a-largish-table queries died with an OutOfMemoryError before returning a single result. I was confused. How could that be? Could the ResultSet be consuming gigs worth of memory? Why, yes, it can, I quickly learned. A bit more searching and I happened upon the MySQL JDBC API Implementation Notes:
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
Ouch! But, that last sentence provides hope. On my plate was a project to write a tool to analyze site activity which was stored in a table with tens of millions of rows. Would be a bit clunky if I couldn't stream/iterate through the data. Here's the magic incantation:
To enable this functionality, you need to create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this any result sets created with the statement will be retrieved row-by-row.
In fact, you don't need to specify forward-only and read-only as they are the defaults, but the Integer.MIN_VALUE fetch size is essential. This signals the driver to stream the data rather than send it all-at-once. There are caveats to this approach. Be sure to read the ResultSet section of the MySQL JDBC API Implementation Notes before using this for serious work. Also, note that if there are delays in processing the streaming data that your network timeout values are sufficiently large. The mysqld net_write_timeout variable should be set at least as large as the longest possible delay. Also, there appear to be undocumented issues with streaming from one table and using a separate connection to read data from that same table. I've observed odd behavior in this case, such as the ResultSet being set to null. Best to ask for all the required data in the initial streaming query.