Wednesday, December 5, 2007

MySQL Streaming Result Set

WTF? I thought this was a Java blog. Well, yes, it is. And, if you're just mucking around with java, or using it for a small application or research project, you probably won't have to deal with java.sql. But, if you're building most any kind of decent sized java application in the business world, it's hard to avoid using a database (or ten). We've been generally happily using JDBC and Hibernate to communicate with MySQL. That is, until one of my grab-all-the-data-from-a-largish-table queries died with an OutOfMemoryError before returning a single result. I was confused. How could that be? Could the ResultSet be consuming gigs worth of memory? Why, yes, it can, I quickly learned. A bit more searching and I happened upon the MySQL JDBC API Implementation Notes:
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
Ouch! But, that last sentence provides hope. On my plate was a project to write a tool to analyze site activity which was stored in a table with tens of millions of rows. Would be a bit clunky if I couldn't stream/iterate through the data. Here's the magic incantation:
To enable this functionality, you need to create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this any result sets created with the statement will be retrieved row-by-row.
In fact, you don't need to specify forward-only and read-only as they are the defaults, but the Integer.MIN_VALUE fetch size is essential. This signals the driver to stream the data rather than send it all-at-once. There are caveats to this approach. Be sure to read the ResultSet section of the MySQL JDBC API Implementation Notes before using this for serious work. Also, note that if there are delays in processing the streaming data that your network timeout values are sufficiently large. The mysqld net_write_timeout variable should be set at least as large as the longest possible delay. Also, there appear to be undocumented issues with streaming from one table and using a separate connection to read data from that same table. I've observed odd behavior in this case, such as the ResultSet being set to null. Best to ask for all the required data in the initial streaming query.

Wednesday, November 14, 2007

Map.containsKey(Object key)

The fact that Map methods don't consistently take typed (K and/or V) or untyped (Object) argument(s) is baffling to me. Didn't Java 5 introduce generics for a reason? The put() method only accepts arguments of type K and V, yet containsKey() accepts any Object. Why the inconsistency? If the containsKey argument isn't of type K, it's an error (as indicated by the fact that containsKey throws ClassCastException). Wasn't one of the reasons for introducing generics to catch such errors at compile time?

What's worse (for me, at least) is that this odd mish-mash of strong and weak typing introduces an opportunity for bugs that couldn't exist before. A dilemma I recently found myself in was a HashMap which didn't seem to be able to determine when two keys were equal to each other. I was using a key type of my own construction, call it CustomKey. The problem was that I hadn't defined an equals(Object) method. It wasn't that I didn't think I needed to override the Object equals() method. Rather, I thought that the Map would be calling equals(CustomKey). The keys were typed as such, so I couldn't imagine that HashMap would be casting them to Object before calling equals. But, that's exactly what Map does. Well, it's a bit more subtle than that. HashMap doesn't explicitly cast the key to an Object. Rather, put(K,V) calls containsKey(Object) to determine whether the key already exists. Calling containsKey(Object) on the key effectively casts it to Object so that when equals is called within containsKey, the equals(CustomKey) is ignored in favor of equals(Object).

A careful read of the API docs will make it obvious that you should override equals(Object) for any custom key. But, I wonder why they had to make it so non-intuitive? Actually, nevermind, I can make a well-educated guess: backwards compatibility. Though, if that's the case, I must wonder: why update the put() method for genetics, but not containsKey(), containsValue(), get(), and remove()? And, I further wonder---why not explain the rationale in the API documentation? Even in the fifth edition of Java In A Nutshell by David Flanagan I haven't been able to find a relevant discussion...

Monday, June 4, 2007

java.util.Calendar.clone() hasn't been updated for 5.0

java.util.Calendar.clone() returns Object. What?!?!?!? Are you crazy? This function overrides Object.clone(), so there's no need to use an Object return type. The return type will *always* be Calendar. And, Java 5.0 specifically allows for this (return type to be a subclass of overridden method's return type). In "Java in a Nutshell", David Flanagan notes (pg. 120, 5th ed.):
In Java 5.0 and later, the return type of the overriding method may be a subclass of return type of the overridden method instead of being exactly the same return type. This is known as a covariant return and is described in "Covariant Return Types" in Chapter 2.
Hopefully the Object return type is an oversight that will be corrected in Java 6.

Tuesday, May 29, 2007

java.util.concurrent.ConcurrentHashtable

The J2SE 1.5.0 java.util.concurrent.ConcurrentHashMap API states:
This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. [...] Like Hashtable but unlike HashMap, this class does NOT allow null to be used as a key or value.
My question for whomever chose the name of this class: Why the bloody hell did you call it ConcurrentHashMap and not ConcurrentHashtable?!?!

Tuesday, May 15, 2007

Overriding an Instance Field Should be a Compile Error

Consider this scenario:
class A {
String s = "A";
void bar() {
  System.out.println(s);
}
}

class B extends A {
String s = "B";
}

public static void main(String[] args) {
B foo = new B();
foo.bar();
}
What gets printed, A or B? The answer is A. I expected B. Since B doesn't define bar(), B inherits the definition from A. My OOP intuition says that B should do just that---inherit the definition of bar(), and call it as if it were defined in B. Clearly, this isn't what Java does. Creation of constructors for A, "A(){System.out.println("A()");}", and B, "B(){}", provide insight into what Java is doing. Running main() with these constructors yields "A()" on standard out. I.e. even though the constructor for A is never explicitly called, it is still being called---Java is creating a phantom instance of A which handles inherited function calls. The problem here is that class B is hiding the s field of class A. I.e. two s fields are instantiated, one for A and one for B. The foo.bar() call executes the class A function, so it uses the class A field. What I find a bit disturbing is that Java allows field hiding. What's the point? The new field can't affect invocations of superclass methods. So, the new field is sufficiently distinct from the superclass field so as to deserve a different name. Seems to me that field hiding only creates confusion. Update 12/18/07: Field hiding ("Hiding Superclass Fields") is discussed by Flanagan. Also, see "Overriding Superclass Methods".