Wednesday, December 5, 2007

MySQL Streaming Result Set

WTF? I thought this was a Java blog. Well, yes, it is. And, if you're just mucking around with java, or using it for a small application or research project, you probably won't have to deal with java.sql. But, if you're building most any kind of decent sized java application in the business world, it's hard to avoid using a database (or ten). We've been generally happily using JDBC and Hibernate to communicate with MySQL. That is, until one of my grab-all-the-data-from-a-largish-table queries died with an OutOfMemoryError before returning a single result. I was confused. How could that be? Could the ResultSet be consuming gigs worth of memory? Why, yes, it can, I quickly learned. A bit more searching and I happened upon the MySQL JDBC API Implementation Notes:
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
Ouch! But, that last sentence provides hope. On my plate was a project to write a tool to analyze site activity which was stored in a table with tens of millions of rows. Would be a bit clunky if I couldn't stream/iterate through the data. Here's the magic incantation:
To enable this functionality, you need to create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this any result sets created with the statement will be retrieved row-by-row.
In fact, you don't need to specify forward-only and read-only as they are the defaults, but the Integer.MIN_VALUE fetch size is essential. This signals the driver to stream the data rather than send it all-at-once. There are caveats to this approach. Be sure to read the ResultSet section of the MySQL JDBC API Implementation Notes before using this for serious work. Also, note that if there are delays in processing the streaming data that your network timeout values are sufficiently large. The mysqld net_write_timeout variable should be set at least as large as the longest possible delay. Also, there appear to be undocumented issues with streaming from one table and using a separate connection to read data from that same table. I've observed odd behavior in this case, such as the ResultSet being set to null. Best to ask for all the required data in the initial streaming query.

5 comments:

Unknown said...

If I stream result sets row-by-row and meanwhile I want to issue any other queries on the connection. How can I do it?

I see someone uses the following statements:

stmt = m_conn.createStatement(ResultSet.FETCH_FORWARD, ResultSet.CONCUR_UPDATABLE);
stmt.setFetchSize(Integer.MIN_VALUE);

But I am not sure whether it is right.

Jason said...

I recall having problems with trying to perform other queries while reading from the streaming result set. Have you tried using a separate connection for the 2nd query?

Shashikant Kore said...

This was helpful. Thank you.

Minor caveat. This doesn't seem to work with PreparedStatement.

Ketan said...

@Pussinboots: That's exactly the problem I encountered in a legacy app that I am maintaining. The database access component was using a shared Connection & ResultSet for executing queries. Turned out that although the logic was explicitly closing the ResultSet before executing the next query, it broke down when multiple threads were calling it and it would lead to a "Streaming result set still active" Exception

Carlos Gagliardi said...

Hi guys, this isn't working for me. The resultset is having 15000 rows and for the next rows I can't get it because the connection is closed. Do you know if what you suggested could not work in newer versions?
Thx & regards