Wednesday, November 14, 2007

Map.containsKey(Object key)

The fact that Map methods don't consistently take typed (K and/or V) or untyped (Object) argument(s) is baffling to me. Didn't Java 5 introduce generics for a reason? The put() method only accepts arguments of type K and V, yet containsKey() accepts any Object. Why the inconsistency? If the containsKey argument isn't of type K, it's an error (as indicated by the fact that containsKey throws ClassCastException). Wasn't one of the reasons for introducing generics to catch such errors at compile time?

What's worse (for me, at least) is that this odd mish-mash of strong and weak typing introduces an opportunity for bugs that couldn't exist before. A dilemma I recently found myself in was a HashMap which didn't seem to be able to determine when two keys were equal to each other. I was using a key type of my own construction, call it CustomKey. The problem was that I hadn't defined an equals(Object) method. It wasn't that I didn't think I needed to override the Object equals() method. Rather, I thought that the Map would be calling equals(CustomKey). The keys were typed as such, so I couldn't imagine that HashMap would be casting them to Object before calling equals. But, that's exactly what Map does. Well, it's a bit more subtle than that. HashMap doesn't explicitly cast the key to an Object. Rather, put(K,V) calls containsKey(Object) to determine whether the key already exists. Calling containsKey(Object) on the key effectively casts it to Object so that when equals is called within containsKey, the equals(CustomKey) is ignored in favor of equals(Object).

A careful read of the API docs will make it obvious that you should override equals(Object) for any custom key. But, I wonder why they had to make it so non-intuitive? Actually, nevermind, I can make a well-educated guess: backwards compatibility. Though, if that's the case, I must wonder: why update the put() method for genetics, but not containsKey(), containsValue(), get(), and remove()? And, I further wonder---why not explain the rationale in the API documentation? Even in the fifth edition of Java In A Nutshell by David Flanagan I haven't been able to find a relevant discussion...

6 comments:

Shalin Shekhar Mangar said...

Hi Jason,

I remember Joshua Bloch commenting on this issue in one of his tech talks. Basically, he said that the reason behind having Objects instead of typed parameters in the generic Collection classes was to have contains and set intersections work across typed values. For example, you can create a set of Integer and intersect with a set of Long values. So if contains took a typed parameter this would have been impossible to implement.

If I run into the video of this tech-talk, I will post it here.

Regards,
Shalin

Pussinboots said...

I don't mind the existence of the Object-parameter functions. What I do mind is that there aren't (also) typed functions.

Btw, do you realize that Integer.valueOf(54).equals(Long.valueOf(54)) returns false? How do you (or Joshua) expect to be able to do an intersection if equals doesn't work? If being able to intersect number sets were so important, the Number class would have an equals method which worked as expected. If that were the case, you could do the Integer/Long intersection by intersecting two Set<Number>'s---one filled with Integer's and one filled with Long's.

Timothy said...

Jason,

Do you know about Josh Bloch's book, Effective Java? One of the first things I remember reading there was that you should always override both equals() and hashCode() (which, if it's not overridden correctly, will cause subtle bugs if you're using hash-based data-structures).

- Tim

Pomax said...

Actually, it gets worse. Even with your custom equals() method the HashMap will be oblivious to the fact that two distinct objects that resolve "true" in an equals comparison should, in fact, do so. Rather than calling equals like it claims, it - quite annoyingly - relies on the hashcode() method instead.

Pussinboots said...

Hi Timothy,

I definitely agree with Bloch---you should always override those two methods. But, what I think is confusing about equals is that you can't simply define an equals(MyCustomClass) method. I realize that doing so is not technically overriding Object.equals, but it's only due to what I think are poor choices of argument types that this doesn't work.

Jason

lingpipe-blog.com said...

Number, the abstract superclass of Integer and Long, doesn't specify how equality or hash codes should be computed. The definitions in Integer and Long say they're only equal to other objects of the same class, and both classes are final.

In contrast, the Set interface in the collections framework specifies how equality and hash codes have to work for implementations. It's even conveniently defined for you in the abstract class AbstractSet. Thus a TreeSet and HashSet with equal members will be equal.

The reason you can't just define Long.equals(Long), is because the generics framework lets you create a Set<Object> and put whatever kind of things you want in it.

On the other hand, Comparable<E> took that route, so you only define compareTo(E), so the explicit assumption is that you only compare to other instances of the same class (or subclasses). But that's just natural order. You can define your own Comparetor<Object> that compares everything.

Sadly, the autoboxing (cause of that null pointer in the other post), leans on Number instances, so you can't really just define your own version of these things.

The other problem in Number is allocation; valueOf's the way to go over the constructors or parsers, which always generate new instances.