Java Generics and Collections 278
andrew cooke writes "Java 6 was recently
released, but many programmers are still exploring the features
introduced in Java 5 — probably the most significant changes in
the language's twelve year history. Amongst those changes (enumerations,
auto-boxing, foreach, varargs) generics was the most far-reaching,
introducing generic programming in a simpler, safer way than C++
templates and, unlike generics in C#, maintaining backwards (and
forwards) compatibility with existing Java code." Read on for the rest of Andrew's review.
Java Generics and Collections | |
author | Maurice Naftalin, Philip Wadler |
pages | 273 |
publisher | O'Reilly Media, Inc. |
rating | 9/10 |
reviewer | Andrew Cooke |
ISBN | 978-0-596-52775-4 |
summary | Guide to Java generics; also includes interesting discussion of collection classes. |
Given the history of Generic Java, Naftalin and Wadler's Java Generics and Collections has a distinguished pedigree. In this review I'll argue that this is a new classic.
If you're a Java programmer you've probably heard of generics, an extension to the type system that was introduced in Java 5. They give you, as a programmer, a way to write code even when you don't know exactly what classes will be used.
The obvious example is collections — the author of a List class has no idea what type of objects will be stored when the code is used.
Before generics, if you wanted to write code that handled unknown classes you had to use make use of inheritance: write the code as if it would get Objects, and then let the caller cast the result as necessary. Since casts happen at runtime any mistakes may cause a runtime error (a ClassCastException).
Generics fix this. They let you write code in which the classes are named (parameters) and the compiler can then check that the use of these class parameters is consistent in your program. So if you have a List of Foo instances you write List<Foo> and the compiler knows that when you read that list you will receive a Foo, not an Object.
I'll get to the book in a moment, but first a little history. If you know any type theory — particularly as used in functional languages like ML and Haskell — then you'll recognize my quick description above as parametric polymorphism. You'll also know that it is incredibly useful, and wonder how Java programmers could ever have managed without it.
Which explains why Philip Wadler, one of the people responsible for Haskell, was part of a team that wrote GJ (Generic Java), one of the experimental Java mutations (others included PolyJ and Pizza) that, back in the day (late 90s) helped explore how parametric polymorphism could be added to Java, and which formed the basis for the generics introduced in Java 5.
So if you want to understand generics, Wadler is your man. Which, in turn, explains why I jumped at the chance to review O'Reilly's Java Generics and Collections, by Maurice Naftalin and Philip Wadler.
This is a moderately slim book (just under 300 pages). It looks like any other O'Reilly work — the animal is an Alligator this time. It's well organized, easy to read, and has a decent index.
There's an odd discrepancy, though: Wadler is the generics Guru; this is going to be `the generics reference'; generics are sexy (in relative terms — we're talking Java here) and collections are not; the title has "Java Generics" in great big letters with "and Collections" in little tiny ones down in a corner. Yet very nearly half this book is dedicated to collections.
Generics is a great, practical read. It starts simply, introducing a range of new features in Java 5, and then builds rapidly.
If you are completely new to generics, you'll want to read slowly. Everything is here, and it's very clear and friendly, but there are not the chapters of simple, repeated examples you might find in a fatter book. Within just 30 pages you meet pretty much all of generics, including wildcards and constraints.
If that makes your head spin, don't worry. Read on. The next hundred or so pages don't introduce any new syntax, but instead discuss a wide range of related issues. The chapters on Comparisons and Bounds and Declarations contain more examples that will help clarify what generics do. And the following chapters on Evolution, Reification, and Reflection will explain exactly why.
So the first seven chapters introduce generics and then justify the implementation — any programmer that takes the time to understand this will have a very solid base in generics.
There are even some interesting ideas on how Java could have evolved differently — section 6.9 Arrays as a Deprecated Type presents a strong case for removing arrays from the language. It's a tribute to the clarity and depth of this book that the reader is able to follow detailed arguments about language design. Fascinating stuff.
The next two chapters, however, were my favorites. Effective Generics and Design Patterns give sensible, practical advice on using generics in your work, including the best explanation of <X extends Foo<X>> I've seen yet (so if you don't know what I am talking about here, read the book).
(A practical word of advice — if at all possible, use Java 6 with generics. Java 5 has a sneaky bug).
The Collections part of the book was more along O'Reilly's `Nutshell' lines: the different chapters explore different collection types in detail. I must admit that at first I skipped this — it looked like API docs re-hashed to extend the size of the book.
Then I felt bad, because I was supposed to be reviewing this book (full disclosure: if you review a book for Slashdot you get to keep it). And you know what? It turned out to be pretty interesting. I've programmed in Java for (too many) years, and I guess I've not been quite as dedicated to tracking how the library has changed as I should have been — I learned a lot.
Again, a wide range of readers are welcome. This is more than a summary of the Javadocs, ranging from thumbnail sketches of trees and hashtables to a discussion of containers intended for multi-threaded programming.
The way I see it now, this part is a bonus: the first half, on generics, makes this book one of the standards; the second half is an extra treat I'm glad I stumbled across (I guess if you're some kind of weird collection-fetishist maybe it's even worth buying the book for).
I've used generics since the first beta release of Java 5 and had experience with parametric polymorphism in functional languages before that (in other words, I can tell my co- from my contra-variance). So I guess I'm heading towards the more expert end of the spectrum and I was worried I'd find the book boring. It wasn't. After claiming to be expert I don't want to spoil things with evidence that I'm actually stupid, but reading this book cleared up a few `misunderstandings' I'd had. I wish I had read it earlier.
If you're new to generics, and you don't mind thinking, I recommend this book. If you're a Java programmer who's a bit confused by <? super Foo> then this is the book for you.
The only people who shouldn't read this are people new to Java. You need to go elsewhere first. This is not a book for complete beginners. This is a great book in the classic — practical, concise and intelligent — O'Reilly mould.
You can purchase Java Generics and Collections from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Generics are basically good. (Score:5, Interesting)
One thing that I found in Java5 was that it lacked generics for several cases, e.g. Awt/Swing objects that were able to contain Object themselves. Not that it was a big problem, but it wouldn't have been bad to have that support there too...
Anyway - Generics is one of the best features of added to Java lately. It really helps. How I miss it when I'm programming for J2ME...
All I really wanted... (Score:4, Interesting)
That said, this sounds like a good resource on Java Collections in general (though Sun's javadocs are pretty nice themselves), as well as the other features introduced in Java 5. There also seems to be some discussion of more complex generic structures.
I'm still a bit lukewarm about buying it, but if I were getting back into a lot of Java stuff, I probably would.
Reading Generified Code Makes My Brain Hurt (Score:2, Interesting)
Then there is the Collections API itself which upon first glance seems like it was written by amateurs who have never had to write any performance critical code in their lives. For this reason as well, I generally try and avoid using anything in java.util as well.
And now they are talking about adding closures (more bloat) to Java which as I understand the proposal will be implemented under the hood in basically the same way as inner classes (another feature that is a maintenance nightmare that gets abused by novice developers ad infinitum).
Is Java not bloated enough? Do the guys at SUN have such feature envy of C# (the bastard child of Java), that they can't just say enough is enough?
I feel like this is all coming full circle with C++ in the sense that Java now has so many language features that it is becoming too complicated for entry-level developers to be truly productive with and now a new language is needed that has the best features of Java, minus all the bloat that totally overwhelms the initiates.
With more features, generally comes more power, but with more power there is more room for abuse for those who don't have the wisdom to use it (i.e. newbies). Everyone in programming starts off as a newbie and needs to get their feet wet, but once you make a programming language where everyone has a light saber, but does not have the Jedi training or wisdom to use it, well then you are going to have a lot of people causing a whole lot of trouble.
One of the main reasons why Windows software development has slowed to a crawl (besides of course the cannibalizing nature of MS on the Windows platform), is that it takes a good 4 years or more of full-time experience with the Windows API's just to become adept at programming on that platform, on top of being decent at C/C++ itself. I know Microsoft has tried to reduce that learning curve with C# and
I guess it is time for a new application programming language.
Generics, jeez (Score:2, Interesting)
Java has come a long way but there's still a reason Java programmers cost about 60% of the cost of actual C++ programmers (curse them).
Re:C# compatibility? duh... (Score:1, Interesting)
So, for example, if you have some old code that adds Strings to a List, you can pass it a List of Integers from new code, and it'll happily add Strings into your generic List of Integers.
A more useful example might be something where old code promises it'll only add Strings so you can pass a generic list of Strings, only to discover at some random point later where the legacy code accidentally added a StringBuffer.
Seriously, though, you can't use generics in pre-1.5 code. Any code that uses generics generates binaries that only work in 1.5 or later JVMs. (Despite the fact that generics are compile-time only.) The only "backwards compatible" part is that pre-1.5 code will still compile and pre-1.5 binaries will still run. As I've suggested above, though, that backwards compatibility completely defeats the purpose of generics.
Which isn't to say they're completely useless, but they're essentially no more useful than a simple comment to indicate what the programmer wants the collection to contain rather than what it actually contains. Especially when legacy code is involved.
Re:C# compatibility? duh... (Score:3, Interesting)
It's impossible to do in C#/C++.
Re:Java 'generics' are not real generics (Score:3, Interesting)
harvey@clownfish:~$ cat test.java
import java.util.*;
public class test
{
public test()
{
List s = new ArrayList<Integer>();
List l = s;
l.add("foo");
}
}
harvey@clownfish:~$ javac test.java
Note: test.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
harvey@clownfish:~$ javac -Xlint:unchecked test.java
test.java:9: warning: [unchecked] unchecked call to add(E) as a member of the raw type java.util.List
l.add("foo");
^
1 warning
Re:Java 'generics' are not real generics (Score:5, Interesting)
With "real" generics the system has two choices: either generate lots of bloated specific instances of the code, or add type-checking at runtime. CLR designers thought they were going to do the former and it was going to be 'uber leet' and fast, but found out it's not practical (most of the optimizations that C++ uses to limit bloat do not apply well in a dynamic language) so they got stuck with the latter, for objects.
In Java, the code goes to add something to a generic list for example and it does one cast to the generic parameter type. Many times it can completely remove this check since it already knows from flow that the type is compatible. CLR can do this too, but only if it *also* knows the specific instance of the list (what the generic parameter types are), so it can remove fewer checks. This makes optimization harder as well since each use of a generic parameter can potentially block inlining and/or hoisting.
On top of that, the tests CLR has to do are *much* slower since they have to check many parallel type hierarchies (one per generic type references). For example, when passing a LinkedList of Integers to a parameter of type List of Numbers CLR has to in effect check both List assignable from LinkedList and Number assignable from Integer.
So in the vast majority of code not only do you end up with more checks but slower ones, and CLR has to maintain a complicated hierarchy of instantiated types to optimize this. All so primitives can be used faster in some cases, which is pretty ironic since in my experience these cases are usually easy to optimize by hand to use an array or patch out to inline C++ or JNI'd code.
In other words they messed up their runtime for bullet points without considering the implications. Not even to mention that in Java if you don't like generics, you just don't use them.
Re:C# compatibility? duh... (Score:3, Interesting)
If you're maintaining an existing Java application, it's not likely that you're going to rewrite it in C# anyway. If you are starting from scratch without any commitment, I think C#'s approach to generics is better (I'm just talking about this particular issue, I'm not saying that C# is better than Java for all new projects).
Re:Generics are basically good. (Score:3, Interesting)
I'm sorry, but I don't know what you are smoking. Vector is entirely retrofitted to use generics, and it does it fine. And a comment like "ArrayList is a much better choice nowadays" is complete BS. They are virtually identical, with one important difference, spelled out in the documentation [sun.com]: "This class is roughly equivalent to Vector, except that it is unsynchronized". Meaning that if you ever plan to do anything with threads, stick with Vector. In fact, always stick with Vector. It's a better class, and there is no good reason not to do it. Vector is in no way deprecated, and it never will be.
Second, it's not hard at all to get a Foo[] array from a Vector. Just pass a Foo[] array to the toArray() function and it will fill it. It's dead easy. If you're really lazy you don't even have to size it correctly, the class will do that for you. Just tell it what type it should be in and it will do all the work. This is literally one line of code.
Re:Reading Generified Code Makes My Brain Hurt (Score:3, Interesting)
Seriously, I've been using the Collections framework since it was beta, and have never had a situation where its performance wasn't "good enough". Is it really "that bad" for your business need? Or are you committing the cardinal sin of premature optimization???
Re:Java 'generics' are not real generics (Score:2, Interesting)
CLR can do this too, but only if it *also* knows the specific instance of the list (what the generic parameter types are), so it can remove fewer checks. This makes optimization harder as well since each use of a generic parameter can potentially block inlining and/or hoisting.
There is no "only if" here, as you seem to imply. If a generic is compatible with passed parameters the CLR would remove the check in JIT in all cases just like JVM. CLR is also capable of removing the check on upcast calls (i.e. list.get()) as
On top of that, the tests CLR has to do are *much* slower since they have to check many parallel type hierarchies (one per generic type references). For example, when passing a LinkedList of Integers to a parameter of type List of Numbers CLR has to in effect check both List assignable from LinkedList and Number assignable from Integer.
I'm not sure I understand. Implementing IEnumerable in generics doesn't imply any "parallel type hierarchies". Are you sure you are not confusing C++ templates and CLR generics?
So in the vast majority of code not only do you end up with more checks but slower ones, and CLR has to maintain a complicated hierarchy of instantiated types to optimize this.
Are you talking about the speed of JIT phase? This is O(1) step, who the hell cares. In run-time performance the CLR generics are a lot better than JVM ones even on reference-based collections (due to beforementioned upcast elimination). On value-based ones it's an order of magnitude difference.
All so primitives can be used faster in some cases, which is pretty ironic since in my experience these cases are usually easy to optimize by hand to use an array or patch out to inline C++ or JNI'd code.
This is only because Java is limited in a number of available primitives. CLR has a built-in support for extensible value-based types, so supporting them in generics without boxing is quite useful.
Re:Java 'generics' are not real generics (Score:3, Interesting)
It appears that you're confusing CLI generics and C++ templates. I must admit that I have little knowledge of C++ templates, but a comparison of Java's Generics by Type Erasure and C#/CLI's true generics definitely favours the latter.
The following set of slides by Peter Sestoft sums up the differences pretty well: http://www.itu.dk/courses/PFOO/F2006/diku-javacsha rpgenerics.pdf [www.itu.dk]
Slide no. 23 sums up the major advantages of the C#/CLI implementation:
Java simply does not allow this (slide no. 32):
Then of course there's the fact that C# type arguments can be value types, not only reference types. No boxing or unboxing is needed for value type arguments; hence better performance and less memory usage.