Why There Is Interface Pollution in Java 8

I was reading this interesting post about The Dark Side of Java 8. In it, Lukas Edger, the author, mentions how bad it is that in the JDK 8 the types are not simply called functions. For instance, in a language like C#, there is a set of predefined function types accepting any number of arguments with an optional return type (Func and Action each one going up to 16 parameters of different types T1, T2, T3, …, T16), but in the JDK 8 what we have is a set of different functional interfaces, with different names and different method names, and whose abstract methods represent a subset of well know function signatures (i.e. nullary, unary, binary, ternary, etc).

The Type Erasure Issue

So, in a way, both languages suffer from some form of interface pollution (or delegate pollution in C#). The only difference is that in C# they all have the same name. In Java, unfortunately, due to type erasure, there is no difference between Function<T1,T2> and Function<T1,T2,T3> or Function<T1,T2,T3,...Tn>, so evidently, we couldn’t simply name them all the same way and we had to come up with creative names for all possible types of function combinations.

Don’t think the expert group did not struggle with this problem. In the words of Brian Goetz in the lambda mailing list:

[...] As a single example, let’s take function types. The lambda strawman offered at devoxx had function types. I insisted we remove them, and this made me unpopular. But my objection to function types was not that I don’t like function types — I love function types — but that function types fought badly with an existing aspect of the Java type system, erasure. Erased function types are the worst of both worlds. So we removed this from the design.

But I am unwilling to say “Java never will have function types” (though I recognize that Java may never have function types.) I believe that in order to get to function types, we have to first deal with erasure. That may, or may not be possible. But in a world of reified structural types, function types start to make a lot more sense [...]

So, how does this affect us as developers? The following is a categorization of some of the most important new functional interfaces (and some old ones) in the JDK 8 organized by function return type and the number of expected arguments in the interface method.

Functions with Void Return Type

In the realm of functions with a void return type, we have the following:

Type of Function Lambda Expression Known Functional Interfaces
Nullary
				() -> doSomething() 
			
Runnable
Unary
				foo  -> System.out.println(foo)
			
Consumer
IntConsumer
LongConsumer
DoubleConsumer
Binary
				(console,text) -> console.print(text)
			
BiConsumer
ObjIntConsumer
ObjLongConsumer
ObjDoubleConsumer
n-ary
				(sender,host,text) -> sender.send(host, text)
			
Define your own

Functions with Some Return Type T

In the realm of functions with a return type T, we have the following:

Type of Function Lambda Expression Known Functional Interfaces
Nullary
				() -> "Hello World" 
			
Callable
Supplier
BooleanSupplier
IntSupplier
LongSupplier
DoubleSupplier
Unary
				n -> n + 1   
				n -> n >= 0  
			
Function
IntFunction
LongFunction
DoubleFunction

IntToLongFunction
IntToDoubleFunction
LongToIntFunction
LongToDoubleFunction
DoubleToIntFunction
DoubleToLongFunction

UnaryOperator
IntUnaryOperator
LongUnaryOperator
DoubleUnaryOperator

Predicate
IntPredicate
LongPredicate
DoublePredicate

Binary
				(a,b) -> a > b ? 1 : 0
				(x,y) -> x + y 
				(x,y) -> x % y == 0
			
Comparator
BiFunction
ToIntBiFunction
ToLongBiFunction
ToDoubleBiFunction

BinaryOperator
IntBinaryOperator
LongBinaryOperator
DoubleBinaryOperator

BiPredicate

n-ary
				(x,y,z) -> 2 * x + Math.sqrt(y) - z
			
Define your own

An advantage of this approach is that we can define our own interface types with methods accepting as many arguments as we would like, and we could use them to create lambda expressions and method references as we see fit. In other words, we have the power to pollute the world with yet even more new functional interfaces. Also we can create lambda expressions even for interfaces in earlier versions of the JDK or for earlier versions of our own APIs that defined SAM types like these. And so now we have the power to use Runnable and Callable as functional interfaces.

However, these interfaces become more difficult to memorize since they all have different names and methods.

Still, I am one of those wondering why they didn’t solve the problem as in Scala, defining interfaces like Function0, Function1, Function2, …, FunctionN. Perhaps, the only argument I can come up with against that is that they wanted to maximize the possibilities of defining lambda expressions for interfaces in earlier versions of the APIs as mentioned before.

Lack of Value Types

So, evidently type erasure is one driving force here. But if you are one of those wondering why we also need all these additional functional interfaces with similar names and method signatures and whose only difference is the use of a primitive type, then let me remind you that in Java we also lack of value types like those in a language like C#. This means that the generic types used in our generic classes can only be reference types, and not primitive types.

In other words, we can’t do this:

List<int> numbers = asList(1,2,3,4,5);

But we can indeed do this:

List<Integer> numbers = asList(1,2,3,4,5);

The second example, though, incurs in the cost of boxing and unboxing of the wrapped objects back and forth from/to primitive types. This can become really expensive in operations dealing with collections of primitive values. So, the expert group decided to create this explosion of interfaces to deal with the different scenarios. To make things “less worse” they decided to only deal with three basic types: int, long and double.

Quoting the words of Brian Goetz in the lambda mailing list:

More generally: the philosophy behind having specialized primitive streams (e.g., IntStream) is fraught with nasty tradeoffs. On the one hand, it’s lots of ugly code duplication, interface pollution, etc. On the other hand, any kind of arithmetic on boxed ops sucks, and having no story for reducing over ints would be terrible. So we’re in a tough corner, and we’re trying to not make it worse.

Trick #1 for not making it worse is: we’re not doing all eight primitive types. We’re doing int, long, and double; all the others could be simulated by these. Arguably we could get rid of int too, but we don’t think most Java developers are ready for that. Yes, there will be calls for Character, and the answer is “stick it in an int.” (Each specialization is projected to ~100K to the JRE footprint.)

Trick #2 is: we’re using primitive streams to expose things that are best done in the primitive domain (sorting, reduction) but not trying to duplicate everything you can do in the boxed domain. For example, there’s no IntStream.into(), as Aleksey points out. (If there were, the next question(s) would be “Where is IntCollection? IntArrayList? IntConcurrentSkipListMap?) The intention is many streams may start as reference streams and end up as primitive streams, but not vice versa. That’s OK, and that reduces the number of conversions needed (e.g., no overload of map for int -> T, no specialization of Function for int -> T, etc.)

We can see that this was a difficult decision for the expert group. I think few would agree that this is cool, and most of us would most likely agree it was necessary.

The Checked Exceptions Issue

There was a third driving force that could have made things even worse, and it is the fact that Java supports two type of exceptions: checked and unchecked. The compiler requires that we handle or explicitly declare checked exceptions, but it requires nothing for unchecked ones. So, this creates an interesting problem, because the method signatures of most of the functional interfaces do not declare to throw any exceptions. So, for instance, this is not possible:

Writer out = new StringWriter();
Consumer<String> printer = s -> out.write(s); //oops! compiler error

It cannot be done because the write operation throws a checked exception (i.e. IOException) but the signature of the Consumer method does not declare it throws any exception at all. So, the only solution to this problem would have been to create even more interfaces, some declaring exceptions and some not (or come up with yet another mechanism at the language level for exception transparency). Again, to make things “less worse” the expert group decided to do nothing in this case.

In the words of Brian Goetz in the lambda mailing list:

Yes, you’d have to provide your own exceptional SAMs. But then lambda conversion would work fine with them.

The EG discussed additional language and library support for this problem, and in the end felt that this was a bad cost/benefit tradeoff.

Library-based solutions cause a 2x explosion in SAM types (exceptional vs not), which interact badly with existing combinatorial explosions for primitive specialization.

The available language-based solutions were losers from a complexity/value tradeoff. Though there are some alternative solutions we are going to continue to explore — though clearly not for 8 and probably not for 9 either.

In the meantime, you have the tools to do what you want. I get that you prefer we provide that last mile for you (and, secondarily, your request is really a thinly-veiled request for “why don’t you just give up on checked exceptions already”), but I think the current state lets you get your job done.

So, it’s up to us, the developers, to craft yet even more interface explosions to deal with these in a case-by-case basis:

interface IOConsumer<T> {
   void accept(T t) throws IOException;
}

static<T> Consumer<T> exceptionWrappingBlock(IOConsumer<T> b) {
   return e -> {
	try { b.accept(e); }
	catch (Exception ex) { throw new RuntimeException(ex); }
   };
}

In order to do:

Writer out = new StringWriter();
Consumer<String> printer = exceptionWrappingBlock(s -> out.write(s));

Probably, in the future (maybe JDK 9) when we get Support for Value Types in Java and Reification, we will be able to get rid of (or at least no longer need to use anymore) these multiple interfaces.

In summary, we can see that the expert group struggled with several design issues. The need, requirement or constraint to keep backwards compatibility made things difficult, then we have other important conditions like the lack of value types, type erasure and checked exceptions. If Java had the first and lacked of the other two the design of JDK 8 would probably have been different. So, we all must understand that these were difficult problems with lots of tradeoffs and the EG had to draw a line somewhere and make a decisions.

So, when we find ourselves in the dark side of Java 8, probably we need to remind ourselves that there is a reason why things are dark in that side of the JDK :-)

Memoized Fibonacci Numbers with Java 8

Since today is Fibonacci Day, I decided that it would be interesting to publish something related to it.

I believe one of the first algorithms we all see when learning non-linear recursion is that of calculating a Fibonacci number. I found a great explanation on the subject in the book Structure and Interpretation of Computer Programs [SIC] and I dedicated some time to playing with the Fibonacci algorithm just for fun. While doing so I found an interesting way to improve the classical recursive algorithm by using one of the new methods (added in Java 8) in the Map interface and which I used here to implement a form of memoization.

Classical Recursive Fibonacci

In the classical definition of Fibonacci we learn that:

fib(n) = \left\{ \begin{array}{ll} 0 & \mbox{if n=0}\\1 & \mbox{if n=1}\\fibn(n-1)+fib(n-2) & \mbox{otherwise} \end{array} \right.

We program this very easily in Java:

public static long fibonacci(int x) {
   if(x==0 || x==1)
      return x;
   return fibonacci(x-1) + fibonacci(x-2);
}

Now the problem with this algorithm is that, with the exception of the base case, we recursively invoke our function twice and interestingly one of the branches recalculates part of other branch every time we invoke the function. Consider the following image (taken from SIC) that represents an invocation to fibonacci(5).

Clearly the branch to the right is redoing all the work already done during the recursive process carried out by the left branch. Can you see how many times fibonacci(2) was calculated? The problem gets worse as the function argument gets bigger. In fact this problem is so serious that the calculation of a small argument like fibonacci(50) might take quite a long time.

Memoized Recursive Fibonacci

However, there is a way to improve the performance of the original recursive algorithm (I mean without having to resort to a linear-time algorithm using, for instance, Binet’s formula).

The serious problem we have in the original algorithm is that we do too much rework. So, we could alleviate the problem by using memoization, in other words by providing a mechanism to avoid repeated calculations by caching results in a lookup table that can later be used to retrieve the values of already processed arguments.

In Java we could try to store the Fibonacci numbers in a hast table or map. In the case of the left branch we’ll have to run the entire recursive process to obtain the corresponding Fibonacci sequence values, but as we find them, we update the hash table with the results obtained. This way, the right branches will only perform a table lookup and the corresponding value will be retrieved  from the hash table and not through a recursive calculation again.

Some of the new methods in the class Map , in Java 8, simplify a lot the writing of such algorithm, particularly the method computeIfAbsent(key, function). Where the key would be the number for which we would like to look up the corresponding Fibonacci number and the function would be a lambda expression capable of triggering the recursive calculation if the corresponding value is not already present in the map.

So, we can start by defining a map and putting the values in it for the base cases, namely, fibonnaci(0) and fibonacci(1):

private static Map<Integer,Long> memo = new HashMap<>();
static {
   memo.put(0,0L); //fibonacci(0)
   memo.put(1,1L); //fibonacci(1)
}

And for the inductive step all we have to do is redefine our Fibonacci function as follows:

public static long fibonacci(int x) {
   return memo.computeIfAbsent(x, n -> fibonacci(n-1) + fibonacci(n-2));
}

As you can see, the method computeIfAbsent will use the provided lambda expression to calculate the Fibonacci number when the number is not present in the map, this recursive process will be triggered entirely for the left branch, but the right branch will use the momoized values. This represents a significant improvement.

Based on my subjective observation, this improved recursive version was outstandingly faster for a discrete number like fibonacci(70). With this algorithm we can safely calculate up to fibonacci(92) without running into long overflow. Even better, to be sure that our algorithm would never cause overflows without letting the user know we could also use one of the new methods in Java 8 added to the Math class and which throws an ArithmeticException when overflow occurs. So we could change our code as follows:

public static long fibonacci(int x) {
   return memo.computeIfAbsent(x, n -> Math.addExact(fibonacci(n-1),
                                                     fibonacci(n-2)));
}

This method would start failing for fibonacci(93). If we need to go over 92 we would have to use BigInteger in our algorithm, instead of just long.

Notice that the memozied example uses mutations, therefore, in order to use this code in a multithreaded environment we would first need to add some form of synchronization to the proposed code, or use a different map implementation, perhaps a ConcurrentHashMap, which evidently, may impact performance as well. Arguably, this would still be better than the original recursive algorithm.

Java 8 Optional Objects

In this post I present several examples of the new Optional objects in Java 8 and I make comparisons with similar approaches in other programming languages, particularly the functional programming language SML and  the JVM-based programming language Ceylon, this latter currently under development by Red Hat.

I think it is important to highlight that the introduction of optional objects has been a matter of debate. In this article I try to present my perspective of the problem and I do an effort to show arguments in favor and against the use of optional objects. It is my contention that in certain scenarios the use of optional objects is valuable, but ultimately everyone is entitled to an opinion and I just hope this article helps the readers to make an informed one just as writing it helped me understand this problem much better.

About the Type of Null

In Java we use a reference type to gain access to an object, and when we don’t have a specific object to make our reference point to, then we set such reference to null to imply the absence of a value.

In Java null is actually a type, a special one: it has no name, we cannot declare variables of its type, or cast any variables to it, in fact there is a single value that can be associated with it (i.e. the literal null), and unlike any other types in Java, a null reference can be safely assigned to any other reference types (See JLS  3.10.7 and 4.1).

The use of null is so common that we rarely meditate on it: field members of objects are automatically initialized to null and programmers typically initialize reference types to null when they don’t have an initial value to give them and, in general, null is used everywhere to imply that, at certain point, we don’t know or we don’t have a value to give to a reference.

About the Null Pointer Reference Problem

Now, the major problem with the null reference is that if we try to dereference it then we get the ominous and well known NullPointerException.

When we work with a reference obtained from a different context than our code (i.e. as the result of a method invocation or when we receive a reference as an argument in a method we are working on),  we all would like to avoid this error that has the potential to make our application crash, but often the problem is not noticed early enough and it finds its way into production code where it waits for the right moment to fail (which is typically a Friday at the end of the month, around 5 p.m. and just when you are about to leave the office to go to the movies with your family or drink some beers with your friends). To make things worse, the place where your code fails is rarely the place where the problem originated, since your reference could have been set to null far away from the place in your code where you intended to dereference it. So, you better cancel those plans for the Friday night…

It’s worth mentioning that this concept of null references was first introduced by Tony Hoare, the creator of ALGOL, back in 1965. The consequences were not so evident in those days, but he later regretted his design and he called it “a billion dollars mistake“, precisely referring to the uncountable amount of hours that many of us have spent, since then, fixing this kind null dereferencing problems.

Wouldn’t it be great if the type system could tell the difference between a reference that, in a specific context, could be potentially null from one that couldn’t? This would help a lot in terms of type safety because the compiler could then enforce that the programmer do some verification for references that could be null at the same time that it allows a direct use of the others. We see here an opportunity for improvement in the type system. This could be particularly useful when writing the public interface of APIs because it would increase the expressive power of the language, giving us a tool, besides documentation, to tell our users that a given method may or may not return a value.

Now, before we delve any further, I must clarify that this is an ideal that modern languages will probably pursue (we’ll talk about Ceylon and Kotlin later), but it is not an easy task to try to fix this hole in a programming language like Java when we intend to do it as an afterthought. So, in the coming paragraphs I present some scenarios in which I believe the use of optional objects could arguably alleviate some of this burden. Even so, the evil is done, and nothing will get rid of null references any time soon, so we better learn to deal with them. Understanding the problem is one step and it is my opinion that these new optional objects are just another way to deal with it, particularly in certain specific scenarios in which we would like to express the absence of a value.

Finding Elements

There is a set of idioms in which the use of null references is potentially problematic. One of those common cases is when we look for something that we cannot ultimately find. Consider now the following simple piece of code used to find the first fruit in a list of fruits that has a certain name:

public static Fruit find(String name, List<Fruit> fruits) {
   for(Fruit fruit : fruits) {
      if(fruit.getName().equals(name)) {
         return fruit;
      }
   }
   return null;
}

As we can see, the creator of this code is using a null reference to indicate the absence of a value that satisfies the search criteria (7). It is unfortunate, though, that it is not evident in the method signature that this method may not return a value, but a null reference..

Now consider the following code snippet, written by a programmer expecting to use the result of the method shown above:

List<Fruit> fruits = asList(new Fruit("apple"),
                            new Fruit("grape"),
                            new Fruit("orange"));

Fruit found = find("lemon", fruits);
//some code in between and much later on (or possibly somewhere else)...
String name = found.getName(); //uh oh!

Such simple piece of code has an error that cannot be detected by the compiler, not even by simple observation by the programmer (who may not have access to the source code of the find method). The programmer,  in this case, has naively failed to recognize the scenario in which the find method above could return a null reference to indicate the absence of a value that satisfies his predicate. This code is waiting to be executed to simply fail and no amount of documentation is going to prevent this mistake from happening and the compiler will not even notice that there is a potential problem here.

Also notice that the line where the reference is set to null (5) is different from the problematic line (7). In this case they were close enough, in other cases this may not be so evident.

In order to avoid the problem what we typically do is that we check if a given reference is null before we try to dereference it. In fact, this verification is quite common and in certain cases this check could be repeated so many times on a given reference that Martin Fowler (renown for hist book on refactoring principles) suggested that for these particular scenarios such verification could  be avoided with the use of what he called a Null Object. In our example above, instead of returning null, we could have returned a NullFruit object reference which is an object of type Fruit that is hollowed inside and which, unlike a null reference, is capable of properly responding to the same public interface of a Fruit.

Minimum and Maximum

Another place where this could be potentially problematic is when reducing a collection to a value, for instance to a maximum or minimum value. Consider the following piece of code that can be used to determine which is the longest string in a collection.

public static String longest(Collection<String> items) {
   if(items.isEmpty()){
      return null;
   }
   Iterator<String> iter = items.iterator();
   String result = iter.next();
   while(iter.hasNext()) {
       String item = iter.next();
       if(item.length() > result.length()){
          result = item;
       }
   }
   return result;
}

In this case the question is what should be returned when the list provided is empty? In this particular case a null value is returned, once again, opening the door for a potential null dereferencing problem.

The Functional World Strategy

It’s interesting that in the functional programming paradigm, the statically-typed programming languages evolved in a different direction. In languages like SML or Haskell there is no such thing as a null value that causes exceptions when dereferenced. These languages provide a special data type capable of holding an optional value and so it can be conveniently used to also express the possible absence of a value.  The following piece of code shows the definition of the SML option type:

datatype 'a option = NONE | SOME of 'a

As you can see, option is a data type with two constructors, one of them stores nothing (i.e. NONE) whereas the other is capable of storing a polymorphic value of some value type 'a (where 'a is just a placeholder for the actual type).

Under this model, the piece of code we wrote before in Java, to find a fruit by its name, could be rewritten in SML as follows:

fun find(name, fruits) =
   case fruits of
        [] => NONE
      | (Fruit s)::fs => if s = name
                         then SOME (Fruit s)
                         else find(name,fs)

There are several ways to achieve this in SML, this example just shows one way to do it. The important point here is that there is no such thing as null, instead a value NONE is returned when nothing is found (3), and a value SOME fruit is returned otherwise (5).

When a programmer uses this find method, he knows that it returns an option type value and therefore the programmer is forced to check the nature of the value obtained to see if it is either NONE (6) or SOME fruit (7), somewhat like this:

let
   val fruits = [Fruit "apple", Fruit "grape", Fruit "orange"]
   val found = find("grape", fruits)
in
   case found of
       NONE => print("Nothing found")
     | SOME(Fruit f) => print("Found fruit: " ^ f)
end

Having to check for the true nature of the returned option makes it impossible to misinterpret the result.

Java Optional Types

It’s a joy that finally in Java 8 we’ll have a new class called Optional that allows us to implement a similar idiom as that from the functional world. As in the case of of SML, the Optional type is polymorphic and may contain a value or be empty. So, we could rewrite our previous code snippet as follows:

public static Optional<Fruit> find(String name, List<Fruit> fruits) {
   for(Fruit fruit : fruits) {
      if(fruit.getName().equals(name)) {
         return Optional.of(fruit);
      }
   }
   return Optional.empty();
}

As you can see, the method now returns an Optional reference (1), if something is found, the Optional object is constructed with a value (4), otherwise is constructed empty (7).

And the programmer using this code would do something as follows:

List<Fruit> fruits = asList(new Fruit("apple"),
                            new Fruit("grape"),
                            new Fruit("orange"));

Optional<Fruit> found = find("lemon", fruits);
if(found.isPresent()) {
   Fruit fruit = found.get();
   String name = fruit.getName();
}

Now it is made evident in the type of the find method that it returns an optional value (5), and the user of this method has to program his code accordingly (6-7).

So we see that  the adoption of this functional idiom is likely to make our code safer, less prompt to null dereferencing problems and as a result more robust and less error prone. Of course, it is not a perfect solution because, after all, Optional references can also be erroneously set to null references, but  I would expect that programmers stick to the convention of not passing null references where an optional object is expected, pretty much as we today consider a good practice not to pass a null reference where a collection or an array is expected, in these cases the correct is to pass an empty array or collection. The point here is that now we have a mechanism in the API that we can use to make explicit that for a given reference we may not have a value to assign it and the user is forced, by the API, to verify that.

Quoting an article I reference later about the use of optional objects in the Guava Collections framework: “Besides the increase in readability that comes from giving null a name, the biggest advantage of Optional is its idiot-proof-ness. It forces you to actively think about the absent case if you want your program to compile at all, since you have to actively unwrap the Optional and address that case”.

Other Convenient Methods

As of the today, besides the static methods of and empty explained above, the Optional class contains the following convenient instance methods:

ifPresent() Which returns true if a value is present in the optional.
get() Which returns a reference to the item contained in the optional object, if present, otherwise throws a NoSuchElementException.
ifPresent(Consumer<T> consumer) Which passess the optional value, if present, to the provided Consumer (which could be implemented through a lambda expression or method reference).
orElse(T other) Which returns the value, if present, otherwise returns the value in other.
orElseGet(Supplier<T> other) Which returns the value if present, otherwise returns the value provided by the Supplier (which could be implemented with a lambda expression or method reference).
orElseThrow(Supplier<T> exceptionSupplier) Which returns the value if present, otherwise throws the exception provided by the Supplier (which could be implemented with a lambda expression or method reference).

Avoiding Boilerplate Presence Checks

We can use some of the convenient methods mentioned above to avoid the need of having to check if a value is present in the optional object. For instance, we may want to use a default fruit value if nothing is found, let’s say that we would like to use a “Kiwi”. So we could rewrite our previous code like this:

Optional<Fruit> found = find("lemon", fruits);
String name = found.orElse(new Fruit("Kiwi")).getName();

In this other example, the code prints the fruit name to the main output, if the fruit is present. In this case, we implement the Consumer with a lambda expression.

Optional<Fruit> found = find("lemon", fruits);
found.ifPresent(f -> { System.out.println(f.getName()); });

This other piece of code uses a lambda expression to provide a Supplier which can ultimately provide a default answer if the optional object is empty:

Optional<Fruit> found = find("lemon", fruits);
Fruit fruit = found.orElseGet(() -> new Fruit("Lemon"));

Clearly, we can see that these convenient methods simplify a lot having to work with the optional objects.

So What’s Wrong with Optional?

The question we face is: will Optional get rid of null references? And the answer is an emphatic no! So, detractors immediately question its value asking: then what is it good for that we couldn’t do by other means already?

Unlike functional languages like SML o Haskell which never had the concept of null references, in Java we cannot simply get rid of the null references that have historically existed. This will continue to exist, and they arguably have their proper uses (just to mention an example: three-valued logic).

I doubt that the intention with the Optional class is to replace every single nullable reference, but to help in the creation of more robust APIs in which just by reading the signature of a method we could tell if we can expect an optional value or not  and force the programmer to use this value accordingly. But ultimately, Optional will be just another reference and subject to same weaknesses of every other reference in the language. It is quite evident that Optional is not going to save the day.

How these optional objects are supposed to be used or whether they are valuable or not in Java has been the matter of a heated debate in the project lambda mailing list. From the detractors we hear interesting arguments like:

  • The fact that other alternatives exist ( i.e. the Eclipse IDE supports a set of proprietary annotations for static analysis of nullability, the JSR-305 with annotations like @Nullable and @NonNull).
  • Some would like it to be usable as in the functional world, which is not entirely possible in Java since the language lacks many features existing in functional programming languages like SML or Haskell (i.e. pattern matching).
  • Others argue about how it is impossible to retrofit preexisting code to use this idiom (i.e. List.get(Object)which will continue to return null).
  • And some complain about the fact that the lack of language support for optional values creates a potential scenario in which Optional could be used inconsistently in the APIs, by this creating incompatibilities, pretty much like the ones we will have with the rest of the Java API which cannot be retrofitted to use the new Optional class.
  • A compelling argument is that if the programmer invokes the get method in an optional object, if it is empty, it will raise a NoSuchElementException, which is pretty much the same problem that we have with nulls, just with a different exception.

So, it would appear that the benefits of Optional are really questionable and are probably constrained to improving readability and enforcing public interface contracts.

Optional Objects in the Stream API

Irrespective of the debate, the optional objects are here to stay and they are already being used in the new Stream API in methods like findFirstfindAnymax and min. It could be worth mentioning that  a very similar class has been in used in the successful Guava Collections Framework.

For instance, consider the following example where we extract from a stream the last fruit name in alphabetical order:

Stream<Fruit> fruits = asList(new Fruit("apple"),
                              new Fruit("grape")).stream();
Optional<Fruit> max = fruits.max(comparing(Fruit::getName));
if(max.isPresent()) {
   String fruitName = max.get().getName(); //grape
}

Or this another one in which we obtain the first fruit in a stream

Stream<Fruit> fruits = asList(new Fruit("apple"),
                              new Fruit("grape")).stream();
Optional<Fruit> first = fruits.findFirst();
if(first.isPresent()) {
   String fruitName = first.get().getName(); //apple
}

Ceylon Programming Language and Optional Types

Recently I started to play a bit with the Ceylon programming language since I was doing a research for another post that I am planning to publish soon in this blog. I must say I am not a big fan of Ceylon, but still I found particularly interesting that in Ceylon this concept of optional values is taken a bit further, and the language itself offers some syntactic sugar for this idiom. In this language we can mark any type with a ? (question mark) in order to indicate that its type is an optional type.

For instance, this find function would be very similar to our original Java version, but this time returning an optional Fruit? reference (1). Also notice that a null value is compatible with the optional Fruit? reference (7).

Fruit? find(String name, List<Fruit> fruits){
   for(Fruit fruit in fruits) {
      if(fruit.name == name) {
         return fruit;
      }
   }
   return null;
}

And we could use it with this Ceylon code, similar to our last Java snippet in which we used an optional value:

List<Fruit> fruits = [Fruit("apple"),Fruit("grape"),Fruit("orange")];
Fruit? fruit = find("lemon", fruits);
print((fruit else Fruit("Kiwi")).name);

Notice the use of the else keyword here is pretty similar to the method orElse in the Java 8 Optional class. Also notice that the syntax is similar to the declaration of C# nullable types, but it means something totally different in Ceylon. It may be worth mentioning that Kotlin, the programming language under development by Jetbrains, has a similar feature related to null safety (so maybe we are before a trend in programming languages).

An alternative way of doing this would have been like this:

List<Fruit> fruits = [Fruit("apple"),Fruit("grape"),Fruit("orange")];
Fruit? fruit = find("apple", fruits);
if(exists fruit){
   String fruitName = fruit.name;
   print("The found fruit is: " + fruitName);
} //else...

Notice the use of the exists keyword here (3) serves the same purpose as the isPresent method invocation in the Java Optional class.

The great advantage of Ceylon over Java is that they can use this optional type in the APIs since the beginning, within the realm of their language they won’t have to deal with incompatibilities, and it can be fully supported everywhere (perhaps their problem will be in their integration with the rest of the Java APIs, but I have not studied this yet).

Hopefully, in future releases of Java, this same syntactic sugar from Ceylon and Kotlin will also be made available in the Java programming language, perhaps using, under the hood, this new Optional class introduced in Java 8.

Further Reading

Review of Programming Languages Course from Coursera

I just finished taking the course Programming Languages by Dan Grossman from the University of Washington in Coursera and this post is a review of the course from my perspective.

Programming Languages is a course intended to teach many important principles in programming with a strong emphasis in functional programming.

Among the most interesting concepts are the contrasts between static and dynamic typing (and type inference), and the contrasts between functional programming and object-oriented programming. But The course covers other fundamental concepts like mutability / immutability, algebraic data types and pattern matching, recursion and tail recursion, first-class functions and closures, high-order programming, currying, modules, parametric polymorphism, thunks and lazy evaluation, streams, memoization, macros, object-oriented inheritance, mixins, and many other interesting topics.

Every week a set of topics is covered in great detail in a series of videos that may have a length between 10 to 20 minutes. The material is released every week as course progresses. The materials covered in the videos are also provided in written format for further analysis and easier reviewing. Every week a homework is made available to the students. The homework consists of a series of exercises of increasing difficulty. Sometimes the homework contains challenge exercises that can be solved for some extra points. In my case, solving every homework took an average time between 4 to 8 hours, and the challenge exercises sometimes took me almost a similar amount time (mostly due to the lack of specification and not necessarily due to their complexity, although they were more complex than the rest of the exercises in the homework).

Students submit their homework and an automatic grading system reviews it by applying a series of tests. The failing tests are reported to the students so that they can fix the problems and resubmit again. Only the first two submissions count for the final grade (an average is used for final grading purposes), although students are allowed to submit as many times as they want. The grading system works pretty well, but there were occasional problems with it that were timely corrected. Perhaps my biggest complaint is that the autograder, many times, does not provided enough information to determine what was wrong with the homework submission and this lead to certain amount of frustration when trying to figure out what to do to solve the problems, above all because only the first two submissions count for the final grade. Particularly for the cases of challenge exercises the information provided by the autograder upon failure of any tests was really scarce.

Also, students are required to submit their homework for peer-reviews. And they are required to peer-review other students’ homework. This exercise is intended to give a broader perspective to the students by means of reading someone else’s code. This way, the student can find better solutions created by others or spot problems in their own solutions, etc. Also, during the peer-review process the right/best solutions for the exercises were shared with all students participating in the reviews. Therefore, doing the reviews was the best way to find out the answers to all exercises in the homework.

On week 4 and week 8 the students take an online exam consisting in questions with multiple selection answers. The questions are based on the topics covered in previous weeks. Some of the questions could be based on understanding of the theoretical concepts, and some question consist in practical applications of those concepts. Once the student starts the exam there is a time limit to complete it. I think something around 90 minutes.

The following is detailed outline of the topics covered every week.

Week 1: SML

  • ML Expressions and Variable Bindings
  • Variables are Immutable
  • Function Bindings
  • Pairs and Other Tuples
  • Lists
  • Let Expressions
  • Options
  • Some Other Expressions and Operators
  • Lack of Mutation and Benefits Thereof
  • The Pieces of a Programming Language

The homework consisted in the development of series of functions to deal with dates.

Week 2: SML

  • Conceptual Ways to Build New Types
  • Records
  • By Name vs By Position, Syntactic Sugar and the Truth about Tuples
  • Datatype Bindings
  • How ML Does Not Provide Access to Data Type Values
  • How ML Provides Access to Datatype Values: Case Expressions
  • Type Synonyms
  • Lists and Options Are Datatypes
  • Polymorphic Datatypes
  • Pattern Matching
  • Type Inference
  • Polymorphic Types and Equality Types
  • Nested Patterns
  • Exceptions
  • Tail Recursion and Accumulators

The homework consisted in a series of exercises related to a card game, a variation of solitaire.

Week 3: SML

  • Functions as Arguments
  • Polymorphic Types and Functions as Arguments
  • Anonymous Functions
  • Unnecessary Function Wrapping
  • Maps and Filters
  • Returning Functions
  • Lexical Scope
  • Environments and Closures
  • Fold
  • Combining Functions
  • Currying and Partial Application
  • The Value Restriction
  • Mutation via ML References
  • Callbacks
  • Abstract Datatypes
  • Closures in Other Languages

The homework consisted in a series of exercises related to the implementation of pattern matching and type inference.

Week 4: SML

  • What is Type Inference
  • Overview of ML Type Inference
  • Mutual Recursion
  • Modules for Namespace Management
  • Signatures
  • Hiding Things
  • Equivalent Implementations
  • Benefits of Side-Effect-Free Programming
  • Standard Equivalences

There was not homework because this week was the week of the midterm exam.

Week 5: Racket

  • Racket vs Scheme
  • Functions, Lists
  • Syntax and Parentheses
  • Dynamic Typing
  • Local Bindings
  • Top-Level Definitions
  • Mutability/Immutability
  • Delayed Evaluation with Thunks
  • Lazy Evaluation
  • Streams
  • Memoization
  • Macros

The homework consisted in the implementation of exercises related to streams and the challenge was about defining macros.

Week 6: Racket

  • Datatype Programming without Datatypes
  • Recursive Datatypes via Racket Lists
  • Recursive Datatypes via Racket’s struc
  • Implementing Programming Languages
  • Interpreters and Compilers
  • Implementing Closures
  • Defining Macros via Functions in the Metalanguage
  • ML versus Racket
  • What Is Static Checking?
  • Correctness, Soundness, Completeness, Undecidability
  • Weak Typing
  • Advantages and Disadvantages of Static Checking

The homework consisted in the implementation of an interpreter for small programming language called MUPL (Made Up Programming Language).

Week 7: Ruby

  • Objects, Classes, Methods, Variables
  • Visibility and Getters/Setters
  • Everything is an Object
  • Duck Typing
  • Arrays
  • Blocks
  • Hashes and Ranges
  • Subclassing and Inheritance
  • Overriding and Dynamic Dispatch
  • Method Lookup Definition
  • Dynamic Dispatch versus Closures

The homework consisted in the implementation of a tetris game.

Week 8: Ruby

  • OOP Versus Functional Decomposition
  • Multimethods
  • Mutiple Inheritance
  • Mixins
  • Abstract Methods
  • The Subtyping Relation
  • Function Subtyping
  • Subtyping for OOP
  • Covariance
  • Generics Versus Subtyping
  • Bounded Polymorphism

The homework consisted in the implementation of an interpreter for a small language of two-dimensional geometric objects.

Final Thoughts

In my opinion this is one of the best courses I have ever taken. It was fascinating, the topics were covered in great detail. The professor Grossman explained every topic with plenty of examples and almost all the homework was really interesting and the problems were increasingly challenging. Having the opportunity to work with different programming languages of different paradigms and typing styles really enriched my understanding of how programming languages work, and how to make proper comparisons between their features. It was particularly interesting the effort of converting some implementations of solutions to some problems from functional code to object-oriented code. I think that the course prepares the students to learn and assimilate other programming languages more rapidly by having covered concepts that are typically seen in other languages just with different syntax.

From the entire course I found the sections covering Ruby a little bit more uninteresting, and particularly the homework for the tetris game in week 7 was too simple and I found little could be learn from solving the exercise, but the rest of the material and homework were incredibly well thought and prepared.

The entire experience was really enriching and I feel that it has made me a better developer and has broadened my understand of programming languages and has rekindled my enthusiasm to learn even more about programming and other programming languages. It is definitely a course that I would highly recommend to any other developers.

Java Streams API Preview

For today’s post I want to share a series of examples that I have developed while trying the latest features in the Java 8 Stream API. In my last post I did a comparison of the features with those available in LINQ, and for today’s post I have decided to go a bit further and try to use the API to work with a small domain model. The examples developed here are based on the same examples presented in the LambdaJ Project.

The Data Model

For the examples I will use the following domain model:

lambda-model

You can see a full implementation of all the examples developed here by downloading the following Gist.

For the examples presented below assume that  in the context of the code there are three streams always available:

  • Stream<Person> persons.
  • Stream<Car> cars.
  • Stream<Sale> sales.

Challenge 1: Print All Car Brands

From a collection of cars print all car brands.

StringJoiner brands = cars.map(Car::getBrand)
                          .collect(toStringJoiner(","));
String allBrands = brands.toString();

For this example I have also use the new StringJoiner class.

Challenge 2: Select all Sales on Toyota

From a collection of sales, select all those that are related to Toyota cars.

List<Sale> toyotaSales;

toyotaSales = sales.filter(s -> s.getCar().getBrand().equals("Toyota"))
                   .collect(toList());

toyotaSales.forEach(System.out::println);

For this example I could have also used the forEach method in the stream to get all the sales printed. I did it this way just to illustrate that it is possible to collect all items in the stream into a list and from there we can process them. But ideally, I should have processed the items directly in the stream.

Challenge 3: Find Buys of the Youngest Person

From a collection of sales, find all those that are from the youngest buyer.

ToIntFunction<Entry<Person, List<Sale>>> byAge;
byAge = e -> e.getKey().getAge();
byYoungest = sales.collect(groupingBy(Sale::getBuyer))
                  .entrySet()
                  .stream()
                  .sorted(comparing(byAge))
                  .map(Entry::getValue)
                  .findFirst();
if(byYoungest.isPresent()) {
 System.out.println(byYoungest.get());
}

Challenge 4: Find Most Costly Sale

From a collection of sales, find the most costly of all of them.

Optional<Sale> mostCostlySale;
Comparator<Sale> byCost = comparing((ToDoubleFunction<Sale>)Sale::getCost)
                          .reverseOrder();

mostCostlySale = sales.sorted( byCost )
                      .findFirst();

if(mostCostlySale.isPresent()) {
	System.out.println(mostCostlySale.get());
}

Challenge 5: Sum of Sales from Male Buyers & Sellers

From a collection of sales find the sum of all buys/sells made by men.

double sum = sales.filter(s -> s.getBuyer().isMale()
                               && s.getSeller().isMale())
                  .mapToDouble(Sale::getCost)
                  .sum();

Challenge 6: Find the Age of the Youngest Buyer

From a collection of sales, find the age of the youngest buyer who bought for more than 40,000.

OptionalInt ageOfYoungest;

ageOfYoungest = sales.filter(sale -> sale.getCost() > 40000)
                     .map(Sale::getBuyer)
                     .mapToInt(Person::getAge)
                     .sorted()
                     .findFirst();

if(ageOfYoungest.isPresent()) {
	System.out.println(ageOfYoungest.getAsInt());
}

Challenge 7: Sort Sales by Cost

Sort a collection of sales by cost.

Comparator<Sale> byCost= comparing((ToDoubleFunction<Sale>) Sale::getCost);
List<Sale> sortedByCost;

sortedByCost = sales.sorted( byCost )
                    .collect(toList());

Challenge 8: Index Cars by Brand

From a collection of cars, index cars by the their brand.

Map<String,List<Car>> byBrand;
byBrand = cars.collect( groupingBy(Car::getBrand ));

Challenge 9: Find Most Bought Car

From a collection of sales find the most bought car.

ToIntFunction<Entry<Car,List<Sale>>> toSize = (e -> e.getValue().size());

Optional<Car> mostBought;

mostBought = sales.collect( groupingBy(Sale::getCar) )
                  .entrySet()
                  .stream()
                  .sorted( comparing(toSize).reverseOrder() )
                  .map(Entry::getKey)
                  .findFirst();

if(mostBought.isPresent()) {
   System.out.println(mostBought.get());
}

Related Posts

Futher Reading

Java Streams Preview vs .Net High-Order Programming with LINQ

In today’s post I want to share with you a comparison that I did this weekend between the high-order programming features from LINQ and those being incorporated into the new Java 8 streams API.  It is important to highlight that the Java Stream API is still work in progress. I have been downloading and building the source code of Project Lambda for almost a year now and I have seen how the API evolves over time, but the improvements are more frequent as we approach the general availability date (planed for September 2013).

Before I delve into the comparisons, I must say that as of today, the current implementation of the Java Stream API is far away from all the high-order programming functionality offered by LINQ. It has already been announced in the mailing list that there was not going to be enough time for all improvements originally planned and the JEP 108: Collection Enhancements for Third-party Libraries has been deferred to future releases of Java. I am not sure how much of the Stream API is going to be affected by this decision but I hope that in the remaining months, prior to the release, they do much more improvements in the current API, at least, bring it closer to LINQ.

At any rate, as of today, it appears that some basic functionality is already covered (filter, map, reduce, sort, etc.), but much more work is still needed for this API to be decently compared to LINQ in terms of functionality and my comparison here only focuses particularly in the high-order programming features. So, I really hope that it is only too early to write this article, and in the coming months things are going to improve dramatically.

Also, there are some inherent problems that have been evidently difficult to overcome. One of those affecting the design of the API is the lack of value types in Java. It appears that the expert group have been struggling with this, and their solution has been to make copies of some interfaces, defining some to deal with primitive types (i.e. IntStream, LongStream, DoubleStream) and others with reference types (i.e. Stream). This has also caused a diaspora of copies of other functional interfaces (aka interface polution), particularly in the case of the interface Function the problem is much more evident (i.e. ToIntFunction, ToLongFunction, ToDoubleFunction and Function).

Even when all this repetition is done to alleviate the performance problems inherent to boxing and unboxing of primitive types, this workaround has undesirable side effects: first, the need to overload methods for everyone of these interfaces. This overloading is undesirable because it makes it impossible to use a clean lambda expression for these cases, since the compiler is incapable to determine which of all the argument functional interfaces is the one being implemented by the lambda. So, to avoid compiler errors we are  forced to use awful type castings in the code, which make it more verbose and look pretty bad. And a secondary side effect is the incompatibility between primitive type and reference type collections (i.e. an Stream<Integer> is not the same as an IntStream).

About the Comparison

For this exercise I took the code examples from the book .Net 4.0 Generics a Beginners Guide by Sudipta Mukherjee, particularly his examples from Chapter 4: LINQ to Objects. I did not do all the examples simply because there are many features still lacking in the Java Stream API which make it impossible to replicate the same functionality.

The author of the book makes all these examples to demonstrate LINQ high-order programming functionality. It is not  my intention to write an article on LINQ and all it can do, but a comparison of what I could do with the high-order programming features in the Stream API today if I wanted to implement similar idioms as those I can so easily implement using LINQ functionality. Perhaps I may have misinterpreted the author in his use of all these examples to illustrate LINQ, or perhaps, the author only intended to illustrate just one aspect of this broad technology. Therefore, this article is  based on my understanding of his view of this part of LINQ in order to make a comparison, but I can understand that LINQ is more than just a bunch of high-order functions.

Now some may argue that there is no intention in making the stream API something comparable to LINQ, which may well be true, but still LINQ offers a good example of an API that makes extensive use of high-order programming to query collections, and even if the stream API is not yet intended to query things like XML or SQL (at least initially/yet), I feel that it is still a good benchmark to compare the power of expressiveness in this new Java API. For me, this gives me an idea of how evolved this API is today when compared to the power of expressiveness of this another state-of-art technology. As it is customary in the internet world, others may freely disagree with me and present their personal perspectives as I did here.

For the cases that I could implement, the examples are based on the use of Stream and IntStream classes, and the static methods available on the Streams and Collectors helper classes. In order to keep it simple and readable I am not showing in the code examples below all the imports and static imports that  I used. For that matter you may want to take a look at the following Gist where I have published most of these examples and where you will be able to see full imports needed to make the code run properly.

Restriction Operators

Challenge 1: Filtering

Say we have a List of names and we would like to find all those names where "am" occurs:


LINQ

string[] names = { "Sam", "Pamela", "Dave", "Pascal", "Erik" };
List<string> filteredNames = names.Where(c => c.Contains("am"))
                                  .ToList();

Java Streams

String[] names = {"Sam","Pamela", "Dave", "Pascal", "Erik"};
List<String> filteredNames = stream(names)
			     .filter(c -> c.contains("am"))
			     .collect(toList());

Challenge 2: Indexed Filtering

Find all the names in the array "names" where the length of the name is less than or equal to the index of the element + 1.


LINQ

string[] names = { "Sam", "Pamela", "Dave", "Pascal", "Erik" };
var nameList = names.Where((c, index) => c.Length <= index + 1).ToList();

Java Streams

Now this one was particularly tricky in Java because, as of today, the API for streams does not contain any methods that indicate the index of an element within the stream. So, I was forced to generate an indexed stream out of the original array:

String[] names = {"Sam","Pamela", "Dave", "Pascal", "Erik"};

List<String> nameList;
Stream<Integer> indices = intRange(1, names.length).boxed();
nameList = zip(indices, stream(names), SimpleEntry::new)
			.filter(e -> e.getValue().length() <= e.getKey())
			.map(Entry::getValue)
			.collect(toList());

Now, the lack of indices in the stream made the algorithm more verbose, but it was also interesting to notice the incompatibilities between primitive-type streams, like IntStream and reference type streams like Stream<Integer>. In this case, I was forced to transform the IntStream returned by intRange into a Stream<Integer> in order to make the argument compatible with the types expected by zip as you can see in the line #4.


Projection Operators

Challenge 3: Selecting/Mapping

Say we have a list of names and we would like to print “Hello” in front of all the names:


 LINQ

List<string> nameList1 = new List(){ "Anders", "David", "James",
                                     "Jeff", "Joe", "Erik" };
nameList1.Select(c => "Hello! " + c).ToList()
         .ForEach(c => Console.WriteLine(c));

Java Streams

List<String> nameList1 = asList("Anders", "David", "James",
                                "Jeff", "Joe", "Erik");
nameList1.stream()
	 .map(c -> "Hello! " + c)
	 .forEach(System.out::println);

Challenge 4: Selecting Many/Flattening

Suppose, we have a dictionary such that each key has a list of values attached to them. Now, we want to project all the elements in a single collection:


LINQ

Dictionary<string, List<string>> map = new Dictionary<string,List<string>>();
map.Add("UK", new List<string>() {"Bermingham", "Bradford", "Liverpool"});
map.Add("USA", new List<string>() {"NYC", "New Jersey", "Boston", "Buffalo"});
var cities = map.SelectMany(c => c.Value).ToList();

Java Streams

Once more, the Java syntax, as of today, is a bit more verbose. First, we must transform the map to an entry set, and from there, we can generate a stream that we can later flatten to a list of cities, as follows:

Map<String, List<String>> map = new LinkedHashMap<>();
map.put("UK", asList("Bermingham","Bradford","Liverpool"));
map.put("USA", asList("NYC","New Jersey","Boston","Buffalo"));

FlatMapper<Entry<String, List<String>>,String> flattener;
flattener = (entry,consumer) -> { entry.getValue().forEach(consumer); };

List<String> cities = map.entrySet()
			 .stream()
			 .flatMap( flattener )
			 .collect(toList());

Ideally the lines 5, 6 should be defined as inline arguments of flatMap, but the compiler cannot properly infer the types of the expression precisely for the overloading problems that I described above. Given the amount of type information that must be provided, I felt that it was best to define it in another line as I did in #5 and #6. It looks awful,  I know. Clearly, we are in desperate need of an API that offers operations to deal with maps or tuples.


Partitioning Operators

Challenge 5: Taking an Arbitrary Number of Items

In this case we are interested in evaluating only the first n elements of a collection. For instance, using a finite list of numbers, obtain the first 4 of them.


LINQ

int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 };
var first4 = numbers.Take(4).ToList();

Java Streams

Once more, in Java, the absence of value types forced me to make a conversion from an IntStream into a Stream<Integer> to make this work (see line #5), basically because there is not value type collector. An alternative would have been to consume the stream itself (i.e. forEach) or to collect items into an array using Stream.toArray() method.

int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13 };

List<Integer> firstFour;
firstFour = stream(numbers).limit(4)
                           .boxed()
                           .collect(toList());

Challenge 6: Taking Items Based on Predicate

In this case we are interested in taking items out of a collection as long as they satisfy a predicate. Once we find an item that does not satisfy the predicate we stop there.


LINQ

string[] moreNames = { "Sam", "Samuel", "Dave", "Pascal", "Erik",  "Sid" };
var sNames = moreNames.TakeWhile(c => c.StartsWith("S"));

Java Streams

As of today, there is no way to implement this idiom in Java using the streams API. There is an alternative way to do this, but it is not the same thing. The beauty of takeWhile is that is should be short-circuited, that is, it should stop the evaluation in the moment that one item does not satisfies the predicate. The following version in Java does not have this property:

String[] names  = { "Sam","Samuel","Dave","Pascal","Erik","Sid" };

List<String> found;
found = stream(names).collect(partitioningBy( c -> c.startsWith("S")))
                     .get(true);

The collector created by partitioningBy (line #4) forces the evaluation of the whole stream, placing items into a boolean map, where all items that satisfy the predicate are bound to true and all others to false. So clearly, it is not the same thing. I hope that as the Oracle expert group works on the API they fix this omission.


Challenge 7: Skipping an Arbitrary Number of Items

In this case we are interested in skipping items in a collection up to certain arbitrary number, then we keep the rest of the items.


LINQ

string[] vipNames = { "Sam", "Samuel", "Samu", "Remo", "Arnold","Terry" };
var skippedList = vipNames.Skip(3).ToList();//Leaving the first 3.

Java Streams

In Java, the solution consists in creating a new stream where the first n elements have been discarded. As follows:

String[] vipNames = { "Sam", "Samuel", "Samu", "Remo", "Arnold","Terry" };

List<String> skippedList;
skippedList = stream(vipNames).substream(3).collect(toList());

Challenge 8: Skipping Items Based on Predicate

In this case we are interested in skipping items out of a collection as long as they satisfy a predicate. Once we find an item that does not satisfy the predicate we take the rest of the items from there.


LINQ

int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 20 };
var skippedList = numbers.SkipWhile(c => c < 10);

Java Streams

With current streams API I found no way to implement this idiom.

Ordering Operators

Challenge 9: Ordering/Sorting Elements

Order the elements of a collection alphabetically:


LINQ

string[] friends = { "Sam", "Pamela", "Dave", "Anders", "Erik" };
friends = friends.OrderBy(c => c).ToArray();

Java Streams

In the case of Java, we use the sorted method to produce the same result. The sorted method can also accept a Comparator to determine the sorting criteria.

String[] friends = { "Sam", "Pamela", "Dave", "Anders", "Erik" };
friends = stream(friends).sorted().toArray(String[]::new);

Challenge 10: Ordering/Sorting Elements by Specific Criterium

Order the elements of a collection strings according to the length of the string:


LINQ

string[] friends = { "Sam", "Pamela", "Dave", "Anders", "Erik" };
friends = friends.OrderBy(c => c.Length).ToArray();

Java Streams

In this case we pass a Comparator to the sort method. And once again, we hit against the problem of overloaded methods defined in the API to deal with the lack of value types in Java. Here we are forced to provide an explicit casting (line #3) to help the compiler:

String[] friends = { "Sam", "Pamela", "Dave", "Anders", "Erik" };
friends = stream(friends)
           .sorted(comparing((ToIntFunction<String>)String::length))
           .toArray(String[]::new);

An alternative way could be to provide an implementation of the Comparator (line #3), in this case, by means of a binary lambda expression. This is a little cleaner, but more verbose.

String[] friends = { "Sam", "Pamela", "Dave", "Anders", "Erik" };
friends = stream(friends)
            .sorted( (s1,s2) -> Integer.compare(s1.length(), s2.length()))
            .toArray(String[]::new);

Challenge 11: Ordering/Sorting Elements by Multiple Criteria

Order the elements of a collection of strings according to several sorting criteria:


LINQ

string[] fruits = {"grape", "passionfruit", "banana",
                   "apple", "orange", "raspberry",
                   "mango", "blueberry" };

//Sort the strings first by their length and then alphabetically.
//preserving the first order.
var sortedFruits = fruits.OrderBy(fruit =>fruit.Length)
                         .ThenBy(fruit => fruit);

Java Streams

Originally I had thought it was not possible to implement this idiom with the latest release of the API, but one of our readers had a good suggestion. Even so I was not able to get rid of the castings in lines #5 and #7. Once again the interface pollution causes the need for castings to clarify which of the overloaded methods are the ones being implemented in Comparators and Comparator here.

String[] fruits = {"grape", "passionfruit", "banana","apple",
                   "orange", "raspberry","mango", "blueberry" };

Comparator<String> comparator;
comparator = comparing((Function<String,Integer>)String::length,
                       Integer::compare)
            .thenComparing((Comparator<String>)String::compareTo);

fruits = stream(fruits) .sorted(comparator)
                        .toArray(String[]::new);

Grouping Operators

Challenge 12: Grouping by a Criterium

Group the elements of a collection of strings by their length.


LINQ

string[] names = {"Sam", "Samuel", "Samu", "Ravi", "Ratna",  "Barsha"};
var groups = names.GroupBy(c => c.Length);

Java Streams

String[] names = {"Sam", "Samuel", "Samu", "Ravi", "Ratna",  "Barsha"};

Map<Integer,List<String>> groups;
groups = stream(names).collect(groupingBy(String::length));

Set Operators

The current implementation of streams is way behind LINQ in this area. From all possible set operations, the only ones currently implemented are “distinct” and “concat”, although “concat” is not a set operation because it would accept duplicates, the correct would be to have a “union” operation, but this does not exist in the stream API yet.

Challenge 13: Filter Distinct Elements

Obtain all the distinct elements from a collection.


LINQ

string[] songIds = {"Song#1", "Song#2", "Song#2", "Song#2", "Song#3", "Song#1"};
//This will work as strings implement IComparable
var uniqueSongIds = songIds.Distinct();

Java Streams

String[] songIds = {"Song#1", "Song#2", "Song#2", "Song#2", "Song#3", "Song#1"};
//according to Object.equals
stream(songIds).distinct();

Challenge 14: Union of Two Sets

Join together two sets of items.


LINQ

List<string> friends1 = new List<string>() {"Anders", "David","James",
                                            "Jeff", "Joe", "Erik"};
List<string> friends2 = new List<string>() { "Erik", "David", "Derik" };
var allMyFriends = friends1.Union(friends2);

Java Streams

In Java we have to concatenate the two streams and then obtain the distinct elements.

List<String> friends1 = asList("Anders","David","James","Jeff","Joe","Erik");
List<String> friends2 = asList("Erik","David","Derik");
Stream<String> allMyFriends = concat(friends1.stream(),
                                     friends2.stream()).distinct();

Element Operatos

Challenge 15: First Element

Obtain the first element of a collection.


LINQ

string[] otherFriends = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
string firstName = otherFriends.First();
string firstNameConditional = otherFriends.First(c => c.Length == 5);

Java Streams

In Java we use the findFirst method which returns an Option object. The object may contain something or nothing, to validate that one must invoke the isPresent method on the returned object.

String[] otherFriends = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
Optional<String> found = stream(otherFriends).findFirst();

Optional<String> maybe = stream(otherFriends).filter(c -> c.length() == 5)
                                             .findFirst();
if(maybe.isPresent()) {
   //do something with found data
}

Range Operators

Challenge 16: Generate a Range of Numbers

Generate a range of numbers that are multiples of 11.


LINQ

var multiplesOfEleven = Enumerable.Range(1, 100).Where(c => c % 11 == 0);

Java Streams

IntStream multiplesOfEleven = intRange(1,100).filter(n -> n % 11 == 0);

Quantifier Operators

Challenge 17: All

Do all elements in a collection satisfy a predicate?


LINQ

string[] persons = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
bool x = persons.All(c => c.Length == 5);

Java Streams

String[] persons = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
boolean x = stream(persons).allMatch(c -> c.length() == 5);

Challenge 18: Any

Do any elements in a collection satisfy a predicate?


LINQ

string[] persons = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
bool x = persons.Any(c => c.Length == 5);

Java Streams

String[] persons = {"Sam", "Danny", "Jeff", "Erik", "Anders","Derik"};
boolean x = stream(persons).anyMatch(c -> c.length() == 5);

Merging Operators

Challenge 19: Zip

Combine two collections into a single collection.


LINQ

string[] salutations = {"Mr.", "Mrs.", "Ms", "Master"};
string[] firstNames = {"Samuel", "Jenny", "Joyace", "Sam"};
string lastName = "McEnzie";

salutations.Zip(firstNames, (sal, first) => sal + " " + first)
           .ToList()
           .ForEach(c => Console.WriteLine(c + " " + lastName));

Java Streams

String[] salutations = {"Mr.", "Mrs.", "Ms", "Master"};
String[] firstNames = {"Samuel", "Jenny", "Joyace", "Sam"};
String lastName = "McEnzie";

zip(
    stream(salutations),
    stream(firstNames),
    (sal,first) -> sal + " " +first)
.forEach(c -> { System.out.println(c + " " + lastName); });

In general, things are still not looking good, but I really hope that will change in the coming months. There are challenges that will be difficult to overcome, like the issues with value types, but in my opinion we could  live with those problems as long as we are provided with an API sufficiently expressive.

I am currently working on another set of examples in which I work on a small data model and see what I can do with it to run all kinds of queries using the Stream API, but I will leave that for my next post.

Related Posts

Further Reading

Programmer Jokes

Heck is Friday! Today I have gathered a funny set of programmer jokes. I hope you like them :-)

A group of 4 Microsoft .NET programmers and a group of 4 Java programmers are going on a train to an expo. The MS programmers buy a ticket each, and then watch the Java programmers proceed to buy one ticket between them.

The MS programmers are intrigued and when they get on the train, they watch the Java programmers to see what they do when the guard comes to check the tickets. It turns out that, before the guard comes, they all cram into the toilet. The guard knocks on the door, and asks for the ticket. The guard takes it from under the door, and slides it back.

The MS programmers are all impressed, so on the way back, they buy only one ticket. Only to watch the Java folks get on the train without buying a ticket at all.

When they get on the train, the MS people cram into the toilet, as they saw the Java folks on the earlier journey. The Java programmers then knock on the door, and say “Ticket please”. The MS programmers slide the ticket under the door, as they saw the Java programmers do earlier.

“Thank you”, they say. “You steal our methods, but you don’t understand them.”


A young Java developer and his project manager board a train headed through the mountains on its way to Wichita. They can find no place to sit except for two seats right across the aisle from a young woman and her grandmother. After a while, it is obvious that the young woman and the young programmer are interested in each other, because they are giving each other looks. Soon the train passes into a tunnel and it is pitch black. There is a sound of a kiss followed by the sound of a slap.

When the train emerges from the tunnel, the four sit there without saying a word. The grandmother is thinking to herself, “It was very brash for that young man to kiss my granddaughter, but I’m glad she slapped him.”

The project manager is sitting there thinking, “I didn’t know the young tech was brave enough to kiss the girl, but I sure wish she hadn’t missed him when she slapped me!”

The young woman was sitting and thinking, “I’m glad the guy kissed me, but I wish my grandmother had not slapped him!”

The young programmer sat there with a satisfied smile on his face. He thought to himself, “Life is good. How often does a guy have the chance to kiss a beautiful girl and slap his Ppoject manager all at the same time!”


Two ints and a Float walk into a bar. They spot an attractive Double on her own.

The first int walks up to her and say: “Hey baby, my VM or yours?”. She slaps him and he walks back dejected.

The second int walks over. “Hey cute-stuff, can I cook you ‘Beans’ for breakfast?” After a quick slapping, he too walks back.

Then the Float ambles over casually: “Where those two primitive types bothering you?”, he remarks.

“Yes, I’m so glad you’re here”, she says. “They just had no Class!”


A bunch of 17 year olds – ClassCast, IllegalArgument and ArrayIndexOutOfBounds – decide to take their chances, and try to get served at the bar.

The Bartender takes one look at them, and asks them for ID.

ClassCast hands over his fake ID, IllegalArgument hands over his brother Throwable’s ID, but ArrayIndexOutOfBounds doesn’t have any fake ID.

The Bartender says “Sorry guys, you’ll have to leave unless I can see some ID”.

ClassCast pleads with the barman “Can’t you just bend the rules for us?” and the barman says “Sorry, no Exceptions“.


Two session beans in love are sitting cuddled close together:

“Oh Jarling, my Singleton!”, the female session bean exclaims.
“Let’s go Home and Make love.”, the male session bean replies.
“But we can’t”, the female session bean says. “I don’t want to create() new() instances.”
“Don’t worry” the male session bean replies with a smile. “My constructor is protected.”


Why did the Integer drown?
‘Coz he couldn’t Float!


Yo mama’s so fat… she get an ArrayIndexOutOfBoundException!
Yo mama’s so po… she does garbage collection for a living!
Yo mama’s so ugly… her java.lang.reflect took down the mirror site!

A bonus one
Your mama’s so fat… that not even Dijkstra is able to find a shortest path around her.


When your hammer is C++, everything begins to look like a thumb.


A Cobol programmer made so much money doing Y2K remediation that he was able to have himself cryogenically frozen when he died. One day in the future, he was unexpectedly resurrected.

When he asked why he was unfrozen, he was told:

“It’s the year 9999 – and you know Cobol”


A geologist and a Java programmer are sitting next to each other on a long flight from LA to NY. The geologist leans over to the programmer and asks if he would like to play a fun game. The programmer just wants to take a nap, so he politely declines and rolls over to the window to catch a few winks. The geologist persists and explains that the game is real easy and a lotta fun. He explains, “I ask you a question, and if you don’t know the answer, you pay me $5. Then you ask me a question, and if I don’t know the answer, I’ll pay you $5.” Again, the programmer politely declines and tries to get to sleep. The geologist now somewhat agitated, says, “OK, if you don’t know the answer you pay me $5, and if I don’t know the answer, I’ll pay you $50!”

This catches the programmer’s attention, and he sees no end to this torment unless he plays, so he agrees to the game. The geologist asks the first question. “What’s the distance from the Earth to the moon?”

The programmer doesn’t say a word, but reaches into his wallet, pulls out a five dollar bill and hands it to the geologist.

Now, it’s the programmer’s turn. He asks the Geologist, “What goes up a hill with three legs, and comes down on four?” The geologist looks up at him with a puzzled look. He takes out his laptop computer and searches all of his references. He taps into the Airphone with his modem and searches the net and the Library of Congress. Frustrated, he sends e-mail to his co-workers — all to no avail.

After about an hour, he wakes the programmer and hands him $50. The programmer politely takes the $50 and turns away to try to get back to sleep.

The Geologist is more than a little miffed, shakes the programmer and asks, “Well, so what’s the answer?”

Without a word, the programmer reaches into his wallet, hands the geologist $5, and turns away to get back to sleep.


Once upon a time there was a shepherd looking after his sheep on the side of a deserted road. Suddenly a brand new Porsche screeches to a halt. The driver, a man dressed in an Armani suit, Cerutti shoes, Ray-Ban sunglasses, TAG-Heuer wrist-watch, and a Versace tie, gets out and asks the Shepherd:

Man: “If I can tell you how many sheep you have, will you give me one of them?”
The shepherd looks at the young man, and then looks at the large flock of grazing sheep and replies:

Shepherd: “Okay.”

The young man parks the car, connects his laptop to the mobile-fax, enters a NASA Webster, scans the ground using his GPS, opens a database and 60 Excel tables filled with logarithms and pivot tables, then prints out a 150 page report on his high-tech mini-printer. He turns to the shepherd and says,

Man: “You have exactly 1,586 sheep here.”
The shepherd cheers,

Shepherd: “That’s correct, you can have your sheep.”

The young man makes his pick and puts it in the back of his Porsche. The shepherd looks at him and asks,

Shepherd: “If I guess your profession, will you return my animal to me?”

The young man answers;
Man: “Yes, why not?”

Shepherd: “You are an IT consultant.”

Man: “How did you know?”

Shepherd: “Very simple. First, you came here without being called. Second, you charged me a fee to tell me something I already knew, and third, you don’t understand anything about my business. Now can I have my DOG back?”


Richard Stallman, Linus Torvalds, and Donald Knuth engage in a discussion on whose impact on computer science was the greatest.
Stallman: “God told me I have programmed the best editor in the world!”
Torvalds: “Well, God told me that I have programmed the best operating system in the world!”
Knuth: “Wait, wait, I never said that.”


Joke: A novice programmer was explained the meaning of RTFM. He showed up the next day saying: “So I went out and bought the Kama Sutra. Now what?”

Meta-joke: If you tell the joke above to a non-programmer, he will ask: “What’s RTFM?” A programmer will ask: “What’s Kama Sutra?”

Meta-meta-joke: If instead of laughing in response in the meta-joke above you have asked “I knew both, now who am I”, then you are probably a programmer over the age of 30, who has realized the value of social skills, and who may even be married, but who is still an uber-geek who takes things way too literally.

If you have asked “I googled both, now who am I”, then you are probably a high-school kid who reads stackoverflow and takes things way too literally, but who had not yet known about RTFM or Kama Sutra. Congratulations, you are well on your way to becoming an uber-geek. Please try to acquire some social skills along the way. You may not think so now, but they do come in handy.


Don’t anthropomorphize computers. They hate that!


I was in the airport VIP lounge in route to Seattle a couple of weeks ago. While in there, I noticed Bill Gates sitting comfortably in the corner, enjoying a drink. I was meeting a very important client who was also flying to Seattle, but she was running a little bit late.

Well, being a straightforward kind of guy, I approached the Microsoft chairman, introduced myself, and said, “Mr. Gates, I wonder if you would do me a favor.”

“Yes?”

“I’m sitting right over there,” pointing to my seat at the bar, “and I’m waiting on a very important client. Would you be so kind when she arrives as to come walk by and just say, ‘Hi, Ray,’?”

“Sure.”

I shook his hand and thanked him and went back to my seat.

About ten minutes later, my client showed up. We ordered a drink and started to talk business.

A couple of minutes later, I felt a tap on my shoulder. It was Bill Gates.

“Hi, Ray,” he said.

I replied, “Get lost Gates, I’m in a meeting.”


Why Encapsulation Matters

Encapsulation is more than just defining accessor and mutator methods for a class. It is a broader concept of programming, not necessarily object-oriented programming, that consists in minimizing the interdependence between modules and it’s typically implemented through information hiding. Paramount to understand encapsulation is the realization that  it has two main objectives: (1) hiding complexity and (2) hiding the sources of change.

About Hiding Complexity

Encapsulation is inherently related to the concepts of modularity and abstraction. So, in my opinion, to really understand encapsulation, one must first understand these two concepts.

Let’s consider, for example, the level of abstraction in the concept of a car. A car is complex in its internal working. They have several subsystem, like the transmission system, the break system, the fuel system, etc.

However, we have simplified its abstraction, and we interact with all cars in the world through the public interface of their abstraction: we know that all cars have a steering wheel through which we control direction, they have a pedal that when we press it we accelerate the car and control speed, and another one that when we press it we make it stop, and we have a gear stick that let us control if we go forward or backwards. These features constitute the public interface of the car abstraction. In the morning we can drive a sedan and then get out of it and drive an SUV in the afternoon as if it was the same thing.

However, few of us know the details of how all these features are implemented under the hood. So, this simple analogy shows that human beings deal with complexity by defining abstractions with public interfaces that we use to interact with them and all the unnecessary details get hidden under the hood of these abstractions. And I want to emphasize that word “unnecessary” here, because the beauty of an abstraction is not having to understand all those details in order to be able to use it, we just need to understand a broader abstract concept and how it works and how we interact with it. That’s why most of us don’t know or don’t care how a car works under the hood, but that doesn’t prevents us from driving one.

In his book Code Complete, Steve McConnell uses the analogy of an iceberg: only a small portion of an iceberg is visible on the surface, most of its true size is hidden underwater. Similarly, in our software designs the visible parts of our modules/classes constitute their public interface, and this is exposed to the outside world, the rest of it should be hidden to the naked eye. In the words of McConell “the interface to a class should reveal as little as possible about its inner workings”.

Clearly, based on our car analogy, we can see that this encapsulation is good, since it hides unnecessary/complex details from the users. It makes objects simpler to use and understand.

About Hiding the Sources of Change

Now, continuing with the analogy; think of the time when cars did not have a hydraulics directional system. One day, the car manufactures invented it, and they decide it to put it in cars from there on. Still, this did not change the way in which drivers were interacting with them. At most, users experienced an improvement in the use of the directional system. A change like this was possible because the internal implementation of a car is encapsulated, that is, is hidden from its user. In other words changes can be safely done without affecting its public interface.

In a similar way, if we achieve proper levels of encapsulation in our software design we can safely foster change and evolution of our APIs without breaking its users, by this minimizing the impact of changes and the interdependence of modules. Therefore, encapsulation is a way to achieve another important attribute of a good software design known as loose coupling.

In his book Effective Java, Joshua Block highlights the power of information hiding and loose coupling when he says: “Information hiding is important for many reasons, most of which stem from the fact that it decouples the modules that compromise a system, allowing them to be developed, tested, optimized, used, understood, and modified in isolation. This speeds up system development because modules can be developed in parallel. It eases the burden of maintenance because modules can be understood more quickly and debugged with little fear of harming other modules [...] it enables effective performance tuning [since] those modules can be optimized without affecting the correctness of other modules [and] increases software reuse because modules that aren’t tightly coupled often prove useful in other contexts besides the ones for which they were developed”.

So, once more, we can clearly see that encapsulation is a desirable attribute that eases the introduction of change and foster the evolution of our APIs. As long as we respect the public interface of our abstractions we are free to change whatever we want of its encapsulated inner workings.

About Breaking the Public Interface

So what happens when we do not achieve the proper levels of encapsulation in our designs?

Now, think that car manufactures decided to put the fuel cap below the car, and not in one of its sides. Let’s say we go and buy one of these new cars, and when we run out of gas we go to the nearest gas station, and then we do not find the fuel cap. Suddenly we realize is below the car, but we cannot reach it with the gas pump hose. Now, we have broken the public interface contract, and therefore, the entire world breaks, it falls apart because things are not working the way it was expected. A change like this would cost millions. We would need to change all gas pumps, not to mention mechanical shops and auto parts. When we break encapsulation we have to pay a price.

This last part of our analogy, clearly reveals that failing to define proper abstractions with  proper levels of encapsulation will end up causing difficulties when change finally happens. So, as we can see, the goal of encapsulation is reduce the complexity of the abstractions by providing a way to hide implementation details and it also help us to minimize interdependence and facilitate change. We maximize encapsulation by minimizing the exposure of implementation details.

However encapsulation will not help us if we do not define proper abstractions. Simply put, there is no way to change the public interface of an abstraction without breaking its users. So, the design of good abstractions is of paramount importance to facilitate the evolution of the APIs, encapsulation is just one of the tools that help us create this good abstractions, but no level of encapsulation is going to make a bad abstraction work.

Encapsulation in Java

One of those things that we always want to encapsulate is the state of a class. The state of a class should only be accessed through its public interface.

In a object-oriented programming language like Java, we achieve encapsulation by hiding details using the accessibility modifiers (i.e. public, protected, private, plus no modifier which implies package private). With these levels of accessibility we control the level of encapsulation, the less restrictive the level, the more expensive change is when it happens and the more coupled the class is with other dependent classes (i.e. user classes, subclasses, etc.).

In object-oriented languages a class has two public interfaces: the public interface shared with all users of the class, and the protected interface shared with subclasses. It is of paramount importance that we design the proper levels of encapsulation for every one of these public interfaces so that we can facilitate change and foster evolution of our APIs.

Why Getters and Setters?

Many people wonder why we need accessor and mutator methods in Java (a.k.a. getters and setters), why can’t we just access the data directly? But the purpose of encapsulation here is  is not to hide the data itself, but the implementation details on how this data is manipulated. So, once more what we want is a way to provide a public interface through which we can gain access to this data. We can later change the internal representation of the data without compromising the public interface of the class. On the contrary, by exposing the data itself, we compromise encapsulation, and therefore, the capacity of changing the ways to manipulate this data in the future without affecting its users. We would create a dependency with the data itself, and not with the public interface of the class. We would be creating a perfect cocktail for trouble when “change” finally finds us.

There are several compelling reasons why we might want to encapsulate access to our fields. The best compendium of these reasons I have ever found is described in Joshua Bloch’s book Effective Java. There in Item 14: Minimize the accessibility of classes and members, he mentions several  reasons, which I mention here:

  • You can limit the values that can be stored in a field (i.e. gender must be F or M).
  • You can take actions when the field is modified (trigger event, validate, etc).
  • You can provide thread safety by synchronizing the method.
  • You can switch to a new data representation (i.e. calculated fields, different data type)

However, it is very important to understand that encapsulation is more than hiding fields. In Java we can hide entire classes, by this, hiding the implementation details of an entire API.

My understanding of this important concept was broaden and enriched by my reading of a great article  by Alan Snyder called Encapsulation and Inheritance in Object-Oriented Programming Languages which I recommend to all readers of this blog. I found a version of it available on the Web and I shared a link to it a the end of this article.

Further Reading

Covariance and Contravariance In Java

I have found that in order to understand covariance and contravariance a few examples with Java arrays are always a good start.

Arrays Are Covariant

Arrays are said to be covariant which basically means that, given the subtyping rules of Java, an array of type T[] may contain elements of type T or any subtype of T. For instance:

Number[] numbers = new Number[3];
numbers[0] = new Integer(10);
numbers[1] = new Double(3.14);
numbers[2] = new Byte(0);

But not only that, the subtyping rules of Java also state that an array S[] is a subtype of the array T[] if S is a subtype of T, therefore, something like this is also valid:

Integer[] myInts = {1,2,3,4};
Number[] myNumber = myInts;

Because according to the subtyping rules in Java, an array Integer[] is a subtype of an array Number[] because Integer is a subtype of Number.

But this subtyping rule can lead to an interesting question: what would happen if we try to do this?

myNumber[0] = 3.14; //attempt of heap pollution

This last line would compile just fine, but if we run this code, we would get an ArrayStoreException because we’re trying to put a double into an integer array. The fact that we are accessing the array through a Number reference is irrelevant here, what matters is that the array is an array of integers.

This means that we can fool the compiler, but we cannot fool the run-time type system. And this is so because arrays are what we call a reifiable type. This means that at run-time Java knows that this array was actually instantiated as an array of integers which simply happens to be accessed through a reference of type Number[].

So, as we can see, one thing is the actual type of the object, an another thing is the type of the reference that we use to access it, right?

The Problem with Java Generics

Now, the problem with generic types in Java is that the type information for type parameters is discarded by the compiler after the compilation of code is done; therefore this type information is not available at run time. This process is called type erasure. There are good reasons for implementing generics like this in Java, but that’s a long story, and it has to do with binary compatibility with pre-existing code.

The important point here is that since at run-time there is no type information, there is no way to ensure that we are not committing heap pollution.

Let’s consider now the following unsafe code:

List<Integer> myInts = new ArrayList<Integer>();
myInts.add(1);
myInts.add(2);
List<Number> myNums = myInts; //compiler error
myNums.add(3.14); //heap polution

If the Java compiler does not stop us from doing this, the run-time type system cannot stop us either, because there is no way, at run time, to determine that this list was supposed to be a list of integers only. The Java run-time would let us put whatever we want into this list, when it should only contain integers, because when it was created, it was declared as a list of integers. That’s why the compiler rejects line number 4 because it is unsafe and if allowed could break the assumptions of the type system.

As such, the designers of Java made sure that we cannot fool the compiler. If we cannot fool the compiler (as we can do with arrays) then we cannot fool the run-time type system either.

As such, we say that generic types are non-reifiable, since at run time we cannot determine the true nature of the generic type.

Evidently this property of generic types in Java would have a negative impact on polymorphism. Let’s consider now the following example:

static long sum(Number[] numbers) {
   long summation = 0;
   for(Number number : numbers) {
      summation += number.longValue();
   }
   return summation;
}

Now we could use this code as follows:

Integer[] myInts = {1,2,3,4,5};
Long[] myLongs = {1L, 2L, 3L, 4L, 5L};
Double[] myDoubles = {1.0, 2.0, 3.0, 4.0, 5.0};
System.out.println(sum(myInts));
System.out.println(sum(myLongs));
System.out.println(sum(myDoubles));

But if we attempt to implement the same code with generic collections, we would not succeed:

static long sum(List<Number> numbers) {
   long summation = 0;
   for(Number number : numbers) {
      summation += number.longValue();
   }
   return summation;
}

Because we we would get compiler errors if you try to do the following:

List<Integer> myInts = asList(1,2,3,4,5);
List<Long> myLongs = asList(1L, 2L, 3L, 4L, 5L);
List<Double> myDoubles = asList(1.0, 2.0, 3.0, 4.0, 5.0);
System.out.println(sum(myInts)); //compiler error
System.out.println(sum(myLongs)); //compiler error
System.out.println(sum(myDoubles)); //compiler error

The problem is that now we cannot consider a list of integers to be subtype of a list of numbers, as we saw above, that would be considered unsafe for the type system and compiler rejects it immediately.

Evidently, this is affecting the power of polymorphism and it needs to be fixed. The solution consists in learning how to use two powerful features of Java generics known as covariance and contravariance.

Covariance

For this case, instead of using a type T as the type argument of a given generic type, we use a wildcard declared as ? extends T, where T is a known base type.

With covariance we can read items from a structure, but we cannot write anything into it. All these are valid covariant declarations.

List<? extends Number> myNums = new ArrayList<Integer>();
List<? extends Number> myNums = new ArrayList<Float>();
List<? extends Number> myNums = new ArrayList<Double>();

And we can read from our generic structure myNums by doing:

Number n = myNums.get(0);

Because we can be sure that whatever the actual list contains, it can be upcasted to a Number (after all anything that extends Number is a Number, right?)

However, we are not allowed to put anything into a covariant structure.

myNumst.add(45L); //compiler error

This would not be allowed because the compiler cannot determine what is the actual type of the object in the generic structure. It can be anything that extends Number (like Integer, Double, Long), but the compiler cannot be sure what, and therefore any attempt to retrieve a generic value is considered an unsafe operation and it is immediately rejected  by the compiler. So we can read, but not write.

Contravariance

For contravariance we use a different wildcard called ? super T, where T is our base type. With contravariance we can do the opposite. We can put things into a generic structure, but we cannot read anything out of it.

List<Object> myObjs = new List<Object();
myObjs.add("Luke");
myObjs.add("Obi-wan");
List<? super Number> myNums = myObjs;
myNums.add(10);
myNums.add(3.14);

In this case, the actual nature of the object is  List of Object, and through contravariance, we can put a Number in it, basically because a Number has Object as its common ancestor. As such, all numbers are also objects, and therefore this is valid.

However, we cannot safely read anything from this contravariant structure assuming that we will get a number.

Number myNum = myNums.get(0); //compiler-error

As we can see, if the compiler allowed us to write this line, we would get a ClassCastException at run time. So, once again, the compiler does not run the risk of allowing this unsafe operation and rejects it immediately.

Get/Put Principle

In summary, we use covariance when we only intend to take generic values out of a structure. We use contravariance when we only intend to put generic values into a structure and we use an invariant when we intend to do both.

The best example I have is the following that copies any kind of numbers from one list into another list. It only gets items from the source, and it only puts items in the destiny.

public static void copy(List<? extends Number> source,
                        List<? super Number> destiny) {
   for(Number number : source) {
      destiny.add(number);
   }
}

Thanks to the powers of covariance and contravariance this works for a case like this:

List<Integer> myInts = asList(1,2,3,4);
List<Integer> myDoubles = asList(3.14, 6.28);
List<Object> myObjs = new ArrayList<Object>();
copy(myInts, myObjs);
copy(myDoubles, myObjs);

Further Reading

Most of my insights in this topic actually come from my reading of an excellent book:

Java Lambda Expressions vs Method References

Now we can use lambda expressions to implement functional interfaces as we have seen in previous posts, but the lambda expressions are not the only mechanism we can use for this purpose. Particularly where there are preexisting code that we would like to reuse we can simplify the implementation of the functional interface by using method references.

Static Method References

First, consider the existence of a functional interface Predicate as follows:

public interface Predicate<T> {
   public void test(T t);
}

And let’s say that we had a method to filter elements out of a list using this predicate, as follows:

static <T> List<T> filter(Predicate<T> predicate, List<T> source) {
   List<T> destiny = new ArrayList<>();
   for (T item : source) {
      if(predicate.test(item)){
         destiny.add(item);
      }
   }
   return destiny;
}

Finally, let’s say we had a class containing a set of static method predicates which we had defined in the past, prior to the existence of the Java 8. Something as follows:

static class IntPredicates {
   public static boolean isOdd(Integer n) { return n % 2 != 0; }
   public static boolean isEven(Integer n) { return n % 2 == 0; }
   public static boolean isPositive(Integer n) { return n >= 0; }
}

Now, one way to implement a predicate that could reuse our static methods would be through the use of lambda expressions, like this:

Predicate<Integer> isOdd = n -> IntPredicates.isOdd(n);
Predicate<Integer> isEven = n -> IntPredicates.isEven(n);

However, we can clearly see that the signature of the static predicate methods corresponds perfectly with the signature of the test method for integer predicates. So, an alternative way to implement the functional interface in this case is through a static method reference, as follows:

Predicate<Integer> isOdd = IntPredicates::isOdd;
Predicate<Integer> isEven = IntPredicate::isEven;

Notice the use of double colon :: here. We are not invoking the method, we are just referencing its name.

We could now use this technique to filter a list of numbers that satisfy any of these predicates, something like this:

List<Integer> numbers = asList(1,2,3,4,5,6,7,8,9);
List<Integer> odds = filter(IntPredicates::isOdd, numbers);
List<Integer> evens = filter(IntPredicates::isEven, numbers);

So, as we can see, we could implement the functional interfaces in this case using both: lambda expressions and method references, but the syntax with the static method references was more succinct.

Constructor Method References

Let’s consider now the existence of a functional interface named Function, as follows:

public interface Function<T,R> {
   public R apply(T t);
}

Based on it, we could define a method map, that converts the elements from a source list from certain value T to certain value R, as follows:

static <T,R> List<R> map(Function<T,R> function, List<T> source) {
   List<R> destiny = new ArrayList<>();
   for (T item : source) {
      R value = function.apply(item);
      destiny.add(value);
   }
   return destiny;
}

Now imagine that we had a list of strings containing numbers that we would like to transform to integer values. We could do it using a lambda expression to provide an implementation for the Function interface, more or less like this:

List<String> digits = asList("1","2","3","4","5");
List<Integer> numbers = map(s -> new Integer(s), digits);

However, we can clearly infer that the constructor Integer(String) has the same signature as the  apply method in the Function reference required here, namely, it receives a string as argument and returns an integer.

So, in this case we simplify the implementation of the functional interface by means of using a constructor reference, as follows:

List<String> digits = asList("1","2","3","4","5");
List<Integer> numbers = map(Integer::new, digits);

This conveys the same meaning: take a string and make me an integer out of it. It is the perfect task for our Integer(String) constructor.

Instance Method Reference to Arbitrary Objects

Consider now the existence of a class named  Jedi, defined as follows:

public class Jedi  {
    private String name;
    private int power;

    public Jedi(String name, int power){
        this.name = name;
        this.power = power;
    }

    public String getName() {
        return name;
    }

    public int getPower() {
        return power;
    }

    //equals,hashCode,toString
}

Now, consider that we had a list of jedis, and we would like to use our previous function map to extract the name from every jedi and generate a list of names out of the list of jedis. Somewhat like this, using lambda expressions:

List<Jedi> jedis = asList(new Jedi("Obi-wan", 80),
                          new Jedi("Anakin", 25),
                          new Jedi("Yoda", 500));
List<String> names = map(jedi -> jedi.getName() , jedis);

The interesting observation here is that the parameter jedi is the argument for the apply method in the Function reference. And we use that reference to a jedi to invoke on it the method getName. In other words, we invoke a method on the reference we receive as argument.

So, we could simplify this implementation by using an instance method reference as follows:

List<Jedi> jedi = asList(new Jedi("Obi-wan", 80),
                         new Jedi("Anakin", 25),
                         new Jedi("Yoda", 500));
List<String> names = map(Jedi::getName, jedi);

Again, the interesting aspect of this type of method reference is that the method getName is an instance method. Therefore, the target of its invocation must be an instance, which in this case is an arbitrary object being provided as the argument for the method apply in the Function interface definition.

Instance Method Reference to a Specific Object

Let’s consider the existence of functional interface named Consumer, as follows:

public interface Consumer<T> {
   public void accept(T t);
}

And let’s define a method capable of using a consumer to consume all the elements of a given list, like this:

static  void forEach(Consumer<T> consumer, List<T> source){
   for (T item : source) {
      consumer.accept(item);
   }
}

Imagine that now we would like to print all the elements contained in a list, and for that purpose we could define a consumer using a lambda expressions:

List<Integer> numbers = asList(1,2,3,4,5,6,7,8,9);
forEach(n -> { System.out.println(n); }, numbers);

However, we could also make the observation that the method println has the same signature that our Consumer has, it receives an integer and does something with it, in this case  it prints it to the main output.

However, we cannot specify that this is an arbitrary instance method reference by saying PrintStream::println, because in this case the Consumer interface method accept does not receive as one of its arguments the PrintStream object on which we may want to invoke the method println. Conversely, we already know which is the target object on which we would like to invoke the method: we can see that every time we would like to invoke it on a specific reference, in this case the object System.out.

So, we could implement our functional interface using an instance method reference to a specific object as follows:

List<Integer> numbers = asList(1,2,3,4,5,6,7,8,9);
forEach(System.out::println, numbers);

In summary, there are circumstances in which we would like to use some preexisting code as the implementation for a functional interface, in those case we could use one of several variants of method references instead of a more verbose lambda expression.