Collections, Demystified

I am writing this with a view to making headway in wrapping my head around Oracle's Java docs. Simply reading these dry documents is for me a daunting task, to say nothing of comprehending them! But I have found that the information becomes more interesting, and the process of reading more tolerable, when I put in the effort to conceptualize the info in a way that makes sense to me—which makes sense, since info can hardly interest you if you don't understand it. Fortunately I have some idea as to how I can break this task down into subtasks. At present I am primarily interested in the core Java interfaces. These are the interfaces in the core Java SE APIs, defined in Oracle's API specification for Java SE. Let us begin with what Oracle identifies as "[t]he root interface in the collection hierarchy": the Collection interface. There's a lot to learn about Collection. In this post we'll focus on what collections, or objects of type Collection, are. Collections According to Oracle, collections are groups of objects called elements. Some collections can contain duplicates of their elements. We call these bags. Other collections cannot. Call them boxes. Boxes are like sets in set theory. Just as, in set theory, you can't have more than one of a single object in a set, you can't put more than one of the same object in a box. Every once in a while you might see multiple instances of the same numeral in notation representing a set, such as {1,2,2,3}.\lbrace 1, 2, 2, 3 \rbrace .{1,2,2,3}. But this is just a notational variant of {1,2,3}.\lbrace 1, 2, 3 \rbrace .{1,2,3}. So is {1,2,3,2},\lbrace 1, 2, 3, 2 \rbrace ,{1,2,3,2}, or {1,3,2}.\lbrace 1, 3, 2 \rbrace .{1,3,2}. These are all the same set. Why? Because they have the same elements. There are not actually two 2s2\textrm{s}2s in the set {1,2,2,3}.\lbrace 1, 2, 2, 3 \rbrace .{1,2,2,3}. There's only one, because the number 222 is just a single object. Only one such number exists. No matter how many times you write the numeral ‘2’,\lq 2 \text{\textquoteright}, ‘2’, it always stands for the same number. Here's a way to see my point. Let xx x be 2.2. 2. Then {1,2,2,3}\lbrace 1, 2, 2, 3 \rbrace {1,2,2,3} is {1,x,x,3}.\lbrace 1, x, x, 3 \rbrace. {1,x,x,3}. But obviously ‘x’\lq x \text{\textquoteright} ‘x’ denotes the same thing wherever it appears. So {1,x,x,3}\lbrace 1, x, x, 3 \rbrace {1,x,x,3} can only have three things in it: 1,x,1, x, 1,x, and 3.3.3. So by substituting ‘2’\lq 2 \text{\textquoteright} ‘2’ for ‘x’,\lq x \text{\textquoteright}, ‘x’, we conclude that {1,2,2,3}\lbrace 1, 2, 2, 3 \rbrace {1,2,2,3} can only have three things in it: 1,2,1, 2, 1,2, and 3.3. 3. All this to say, sets never have duplicate elements. Boxes are the same way. Consider the paradigmatic box: an object of type Set (a subinterface of Collection). (From here on out, unless I say otherwise, "set" means a Set object.) For any two elements e1 and e2 of a set, !e1.equals(e2). It follows, without loss of generality, that e1 and e2 are not duplicates. You might doubt these things are true of all sets. Some sets have elements, the thought might go, that lack an equals method. If e1 were one of those elements, the method call e1.equals(e2) would be impossible. So it could not return false as needed for the negation of its return value to be true. But you would be wrong! Any element of a collection is an object, i.e., an instance of a class. And all classes are subclasses of Object. So they all inherit the equals method from Object. For this reason, no matter the type of e1, e1 has an equals method. How can we be sure though that e2 is a valid argument to pass? We can because it's an instance of Object, as is e1, and the lone parameter of Object's equals method is of type Object. Null Elements So it looks like no set can contain multiple copies of an element. But hang on... Can't you put null in a collection? That's not an object. It's a value indicating the absence of an object reference. I'm the one who's wrong then! Not just any element of any collection is an object. So, what if e1 happens to be null? Then e1 is not an instance of any class, and does not inherit an equals method from Object. In that case it's not strictly true that !e1.equals(e2). But it is true that e1 and e2 are not such that e1.equals(e2). Why? For those elements to be such that e1.equals(e2), that method call would have to return true. But the invocation is not possible, so it cannot return true. So the generalization in the documentation for Set still holds true. That is, every set contains no pair of elements e1 and e2 such that e1.equals(e2). Now we see why Oracle chose to deny any element pair is such that e1.equals(e2), instead of affirming, like I did, that all pairs are such that !e1.equals(e2). Wording matters. But in my defense, what I said was such

Feb 9, 2025 - 06:24

I am writing this with a view to making headway in wrapping my head around Oracle's Java docs. Simply reading these dry documents is for me a daunting task, to say nothing of comprehending them! But I have found that the information becomes more interesting, and the process of reading more tolerable, when I put in the effort to conceptualize the info in a way that makes sense to me—which makes sense, since info can hardly interest you if you don't understand it.

Fortunately I have some idea as to how I can break this task down into subtasks. At present I am primarily interested in the core Java interfaces. These are the interfaces in the core Java SE APIs, defined in Oracle's API specification for Java SE.

Let us begin with what Oracle identifies as "[t]he root interface in the collection hierarchy": the Collection interface. There's a lot to learn about Collection. In this post we'll focus on what collections, or objects of type Collection, are.

Collections

According to Oracle, collections are groups of objects called elements. Some collections can contain duplicates of their elements. We call these bags. Other collections cannot. Call them boxes.

Boxes are like sets in set theory. Just as, in set theory, you can't have more than one of a single object in a set, you can't put more than one of the same object in a box.

Every once in a while you might see multiple instances of the same numeral in notation representing a set, such as ${1,2,2,3}.\lbrace 1, 2, 2, 3 \rbrace .$ But this is just a notational variant of ${1,2,3}.\lbrace 1, 2, 3 \rbrace .$ So is ${1,2,3,2},\lbrace 1, 2, 3, 2 \rbrace ,$ or ${1,3,2}.\lbrace 1, 3, 2 \rbrace .$ These are all the same set. Why? Because they have the same elements. There are not actually two $2s2\textrm{s}$ in the set ${1,2,2,3}.\lbrace 1, 2, 2, 3 \rbrace .$ There's only one, because the number $2$ is just a single object. Only one such number exists. No matter how many times you write the numeral $‘2’,\lq 2 \text{\textquoteright},$ it always stands for the same number.

Here's a way to see my point. Let $x$ be $2.$ Then ${1,2,2,3}\lbrace 1, 2, 2, 3 \rbrace$ is ${1,x,x,3}.\lbrace 1, x, x, 3 \rbrace.$ But obviously $‘x’\lq x \text{\textquoteright}$ denotes the same thing wherever it appears. So ${1,x,x,3}\lbrace 1, x, x, 3 \rbrace$ can only have three things in it: $1, x,$ and $3.$ So by substituting $‘2’\lq 2 \text{\textquoteright}$ for $‘x’,\lq x \text{\textquoteright},$ we conclude that ${1,2,2,3}\lbrace 1, 2, 2, 3 \rbrace$ can only have three things in it: $1, 2,$ and $3.$

All this to say, sets never have duplicate elements. Boxes are the same way. Consider the paradigmatic box: an object of type Set (a subinterface of Collection). (From here on out, unless I say otherwise, "set" means a Set object.) For any two elements e1 and e2 of a set, !e1.equals(e2). It follows, without loss of generality, that e1 and e2 are not duplicates.

You might doubt these things are true of all sets. Some sets have elements, the thought might go, that lack an equals method. If e1 were one of those elements, the method call e1.equals(e2) would be impossible. So it could not return false as needed for the negation of its return value to be true.

But you would be wrong! Any element of a collection is an object, i.e., an instance of a class. And all classes are subclasses of Object. So they all inherit the equals method from Object. For this reason, no matter the type of e1, e1 has an equals method. How can we be sure though that e2 is a valid argument to pass? We can because it's an instance of Object, as is e1, and the lone parameter of Object's equals method is of type Object.

Null Elements

So it looks like no set can contain multiple copies of an element. But hang on... Can't you put null in a collection? That's not an object. It's a value indicating the absence of an object reference. I'm the one who's wrong then! Not just any element of any collection is an object.

So, what if e1 happens to be null? Then e1 is not an instance of any class, and does not inherit an equals method from Object. In that case it's not strictly true that !e1.equals(e2). But it is true that e1 and e2 are not such that e1.equals(e2). Why? For those elements to be such that e1.equals(e2), that method call would have to return true. But the invocation is not possible, so it cannot return true. So the generalization in the documentation for Set still holds true. That is, every set contains no pair of elements e1 and e2 such that e1.equals(e2).

Now we see why Oracle chose to deny any element pair is such that e1.equals(e2), instead of affirming, like I did, that all pairs are such that !e1.equals(e2). Wording matters. But in my defense, what I said was such truthlike, much truthy. Not all pairs of elements satisfy the negation, but all pairs of objects in any collection do.

There being collections with null elements doesn't square with defining "collections" as groups of objects and "elements" as objects in a collection. Some collections have none but null elements. No null element is an object, though each is in a collection. So, what are collections and elements? To answer this question, we have to consider two things.

First, what is it that all elements of Collection objects have in common, be those elements objects or not? If there is something they all have in common, we may keep calling all Collection objects collections, if we please. If that's what we want, we just have to revise our definition of "collection."

Second, is it true that all Collection objects are collections? Perhaps objects of that type only truly qualify as collections when they contain objects. Then, if a Collection object has only null elements, it's not a collection (group of objects). Assuming that's right, there is no need to revise our definition of "collection," just our notion of which Collection objects that less formal term applies to. (That's if you have the same notion I had—that all Collection objects are collections.)

I'll take these questions in turn. There is a common feature of all elements of Collection objects. And it's not just that they're elements of Collection objects, of course. Pointing that out doesn't help. We want to know what makes them elements!

The pertinent common feature is their being values. That is, all elements are pieces of data in the form of variables or literals.

Null elements are values known as reference literals. These differ from primitive literals, not with respect to whether they're values, but with respect to whether they are directly assignable—so, assignable without autoboxing—to variables of reference types. These are types like Object and HashSet. We declare variables of these types if we want variables capable of storing objects' memory addresses.

Though primitive literals are values, they cannot be elements. Why not? Because they're neither references nor reference literals. Collection objects can only hold either variables of reference types or values directly assignable to variables of those types. This is because Collection is a blueprint for data structures that are designed to group objects together. Primitives are not objects, nor are they object references. So there is no need for Collection to accommodate the inclusion of primitives as elements.

But in describing collections as groups of objects known as its elements, Oracle seemed to be saying a collection's elements are the objects themselves. Not values serving as references to the objects. But then Oracle must mean something different by "element" when it refers to null elements in the documentation for Set.

Suppose we want our usage of "element" to be maximally consistent. Then we're at a crossroads, semantics-wise. Either we stop calling null values elements, or we stop calling objects elements, opting to call values elements instead.

Say we stop calling objects elements. Then we can alternatively define a collection as a group of values, and define those values as its elements. This way we can keep calling all Collection objects collections, and keep calling occurrences of null in a collection elements.

If on the other hand we reserve the term "element" for objects, then we have to say a Collection object $C$ has no elements if all $C$ "contains" is null. And we must deny that variables and literals are themselves elements of $C$ , since no variable or literal is itself an object. And if $C$ not only "contains" null, but also contains objects, we must say that those objects $C$ contains are $C′sC'\textrm{s}$ only elements—that null is not one of $C′sC'\textrm{s}$ elements.

As observed previously, if there are such things as null elements, a Collection object with only null elements is not a group of objects. So, by Oracle's definition, such an object is not a collection. Neither is a Collection object that "contains" null rather than elements. It's not a group of objects.

The only way a Collection object $C$ containing no other objects can be a group of objects is if it's an element of itself. $C$ would then be like a singleton set in that $C$ has only one object as an element. But $C$ would differ in that (i) if we count null as an element, $C$ has at least two elements if it "contains" null, and (ii) $\in C$ . (No set is a member of itself according to Zermelo-Fraenkel set theory.) It's an idea, but I imagine it would make matters more confusing rather than less.

I say we classify values, not objects, as elements. What do you think?