Notations for Collections

While the following is not central to the design of Grace, we do expect that users will frequently be working with collection classes like lists, sets, etc.

The first question, is how do you create one of these objects. In Java you would create a linked list of three integers as follows:

List lst = new LinkedList()
lst.add(1)
lst.add(2)
lst.add(3)

Note that Java does introduce a compact notation for declaring arrays:
Integer[] arr = {1,2,3}
but there is nothing equivalent to that for other collection classes.

Scala makes it easy:
var lst = List(1,2,3)

With Grace’s vararg feature (allowing a variable number of parameters at the end of the actual argument list), we could write:

var lst: List[Integer] := LinkedList[Integer].new(1,2,3)

or, with a small amount of type inference

var lst := LinkedList[Integer].new(1,2,3)

Python, as well as functional languages like ML and Haskell, allow programmers to create lists with a more compact notation: [1,2,3]

We expect to encourage a more listful style of programming with the use of higher-order functions like map, foreach, etc. Do we need to introduce a more compact notation for lists, like that in Python, or is the above fine?  What is the right answer given our target audience of first and second year students in CS.

As well as creating these collection classes, we also need to access elements. For linear sequences, we would like to access and update the ith element.

For Java arrays, we would write arr[i], while for lists we would write lst.get(i). The notation arr[i] can also be used on the left side of an assignment, while lst.set(i,newVal) results in replacing the ith element by newVal for a list.

In Scala, the access to the ith element is defined in an apply method, whose name can be omitted in a call. Thus, lst.apply(i) is the same as lst(i). An update method is used to provide a new value for the ith element (in a mutable collection).

Python accesses the ith element of a list with square brackets: lst[i], which can also be used on the left side of an assignment.

In Grace our current plans are to access elements using regular message sends where the message has a name like “at” or “get”. Another alternative might be to use an operator like “@” or use the [] notation like Python.

What do you think of these ideas? What choices are the simplest, most consistent, and/or most acceptable to our target oudience?

12 thoughts on “Notations for Collections

  1. I think using [] for indexing makes more sense than named method calls. Those are unpleasant enough to use for Java collections, let alone everywhere, and they’re a clearly-understood notation. There is also a consistent assignment operation to go with [], of the form “collection[key] := value”, which fits in with assignments to variables and members, rather than invoking an unrelated method name. That is how object vars work, so collections should behave the same way, with the same parallel between access and assignment.

    A postcircumfix [] operator may be useful outside of pure collections as well, and since there are already user-defined operators it isn’t a large burden to add to the language.

    Collection literals are nice to have, but there are only a few characters available (really only combinations of [ ] with {} and () here) so it’s important that the collections with literals really are the most important and commonly-used ones. Some collections are more useful as literals in their immutable forms (sets) while some may be better mutable (lists).

    The other useful point to consider with collection literals is whether and how comprehensions can be built into them. Although the same result can usually come from a constructor, map, and an appropriate block, set and list comprehensions in the style of Python and Haskell are clearer for many cases and understandable without higher-order functions. The existence of comprehensions might change which collections are most useful to have as literals as well.

  2. Personally, given the target audience is 1st & 2nd years, I would favour the compact [1,2,3] notation, along with {1,2,3} for sets. Obviously, I’m biased since that’s the notation used in Whiley. But, I think it makes a big difference since you spend most of your time working with list or set types.

    In Java, putting ArrayList everywhere just gets tedious. Even worse, having to initialise by first constructing then calling “add()” is a pain. You can shorten this to e.g. new ArrayList(){{add(“Hello”);}} but that’s still very awkward.

  3. I use the literal initialisers in Python all the time, and I think they would be useful in the kind of problems we often set first and second years, I don’t see any downside either. I’d be keen to see [] used for access and update because it’s so widely used (C, Java, Python, and so on), it is one of the really common conventions. Getter and setter methods are such a pain, but of course you then start to give up some of the uniformity, in particular lists start to become distinguished objects. You can live with this (Python), make everything as convenient (C++) or make everything as inconvenient (Java), none is ideal, but for a teaching language I would probably go down the Python road.

    Talking of sticking to conventions, using square brackets for type parameters seems silly, why not use angle brackets, like everyone else? The less you confuse people, the better!

  4. Remember, the goal is to make a language that is good for learning and teaching programming (in a strong computer science context), not to make a programming language that we would want to use.

    In java, there is a big disconnect between arrays and lists and other collections – they work completely differently in almost every regard.

    I think that is unfortunate and is bad for teaching. Conceptually, an array is a particular implementation of a list of a fixed size, which is particularly efficient. Java makes that connection hard to see. But I do want to see arrays in the language since I think there are important concepts to learn about how to implement more abstract data structures in terms of more concrete, more efficient data structures.

    Stand alone literals for lists, arrays, sets, etc are very useful and convenient in programming. All professional languages should have them. But, such literals are shorthands for constructors with intialisers. Shorthands confuse students because the shorthand does not say what it is and what is happening, and the students will be even more likely to just copy boilerplate code without understanding what is actually happening. We should only introduce shorthands when there is a strong argument for it, beyond making the programs faster to type.

    I think that the small extra cost of requiring an explicit constructor for intialised collections will help their understanding far more than we lose in extra characters.

    btw, the java shorthand for arrays:

    is a horrible hack, since you can only use this in an array declaration. I never use it in my teaching – I only use the longer (but more explanatory) form:

    since the longer form can be used anywhere:

    Also, every bit of shorthand adds an extra special case rule to the syntax, and students are not good at learning lots of special case syntax rules. If the collection construction and initialisation can use no additional new syntax, that is an advantage.

    The shorthand [] for indexing/accessing elements of a list or array could also be gotten rid of under the same argument. However, in addition to Nick’s point
    that it is a very widely used convention, I think there are three special advantages for this shorthand. First, it makes complex expressions using array/list elements much simpler and easier to read. Secondly, it makes a closer connection to the standard mathematical notation for indexing sequences/matrices etc and I think making connections to familiar mathematics is a good thing. Thirdly, it allows the place in the collection to be specified in a way that makes it clear that it is a location that can hold a value (ie, it is a variable or a lhs). It lets us write

    as well as

    and I think the first helps support a better understanding of collections (especially arrays).

    Therefore, I would want to
    – not use the Java [] notation for the type of an array (it should use the word array
    – allow the [] notation for indexing both arrays and lists (and maps, see below)
    – not build-in a standalone literal for lists or arrays
    – have easy and parallel structures to construct initialised lists and arrays that combine the initialisation into the construction.

    I also don’t want to put lists on a pedestal above other collection types (yes, I’m a lisp programmer, but lists aren’t everything!). I would want sets and bags and maps, and … to be just as straightforward to construct and use, and in a very similar way to lists (and arrays). I would want to allow the same [] notation for accessing elements of a map, to emphasise the relationship between lists (maps from subranges of the integers, or other finite ordinal types) and maps from other types.

    I like James’ suggestion for initialising collections:

    There is an undesirable conflict between [] for list/array indexing and [ ] for type parameters. This is almost enough to make me retract my preference for using [] for indexing.

    Is there a strong argument for using [] rather than < > for type parameters?

  5. Some good arguments here and I take your points – although I’ll leave to the others to debate the merits of [] vs <>…

    One thing with the syntax, note that local type inference means we should be able to write:

    rather than the longer form giving an explicit type to the variable. The main problem with this syntax, though, is what Array[Integer] means. If classes (like Array are just objects) what does it mean to parameterise an object? Java avoids this by having a distinguished syntax for new, but in Grace we’re hoping creation can be just messages to objects – like all other computation. Array.new sends “new” to the “Array” class / factory object; perhaps Array.[Integer]new or Array.new[Integer] sends an explicitly-bound type generic “new” method to the Array class. But what’s Array<Integer> when it’s on its own?

    (and of course all of these examples & problems would work equally well with <> for generics…)

  6. I agree with most of what Peter Andreae says, and with most of his reasoning. Arrays and Lists should not be special: all library-defined collections should have equal standing with respect to their initialization.

    Using square brackets for indexing has a venerable history that goes back to Algol 60. We can preserve this convention, at the same time giving all collections equal standing, by making [ ] the name of a method, like +. It’s another syntax rule (or several), but perhaps worth it. Then [ ] could be defined on Dictionaries to do lookup and so on fort other objects where that makes sense. (This directs us away from using [ ] for type parameters.)

    However, I think that it’s a bad idea to allow [ ] on the left hand side of an assignment. If I recall correctly, Dijkstra railed against this in “A Discipline of Programming“, and for once I think that he was right. One of the big ideas that I strive to get across in teaching OO is that a collection object is a single object, not a collection of variables. Such an object has its own properties; changing any of its internal state changes the whole object. Allowing b[3] on the left of an assignment encourages students to treat b as a collection of variables, rather than as a collection of values.

    So, although we could enable [ ] on the left of := by creating the [ ]:= method (just as we made name:= a method), I don’t think that we should. We would then have to explain that b[3] := 7 changes the object b, which is not at all clear from the syntax. A Smalltalk-like syntax

    b.at 3 put 7

    makes it clearer that we are changing the whole of b.

  7. I agree with Andrew, but would go even further. Why not just use
    b.at 3
    to get the third element, and
    b.at 3 put 7
    to change it. I don’t see the point of introducing a special syntax just for accessing collections.

    Recall that from a novices point of view, all of these are weird. If anything, they are used to using subscripts, not square brackets. I’d rather use syntax that is similar to other parts of the language than have one weird thing (that is at least as hard to type as the “at” method).

  8. I would get rid of varargs and use concise collection literal syntax […], which is interpreted depending on context (either array, list, set, map, …). In cases where there is no context, you could create the required context eg:

    Set [1,2,3,4]

    Or it can always be an array and each collection constructor would accept array as initializer.

    (As you see from my example, I would prefer that application syntax didn’t require params around arguments. If you need to pass multiple arguments, you just pass a tuple, which happens to have parens around it, this way each function takes always 1 argument — another simplification).

    For getting an element I prefer Scala style. Not sure about setting (doesn’t matter so much)

  9. Just stumbled on this Grace thing and reading through these blogs. I like what I’m reading so far. I too agree with Andrew Black. Uniformity is a good thing. Avoid special cases. The idea, if I understand, is to make a language that is good for teaching programming. Not a language that needs to be taught so that you can program.

  10. I would prefer that application syntax didn’t require params around arguments.

    This was one of the things we talked about at Splash. The current rule is that “arguments must be delimited” – so single arguments that are strings, lists, or blocks don’t need parentheses, but a single number or a method will need parentheses.

  11. The idea, if I understand, is to make a language that is good for teaching programming. Not a language that needs to be taught so that you can program

    Sure – or rather, because I think some teaching of the language will always be required – to clear away as much of the “accidental complexity” from the language that we can.

Leave a Reply

Your email address will not be published. Required fields are marked *