Sequential Control Structures

Control structures are important for imperative programming languages, and crucial for educational languages. The most important topics in a first imperative programming course are often sequence, selection, anditeration. Grace needs to support this pedagogy.

Grace’s control structures also need to help students to move onwards to other languages – particularly “curly brace” languages (C, C++, Java, etc) because, for most students, the majority of software they will have to deal with will be written in these languages. This means that Grace must facilitate reading and interpreting classically imperative syntax. On the other hand, Grace doesn’t need to replicate every nook and cranny of these languages. As C++ abstracted and simplified C, and Java abstracted and simplified C++, Grace can abstract and simplify Java (and Scala and Python and other languages too).

Selection and Iteration

So, as well as a pattern matching construct, Grace will support “if” statements:

and while loops:

Grace’s “for” loop will follow the “foreach” style.

The difference here is that Grace’s blocks are true lambdas: rather than the for statement introducing a variable binding that is used “later on” — we declare the iteration variable at the start of the block. The key principles here are economy – variables are only introduced as parameter (or variable) declarations. As with any other formal argument in Grace, the block parameters are bound in a (conceptually) new instance of the block execution context for each iteration – so Grace is not suceptible to the kinds of problems in C# (that defines the foreach loop as repeatedly assigning to the same mutable variable).

As in Python, for can be used with a range to iterate through integers:

Control Structures Defined in Libraries

Grace’s syntax for control structures is precisely Grace’s syntax for message send — indeed, the Smalltalk-like multiple keyword messages and the layout syntax should work together to that end. The key reasons for this are that we want Grace to support different language levels, like Racket, and different pedagogies, particularly regarding concurrency. Instructors need to be able to define “language extensions” in libraries, ideally without needing any overly specialised constructors. Having a single general syntax also shouldmean Grace is easier to parse, and eventually to compile.

Using Grace’s keyword message syntax for control structures also means that the control structures can (at least conceptually) be defined in Grace, for example:

for while loops, and for if statements:

We don’t want to commit all Grace implementations to implement control flow structures in this way — Grace implementations may forbid redefinition of control flow messages, intercept call to them, and implement them directly. But (for more advanced students) we think being able to explain even basic control structures within Grace is an advantage.

Loop with exit

In an earlier design, we proposed a single “loop-with-exit” construct:

James is currently leaning against this design because it’t not clear what all the semantics are – what is the “exit”, what is its scope, how it is resolved. As Andrew says, in Smalltalk, programmers just refactor and use “return” – and that will work in Grace.

Advanced programmers should be able to define loop-with-exit in Grace:

although using it is a little ugly as the “exit” becomes an explicit continuation block:

This does indicate a weakness of Grace’s approach to defining syntax or language extensions: some things will be uglier than they could be if Grace had truly extensible syntax, e.g. like Lisp/Scheme/Racket macros. In this Grace also follows Smalltalk. We hope the advantages of a smaller language with more regular syntax will overcome the contortions that may occasionaly be required.

Exceptions

Grace needs an exception mechanism since exceptions are another common flow control construct of many programming languages, and students using Grace will need to be taught to use exceptions. We plan to support basic unchecked exceptions on the lines of ML, Java, C# etc.

Exceptions will be generated by the “raise” keyword (syntatically another message call) that takes an argument of some subtype of Exception:

and will be caught by a “catch” statement that syntactically parallels
the “match” statement, e.g.:

Exceptions can’t be restarted, however, the stack frames that are terminated when an exception is raised should be pickled so that they can be used in the error reporting machinery (debugger, stack trace).

The Case for Case

We will need a case statement in Grace to cover both the kind of simple cases covered by switch statements in C-family languages (though with less error prone syntax) as well as typecase statements that will provide the introspection on types and internal structure information as Java’s instanceof and Smalltalk’s understands: and isKindOf: . We could also consider a richer style of pattern matching that takes advantage of ML and Haskell-style destructuring pattern matching, similar to that used by Scala and Newspeak.

The simplest use of a case statement would be the following (lifted from a Scala example):

The keyword “otherwise” could be replaced by “else” to keep down the number of keywords, but it is arguably a good idea for it to be distinct from the conditional alternative to help (human) parsing (but could be argued out of it).

Putting “match” second follows Scala, and looks like we are sending the match message to Number (in fact, match could be defined in Object or whatever we are calling Top) with a parameter which is a lexical closure, or something like it. On the other hand, putting match first follows most other Grace statements.

Going a step further

An interesting issue is whether or not to go farther with pattern matching. This would have power similar to that in Scala, and approach Newspeak, but it would work somewhat differently because our constructors work differently from theirs.

The idea is that classes that we would like to use pattern matching with would require a special method that would extract the values to be used in a pattern match. For the example with Some<T>, we might add:

Then we could match against an Option type like this:

A list data type might have an extractor:

where head and tail are instance variables. A match could then look like:

The real question is, first, whether the benefits are worth the extra features here; and second, whether to stop there or head for a full Newspeak style generalised match protocol. One advantage of a full protocol is that other useful matches – notably regexp string matching – may be able to be incorporated into the same construct.

An alternative: pattern matching lambdas

Rather than introducing a special match statement with syntax to handle multiple cases, we have an alternative proposal where any block (with parameters) acts as a single pattern-match against those parameters. It is an error if the block is called with parameters that don’t match.

Then, a match exp case block case block “method” combines several pattern-matching lambda-blocks into a full match. For example:

This pattern-matching-lambda proposal was inspired by the realisation that, in Grace, every block (lambda expression) is potentially a type-match. This is because any lambda could be called from a dynamically-typed context, and so Grace will need to implement that type-match anyway. It turns out that Haskell, at least, also supports destructuring in lambda expressions.

Of course, both the lambda design and the multiple case syntax give Grace anonymous partial function definitions — that’s a good thing. The choice here is between a special syntatic construct for handling a series of cases that is used only within a “match” statement, or generalising lambda blocks to support matching against a single case and having a match construct that just glues a series of lambdas together. Pattern match lambdas are potentially more flexible, but it’s not clear which is easier to implement.

Questions

  • How powerful should matches be:
    • separate value match and type match statements?
    • single match statement that handles both value and type matches?
    • …also including destructuring matches?
    • …also including a full OO matching protocol allowing first-class patterns and user-level matching?
  • Do we stay with the Scala/Newspeak design – simple method definitions which can have a match statement inside it?
    • or should we support multiple cases and matching for method definitions?
    • or head out towards multimethods and predicate dispatch?
  • Which alternative – a match statement or pattern-matching-lambdas?

Some References to Scala and Newspeak

Values, Equals, and Hashcodes

The issues of hash codes, equals methods, and what object identity really means is quite fundamental, and quite fraught. Certainly the Java solution of programmers defining equals and hash methods – according to some rather strict guidelines – has proven to have many problems in practice, and James has found it an annoying two lectures to teach for Java or C++.

William Cook (OOPSLA 2009) has argued that if objects are truly abstractions, equals and hash-code should be left up to each object to implement: there should be “no built-in notion of equality” because this would break the abstractions offered by objects. On the other hand, it’s not clear a language would be useful without some kind of equals for at least some kind of objects. Haskell offers another option: make an explicit interface (like the Eq type class) and only objects that provide equality should confirm to that interface. Hashcodes could be treated in the same way. While this would mean not every object had to participate in equality, we would still have to teach students about the necessarily difficult contracts of those methods. We could be more extreme: following Derek Rayside & Daniel Jackson and generate equality methods automatically based on explicit class invariants. But that seems a little far for a practical language today.

We propose that Grace will adopt a much more straightforward approach. Grace’s equals will be based on Henry Baker’s Egal predicate – structural equality of immutable state – thus making it similar to Clojure. Any mutable object (an object with at least one field) will only equal itself. Immutable objects (comprising only constants and methods, but no writeable fields) will be equal to other objects with the same structure (constants and methods) and the same values for all of those constants. Grace will provide just one equality predicate, written = (or !=) using this definition, and programmers will not be able to override it. Compatible hashcodes will also be provided.

Compared with Java or Smalltalk, this is hopefully much simpler, “does the right thing” for many common cases, and encourages immutable objects. The two main difficulties are first, that it cannot cope with objects whose abstract state may be equal but their “concrete” state is not. A linked list and an array list with identical contents will not be =, even if they are immutable. The second difficulty is that two mutable objects, two array lists, say, will never be = even if they have identical contents and are two instances of the same class. = in Grace is stable, if e1 = e2 is ever true, it will always be true. This cannot apply to mutable objects (unless e1 and e2 are actually the same object) . To compare or index using collections of objects, programmers will need use some kind of immutable objects: Grace will provide immutable tuple literals and methods to convert collections into tuples.

A key advantage of this definition of = is that immutable objects can act as pure values. Especially for (so-called) value objects — immutable objects containing only other immutable objects — Grace programs cannot distinguish between different instances with the same value, just as in most languages integers are compared by value, not by reference (you can’t tell the difference between the integer “1” even at different memory addresses). A Grace implementation can support value objects using whatever implementation is most efficient: either passing by reference always, by passing some types by value, or even by inlining fields into their containing objects, and doing a field-update if the containing object assigns in a new value.

In this way, Grace hopes to support a simple and stable definition of object equality, hashcodes, and value types – all without requiring Grace programmers to write any code.

Some references:

Comments

In most current programming languages, comments are treated as spaces; they separate lexemes, just as spaces do, but otherwise are ignored.
This is in spite of the fact that many people feel that comments are one of the most significant parts of a program. Correctly used, comments can make a program much more readable; badly used comments are an apology for bad code.

How did we get to a place where one of the most significant aspects of our languages are treated as—well—as insignificant?

In Fortran, a comments was any line with a “C” in column 6. This had the great advantage that the whole comment line could be thrown away without being processed, thus saving valuable core memory. Algol 60 followed a similar strategy, declaring that the sequence of symbols

; comment < any sequence not containing ; > ;

was equivalent to ; Both of these conventions have the problem that it is not clear if a comment refers to the statement proceeding it or the statement following it; this has to be inferred from the contents of the comment, which the computer is unlikely to understand. But they save memory and simplify the parser — obviously more important properties than clarity of intent!

So here’s an idea: suppose that we make the placement of comments syntactically meaningful? Might this help the next human reader to understand their intent?

If you think that I’m overestimating this difficulty, take a look at a paper by Michael Van De Vanter called “Preserving The Documentary Structure of Source Code” [1]. Van De Vanter concluded that in general, it’s just not possible to start with the “comments as spaces” convention, and do anything intelligent about inferring what the comment refers to.

With the advent of refactoring tools, which automatically re-arrange our code, it has become important that the computer does know which statement the comment refers to, for otherwise it can’t correctly position the comment in the refactored code. So, in Grace, comments are regarded as annotations that are attached to expressions and to statements. We note that Newspeak also attaches comments to known places in the program, so that they can be preserved during refactoring.

In an IDE, it’s a simple matter to attach a comment to a syntactic element: select that element, and use a context menu selection or a button to add a comment. We can imagine the comment appearing in a balloon, and popping up on mouse hover; we can also imagine some subtle shading or discrete tagging that will reveal the presence of latent comments. However, we also want Grace to have a representation in pure text, both as an interchange format and to permit editing in legacy text editors. So, we need some rules that will tell us exactly what syntactic element a comment is attached to when Grace code is turned into text. (I’m tempted to call this process ugly-printing, since it takes a Grace program that’s as pretty as an IDE can make it, and produces a plain-text version.)

After comments, layout is the other non-syntactically-meaningful element that is used in conventional programming languages. This implies that whatever comment convention we adopt, it should not require a particular layout. In other words, a “comment from here to the end of the line” convention won’t do, because it might require the addition of extra line breaks just to correctly terminate comments.

Thus, we are left with the need to invent a pair of comment delimiters, and a rule that tells the ugly-printer where to put the comments. The simplest rule that I can imagine is:

a comment is delimited by /* and */, and immediately follows the syntactic element to which it applies. When several nested syntactic elements end with the same character, the comment applies to the largest. Parentheses can be used to avoid this problem.

For example:

For this to work, it’s important that the heading of a method be a comment-able syntactic element For example,

Here a comment that describes the behavior of the method as a whole is attached to the method header. Note that requires clauses and types are other forms of annotation; these too must be unambiguously attached to a specific syntactic element if they are to reach their full potential.

/* and */ as comment delimiters have the advantage that they are really ugly. Whatever symbol is used to represent a comment in the textual representation of a program won’t be available as an operator symbol, so it makes sense to choose an ugly one.

For software engineering reasons, Grace will also support comments to end-of-line, probably by the // character familiar from C++ and Java.
Many (but not all) of the above examples also work with // comments:

Metadata

Postmodern programs need more than just human-readable comments – in fact, human-readable comments can be seen as as subset of wide universe of arbitrary metadata that can be attached to program elements. Experience with Java, C#, (and Scala) has shown the advantage of an extensible and flexible metadata facility in a programming language. Given that a key design goal of Grace is to support extensions via libraries, many library writers will need to interpret metadata attached to their client programs.

Grace will have a metadata facility based as a generalisation of its support for comments. We’re not yet sure how this will work.
Most likely it will follow C# in having multiple delimiters rather than Java’s @ symbol – say /: :/ and //: The problem is that might look too horrible for general use.

Perhaps permit more direct metadata in particular places in programs

  • wherever we’re writing a type
  • at the “top level??”

So e.g. rather than

or something, one could write

Other issues

For both comments and (especially) metadata, we need to consider the difference between:

  • annotating a method
  • annotating a return type of a method
  • annotating the receiver of the method.

The catch is that with our Pascal

syntax, how can be distinguish these cases?


References

[1] Michael L. Van de Vanter, “Preserving the Documentary Structure of Source Code in Language-Based Transformation Tools,” SCAM, pp.0133, First IEEE International Workshop on Source Code Analysis and Manipulation, 2001. http://research.sun.com/jackpot/COM.sun.mlvdv.doc.scam_nov01.paper_pdf.pdf