Principles of Language Design

Late last year, Sophia Drossopoulou and Éric Tanter asked us about the principles behind the design of Grace. About the values, principles, and practices, in fact, with reference to Dave Ungar’s great Dahl-Nygaard ECOOP Keynote.

I’m pretty sure we’ll be able to do a good job of describing Grace’s principles retrospectively, in around 2025 (doing as good a job as Dave will be much harder of course). To be fair, we have made a start, in that the Grace Manifesto and SPLASH sides contain a potpourri of principles — perhaps values and practices too — that we seek to follow in the design of Grace.

So as well as posting notes on language design, in 2011, we plan to post notes on the principles underlying our design of Grace – unpacking the principles in the manifesto and panel slides (they’re not quite the same, it turns out) and no doubt introducing some new principles we discovered along the way.

For now, I’ll start by mentioning one principle that doesn’t get an explicit notice in either the manifesto or the panel slides: openness. We genuinely hope that the design of Grace can be as open as possible: while there is a core group of “committers” to the specification, as yet, nothing’s fixed or final: and we have already made or confirmed significant decisions about Grace based on the feedback from supporters and the blog. We hope to continue as openly and as transparently as we can — that’s what this blog is about — and we hope to augment this with some Grace workshops around the world in 2011.

Wishing you all a happy and graceful New Year

Normal Service Should Now Be Resumed

We’ve had some problems with this blog over the last week, due mainly to issues with our commercial hosting provider.

Gracelang.org – and this blog – are now hosted at Victoria University of Wellington, in New Zealand, thanks to the ECS programming team & friends. So while it make take a little longer to load, it should always be here.

What’s in a Name: Variable and Constant bindings

As in most languages, Grace lets you bind names to values. In Grace’s case, the values are always objects. Some bindings are constant, that is, they don’t change until the names goes out of scope; others are variable, which means that the name can be re-bound to a new value. This re-binding is accomplished using assignment.

Constant Bindings

Constant bindings are used to bind method parameters to the arguments of the message that started the method execution. Grace supports constant bindings with the keyword const.

Const requires a name to define, and either or both an initialiser and a type. Const can be used to give a name to a literal, or to any other object.

If there is no initialiser, the identifier must be assigned (using =) before it is used.

The point of definitions like “const x : Rational” or const x — without an initialiser — is that in class definitions, they will be used to define per-instance constants that will be initialised when the object is constucted. For example, a point in Grace would declare two constants “x” and “y” — which would differ for each instance.

Uninitialised variables (of any type) are given a special “uninitialised” value; accessing this value is an error (caught either at run time or at compile time, depending on the cleverness of your implementor).

Variables

Naturally enough, Grace supports variable declarations via the “var” keyword.

Unlike constant name bindings, variables can be re-bound using assignment “:=”

Note that const uses = initialisation, while var uses := for both the initial and subsequent bindings.

As in Java, both const and var declarations may occur anywhere within a method or block: their scope is to the end of their defining block or method.

Fields

Within object or class constructors, fields are declared by the field keyword rather than var.

The idea here is that classes will contain methods defined by the method keyword, and fields defined by the field keyword. We are hoping that we can choose Grace’s keywords to guide the terminology used in discussing the language.

Const declarations an also appear in object and class constructors, to define per-instance constants.

Single Namespace

Grace has a single namespace for methods and variables. This means that methods should be able to override fields, and fields override methods.

A declaration of field or a constant named “x” creates a reader method called “x”. A declaration of a (writable) field creates a setter method called “set_x”. All code of the form “e1.x := e2” is executed by sending the “set_x” setter method to the object e1, passing e2 as an argument.

The upshot of this is, if a Grace program declares a pair of methods in an object:

clients (and subclasses) of that object are unable to distinguish this from a field declaration

This ensures that fields and constants are evaluated in Grace using the same message send evaluation rule as method sends.

Types

We haven’t mentioned types. That’s because we haven’t worked out the even the initial details – but that’s what we’ll look at soon.

Decisions

This note outlines our current design of Grace. We have a few questions or concerns about this design:

  1. Should Grace distinguish between var to declare temporary variables and field to declare instance fields – or if not, which one should we pick?
  2. Should Grace distinguish between := for mutable fields and variables, and = for constants?
  3. How should Grace name setter methods?  “set_x” or “s:=” or something else.

BNF


Operators are Messages Too

As well as named messages, Grace also allows operator symbols for binary messages. This allow us to use conventional symbols like + and -, and also allows the programmer to define her own operator symbols from single characters or multiple character sequences. Since the advent of Unicode, there are lots of characters to choose from.

Note that we are using the term binary message in the same sense as Smalltalk: a message with a receiver and one argument. The evaluation order is the obvious one: parenthesized sub-expressions first; then messages are sent left to right.

In the absence of parenthesis, it would be nice to leave the order of evaluation of a # b # d # e to the implementation: maybe the machine can do a # b and d # e in parallel? In that case, wouldn’t it be nice to allow it to do so? The problem is that this makes sense only for associative operators, and we have no way of knowing what # actually does. What about a – b – c? We can’t very well leave it up to the implementation to choose between (a – b) – c and a – (b – c)!

Grace’s message evaluation rules for operators also follow Smalltalk: apart from the special syntax, they follow exactly the same evaluation rules as any other message. We hope that this single evaluation rule, straightfoward syntax, and lack of implicit calls or coercions, will ensure that Grace programs that use operators will remain comprehensible.

1 + 2
1 + (2 * 3)
0 – 1
“Hello” ++ ” ” ++ “World”
bezerk !@#$%^&* istan

But this is where things get more tricky. As mentioned above, Grace operators are evaluated left-to-right. In our current design, unlike Smalltalk, (but like Self) it is a syntax error for two different operator symbols to appear in an expression without parenthesis to indicate order or evaluation — all Grace operators have the same precedence. The same operator symbol can be sent more than once without parenthesis.

1 + 2 + 3 // evaluates to 6
1 + (2 * 3) // evaluates to 7
(1 + 2) * 3 // evaluates to 9
1 + 2 * 3 // syntax error in Grace; would evaluate to 9 in Smalltalk

Named message sends bind more tightly than operator message sends: The following examples show first the Grace expressions as they would be written, followed by the parse

1 + 2.i                                     1 + (2.i)
(a * a) + (b * b).sqrt                 (a * a) + ((b *b).sqrt)
((a * a) + (b * b)).sqrt               ((a * a) + (b *b)).sqrt
a * a + b * b                           // syntax error
a + b + c                                 (a + b) + c
a – b – c                                   (a – b) – c

The One True Message Send has implications that carry over to operator messages also: an object can have only one method to respond to any given message, whether that message’s name is an operator symbol or a textual name. For example, Number can have only one definition for the “+” operator.

A second detail of this current design is that Grace has no unary operators. There can be negative numeric literals, like -2 or -345.34, but to negate an expression, programmers have to write either “e.negative” (using a named message) or “0 – e” using the binary “-” message.


We have more questions about the design for operators than some other parts of Grace’s design. The trade-offs seem more acute here than elsewhere for some reasons. This design has the key advantage that arbitrary operators can be defined, and the disadvantage that we do not support operator precedence.

Eric asks: whether it wouldn’t make more sense to require all operator sends to be parenthesized, rather than permitting multiple left associative calls. James thinks this worked OK in Self, and e.g. “a + b + c” isn’t generally a problem.

Kim says: the no-precedence design would be very confusing – we have to let people write normal arithmetic statements with the intuitive meanings.

James says: that there are advantages to requiring parentheses for novice programmers: they can be easily inserted by an IDE, many style guides recommend parens for complex expressions, and it’s worth it to permit arbitrary operators.

James thinks: if we do – after all – need operator precedence, the obvious design to fall back to is like Blue or C# – there are a fixed set of operators, with fixed precedence. The One True Message Send Rule will still apply; so operator methods would still be defined just like named methods, and operator message sends would behave just like named message sends.


We even have some draft BNF for expressions, ripped shamlessly from the Self manual in the first instance:

{ … } means zero or more occurrences of the symbols inside the braces.
[ … ] means zero or one occurrences of the symbols inside the braces.

expression ::= constant | named-send | operator-send | ‘(‘ expression ‘)’
constant ::= “self” | numberLiteral | stringLiteral | blockLiteral | objectLiteral
named-send ::= named-message | receiver “.” named-message | “super” “.” named-message
named-message ::= identifier arglist {identifier arglist}
arglist ::= “(” [{expression”,” } expression] “)”] | blockLiteral
operator-send ::= receiver operator-message | “super” operator-message
operator-message ::= operator expression
operator ::= op-char {op-char}
receiver ::= expression
blockLiteral ::= “{” [ param-decl-list => ] code “}”
code ::= { statement “;” } [“return”] expression

Built-in Objects: Booleans, Strings, and Numbers

Grace has three types of built-in objects: Booleans, Strings, and Rationals, the latter being exact. Grace will also support approximate IEEE 64-bit floating point numbers: particular implementations may support many other types of numbers. All numeric types will be sub-types of an abstract type Number.

The need for exact computation should be obvious to anyone who has tried to explain to a novice why 1/3 * 3 isn’t 1. Once ordinary arithmetic operations are exact, we don’t see a need for separate Natural or Integer types — Javascript and (early) Basic get by quite fine with only one numeric type – floating point numbers. And we don’t see how to eliminate inexact numbers, since operations like square root and sin are going to introduce them again.

Our concern is to minimize the basic types people need to learn: we shouldn’t have to teach the difference between integers & rationals & reals & inexact numbers on day 1. At some point it is important to teach these differences: in the design of Grace we want to support different pedagogical approaches to when this must be taught.

These names illustrate another (minor) design decision: Grace type names are spelled-out in full, rather than being abbreviated. So we have class Number rather than Num, and Boolean rather than Bool. So, what should we call the floating point type? FloatingPoint is ambiguous, because over time the library is likely to contain many floating point formats. DoublePrecision is rather wordy as well as being archaic; Binary64 is the official IEEE name, is shorter and more descriptive – and horrible for novices!

What does it mean for these objects to be “built in”? It means that there are denotations for them in the language syntax. (This is not true for Binary64s; there are, however, methods to generate Binary64s from other Numbers.) So Booleans in Grace are represented by the global names “true” and “false”, and programmers won’t be able to re-define what those names mean.

Grace will provide the usual operations on Booleans — which will be represented as messages. Binary operations will generally use single symbols, such as “&” for and, and “|” for or. We don’t think that the syntax will allow single-symbol unary operators, so Boolean negation will be the named message “not”.

Unlike C, Python, or Groovy, Grace supports no implicit conversions, so Numbers, Strings, Objects, or any other type cannot be used in contexts where Booleans are expected. Rather, Grace programs must explicitly test for empty Strings, Lists, and so on.

String literals in Grace are written between double quotes, as in C, Java, or Python. Strings literals support a range of escape characters such as “\t\b\n\f\r\v\\\””, and also escapes for Unicode.

Like Python, there is no Character type in Grace — rather, characters can be represented by Strings of length 1 where necessary. Like Java, Strings are immutable, and literals may be interned. Grace’s standard library will include mechanisms to support efficient incremental string construction. Strings will also conform to the protocol of an immutable IndexableCollection.

“Hello World!”
“\t”
“The End of the Line\n”
“A”

The Grace type Number is the supertype of all numeric types: we encourage programmers to use this type when writing most programs. Although Grace has no implicit conversions, subtyping means that any numeric type can be stored in a variable or passed as a parameter or return value of type Number.

Grace has three forms of Number literal: ordinary strings of digits, assumed to represent decimal (radix-10) numbers; literals with an explicit radix, indicated by a (decimal) number between 2 and 35 and a leading x; and base-exponent numerals using e as the exponent indicator, also always in decimal. All numeric literals evaluate to exact Rational numbers, so 0.2 will be exactly 1/5. So, there are no literals for inexact numbers.

1
-1
42
3.14159265
13.343e-12
-414.45e3
16xF00F00
2×10110100
0xDEADBEEF // Radix zero treated as 16

Grace will support all the usual binary arithmetic operators. However, as mentioned above, we are currently planning on not allowing unary operator symbols, so negation will be the named method “negative”. (Similarly, the Boolean “not” operator is another message send).

Messages sent to numbers support explicit conversions between number types; for example, sending the message “b64” to any Number will convert it to the 64 bit binary floating point.

Grace’s libraries may support a range of additional numeric types, such as machine integers, bytes, longer and shorter floating point numbers, and complex numbers. These types don’t need to be built-in: one of our design goals is to make library classes as convenient to use as built-in classes, so not being built-in does not mean that they are “second-class” in any way. These additional Number classes are likely to be required in some programs for efficiency, for calling external libraries, or for accessing external data formats. The aim of this design is to present beginners with well-behaved Numbers (and Booleans and Strings), while allowing their instructors to introduce more specialised subclasses as they are needed.

The plan is that most classes in the libraries will accept Numbers as arguments; for example, Indexed Collections, such as Arrays and Strings, will accept Number arguments describing string positions, but will raise an error if the index is not an integer, just as it will if the index is out of bounds.

Grace will have mechanisms (like Java’s final, Scala’s case classes or Lime’s extendedby) that will indicate that classes or interfaces may not be further extended – this should support special case code generation for efficiency.

For the hardcore, here is the BNF for numberLiterals:

numberLiteral ::= [-][radix x][digits][.digits] | [digits][.digits][e][-][digits]
radix ::= 0 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 23 | 33 | 34 | 35
digits :: = digit | digit digits
digit :: = 0 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z