Operators are Messages Too

As well as named messages, Grace also allows operator symbols for binary messages. This allow us to use conventional symbols like + and -, and also allows the programmer to define her own operator symbols from single characters or multiple character sequences. Since the advent of Unicode, there are lots of characters to choose from.

Note that we are using the term binary message in the same sense as Smalltalk: a message with a receiver and one argument. The evaluation order is the obvious one: parenthesized sub-expressions first; then messages are sent left to right.

In the absence of parenthesis, it would be nice to leave the order of evaluation of a # b # d # e to the implementation: maybe the machine can do a # b and d # e in parallel? In that case, wouldn’t it be nice to allow it to do so? The problem is that this makes sense only for associative operators, and we have no way of knowing what # actually does. What about a – b – c? We can’t very well leave it up to the implementation to choose between (a – b) – c and a – (b – c)!

Grace’s message evaluation rules for operators also follow Smalltalk: apart from the special syntax, they follow exactly the same evaluation rules as any other message. We hope that this single evaluation rule, straightfoward syntax, and lack of implicit calls or coercions, will ensure that Grace programs that use operators will remain comprehensible.

1 + 2
1 + (2 * 3)
0 – 1
“Hello” ++ ” ” ++ “World”
bezerk !@#$%^&* istan

But this is where things get more tricky. As mentioned above, Grace operators are evaluated left-to-right. In our current design, unlike Smalltalk, (but like Self) it is a syntax error for two different operator symbols to appear in an expression without parenthesis to indicate order or evaluation — all Grace operators have the same precedence. The same operator symbol can be sent more than once without parenthesis.

1 + 2 + 3 // evaluates to 6
1 + (2 * 3) // evaluates to 7
(1 + 2) * 3 // evaluates to 9
1 + 2 * 3 // syntax error in Grace; would evaluate to 9 in Smalltalk

Named message sends bind more tightly than operator message sends: The following examples show first the Grace expressions as they would be written, followed by the parse

1 + 2.i                                     1 + (2.i)
(a * a) + (b * b).sqrt                 (a * a) + ((b *b).sqrt)
((a * a) + (b * b)).sqrt               ((a * a) + (b *b)).sqrt
a * a + b * b                           // syntax error
a + b + c                                 (a + b) + c
a – b – c                                   (a – b) – c

The One True Message Send has implications that carry over to operator messages also: an object can have only one method to respond to any given message, whether that message’s name is an operator symbol or a textual name. For example, Number can have only one definition for the “+” operator.

A second detail of this current design is that Grace has no unary operators. There can be negative numeric literals, like -2 or -345.34, but to negate an expression, programmers have to write either “e.negative” (using a named message) or “0 – e” using the binary “-” message.


We have more questions about the design for operators than some other parts of Grace’s design. The trade-offs seem more acute here than elsewhere for some reasons. This design has the key advantage that arbitrary operators can be defined, and the disadvantage that we do not support operator precedence.

Eric asks: whether it wouldn’t make more sense to require all operator sends to be parenthesized, rather than permitting multiple left associative calls. James thinks this worked OK in Self, and e.g. “a + b + c” isn’t generally a problem.

Kim says: the no-precedence design would be very confusing – we have to let people write normal arithmetic statements with the intuitive meanings.

James says: that there are advantages to requiring parentheses for novice programmers: they can be easily inserted by an IDE, many style guides recommend parens for complex expressions, and it’s worth it to permit arbitrary operators.

James thinks: if we do – after all – need operator precedence, the obvious design to fall back to is like Blue or C# – there are a fixed set of operators, with fixed precedence. The One True Message Send Rule will still apply; so operator methods would still be defined just like named methods, and operator message sends would behave just like named message sends.


We even have some draft BNF for expressions, ripped shamlessly from the Self manual in the first instance:

{ … } means zero or more occurrences of the symbols inside the braces.
[ … ] means zero or one occurrences of the symbols inside the braces.

expression ::= constant | named-send | operator-send | ‘(‘ expression ‘)’
constant ::= “self” | numberLiteral | stringLiteral | blockLiteral | objectLiteral
named-send ::= named-message | receiver “.” named-message | “super” “.” named-message
named-message ::= identifier arglist {identifier arglist}
arglist ::= “(” [{expression”,” } expression] “)”] | blockLiteral
operator-send ::= receiver operator-message | “super” operator-message
operator-message ::= operator expression
operator ::= op-char {op-char}
receiver ::= expression
blockLiteral ::= “{” [ param-decl-list => ] code “}”
code ::= { statement “;” } [“return”] expression

Built-in Objects: Booleans, Strings, and Numbers

Grace has three types of built-in objects: Booleans, Strings, and Rationals, the latter being exact. Grace will also support approximate IEEE 64-bit floating point numbers: particular implementations may support many other types of numbers. All numeric types will be sub-types of an abstract type Number.

The need for exact computation should be obvious to anyone who has tried to explain to a novice why 1/3 * 3 isn’t 1. Once ordinary arithmetic operations are exact, we don’t see a need for separate Natural or Integer types — Javascript and (early) Basic get by quite fine with only one numeric type – floating point numbers. And we don’t see how to eliminate inexact numbers, since operations like square root and sin are going to introduce them again.

Our concern is to minimize the basic types people need to learn: we shouldn’t have to teach the difference between integers & rationals & reals & inexact numbers on day 1. At some point it is important to teach these differences: in the design of Grace we want to support different pedagogical approaches to when this must be taught.

These names illustrate another (minor) design decision: Grace type names are spelled-out in full, rather than being abbreviated. So we have class Number rather than Num, and Boolean rather than Bool. So, what should we call the floating point type? FloatingPoint is ambiguous, because over time the library is likely to contain many floating point formats. DoublePrecision is rather wordy as well as being archaic; Binary64 is the official IEEE name, is shorter and more descriptive – and horrible for novices!

What does it mean for these objects to be “built in”? It means that there are denotations for them in the language syntax. (This is not true for Binary64s; there are, however, methods to generate Binary64s from other Numbers.) So Booleans in Grace are represented by the global names “true” and “false”, and programmers won’t be able to re-define what those names mean.

Grace will provide the usual operations on Booleans — which will be represented as messages. Binary operations will generally use single symbols, such as “&” for and, and “|” for or. We don’t think that the syntax will allow single-symbol unary operators, so Boolean negation will be the named message “not”.

Unlike C, Python, or Groovy, Grace supports no implicit conversions, so Numbers, Strings, Objects, or any other type cannot be used in contexts where Booleans are expected. Rather, Grace programs must explicitly test for empty Strings, Lists, and so on.

String literals in Grace are written between double quotes, as in C, Java, or Python. Strings literals support a range of escape characters such as “\t\b\n\f\r\v\\\””, and also escapes for Unicode.

Like Python, there is no Character type in Grace — rather, characters can be represented by Strings of length 1 where necessary. Like Java, Strings are immutable, and literals may be interned. Grace’s standard library will include mechanisms to support efficient incremental string construction. Strings will also conform to the protocol of an immutable IndexableCollection.

“Hello World!”
“\t”
“The End of the Line\n”
“A”

The Grace type Number is the supertype of all numeric types: we encourage programmers to use this type when writing most programs. Although Grace has no implicit conversions, subtyping means that any numeric type can be stored in a variable or passed as a parameter or return value of type Number.

Grace has three forms of Number literal: ordinary strings of digits, assumed to represent decimal (radix-10) numbers; literals with an explicit radix, indicated by a (decimal) number between 2 and 35 and a leading x; and base-exponent numerals using e as the exponent indicator, also always in decimal. All numeric literals evaluate to exact Rational numbers, so 0.2 will be exactly 1/5. So, there are no literals for inexact numbers.

1
-1
42
3.14159265
13.343e-12
-414.45e3
16xF00F00
2×10110100
0xDEADBEEF // Radix zero treated as 16

Grace will support all the usual binary arithmetic operators. However, as mentioned above, we are currently planning on not allowing unary operator symbols, so negation will be the named method “negative”. (Similarly, the Boolean “not” operator is another message send).

Messages sent to numbers support explicit conversions between number types; for example, sending the message “b64” to any Number will convert it to the 64 bit binary floating point.

Grace’s libraries may support a range of additional numeric types, such as machine integers, bytes, longer and shorter floating point numbers, and complex numbers. These types don’t need to be built-in: one of our design goals is to make library classes as convenient to use as built-in classes, so not being built-in does not mean that they are “second-class” in any way. These additional Number classes are likely to be required in some programs for efficiency, for calling external libraries, or for accessing external data formats. The aim of this design is to present beginners with well-behaved Numbers (and Booleans and Strings), while allowing their instructors to introduce more specialised subclasses as they are needed.

The plan is that most classes in the libraries will accept Numbers as arguments; for example, Indexed Collections, such as Arrays and Strings, will accept Number arguments describing string positions, but will raise an error if the index is not an integer, just as it will if the index is out of bounds.

Grace will have mechanisms (like Java’s final, Scala’s case classes or Lime’s extendedby) that will indicate that classes or interfaces may not be further extended – this should support special case code generation for efficiency.

For the hardcore, here is the BNF for numberLiterals:

numberLiteral ::= [-][radix x][digits][.digits] | [digits][.digits][e][-][digits]
radix ::= 0 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 23 | 33 | 34 | 35
digits :: = digit | digit digits
digit :: = 0 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z

One True Message Send

One True Message Send

Grace is a pure object-oriented language. Everything in the language is an object, and all computation proceeds by sending (dynamically dispatched) messages to objects. In this, Grace follows most strongly in the Smalltalk tradition. Like Smalltalk, Grace has “One True Message Send” rule that explains how expressions are evaluated: a message that is sent to a receiving object is looked up in the receiving object’s definition, and if a matching method is found, that method is executed. (What happens when a message is not found will be the subject of another note). This “One True Message Send Rule” is followed for all types of Grace objects, whether the builtin Strings, Booleans, and Numbers, or objects defined in libraries or by user code.

Message-send is the principal operation in Grace; as far as possible, other operations in the language (especially control structures) are phrased as message-sends. This is important because, if classes introduced by libraries are going to be “first class”, it must be possible for them to define their own control structures, and have those control structures be the social equals of the built-in control structures. In effect, we can extend the syntax of Grace in the libraries.

The syntax of message-send is more like Java or C# than Smalltalk: a receiver (sometimes implicit), then a message name (an identifier), then any arguments in parenthesis. As in Smalltalk, however, it is possible to break up a long argument list and distribute it throughout the message name, so one can define both of the following

It is up to the designer of an interface to decide on the best syntax for the language of messages that she is creating. One the interface designer has made her decision, clients have to follow along; it’s important that two sends of the same message look alike.

Arguments are always passed by object reference, and no implicit conversions or copying ever takes place.

Grace uses the keyword self to refer to the current object – what Simula, C++, and Java refer to as this. As in Java, Eiffel, and Self, the receiver of a message can be left implicit if it is self. Why self rather than this? Because it’s impossible to lecture about a language without using the English pronoun “this” in its ordinary sense!

Messages that take no arguments don’t get any parenthesis. This is because Grace has one shared namespace for method names and variables in objects. Thus, count could be either an access to a local variable or a send of the message count to self; to decide which one need only look at the declaration of count.This is a deliberate choice: the syntax abstracts away the implementation detail of whether a particular attribute is implemented by a variable or a method, and changing from one to the other does not require code that accesses the attribute to be modified. This is in contrast to C++ and Java, which use separate namespaces for variables and methods, and separate syntax for variable access and message send.

The motivation for allowing the arguments to be distributed through the name of a message is primarily to allow libraries to implement readable control structures. So

which looks like a conventional while loop, is actually a send of the message while()do() to self. To promote this usage, we allow the parameter parenthesis to be omitted if they would be adjacent to the pair of braces that create a closure. Moreover, because of Grace’s layout syntax, the above example can also be written as

where the braces are themselves replaced by layout.

This will allow libraries to implement constructs for things like atomicity, finalization and locking using messages. For example:

Grace’s One True Message Send rule has a number of consequences. One of the most important is that Grace does not allow overloading on argument type as in C++ or Java and their descents (C#, Scala): static type information never affects the execution of correct programs in Grace. So, if you want two methods on the same object to behave differently, you will have to give them distinct names. We expect sales of Roget’s Thesaurus to rocket.

At present we are not sure whether to support variadic methods. Most languages that started without variadic methods have added them; not having them may force the programer to create a tuple.

Let’s Start With Syntax

We all know that syntax is unimportant — in theory. However, it is quite important in practice, because we all have our pet loves and hates. Moreover, to even discuss competing ideas for the more substantive parts of the language, we need a syntax. So, let’s start by talking about it.

Brackets, Semicolons, and Blocks

We propose that Grace use curly brackets for grouping, and semicolons as statement terminators, However, we also plan to use indentation (layout) for grouping, and allow semicolons to be omitted at the end of lines.

The reasons for this is that it is really important for students to learn to use indentation correctly, if their programs are going to be readable. Having the compiler ignore indentation makes this hard to reinforce: the eye believes the indentation, even though the compiler does not. Making indentation significant means that student programs mean what they appear to mean.

The following code uses both semicolons and curly brackets:

while this example uses only layout:

and this uses both

Over the last 20 years, a major innovation in programming language syntax has been the reduction in separator and grouping symbols; we are following that trend. The most visible change has been from Algol and Pascal’s beginend to C’s “{-}”. Python and Haskell are the two main examples that Grace follows here, although CLU, Scala and Go allow statement separator semicolons to be omitted.

Why retain semicolons and brackets at all, even as options? There are for several reasons. For programmers converting from languages that use semicolons and brackets, they will provide important familiarity. Some pedagogical approaches may prefer explicit grouping and separation; supporting both implicit layout and explicit syntax means that students can learn about the differences and programmers can choose whatever is most appropriate. Perhaps most importantly, having explicit constructs will also make formal description of Grace easier to manage. That is, if you are trying to write a formal semantics for semicolon, it’s handy to have a semicolon operator to write about.

Block Semantics

Brackets and layout are enmeshed in one potentially more controversial aspect of Grace’s design: Grace’s bracket or layout blocks are in fact the same as Smalltalk’s square-bracket block: they are parameterless lambda expressions.

Grace does not have macros or quotation (like Scheme or Lisp), nor implicit proceduring conversions (like Algol-68 or Scala). Rather, like Smalltalk, any expression whose evaluation needs to be deferred must be wrapped explicitly in a block. So, both the control expression and the block to be repeated in a while statement must be inside either explicit brackets or layout indentation.

Apart from the curly brackets, Grace’s blocks differ from Smalltalk’s in one other respect: a zero-argument block that is at the “top level” of a statement list, that is, one that not passed as a parameter to a method or stored in a variable, will be evaluated as soon as it is created. This ensures that nested blocks in Grace that are used in the traditional Java, C, or Pascal way have the same semantics as the Java, C, or Pascal nested blocks. However, “first class” blocks — those that are passed as arguments to a method or assigned to a variable — will represent functions. That is, they will be like Smalltalk blocks or Lisp or Scheme lambdas. (For afficionados, this proposal for Grace design is the same as the deproceduring coercion in Algol-68).

Comments

While we’re on the topic, in today’s design, Grace’s comments are introduced by the same characters as Java:

(this means comment characters cannot be used as operators: Andrew would like other comment characters, perhaps || |* *|).

Comments will be attached to syntactic elements of the program, either where they appear, or for definitions, before the definition to which they apply, following Newspeak and several other languages.

Tabs

According Guido van Rossum’s (“Python’s Regrets”) Python’s support of both tab and space characters has caused many, many problems in practice. We propose to address this by treating all tab characters as syntax errors in Grace. String literals will support an escape sequence for tabs, so Grace programs should have no tab characters whatsoever. Grace editors or programming environments should transliterate all tabs to spaces. (Fortress does the same thing.)

Kim Comments: I’m slightly uncomfortable by having different semantics of blocks depending on whether or not they are at the top level. On the other hand, it gives you the right behavior