Let’s Start With Syntax

We all know that syntax is unimportant — in theory. However, it is quite important in practice, because we all have our pet loves and hates. Moreover, to even discuss competing ideas for the more substantive parts of the language, we need a syntax. So, let’s start by talking about it.

Brackets, Semicolons, and Blocks

We propose that Grace use curly brackets for grouping, and semicolons as statement terminators, However, we also plan to use indentation (layout) for grouping, and allow semicolons to be omitted at the end of lines.

The reasons for this is that it is really important for students to learn to use indentation correctly, if their programs are going to be readable. Having the compiler ignore indentation makes this hard to reinforce: the eye believes the indentation, even though the compiler does not. Making indentation significant means that student programs mean what they appear to mean.

The following code uses both semicolons and curly brackets:

while this example uses only layout:

and this uses both

Over the last 20 years, a major innovation in programming language syntax has been the reduction in separator and grouping symbols; we are following that trend. The most visible change has been from Algol and Pascal’s beginend to C’s “{-}”. Python and Haskell are the two main examples that Grace follows here, although CLU, Scala and Go allow statement separator semicolons to be omitted.

Why retain semicolons and brackets at all, even as options? There are for several reasons. For programmers converting from languages that use semicolons and brackets, they will provide important familiarity. Some pedagogical approaches may prefer explicit grouping and separation; supporting both implicit layout and explicit syntax means that students can learn about the differences and programmers can choose whatever is most appropriate. Perhaps most importantly, having explicit constructs will also make formal description of Grace easier to manage. That is, if you are trying to write a formal semantics for semicolon, it’s handy to have a semicolon operator to write about.

Block Semantics

Brackets and layout are enmeshed in one potentially more controversial aspect of Grace’s design: Grace’s bracket or layout blocks are in fact the same as Smalltalk’s square-bracket block: they are parameterless lambda expressions.

Grace does not have macros or quotation (like Scheme or Lisp), nor implicit proceduring conversions (like Algol-68 or Scala). Rather, like Smalltalk, any expression whose evaluation needs to be deferred must be wrapped explicitly in a block. So, both the control expression and the block to be repeated in a while statement must be inside either explicit brackets or layout indentation.

Apart from the curly brackets, Grace’s blocks differ from Smalltalk’s in one other respect: a zero-argument block that is at the “top level” of a statement list, that is, one that not passed as a parameter to a method or stored in a variable, will be evaluated as soon as it is created. This ensures that nested blocks in Grace that are used in the traditional Java, C, or Pascal way have the same semantics as the Java, C, or Pascal nested blocks. However, “first class” blocks — those that are passed as arguments to a method or assigned to a variable — will represent functions. That is, they will be like Smalltalk blocks or Lisp or Scheme lambdas. (For afficionados, this proposal for Grace design is the same as the deproceduring coercion in Algol-68).

Comments

While we’re on the topic, in today’s design, Grace’s comments are introduced by the same characters as Java:

(this means comment characters cannot be used as operators: Andrew would like other comment characters, perhaps || |* *|).

Comments will be attached to syntactic elements of the program, either where they appear, or for definitions, before the definition to which they apply, following Newspeak and several other languages.

Tabs

According Guido van Rossum’s (“Python’s Regrets”) Python’s support of both tab and space characters has caused many, many problems in practice. We propose to address this by treating all tab characters as syntax errors in Grace. String literals will support an escape sequence for tabs, so Grace programs should have no tab characters whatsoever. Grace editors or programming environments should transliterate all tabs to spaces. (Fortress does the same thing.)

Kim Comments: I’m slightly uncomfortable by having different semantics of blocks depending on whether or not they are at the top level. On the other hand, it gives you the right behavior

16 thoughts on “Let’s Start With Syntax

  1. About semicolons:

    Will Grace be using semicolons to separate statements, and commas to separate formal parameters and arguments? Ie will it have
    { stm1; stm2; stm3 }
    and also
    meth(exp1, expr2);

  2. One way to tackle the braces/indentation issues you raise is to make indentation required in any multi-line block, and to have the first indented line sent the required depth of indentation steps within the block. The distinction between a multi-line block and a single line block is probably needed if you plan to support Smalltalk-like use of blocks to allow teaching of closures in Grace.

    (Or should that be grAce? Afterall, weird capitalization is required for modern computing.)

  3. There are plenty of good and reasonable ideas here, but I fear resurrection of Algol 68’s deproceduring conversion is /not/ one of them.

    First, pedagogically, whether something is evaluated eagerly or lazily is a Big Idea in computing. When teaching Big Ideas, you want them to be explicit and clear, not subject to sophisticated context-sensitive rules. A smaller syntax with a more complicated semantics seems like a negative-value proposition.

    Second, and more importantly, you are offering a temptation to treat { … } as a grouping mechanism, but it is not. If it’s a grouping mechanism, then s and {s} are always equivalent. But println(“2==3”) is correct and println({“2==3”}) is an error. So we have to teach that {…} is grouping but there are places it doesn’t work unless/until we’re ready to teach delayed evaluation.

    Overall, Smalltalk’s ability to define if and while as methods is certainly elegant, but I think it’s not worth it if you have to resort to implicit deproceduring conversions. What is the argument against different syntax for grouping and zero-parameter lambdas? At the beginning you just teach that students must use different syntax for the branches of an if and body of a while. Later you point out that this is really just the syntax for delayed evaluation. I don’t see any problem here.

  4. Good questions Dan. This is why we want to start out on the blog, so our more crazy ideas can be caught as soon as possible. It’s a particularly good question since we already have other syntax for grouping – parenthesis () – which is just grouping, while curly braces {} do both grouping and delayed evaluation.

    One question relates to layout: it seems layout can mean just one of these kinds of grouping? Assuming (as proposed above) it means delayed evaluation, does this still work if it also means delayed evaluation everywhere?

    The reverse question is how much do Java or Python use “top level” blocks for grouping rather than delayed evaluation. Certainly Java uses “{}” brackets to delimit method bodies, but generally method bodies should not be delayed. Perhaps we can address this with a slightly more sophisticated layout rule, which means we wouldn’t need the “top level” block rule. A related question is how often do Java programs us top level blocks?

  5. James,

    What you sketch in your comment sounds right: ignoring layout, use separate syntax for eager grouping vs. zero-argument lambdas. Now if layout can mean only one of these things, which should it be? Dunno.

    Or is it possible to have the meaning of /layout/ depend a bit on context, so, for example, indentation immediately under an if or while can mean delayed evaluation, but otherwise does not. Sure this is treating if and while specially in the parser. Smalltalk doesn’t do that, but, for example, Ruby does, along with almost every other language. If it’s only in the parser and only w.r.t. layout, this seems like a reasonable special-case — and one can simply be more explicit (i.e., not use layout) when teaching eager vs. lazy evaluation.

    I also don’t have a clear sense of when/why programmers would use top-level blocks.

  6. Dan,

    Michael Kölling (who I hope will join us here sometime) asked another good question that is probably relevant here: are we assuming an IDE? Certainly an good IDE could address many of these issues.

    I must admit, I’m not that happy with context-sensitive layout – although I guess it is easier to finesse when necessary – but perhaps if the context sensitive rules are really simple?

    The obvious use of top-level braces is delimiting a method definition,
    and we’ll want those to be definable with layout, but they do not
    delimit a delayed evaluation. I suppose one solution would be to
    write methods inside () parens for consistency?

    Then again: perhaps those outer level braces {} are worth keeping (and keeping special) for compatibility. Grace will use braces for other definitions, such as classes and objects – where they most certainly aren’t around delayed blocks: the braces around a function could be treated like those braces: it works with layout, but it’s not within an expression.

    (Of course, there is another option which is tweaking things so the {brackets} around a method body do mean delayed evaluation – if you somehow wrote a definition with (parents) it would run the code immediately and define the value. Shades of FORTH. I’m not suggesting this!)

  7. If both braces and indentation are alternatives for grouping (which I like), I presume it will be a syntax error when they are both used and are inconsistent:
    ie
    while {x > 3} do {
    print(x);
    x := x+1;
    }
    should be a syntax error.

    What about:
    while {x > 3} do {
    print(x);
    x := x+1;
    }
    I presume that gratuitous grouping is valid.

    Note that an IDE can support, check and even enforce these syntax rules. An IDE could also enforce good indentation even if it were not a syntax rule, but I think it is better for the language itself to require valid indentation.

    Note on IDE and syntax rules: In my experience, it is best if an IDE for novice programmers checks and highlights as many syntax constraints as possible, and requires the programmer fix them early (eg whenever the programmer moves off a line). But, it should not fix the code or make it impossible for the programmer to make the error in the first place, since then the programmer never learns the rule. However, this note is off topic.

  8. Note also, that Sophia Drossopoulou’s comment about semicolons isn’t quite the same as James’ proposal which said semicolons would be statement terminators, not statement separators. Commas, on the other hand, I presume would be expression separators.

    Pascal used ; separators; Java uses ; terminators. I prefer the terminators because the rule is more consistent – students get more confused when modifying code with separators (eg, adding a second statement to a block with a single statement requires a new separator where there were none before). The problem is that separators “belong to” the block, whereas terminators “belong to” the statement. The latter is a simpler rule.

  9. We (Andrew, James & Kim) had a discussion today about layout vs. braces. Both Michael Kölling (offline) and Peter Andreae raise good questions about what happens when layout and braces are used together, and what to do when they conflict. If we let braces win (as they do in Haskell), then the text doesn’t mean what it appears to mean — which was the motivation for making layout significant in the first place.

    If we step back a little, and ask why we want to allow both braces and layout, we gain a little clarity. We want to allow braces so that very brief blocks can be can be written briefly, like {x => x} and {x, y => x + y} and especially {x := 0}. We want to allow layout so that long blocks can be written clearly, using multiple lines, without the visual clutter of braces and without fighting about where that closing } should go. So my current proposal is that braces be allowed only when the opening and closing brace appear on the same line. This means that the first example in the blog post

    while {file.hasNext} do {
        println(file.readLn);
    }

    would be illegal, although it would become legal simply by deleting the { after the do and the matching } at the end. The other two examples would be fine.

    The other place where we get into trouble is with the deproceduring coercion. This was a (probably misguided) attempt to allow nested blocks as found in Algol 60. The main use of such blocks is to introduce declarations part way through a computation, which Grace will allow without a new block. So, based on the feedback here, we will probably drop deproceduring as a bad idea, and make top-level blocks a syntax error. (Why a syntax error, rather than a no-op? So we can change our minds later, of course!)

    That leaves the use of indentation in definitions, most particularly in object constructors and class constructors. When we write an object constructor — the keyword object followed by an indented list of field and method declarations — we most certainly don’t mean deferred execution! On the contrary: we mean “make this object now!”

    We are currently thinking that the solution to this is to face reality, as Dan Grossman says, and just admit indentation after object is what the parser requires, while indentation after then is used to indicate delayed evaluation of a block. As James said, we can probably find a way to make method declarations sensible with either meaning (special syntax or delayed execution), but I don’t see how to do that for object constructors.

  10. While I can see that when indentation has meaning, mixing tabs and spaces will complicate things. But I am not looking forward to having to explain to a beginner why one character that can’t be see is ok but another character that can’t be seen is really bad. I’ve seen too many problems caused by things-that-can’t-be-seen in the past (default constructors in C++ being one extreme). Depending on the programming environment to do the right thing means….having to depend on a programming environment doing the right thing. Is there really no way to get tabs and spaces to play nice together?

  11. Hi Ewan

    Is there really no way to get tabs and spaces to play nice together?

    A good question. I’d hope there was – apparently Haskell manages it – but Python’s experience here (as reported by Guido) sounds pretty conclusive.

    Other Python comments on this:
    * PEP-8
    * Python Indentation

  12. I’m lukewarm on having multiple grouping constructs because novice programmers seem to have a lot of trouble dealing with alternatives with nuanced interactions. C-derived syntax allows bodies of if and loop statements to be either a single statement or a block of statements; this causes endless confusion over the difference between

    while (a)
    {
    f1();
    f2();
    }

    and

    while (a)
    f1();
    f2();

    and also between

    for (i = 0; i < 10; i++)
    cout << i;

    and

    for (i = 0; i < 10; i++);
    cout << i;

    I'd prefer there be one and only one right way of doing grouping, even if it leads to some visual clutter. IMO consistency is more important to novice programmers than aesthetics.

  13. Sure. Note that – at least in Grace – the relevant bits of for and while loops will only take blocks – not single statements – so the particular bugs you’re pointing out won’t happen in the same way in Grace.

    In some ways we have (at least) three grouping mechanisms:
    * (parenthesis) – grouping without delayed evaluation
    * {curly braces} – grouping with delayed evaluation
    * layout – grouping with delayed evaluation
    * [square brackets] – most likely collection “literals”; perhaps indexing.
    * “strings” – do strings count?

  14. Can’t say I care for layout based semantics. But my main comment is about the Algol 68 based coercion of closures to closure evaluations. In real usage, blocks that are not passed or assigned are extremely rare. So there is no practical reason to have this coercion. If something is enclosed in braces, it is a lambda. End of story. There is nothing you want to emulate in Algol 68. Treat things uniformly.

  15. Gilad writes:

    In real usage, blocks that are not passed or assigned are extremely rare.

    If I was a bit cryptic above – we’ve heard this one loud & clear! Without that exception Grace is simpler, clearer, and more regular. So that “corecion” won’t make it any further!

Leave a Reply

Your email address will not be published. Required fields are marked *