Comments

In most current programming languages, comments are treated as spaces; they separate lexemes, just as spaces do, but otherwise are ignored.
This is in spite of the fact that many people feel that comments are one of the most significant parts of a program. Correctly used, comments can make a program much more readable; badly used comments are an apology for bad code.

How did we get to a place where one of the most significant aspects of our languages are treated as—well—as insignificant?

In Fortran, a comments was any line with a “C” in column 6. This had the great advantage that the whole comment line could be thrown away without being processed, thus saving valuable core memory. Algol 60 followed a similar strategy, declaring that the sequence of symbols

; comment < any sequence not containing ; > ;

was equivalent to ; Both of these conventions have the problem that it is not clear if a comment refers to the statement proceeding it or the statement following it; this has to be inferred from the contents of the comment, which the computer is unlikely to understand. But they save memory and simplify the parser — obviously more important properties than clarity of intent!

So here’s an idea: suppose that we make the placement of comments syntactically meaningful? Might this help the next human reader to understand their intent?

If you think that I’m overestimating this difficulty, take a look at a paper by Michael Van De Vanter called “Preserving The Documentary Structure of Source Code” [1]. Van De Vanter concluded that in general, it’s just not possible to start with the “comments as spaces” convention, and do anything intelligent about inferring what the comment refers to.

With the advent of refactoring tools, which automatically re-arrange our code, it has become important that the computer does know which statement the comment refers to, for otherwise it can’t correctly position the comment in the refactored code. So, in Grace, comments are regarded as annotations that are attached to expressions and to statements. We note that Newspeak also attaches comments to known places in the program, so that they can be preserved during refactoring.

In an IDE, it’s a simple matter to attach a comment to a syntactic element: select that element, and use a context menu selection or a button to add a comment. We can imagine the comment appearing in a balloon, and popping up on mouse hover; we can also imagine some subtle shading or discrete tagging that will reveal the presence of latent comments. However, we also want Grace to have a representation in pure text, both as an interchange format and to permit editing in legacy text editors. So, we need some rules that will tell us exactly what syntactic element a comment is attached to when Grace code is turned into text. (I’m tempted to call this process ugly-printing, since it takes a Grace program that’s as pretty as an IDE can make it, and produces a plain-text version.)

After comments, layout is the other non-syntactically-meaningful element that is used in conventional programming languages. This implies that whatever comment convention we adopt, it should not require a particular layout. In other words, a “comment from here to the end of the line” convention won’t do, because it might require the addition of extra line breaks just to correctly terminate comments.

Thus, we are left with the need to invent a pair of comment delimiters, and a rule that tells the ugly-printer where to put the comments. The simplest rule that I can imagine is:

a comment is delimited by /* and */, and immediately follows the syntactic element to which it applies. When several nested syntactic elements end with the same character, the comment applies to the largest. Parentheses can be used to avoid this problem.

For example:

For this to work, it’s important that the heading of a method be a comment-able syntactic element For example,

Here a comment that describes the behavior of the method as a whole is attached to the method header. Note that requires clauses and types are other forms of annotation; these too must be unambiguously attached to a specific syntactic element if they are to reach their full potential.

/* and */ as comment delimiters have the advantage that they are really ugly. Whatever symbol is used to represent a comment in the textual representation of a program won’t be available as an operator symbol, so it makes sense to choose an ugly one.

For software engineering reasons, Grace will also support comments to end-of-line, probably by the // character familiar from C++ and Java.
Many (but not all) of the above examples also work with // comments:

Metadata

Postmodern programs need more than just human-readable comments – in fact, human-readable comments can be seen as as subset of wide universe of arbitrary metadata that can be attached to program elements. Experience with Java, C#, (and Scala) has shown the advantage of an extensible and flexible metadata facility in a programming language. Given that a key design goal of Grace is to support extensions via libraries, many library writers will need to interpret metadata attached to their client programs.

Grace will have a metadata facility based as a generalisation of its support for comments. We’re not yet sure how this will work.
Most likely it will follow C# in having multiple delimiters rather than Java’s @ symbol – say /: :/ and //: The problem is that might look too horrible for general use.

Perhaps permit more direct metadata in particular places in programs

  • wherever we’re writing a type
  • at the “top level??”

So e.g. rather than

or something, one could write

Other issues

For both comments and (especially) metadata, we need to consider the difference between:

  • annotating a method
  • annotating a return type of a method
  • annotating the receiver of the method.

The catch is that with our Pascal

syntax, how can be distinguish these cases?


References

[1] Michael L. Van de Vanter, “Preserving the Documentary Structure of Source Code in Language-Based Transformation Tools,” SCAM, pp.0133, First IEEE International Workshop on Source Code Analysis and Manipulation, 2001. http://research.sun.com/jackpot/COM.sun.mlvdv.doc.scam_nov01.paper_pdf.pdf

3 thoughts on “Comments

  1. In Thread, the query language I’m working on, comments are syntactically an operation, like projection/filtering/sorting/grouping. Every operation takes a set of nodes, and returns a set of nodes, so the comment operation takes a string argument, but just returns the same set of nodes passed to it. Like this:

    Album/Year|||This sets up groups of albums by year

    I.e., as if in Ruby comments were a “comment” method, so you might do:

    x = 3.comment(“this is a magic number arrived at through experimentation”)

  2. Don’t forget another common use of comments: to temporarily remove part of the program as a debugging triage mechanism. This has always been a completely different use of the same mechanism so there’s no reason a language couldn’t provide a distinct way to “comment out unparseable text”, but notice here you really don’t want to require the commented-out stuff to have to parse — that’s the whole point!

    In the modality of “commenting out” you really want comments to nest. But again, this doesn’t have to be “comments” — it can be something else. In 40-year-old terms, “#if 0 … #endif”.

  3. Perhaps Dan’s use case should be accommodated by a different, but related, syntax mechanism. Maybe, in addition to metadata-style comments, there could be “disabled regions” or somesuch.

    A long time ago I used a version of LabWindows that supported disabling blocks of code with the mouse. The disabled code was grayed out and ignored by the compiler. That way programmers could make an explicit distinction between these two kinds of comments. I think that feature would have merit in an educational context.

Leave a Reply

Your email address will not be published. Required fields are marked *