The Avail Programming Language

string ≝ <character…|>

type lattice focused on strings

A string represents text without respect to style or encoding. Naturally, the content of a string consists of characters arrayed in a fixed order. Just as the words east and teas are different because the orderings of their letters are different, so the strings "east" and "teas" are different because the orderings of their characters are different.

The type string is synonymous with <character…|>. It is the tuple type whose leading types are the empty tuple (<>), whose default type is character, and whose cardinality restriction is whole number. So, a string is merely a tuple whose elements are characters exclusively.

Like integer and double literals, string literals are understood directly by the Avail compiler. A string literal begins with a quotation mark " (U+0022), continues with arbitrary Unicode characters optionally interleaved with escape sequences, and ends with a quotation mark. The string literal metacharacters are quotation mark and reverse solidus \ (U+005C) — also called backslash. A reverse solidus alters the meaning of the next character, called the escaped character:

  • When the escaped character is reverse solidus or quotation mark, then its special meaning as a metacharacter is disabled. So the string "\\" has a size of 1 and its only character is a reverse solidus, and the string "\"Hello, Alice, this is Bob.\"" has a size of 28 and its first and last characters are quotation marks.
  • When the escaped character is t (U+0074), then the intended character is character tabulation (U+0009), also called horizontal tab. So the string "\tIndented" has a size of 9 and its first character is character tabulation.
  • When the escaped character is n (U+006E), then the intended character is line feed (U+000A). So the string "Hello, world!\n" has a size of 14 and its last character is a line feed.
  • When the escaped character is r (U+0072), then the intended character is carriage return (U+000D). So the string "Hello, world!\r" has a size of 14 and its last character is a carriage return.
  • When the escaped character is left parenthesis ( (U+0028), then the compiler expects a sequence of hexadecimal numerals separated by commas , (U+002C) and finally a right parenthesis ) (U+0029). Each hexadecimal numeral must denote an integral value from 0 to 10FFFF16 (111411110); the case of the hexadecimal digits A through F does not matter. Each number represents the character associated with the Unicode code point of the same value. The construct as a whole then denotes a sequence of characters. So "\(66, 6E, 6F, 72, 64, 21)!" is the same string as "fnord!!" Because the Avail compiler is able to handle Unicode characters natively, there is scarcely need to use this facility outside of embedding invisible characters, such as control characters or special whitespace, into a string literal. In Avail, "はじめまして。「スミス・トッド」と申します。" is a perfectly reasonable string literal — at least for the author!
  • When the escaped character is line feed (U+000A), then the compiler elides the line feed altogether. This permits a string literal to span multiple lines without forcing its content to span multiple lines. Note that an occurrence of carriage return (U+000D) after the reverse solidus, or even the combination carriage return+line feed, is treated similarly; it is stripped from the resulting string.
    s ::= "Hello, \ world!\n"; Assert: s = "Hello, world!\n";
  • When the escaped character is vertical line | (U+007C), then the compiler discards all whitespace from the beginning of the same line up to the reverse solidus. This region is not permitted to contain characters other than whitespace, in fact. This feature permits nice formatting of multi-line string literals within source code, and is particularly powerful in conjunction with escaping of line feeds. Across several lines, the combination \| forms a visual wall that the string's content resides to the right of:
    /* Obviously some liberties have been taken here… */ If Abraham Lincoln is rallying the troops against vampires then [ gettysburgChangeOfAddress ::= "\ \|Four score and seven years ago our fathers brought forth on this \ \|continent a new nation, conceived in liberty, and dedicated to \ \|the proposition that all men are created equal. \| \|Now we are engaged in a great civil war, testing whether that \ \|nation, or any nation, so conceived and so dedicated, can long \ \|endure. We are met on a great battle-field of that war. We have \ \|come to dedicate a portion of that field, as a final resting \ \|place for those who here gave their lives that that nation might \ \|live. It is altogether fitting and proper that we should do this. \| \|But, in a larger sense, we can not dedicate, we can not \ \|consecrate, we can not hallow this ground. The brave men, living \ \|and dead, who struggled here, have consecrated it, far above our \ \|poor power to add or detract. The world will little note, nor \ \|long remember what we say here, but it can never forget what \ \|they did here. It is for us the living, rather, to be dedicated \ \|here to the unfinished work which they who fought here have thus \ \|far so nobly advanced. It is rather for us to be here dedicated \ \|to the great task remaining before us—that from these honored \ \|dead we take increased devotion to that cause for which they \ \|gave the last full measure of devotion—that we here highly \ \|resolve that these dead shall not have died in vain—that this \ \|nation, under God, shall have a new birth of freedom—and that \ \|government of the people, by the people, for the people, shall \ \|not perish from the earth. \| \|So let's destroy those friggin' vampires!\n"; Print: gettysburgChangeOfAddress; ];

The empty string is written as "". It is the same value as the empty tuple, <>.

tuple | Return to Type System | set