YML – Why a Markup Language?!

YML 2.7.6 of Thu 25 May 2023 – Copyleft 2007-2023, Volker BirkDownload YML 2

YML 2 is a smart template language and DSL concept. This guide will give you a feeling, for what it's worth.

Creating a Wiki like language for writing documentation

Let's say, we want to define a small Wiki system, which should be translated from a Wiki like language into HTML. This guide is written in one using YML 2. I call it YHTML. You can view the source code of what you're reading now. It's about writing web pages like that:

page "Hello, world" {
    p   >>
        Hello, world! I can link here, say:
        ¬http://en.wikipedia.org to Wikipedia¬
        >>

    p   >>
        This is ƒemphasized. And this is «code».
        >>
}

Prerequisite: knowing how HTML works.

How does that work?

YML 2 is a template language. That means, you can define recursive templates of what's to be generated. This is the code; just click on underlined things to get an explanation:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

decl pageContent alias body {
    a name="top";
    include heading.en.yhtml2;
    div id="entries"
        content;
};

decl page(*title, lang="en", xml:lang="en", xmlns="http://www.w3.org/1999/xhtml")
    alias html {
    head {
        title *title;
        meta http-equiv="Content-Type", content="text/html;charset=UTF-8";
        link rel="stylesheet", type="text/css", href="format.css";
    }

    pageContent
        content;
};

define operator "¬\s*(.*?)\s+(.*?)\s*¬" as a href="%1" > %2
define operator "«(.*?)»" as code > %1
define operator "ƒ(\S+)" as em > %1

Details, please!

Starting with XHTML headers

Because HTML headers are boring and annoying, I'm copying them from document to document. And at last, they ended here ;-) If you already have things in angle brackets, you can just add them to your YML 2 document “as is”, because everything which starts with an opening angle bracket will be a “give through” for the YML 2 toolchain. So our first two lines are:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Defining the document structure

A Webpage usually has a structure: it has a specific title and content. Beside that, technical things have to be encoded. A Webpage in XHTML is XML text, setting xmlns to the right name space. That's how we do that in YML 2:

decl page(*title, lang="en", xml:lang="en", xmlns="http://www.w3.org/1999/xhtml")
    alias html {

First we declare the page function. It's aliased to html, so it will generate a html tag, not a page tag.

The first parameter, *title, is a placeholder for the title of the document. The content of what we give here later will be repeated at any place we're putting *title into our template. This technique is called Pointers.

The two other attributes have Default Values, so they're generated each time the page function will be called.

The Document content

The document content is what is in the { ... } block:

{
    head {
        title *title;
        meta http-equiv="Content-Type", content="text/html;charset=UTF-8";
        link rel="stylesheet", type="text/css", href="format.css";
    }

    pageContent
        content;
};

This reflects, that each HTML document has a head and a body section. Of course, we insert the *title pointer value in the title tag. Then some meta data and a link to a nice CSS ;-)

For the body section, we have a little helper function, pageContent. The function named content is a placeholder, where the content of the page will be placed, when our page function will be called.

Generating the body with the pageContent function

The pageContent function is used for generating the body with standard elements; therefore, it's aliased to body:

decl pageContent alias body {
    a name="top";
    include heading.en.yhtml2;
    div id="entries"
        content;
};

It first sets an HTML anchor, so links can reference the top of the page:

a name="top";

Then a file with heading and navigation (the menu to the right on the page here) is being included:

include heading.en.yhtml2;

At last, the page content is being put in, surrounded by a div named entries, so it can be referenced later, too:

div id="entries"
    content;

If you'll have a look on the included heading.en.yhtml2 file, then you'll see the the static head and navigation sections hard coded. With the CSS file everything is brought to the right place.

Defining some operators for the Wiki like language

The trick with a Wiki like language is, that one can write plain text, and adding structural things to it, like links i.e.

So we need language constructs, which let us structure. In YML 2 these are called User defined in-text Operators:

define operator "¬\s*(.*?)\s+(.*?)\s*¬" as a href="%1" > %2
define operator "«(.*?)»" as code > %1
define operator "ƒ(\S+)" as em > %1

They look somewhat disturbing, if you're not familiar with Regex, so I will explain.

First we define a link:

define operator "¬\s*(.*?)\s+(.*?)\s*¬" as a href="%1" > %2

The keyword define operator starts the definition. Then there is the Regex:

"¬\s*(.*?)\s+(.*?)\s*¬"

I decided I want to have the special character ¬ surrounding each link like this: ¬http://en.wikipedia.org go to Wikipedia¬. This is just like what MediaWiki does with brackets; here the same would read: [http://en.wikipedia.org go to Wikipedia].

I like using such special characters. This is because I'm using a Mac and GNU/Linux. If you're using Windows, I really can recommend AutoHotkey. It's a great piece of software to expand the keyboard capabilities of Windows (and much more).

How does this Regex stuff work? It's a pattern matching language consuming characters with each command. Well, we want to have the following: The first thing between the ¬ markers shell be the link target URL. All other things shell be the name of the link shown. For that case, we're first consuming whitespace with \s* – the \s means “an arbitrary whitespace character” (like blank, newline, etc.). The asterisk * means “some of them or none”, so this consumes all whitespace which is there (and gives no error if there is none).

Second, we open a group with parentheses ( ) This first group we can later reference as %1 when substituting.

Inside this group, we're telling that we want anything in it, no matter what it is. For this case, we're using a dot . which means “any character”, followed by asterisk questionmark *?, which is the code for “consume as much as you can, but only up to the next code in the Regex”. The total (.*?) consumes the target URL (without checking it).

Then we're consuming some whitespace again, this time with \s+. Using a plus + instead of an asterisk * or asterisk questionmark *? means: there has to be at least one whitespace character. And we want whitespace between the URL and the name, right? ;-)

Now we're consuming the second group. We're consuming whatever is there – it's the name of the link. We're using another (.*?) group for it. It will be group 2, and we can reference it with this in the substitution: %2.

At last we're consuming redundant whitespace with \s*, and our Regex is closed by another ¬ character. And that makes the total Regex:

"¬\s*(.*?)\s+(.*?)\s*¬"

So what can we do with it? What we want are <a href="..." /> tags. And that means, we want to call a function like this: a href="..." > ...

As href we want to have the result of group 1, because this is the link target. After the Quote operator > we want to have what is the name of the link, that is the result of group 2. That we can write literally:

a href="%1" > %2

Our first User defined in-text Operator is ready :-)

Maybe you would prefer using brackets. So just do it ;-) Change the Regex to this, and you can use brackets for links like in MediaWiki; we have to escape the brackets [ ] with a backslash \, because brackets are also codes in Regex, and we don't want the code, we really want brackets:

"\[\s*(.*?)\s+(.*?)\s*\]"

The other two operators should now be easy to understand:

define operator "«(.*?)»" as code > %1
define operator "ƒ(\S+)" as em > %1

A tip: the code with an upper case letter S \S means, that only non-whitespace characters shell be consumed.

Using it

How to write a new web page with our templates? Here's a hello world. We can use Block Quotes for entering text, and our new self defined operators:

include homepage.en.yhtml2

page "Hello, world" {
    p   >>
        Hello, world! I can link here, say:
        ¬http://en.wikipedia.org to Wikipedia¬
        >>

    p   >>
        This is ƒemphasized. And this is «code».
        >>
}

The result you can see here:

<< back to Introduction ^Top^ >> The Features (source)