#+title: Stem #+author: Preston Pan #+description: My own concatenative programming language #+html_head: #+html_head: #+language: en #+OPTIONS: broken-links:t * Introduction Stem is an interpreted concatenative programming language, which is general purpose and features a foreign language interface (FLI), as well as metaprogramming capabilities. Here, I document the syntax a general guide of programming in the language, as well as some of the process of making it. I will also cover adding new functions and objects from the foreign language interface by writing C libraries. #+CAPTION: The stem programming language, logo designed by Andrei Sova #+ATTR_HTML: :height 400px [[file:img/stem.png]] If you don't know what any of that means, that is okay. I will go over the programming language as if this is your first programming language, as stem is one of the most simple programming languages that is feasible for practical use. For information on how to install stem on MacOS or Linux, see [[https://github.com/ret2pop/stem][the github page]]. Documentation on the C API is available on the [[https://stemdoc.nullring.xyz][Doxygen generated page]]. * Language Design In stem, all information is stored on what's called /the stack/, and there are things that you can put on the stack. There is also another type of thing you can do in stem, but that'll have to wait until later. For now, to simplify the explanation, we'll say that /everything that you can do in the programming language stores some information/, and where that information is stored is this thing called /the stack/. With that being said, we will have to define the /what you can do/ part and the /stack/ part in order for you to be able to program in this language. ** Things that can be Stored in the Stack We call things that can be stored on the stack /literals/. They can be in four different forms, of which two are immediately easy to understand: /strings/, or basically any english phrase or list of characters that you want to store, and /numbers/. Strings look like this: #+begin_src stem "this is a string!" "1234678876" "this too is a string" "asdfghjkl" #+end_src and numbers look like this: #+begin_src stem 50 3.1415 1000000 #+end_src The third type of literal is called a /quote/. You can imagine a quote as an ordered list of other literals: #+begin_src stem [ "hello" 50 3.14 [ "inside another quote" ] ] #+end_src between the '[' and the ']' character, you can see a list of four different literals. Because a quote is also another type of literal, quotes can store other quotes. The /fourth/ type of literal we will talk about later, as it is not /just/ a literal. ** The Stack Now it is time to talk about the stack. The stack is what stores the literals, of course as we know, but /how/ does it store the literals, and for what purpose? It stores the literals like a regular stack of objects, such as a stack of plates, would in real life. When something is put on the stack, it is on the /top/ of the stack. When another object is then put on the stack, /that/ object becomes the new top of the stack, and the previous object is under that object. This makes a natural ordering of what is considered /above/ something else on the stack, just like a stack of plates each with some information on each of them would in real life. Note that if you had a real life arrangement of these plates, you would be able to read the top piece of information on the stack but no others, until you took that plate off the stack. Then, another plate would be on the top of the stack, and you would be able to read that plate. This is very much like how stem works, but /how/ do you read information from the stack, when we've only described how to /put things/ on the stack? This is where we introduce the full language: a language of not just literals, but /words/ with meaning. ** Words /Words/ are the last type of thing that can be put on the stack, but they are special in that they can also /do things/. Thus far, none of the things we've talked about can actually add numbers, for example, only store them. /Words/ add meaning to the language, and make it not just a place to store data, but rather, /do things/ with the data. Here are some examples of some words: #+begin_src stem dsc myword myword123 hello_this_is_word IAMAWORDTOO #+end_src But most of these words will actually be put on the stack as well rather than do something. In order for them to do something rather than to be interpreted as data, we must /define/ them. Stem comes with a set of predefined words that you can combine in order to make new definitions which are defined in terms of a combination of the predefined words, just like in the english language. Next, we'll go over some predefined words. ** Predefined Words To follow along, I suggest after following the instructions on the [[https://github.com/ret2pop/stem][github page]] you go into the stem project folder, find the ~stemlib~ folder, go into it with ~cd stemlib~, and then run ~stem repl.stem~. Here you will encounter what is known as the /REPL/, or the read, eval, print loop. What it is called doesn't matter. Just know that it runs stem code interactively. A basic word that prints out the top thing on the stack and removes it is simply a period: #+begin_src stem :exports both # this is a comment, used to explain code but doesn't affect the outcome at all. # comments start with a '#'. "hello world\n" . #+end_src #+RESULTS: : hello world where the ~\n~ just signifies a newline character, basically just telling it to not print the "hello world" on the same line as the next thing printed. You can print the entire stack like so: #+begin_src stem :exports both 1 2 3 [ "some quote" ] "string!" ? #+end_src #+RESULTS: : 1 : 2 : 3 : Q: [ : some quote] : string! Which prints the entire stack, where the bottom-most thing is the top thing on the stack. There are also some basic math operations you can do: #+begin_src stem :exports both 3 4 + . 3 4 - . 3 4 * . 3.0 4 / . #+end_src #+RESULTS: : 7 : -1 : 12 : 0.750000 One can independently verify that these results are accurate. These basic math operations take /two/ things off of the stack, does the operation on those two numbers, and then puts the new value back on the stack, deleting the old values. Then, the period character prints the value and pops them off the stack. #+CAPTION: Demonstration of the stack effect of the plus word #+ATTR_HTML: :height 100px [[file:./img/stack.png]] There are predefined words for other mathematical operations too, all listed here: #+begin_src stem :exports both 0.0 sin . 0.0 cos . 1.0 exp . 2.5 floor . 2.5 ceil . 2.71828 ln . #+end_src #+RESULTS: : 0.000000 : 1.000000 : 2.718282 : 2.000000 : 3.000000 : 0.999999 These operations I will assume you are familiar with, and one can independently verify their (approximate) validity. There are also comparison and logical operations: #+begin_src stem :exports both "hi" "hi" = . 4 3 = . 3 4 < . 3 4 > . 3 4 <= . 3 4 >= . 1 1 and . 1 0 and . 0 1 or . 0 0 or . #+end_src #+RESULTS: #+begin_example 1 0 1 0 1 0 1 0 1 0 #+end_example Which compare the first number to the second number with a certain operation like "greater than or equals to". The result is a zero or one, indicating that the statement is either /true/ or /false/, with 1 being true. With these statements, you can make decisions: #+begin_src stem :exports both 3 4 < [ "3 < 4" . ] [ "3 >= 4" . ] if #+end_src #+RESULTS: : 3 < 4 where the word ~if~ just checks if the third thing from the top of the stack (the first thing you write) is a zero or a one, and if it is, then execute whatever's inside the first quote, otherwise execute the second quote. Note that this wording is a little bit confusing because the /first thing you write/ is also the /last thing on the stack/ because adding new things to the stack puts the first thing /below/ the second. #+caption: Stack effect of if word [[file:./img/stack2.png]] Now, also observe that inside the quotes we are storing valid code. This will become important later on as we introduce the concept of /metaprogramming/. First, though, we have to introduce a couple more important predefined words. #+begin_src stem :exports both [ "hello world!\n" . ] eval 3 quote . [ 1 2 ] [ 3 4 ] compose . 1 [ 2 3 ] curry . #+end_src #+RESULTS: #+begin_example hello world! Q: [ 3 ] Q: [ 1 2 3 4 ] Q: [ 1 2 3 ] #+end_example ~eval~ evaluates the top of the stack as if it were a piece of code; ~quote~ puts the top of the stack in a quote and then pushes it back to the top of the stack; ~compose~ combines two quotes into one; and ~curry~ puts a value in the front of the quote. Note that some of these operations work for strings as well: #+begin_src stem :exports both "hello " "world\n" compose . #+end_src #+RESULTS: : hello world And some other words that we use to operate on quotes and strings are here: #+begin_src stem :exports both [ 1 2 3 4 ] 1 cut . . 0 [ 5 6 7 8 ] vat . "hello\nworld\n" 6 cut . . 1 "asdfghjkl;" vat . #+end_src #+RESULTS: #+begin_example Q: [ 3 4 ] Q: [ 1 2 ] 5 world hello s #+end_example ~cut~ cuts a string or quote into two, where the number in front tells ~cut~ /where/ to cut. Note that normally in programming numbering starts at 0, so 1 is actually the /second/ element of the quote. ~vat~ gets the nth element, where n is the /first/ value passed into ~vat~. It also returns the quote or string on the stack back after, with the value at that index on top. There are two more words that we have to define: #+begin_src stem :exports both 1 2 swap . . 1 2 . . "hello\n" dup . . 1 2 5 [ + ] dip . . #+end_src #+RESULTS: : 1 : 2 : 2 : 1 : hello : hello : 5 : 3 ~swap~ just swaps the top two numbers on the stack, ~dup~ just duplicates the top of the stack, and ~dip~ is just ~eval~ except it does the operation one layer below. In this example, it adds 1 and 2 instead of 2 and 5, thus you see a 5 and a 3 printed instead. Note that there are more words, but we won't need them for now. Now, we are ready to investigate how to define words in terms of other words, or so-called /compound words/. ** Compound Words Compound words, or words made up of other words (and literals), are created with yet /another/ word, ~def~. ~def~ takes an undefined word (all undefined words are just put on the stack) and a quote, and then from there on the word in question is defined as that quote, where whenever stem sees that word in the future, it immediately ~eval~'s that quote. ~undef~ undefines a word, which is self explanatory. #+begin_src stem :exports both hello [ "hello world\n" . ] def hello \hello undef hello . #+end_src #+RESULTS: : hello world : W: hello In order to put words on the stack instead of calling them, just escape them: #+begin_src stem :exports both \def . #+end_src #+RESULTS: : W: def Now, so far, we have discussed making decisions with ~if~, doing various operations and evaluating quotes in a multitude of ways. What we /haven't/ covered is executing the same code some amount of times, or ~looping~. In this language, all looping is done by defining words that call themselves, or what's called /recursion/. ** Recursion We can loop in stem by defining a word that calls itself: #+begin_src stem :exports both loop-forever [ "hello world\n" . loop-forever ] def #+end_src Now, we /don't actually/ want to run this because it will just keep on printing hello world forever, without stopping, and we might want to constrain how much it loops. We can do this by only looping under some condition: #+begin_src stem :exports both loop-some [ dup 0 <= [ ] [ dup . 1 - loop-some ] if ] def 4 loop-some #+end_src #+RESULTS: : 4 : 3 : 2 : 1 and we can see that it actually loops. You can modify the code to do more complex looping, and in the standard library (the ~stemlib~ folder), there is a ~loop~ function that loops any code any amount of times, written by Matthew Hinton. * Metaprogramming So what is this talk of metaprogramming? To put it simply, metaprogramming is a method by which one can autonomously build code and then evaluate it, thus allowing oneself to talk about code, or make decisions to make different code based on some inputs, before running the code. So how might we use metaprogramming? In the standard library, we define a couple of words ~dupd~, ~dupt~: #+begin_src stem :exports both dupd [ [ dup ] dip ] def dupt [ [ [ dup ] dip ] dip ] def 3 2 dupd ? #+end_src #+RESULTS: : 3 : 3 : 2 which duplicates the second and third value on the stack respectively. However, we might want to define ~dupn~ for any n, which takes in an integer and computes ~dup~ ~n~ values down. We can do that with metaprogramming, or less abstractly, we can do it by repeatedly putting quotes inside quotes, and then we can ~eval~ the resultant quote. Here is the code that programs ~dipn~ in its entirety, without any sugarcoating: #+begin_src stem :exports both # dsc simply discards the top object on the stack dsc2 [ dsc dsc ] def dupd [ [ dup ] dip ] def over [ dupd swap ] def dup2 [ over over ] def dip2 [ swap [ dip ] dip ] def loop [ dup2 [ swap [ ] if ] dip2 dup [ 1 - loop ] [ dsc2 ] if ] def dipn [ [ [ dip ] curry ] swap loop eval ] def dupn [ [ dup ] swap dipn ] def # this is the code that does stuff 1 2 3 4 5 6 7 8 3 dupn ? #+end_src #+RESULTS: : 1 : 2 : 3 : 4 : 5 : 5 : 6 : 7 : 8 As you can see, in the early days of programming in this language, you must use quite a lot of words in order to talk about even basic concepts. As the language evolves, however, it becomes ever more easy to "talk" about abstract subjects in it. What this piece of code does is it adds ~dip~ to the right of the previous quote, nesting quotes like russian dolls over and over again until it becomes suitable to call ~eval~. Thus, we have built up a piece of code in the language and then automatically executed it! Note that because ~def~ is also a word, you can automatically define words as well, which is a powerful concept. * C FLI The C "FLI" (I don't know what it is really) is a system by which custom objects and functions can be added into the language. A library which implements the bindings for this language have to implement the following functions: #+begin_src C void add_funcs(); void add_objs(); #+end_src and has to include the ~stem.h~ file as a library. The C file has a couple of global variables that it must recognize with extern: #+begin_src C extern array_t *STACK; extern array_t *EVAL_STACK; extern ht_t *WORD_TABLE; extern parser_t *PARSER; extern ht_t *FLIT; extern ht_t *OBJ_TABLE; #+end_src not all of these have to be used but it is a complete list of global variables. You may use ~add_func~ and ~add_obj~ (defined in stem.h) to add custom functions and custom objects. As long as an object has a specified way to free, deep copy, and create itself, it can be used in a memory safe manner. The builtin word ~clib~ loads a library dynamically and calls both ~add_func~ and ~add_obj~, adding all the functions and objects you've defined to the runtime environment. Note that when making custom functions, they must be in the form ~void my_func(void *value)~, where ~value~ contains the word that is used to call the function. When you are implementing functions, make sure to keep only one copy of a pointer on the stack at all times, otherwise it is prone to segfaulting. Also, for examples on how builtins may be implemented, see ~builtins.c~. Again, you can view the API at the [[https://stemdoc.nullring.xyz][webpage generated by doxygen]].