HackerFoo Writes Stuff Here

Startup Memory Allocation in PoprC

Dustin DeWeese — Sat, 15 Aug 2020 00:00:00 UT

Arbitrary limits are bad

I’ve written PoprC in an embedded style of C, where malloc() is avoided. This has a lot of benefits: no dynamic allocation overhead, repeatable addresses which can be used as identifiers, and the ability to use watchpoints when debugging, as well as being able to easily iterate over arrays.

It does have one large drawback, though: because memory is statically allocated, the sizes cannot change. I generally try to set the limits around 2x a reasonable amount, but while this works okay for development, I know it’s not acceptable for end use.

Startup “static” allocation

To address this, I implemented an easy way to perform “static” allocation on startup (in static_alloc.c. The idea is to perform all allocations with arbitrary limits on startup, which allows the opportunity to adjust the sizes based on flags or a configuration file.

Doing this manually would be annoying, so I used the code generation system that I’ve used elsewhere to collect macros of the form STATIC_ALLOC(name, type, default_size). Then, I would translate allocations of the form type name[size] into that macro throughout PoprC’s source. In addition, there are extended forms of the macro to specify alignment (STATIC_ALLOC_ALIGNED) and sizes that are dependent on another (STATIC_ALLOC_DEPENDENT.)

There are a few drawbacks to this system:

Sizes can’t depend on things not visible in static_alloc.c
Types must also be visible in that file.
Static allocations are in a single namespace, so collisions might be a problem.
Addresses are no longer static, so some things must be set up on initialization (e.g. in *_init() procedures.)
LLDB seems to have more difficulty with malloc’ed pointers for some reason.

All of these limitations were fairly easy to work with.

A nice added feature for debugging is the compiler can identify pointers in the allocated region, which is most of the pointers I’m interested in during debugging. I can type pp (some pointer) in LLDB now to display a description of the form variable[offset] (see print_static_alloc().)

Future work

Some allocations aren’t needed all of the time. I would like to implement a way to perform temporary allocations in an area allocated on startup (using static_alloc.c) which is large enough to hold the maximum temporary allocation. This would be like a statically allocated union where the fields can be declared anywhere.

I have yet to implement reading allocation sizes from a configuration file, so PoprC is just using the default sizes. After implementing this, it would be good to have assertion messages that indicate which size needs to be adjusted.

Error Highlighting in PoprC

Dustin DeWeese — Tue, 21 Jul 2020 00:00:00 UT

What is an error anyway…

I have refrained from implementing error messages for PoprC until now.

There are several reasons for this:

Failure is okay, and expected.

Other than trivial syntax errors, there is only one possible error in a Popr program - program non-determinism. Determinism is the property of having a unique successful execution path for any input, so a non-deterministic program has either no successful path, or multiple successful paths, for some input. While the program must be deterministic, it’s okay for any part to fail (e.g. division when the divisor is zero) as long as it is handled by the consumer of that computation. Every branch must fail except one, so failure is common and expected.
```
1 0 /     __ Division by zero
"one" 2 + __ Addition operates on integers
3 False ! __ Asserting False
```
These all fail, but appending True | to any one of these will result in True.

This means detecting true errors requires the entire program context, which is not always available. Automatically proving determinism is not implemented yet, and is not solvable in general.
Reduction is lossy, which can make locating errors in the source imprecise.

Location information is discarded during reduction, and yet the result might fail after reduction. The reduction graph is more expressive than the source language, so a failure might not correspond to a precise location in the original source code.

A simple example is that in 1 2 swap 3 + odd !, 2 is not involved in the error, but after reducing the addition, the location of the result must include the constants 1 and 3 from which the result (4) is derived.
Source locations take up space.

Popr uses a very compact internal representation (IR) graph. For a long time, I’ve been able to fit everything I’ve needed for the IR into 64 byte nodes, but there’s no space for another byte, much less another pointer.

Despite this, the previous behavior of omitting output for all errors is not an acceptable user experience, especially for new users. Just printing “Error!” isn’t acceptable, either. So we need a useful error message, if not a perfectly accurate one. It should be useful for catching simple errors without looking through the logs.

So I made some concessions:

While some failures are expected, the programmer rarely intends an expression that will always fail (such as 1 0 /.)
Imprecise locations are better than nothing.
Error reporting is worth increasing the IR node size, even though this space is “wasted” on a successful compilation.
If this ever becomes a concern, I can use a compile time flag to disable it.

How it works

All nodes are annotated with source locations (seg_t) at parse time. This requires an extra 16 bytes (+25%). Oh well.

union cell {
  uintptr_t c[10]; // <--- UP FROM 8
  struct {
    union {
      cell_t *alt;
      const char *word_name; /* entry */
    };
    union {
      cell_t *tmp;
      val_t tmp_val;
      const char *module_name; /* entry */
      char_class_t char_class; /* tok_list */
    };
    enum op op;
    seg_t src; // <--- SOURCE LOCATION
    union {
      uint8_t pos; /* see below */
      uint8_t arg_index; /* arg index (for dep vars) */
      uint8_t var_index; /* final index for vars in trace */
      priority_t priority; /* used in func_list() & delay_branch() */
    };
    refcount_t n;
    csize_t size;
    union {
      expr_t expr;
      value_t value;
      tok_list_t tok_list;
      entry_t entry;
      mem_t mem;
    };
  }
};

When a node is reduced, the source range in the context is expanded to include that node. If a node fails, the context is limited to just that node. This benefits from logic that prioritizes early failures, so that code is often not highlighted if it doesn’t contribute to the failure.

The final source location from the context is propagated to the result node.

// Reduce then split c->arg[n]
response reduce_arg(cell_t *c,
                csize_t n,
                context_t *ctx) {
  cell_t **ap = &c->expr.arg[n];
  response r = reduce(ap, ctx);
  if(r <= DELAY) {
    ctx->up->alt_set |= ctx->alt_set;
    ctx->up->text = seg_range(ctx->up->text, ctx->text); // <--- PROPAGATE SOURCE LOCATION ON REDUCTION
    split_arg(c, n, dup_alt);
  } else if(r == FAIL) {
    ctx->up->text = ctx->text; // <--- BLAME ONLY THIS REDUCTION ON FAILURE
  }
  return r;
}

On each failure, the location from the context is logged to a buffer.

// Reduce *cp with type t
response reduce(cell_t **cp, context_t *ctx) {
  cell_t *c = *cp;
  const char *module_name, *word_name;
  get_name(c, &module_name, &word_name); // debug
  assert_error(ctx->depth < MAX_CALL_DEPTH, "stack too deep %C", c);

  while(c) {
    assert_error(is_closure(c));
    c = *cp = fill_incomplete(c);
    stats.reduce_cnt++;
    ctx->text = c->src;
    op op = c->op;
    response r = op_call(op, cp, ctx);

    // prevent infinite loops when debugging
    assert_counter(LENGTH(cells));

    if(!*cp) {
      LOG(MARK("FAIL") ": %O %C (%s.%s) %L @abort",
          op, c, module_name, word_name, ctx->loc.raw);
      log_fail(ctx); // <--- LOCATION LOGGED HERE
    }
    c = *cp;
    if(r <= DELAY || (r == RETRY && ctx->retry)) {
      ctx->retry = false;
      return r;
    }
  }

  *cp = &fail_cell;
  return FAIL;
}

If reduction fails entirely, meaning that the expression will always fail, all locations from the failure log are flattened into a set of non-overlapping locations.

size_t get_flattened_error_ranges(seg_t src, pair_t *res) {
  if(!src.s || !src.n) return 0;

  uintptr_t
    l[fail_location_n],
    r[fail_location_n];

  // load offsets into l and r
  COUNTUP(i, fail_location_n) {
    l[i] = clamp(0, (intptr_t)src.n, fail_location[i].s - src.s);
    r[i] = clamp(0, src.n, l[i] + fail_location[i].n);
  }

  return flatten_ranges(l, r, res, fail_location_n);
}

void highlight_errors(seg_t src) {
  pair_t res[fail_location_n];
  size_t n = get_flattened_error_ranges(src, res);
  uintptr_t last = 0;
  COUNTUP(i, n) {
    printf("%.*s", (int)(res[i].first - last), src.s + last);
    printf(UNDERLINE_START);
    print_seg_escape((seg_t) { .s = src.s + res[i].first, .n = res[i].second - res[i].first });
    printf(UNDERLINE_END);
    last = res[i].second;
  }
  printf("%.*s\n", (int)(src.n - last), src.s + last);
}

Ranges are flattened by sorting the start and end points (l & r), incrementing/decrementing depth, and recording the transitions from and to depth == 0. Each transition to a non-zero depth is the start of a flattened range, and each transition back to zero is the end. These will always alternate, so they can be stored as pair_ts.

size_t flatten_ranges(uintptr_t *l, uintptr_t *r, pair_t *res, size_t n) {
  quicksort(l, WIDTH(l), n);
  quicksort(r, WIDTH(r), n);
  int depth = 0;
  size_t out_n = 0;
  uintptr_t *l_end = l + n;
  LOOP(n * 2) {
    if(l >= l_end || *l > *r) {
      depth--;
      if(!depth) {
        res->second = *r;
        res++;
        out_n++;
      }
      r++;
    } else if (*l < *r) {
      if(!depth) {
        res->first = *l;
      }
      depth++;
      l++;
    } else { // *l == *r
      l++;
      r++;
    }
  }
  return out_n;
}

The original source is printed to the screen, with the failure locations underlined.

For example:

1 2 odd ! 3 +

Because 2 odd is True, causing ! to fail.

Or:

1 2 + A *

Because A is a symbol, and * operates on integers.

There’s still more work to do

This is just the first small step in error reporting. There is much left to do:

While I find source location is usually all I need to spot an error (what is wrong), it would also be useful to add an explanation why it is wrong.
Locations are only reported in the expression be compiled, but it might be helpful to see failures in source expanded from a function call.
The programmer probably doesn’t expect a branch to fail for all inputs, so an option would be useful to report this as an error.

Try it!

Try it out and let me know what you think. For example, try this:

1 3 4 - 2 2 - * /

Note that only the responsible code is underlined in this case.

Popr Tutorial - Dot Machines

Dustin DeWeese — Sat, 31 Mar 2018 00:00:00 UT

Popr is a new programming language (compiler, try it) that works unlike any other programming language that I know of, so a concise description of the language is difficult.

Rather than describe the semantics in relation to other languages, or listing the formal evaluation rules, I will present a graphical notation that, while impractical for larger programs, shows how Popr programs work in an intuitive way.

In this notation, we will build machines that consume and produce dots.

Before we proceed further, here are the rules for these diagrams:

Rules

Arrows point to dots a machine needs to consume, not where to put the dots that the machine produces.
Arrows enter and leave upward and to the left.
Boxes are dots that hold machines.
Arrows cannot freely cross each other or pass over any component, so components must consume from top to bottom.
If a component consumes more dots than available, a new dot is add to the bottom left.
Each component must have an arrow pointing to each dot it produces.

A component is activated by pulling any of the dots that the component produces. Before the component can run, it must pull in a dot for each arrow. This may in turn activate other components, propagating from right to left.

Arrows pull in dots

I will introduce components as needed.

Operators: `not`

not

not is a component that produces a True dot when given a False dot, and vice versa.

True not
  False
False not
  True

True and False are dot names. Dot names start with an uppercase letter.

not is an example of a component that consumes one dot and produces a related dot. There are many similar components, which I will call “operators”, such as arithmetic operators and comparison operators.

pullN: boxes, `popr`, and `swap`

Components and dots can be placed in a box, for example, [A B] is a box containing A and B.

Any machine, or even a partial machine, such as [not], can be placed in a box, and boxes themselves can be consumed and produced like any other dot.

Boxed machines are drawn as a machine surrounded by a rectangle (box.)

popr

popr is a component that pulls one dot from within a box.

: [A B] popr
  [ A ] B

Because every box contains a machine, the machine within is activated when pulling from the box:

: [False not]
  [ False not ]
: [False not] popr
  [] True

swap

swap is a component that crosses the top two arrows, so that the top dot and the one beneath it are exchanged.

: A B swap
  B A

pull: popr swap

pull is a machine that combines popr and swap so that the box is above the dot that was pulled out.

: [A B] pull
  B [ A ]

pull2: pull pull

pull3: pull2 pull

pull is useful because they can be chained together, (the dotted area shows the machines defined above), to create pull2 and pull3, which pull 2 and 3 dots, respectively, out of a box.

: [A B C] pull2
  C B [ A ]
: [A B swap C] pull2
  C A [ B ]
: [A B C D] pull3
  D C B [ A ]

Some machines can be extended in a straightforward way similar to pull, to form families of machines. These families are named by replacing each number in the name with an uppercase letter, such as pullN.

swap2: `drop`, `pushl`, and `pushr`

drop

drop is a component that cuts off the top arrow, allowing the other arrow to pass underneath.

When an arrow is cut off, the dot to which it pointed can no longer be pulled, which may prevent activating a machine, and the machines that that machine might have activated, and so on.

Exercise: Write the machine head, which removes a single dot from a box.

: :def head: ...
: [A] head
  A

Tip: Use the online evaluator to test your solution.

pushl

pushl is a component that connects a box to an outside dot, which can be seen as “pushing” the dot into the left side of the box.

: A [B] pushl
  [ A B ]

This is a little misleading, though, because machines only pull. pushl does not activate the machine inside the box; it just connects an arrow from the machine inside the box to a dot outside.

: False [not] pushl
  [ False not ]
: False not [] pushl
  [ False not ]

The result may be surprising, but this diagram explains the result:

Box of potential truth

In the second example, False is pulled into the box with not. In both cases [False not] represents a machine that has not been activated.

popr can be used to activate the machine:

: False [not] pushl popr
  [] True

pushr

pushr is a component that pushes a dot into the right side of a box, which can be retrieved with popr.

: [A] B pushr
  [ A B ]
: [A] B pushr popr
  [ A ] B
: [] False not pushr
  [ False not ]

Just like pushl, it does not activate any components or the machine inside the box.

Note: The dot is on top of pushr because it is an abbreviation of a more verbose diagram where the dot is in the middle; see compose.

swap2: [] swap pushr swap pushr pushl pull3 drop

Now, we can build something a little more interesting using these new components. swapN is another family of machines, starting with swap, which rotates two dots.

swap2 rotates three dots, as shown by the labels in the diagram, bringing the bottom dot to the top.

: A B C swap2 
  B C A

Exercise: Write a machine to swap two dots within a box.

: :def swab: ...
: [A B] swab 
  [ B A ]

dip11: `apNM`

pushl and popr use a box to consume and produce (respectively) one dot, but it can be tedious to do this one dot at a time, so there is a family of components, apMN, which consume M dots and produce N dots.

: A B [C] ap21
  [ A B ] C

apMN is equivalent to M pushls followed by N poprs, e.g. ap21 is equivalent to: pushl pushl popr Therefore, pushl is ap10, and popr is ap01.

dip11: swap pushr ap12 swap2 drop

dip11 runs a dot through a given box, like ap11, but without affecting the dot on top.

The dot labeled f is a box containing a machine that consumes a dot and produces another. dip11 uses pushr to stash the dot on top within the given box, and then uses ap12 to run the machine and pull the stashed dot back out along with the produced dot. There is no need to produce the remaining box, so it is dropped (not pulled.)

: False A [not] dip11
  True A
: False A [] dip11
  False A

ifte: `!` (assert), `|` (alt), and `.` (compose)

assert

! (assert) either produces the dot from the lower arrow if the upper arrow consumes a dot called True, otherwise, it produces a broken dot, which breaks everything that consumes it. Broken machines don’t produce anything.

: A True !
  A
: A 42 !

Note: Because a broken machine doesn’t produce anything, nothing is printed in this case.

alt

| (alt) represents two versions of the machine to try: one version using the top arrow, and another version using the bottom arrow.

: A B |
  A
  B
: A B False ! |
  A

Note: The results from both versions of the machine are printed on different lines.

compose

. (compose) puts two boxes together, merging the two machines into one box by connecting them together. This component does not activate the machine within the box.

Note: pushr is equivalent to: [] pushl .

: [A] [B] .
  [ A B ]
: [False] [not] . popr
  [] True
: [A B] [swap] .
  [ A B swap ]

Notice that the last example did not print: [ B A ]

ifte: [] ap20 swap pushr [not !] [swap drop !] | . head

ifte uses the components discussed to select between two dots, depending on the name of the first dot.

The first part of ifte, [] ap20 swap pushr, arranges the three dots in a box, as shown in the fluffy bubble.

There are two versions of the machine, indicated by the circle (alt) pointing to the boxes F and G. The boxed dots and F (for one version, G for the other) are put together into one box, using compose. The result is pulled out of the box, using the component head, defined in the earlier exercise.

In both F and G, there is an assertion ! connected to A. In F, it is negated with not, and C is passed through. In G, A is passed directly to the assertion, and B is passed through, while C is dropped (not needed.)

So, we have two versions of this machine:

Works if A is True, using G to extract B.
Works if A is False, using F to extract C.

Now that we have two machines, we can just try both, and keep the results from the one that doesn’t break. This works as intended, producing B or C depending on if A is True.

: True A B ifte
  A
: False A B ifte
  B

Exercise: Use ifte to write a machine that inverts the first dot if the second is True.

: :def bxor: ...
: False False bxor
  False
: False True bxor
  True
: True False bxor
  False
: True True bxor 
  True

dup

One of the remaining components is dup, which produces two copies.

: A dup
  A A

Exercise: Write the following machine.

: :def abba: ...
: A B abba
  A B B A
: B A abba
  B A A B

So Then

While this notation isn’t ideal for large programs, both the diagrams and Popr programs can be broken into smaller pieces (as seen in ifte). This notation can be useful to introduce the semantics of Popr programs, and can be used to analyze any behavior that is confusing or surprising.

PoprC Runtime Operation Part 4: Alternatives

Dustin DeWeese — Tue, 19 Nov 2013 00:00:00 UT

Every programming language supports some form of branching. A simple form of branching is the C if statement, which evaluates a boolean expression, and then selects a branch based on the value of that expression. This works well if the branching condition can be calculated before choosing a branch, but sometimes a branch can’t be selected until one of the branches has been partially computed. If this is the case, the programmer has to write several nested if statements, one for each decision point, and divide the calculation into incremental fragments, often repeating parts of the same calculation in different branches. Another example of this type of branching is common in error handling, where errors are checked in several places. If an error occurs, the program must jump to an alternate branch which handles the error, often by aborting the calculation and returning an error to the caller, which will continue to propagate upwards until it can be handled reasonably. This is why C++ explicitly supports exceptions, although it is really just a type of branching.

Many pipelined processors handle branching slightly different from the semantics of the if statement. Instead of stalling while waiting for the condition to be fully evaluated, the processors has a branch prediction unit, which just makes a guess and follows a branch. The processor backtracks if it chose the wrong branch. The programming language Prolog takes this even further; every code path is executed exhaustively until the computation is successful. Whenever a branch fails (is determined to be invalid), the program backtracks to the last branch point with an unfollowed branch, and then continues along that branch. This is repeated until the program finds a successful path to a result.

The Popr language is designed with Prolog style backtracking as the only form of branching, because this subsumes other types of conditional statements. Furthermore, since Popr is partially evaluated at compile time, backtracking implements an advanced type system, because the compiler will just try all versions of each function until it can form a program that is well typed.

Branches in Popr are called alternatives, and are created with the alternative function (|). Two fields are present in cell_t to support alternatives. The alt field points to the next alternative path in the graph. The alt_set field is used to determine if two paths are compatible, by making sure that values can’t take conflicting paths. This prevents results such as 2 3 | dup + evaluating to 5; 4 and 6 are the only valid results.

Alternatives are expanded during reduction by a process called splitting; when a function has N arguments having alternatives, 2^N copies are created from all the combinations generated by replacing an argument with its alternative. If any argument has more than one alternative, they will continue to be expanded further when they are reduced later.

cell_t *dup_alt(cell_t *c, unsigned int n, cell_t *b) {
  unsigned int i = 0, in = closure_in(c), out = 0;
  assert(n < in);
  cell_t *a = copy(c);

  // ref args
  for(; i < in; ++i) {
    if(i != n) ref(a->arg[i]);
  }

  // update deps
  for(; i < c->size; ++i) {
    if(a->arg[i]) a->arg[i] = dep(a);
    c->arg[i]->alt =
	  conc_alt(a->arg[i], c->arg[i]->alt);
    ++out;
  }

  a->arg[n] = b;
  a->n = out;
  c->alt = a;
  return a;
}

dup_alt(c, n, b) copies c, with c->arg[n] replaced with b, and assigns c->alt to b; this is the basic operation of creating a new alternative.

void split_arg(cell_t *c, unsigned int n) {
  cell_t
    *a = c->arg[n],
    *p = c,
    **pa;
  if(!a || !a->alt || is_marked(a, 1)) return;
  do {
    pa = &p->arg[n];
    if(*pa == a) {
      // insert a copy with the alt arg
      p = dup_alt(p, n, ref((*pa)->alt))->alt;
      // mark the arg
      *pa = mark_ptr(*pa, 1);
    } else p = p->alt;
  } while(p);
}

split_arg(c, n) splits c on the nth argument. Any alternative of c having the same argument in the same position is split. When a closure is split on an argument n, the new closure has the argument c->arg[n]->alt, but the old one needs a new argument that no longer has an alternative, so that c->arg[n]->alt isn’t split and followed more than once. Rather than make a copy without an alternative, the low bit of the pointer c->arg[n] is set, marking that the alt field should be ignored, as if c->arg[n]->alt was set to 0.

cell_t *closure_split(cell_t *c, unsigned int s) {
  int i;
  for(i = 0; i < s; ++i) {
    split_arg(c, i);
  }
  for(i = 0; i < s; ++i) {
    c->arg[i] = clear_ptr(c->arg[i], 1);
  }
  return c->alt;
}

closure_split(c, s) splits c from c->arg[0] to c->arg[s-1], and then clears the flags set on the arguments of c (so they are ready for reduction), and returns c->alt. This function must generally be called after c->arg[0] to c->arg[s-1] have been reduced for alternatives to work correctly.

An alt_set_t is a bit field indicating which alternatives have been followed to reach a value. Functions for manipulating alt_sets are prefixed with as.

alt_set_t as(unsigned int k, unsigned int v) {
  assert(k < AS_SIZE);
  return ((alt_set_t)1 << (k + AS_SIZE)) |
    (((alt_set_t)v & 1) << k);
}

Each point where an alternative is chosen (| is reduced) has an associated unique identifier. as(k, v) creates an alt_set_t that indicates which alternative was followed at point k. If v is 0, the first alternative was followed; if v is 1, the second alternative was followed.

alt_set_t as_conflict(alt_set_t a, alt_set_t b) {
  return ((a & b) >> AS_SIZE) &
    ((a ^ b) & (((alt_set_t)1<1));
}

as_conflict(a, b) determines if there are any conflicts between a and b i.e. both alternatives where followed at some point. If two alt_sets do not conflict, they can be combined with bitwise or.

bool entangle(alt_set_t *as, cell_t *c) {
  return !as_conflict(*as, c->alt_set) &&
    (*as |= c->alt_set, true);
}

entangle(&as, c) returns false if c->alt_set conflicts with as, otherwise they are combined and stored in as, and the function returns true. This should be called on all reduced arguments of a function to determine the alt_set of the resulting value. If there are conflicts, the function must fail.

bool reduce_arg(cell_t *c,
		unsigned int n,
		alt_set_t *as,
		type_t t) {
  bool r = reduce(&c->arg[n], t);
  split_arg(c, n);
  return r && entangle(as, clear_ptr(c->arg[n], 1));
}

reduce_arg(c, n, &as, t) combines the reduction of an argument with splitting and entanglement. It reduces c->arg[n] with expected type t and entangling as.

Alternatives in Popr are very powerful, allowing logic programming and a powerful type system. The implementation is complex, but partial evaluation can remove alternatives in many cases.

PoprC Runtime Operation Part 3: Quotes

Dustin DeWeese — Sun, 10 Nov 2013 00:00:00 UT

A quote in the PoprC runtime is a fragment of possibly unevaluated code. You can think of a quote as holding a section of code, which can be further assembled and executed. Arguments can be appended to the left (pushl), and results can be removed from the right (popr). Quotes can also be composed using (.). Quote literals are indicated by a section of code surrounded by square brackets ([ ... ]). Quotes can be nested without limit. Quotes can be used as auxiliary stacks.

Quotes cannot be spliced up a level (‘removing the square brackets’.) This would result in all higher order functions having variable arity. Furthermore, the higher order functions’ behavior would be tied to their internal operation, because any function executed within another could consume and modify its internal data, making abstract interpretation and even compile time parsing of library functions difficult. So, the only way to manipulate and access the contents from outside are through pushl and popr.

This is important, because it is what allows PoprC to entirely remove all local (intermediate) quote operations (pushl, popr, and .) from the resulting compiled code. Furthermore, inlining can make more quote operations local, at the expense of code size. This is also interesting because, although most implementations of concatenative languages rely on one or more stacks to store arguments to functions, PoprC aggressively eliminates the only non-primitive datatype (which can be used as a stack), and most function arguments are assigned at compile time.

Figure 1: [1 + 2 +]

Quotes are stored as lists, which are vectors of pointers to closures. Pointers to incomplete closures are always on the left of the list (c->ptr[list_size(c) - 1]). An incomplete closure can contain another incomplete closure (but only one), which forms a chain terminating at the innermost incomplete closure (node 4 in Figure 1, notice it only has one argument). To the right are complete closures, ready to be reduced (except for deps, which although marked as complete, can point to incomplete closures.)

cell_t *empty_list() {
  cell_t *c = closure_alloc(1);
  c->func = func_reduced;
  c->type = T_LIST;
  return c;
}

cell_t *quote(cell_t *x) {
  cell_t *c = closure_alloc(2);
  c->func = func_reduced;
  c->type = T_LIST;
  c->ptr[0] = x;
  return c;
}

It’s easy to make a list; just allocate the cells, set c->func and c->type, and assign the items of the list to c->ptr. Above is code to create an empty list (empty_list()), and a list with just one item (quote(c)).

cell_t *expand(cell_t *c, unsigned int s) {
  if(!c) return 0;
  int n = closure_args(c);
  int cn_p = calculate_cells(n);
  int cn = calculate_cells(n + s);
  if(!c->n && cn == cn_p) {
     c->size += s;
    return c;
  } else {
    /* copy */
    cell_t *new = closure_alloc(n + s);
    memcpy(new, c, cn_p * sizeof(cell_t));
    if(is_placeholder(c)) trace(new, c, tt_copy);
    new->n = 0;
    traverse_ref(new, ARGS_IN | PTRS | ALT);
    new->size = n + s;
    if(is_reduced(c)) alt_set_ref(c->alt_set);
    drop(c);
    return new;
  }
}

When a larger list is needed to insert more items (such as in compose() and pushl_nd()), expand(c, n) is used to expand c to allow n more items, returning the expanded list.

cell_t *pushl_nd(cell_t *a, cell_t *b) {
  assert(is_closure(a) &&
	 is_closure(b) && is_list(b));

  int n = list_size(b);
  if(n) {
    cell_t *l = b->ptr[n-1];
    if(!closure_is_ready(l)) {
      cell_t *_b = arg_nd(l, a, b);
      if(is_placeholder(l)) trace(_b->ptr[n-1], l, tt_copy);
      return _b;
    }
  }

  cell_t *e = expand(b, 1);
  e->ptr[n] = a;
  return e;
}

pushl_nd(a, b) tries to fill in a as an argument for the leftmost item in the list using arg_nd, otherwise the list is expanded and a is inserted in the leftmost position.

cell_t *compose_nd(cell_t *a, cell_t *b) {
  int n = list_size(b);
  int n_a = list_size(a);
  int i = 0;
  if(n && n_a) {
    cell_t *l;
    while(!closure_is_ready(l = b->ptr[n-1]) && i < n_a) {
      cell_t *x = a->ptr[i];
      if(is_placeholder(x)) {
	/* ... */
      } else {
	b = arg_nd(l, ref(x), b);
	++i;
      }
    }
  }
  cell_t *e = expand(b, n_a - i);
  int j;
  for(j = n; i < n_a; ++i, ++j) {
    e->ptr[j] = ref(a->ptr[i]);
  }
  drop(a);
  return e;
}

compose_nd(a, b) is similar to pushl_nd, but a is a list, so it conceptually the same as performing pushl_nd(x, b) for each item x in a, but it is much more efficient because b is expanded only once. Ignore if(is_placeholder(x) { ... } for now, just look at the else clause.

The _nd variants of functions are nondestructive to their arguments, so that pushl_nd doesn’t affect other references; only the portion of the graph that has an exclusive reference can be modified and the rest must be copied. This is accomplished with arg_nd(c, a, r) which is similar to arg(c, a), except the argument r points to the root of the graph of which arg_nd will return a modified version in which c has been supplied argument a.

arg_nd uses modify_copy(c, r), which returns a copy of r where it is safe to modify c.

cell_t *modify_copy(cell_t *c, cell_t *r) {
  cell_t *new = _modify_copy1(c, r, true);
  if(new && new != r) {
    ref(new);
    drop(r);
  }
  if(new) {
    _modify_copy2(new);
    return new;
  } else return r;
}

void _modify_new(cell_t *r, bool u) {
  cell_t *n;
  if(clear_ptr(r->tmp, 3)) return;
  if(u) {
    n = ref(r);
  } else {
    n = copy(r);
    n->tmp = (cell_t *)3;
    n->n = 0;
  }
  r->tmp = mark_ptr(n, 3);
}

/* first sweep of modify_copy */
cell_t *_modify_copy1(cell_t *c, cell_t *r, bool up) {
  if(!is_closure(r)) return 0;

  r = clear_ptr(r, 3);
  int nd = nondep_n(r);

  /* is r unique (okay to replace)? */
  bool u = up && !nd;

  if(r->tmp) {
    assert(is_marked(r->tmp, 3));
    /* already been replaced */
    return clear_ptr(r->tmp, 3);
  } else r->tmp = (cell_t *)3;
  if(c == r) _modify_new(r, u);
  traverse(r, {
      if(_modify_copy1(c, *p, u))
	_modify_new(r, u);
    }, ARGS | PTRS | ALT);
  return clear_ptr(r->tmp, 3);
}

cell_t *get_mod(cell_t *r) {
  if(!r) return 0;
  cell_t *a = r->tmp;
  if(is_marked(a, 2)) return clear_ptr(a, 3);
  else return 0;
}

/* second sweep of modify copy */
void _modify_copy2(cell_t *r) {

  /* r is modified in place */
  bool s = r == clear_ptr(r->tmp, 3);

  if(!is_closure(r)) return;
  /* alread been here */
  if(!is_marked(r->tmp, 1)) return;
  r->tmp = clear_ptr(r->tmp, 1);
  traverse(r, {
      cell_t *u = clear_ptr(*p, 3);
      cell_t *t = get_mod(u);
      if(t) {
	if(!(s && t == u)) {
	  *p = ref(t);
	  if(s) drop(u);
	}
	_modify_copy2(t);
      } else if(!s) ref(u);
      if((!s || t != u) && is_weak(r, *p)) {
	--(*p)->n;
      }
    }, ARGS | PTRS | ALT);
}

modify_copy is complex, but essentially, it makes a lazy copy using the tmp fields in cell_t. _modify_copy1 performs the first sweep, where copied closures are allocated as needed and stored in tmp, and _modify_copy2 performs the second sweep, where references are updated in the copied graph to point to the new closures.

As you can see, quotes are very powerful, although the implementation is somewhat complex. The compiler can eliminate any overhead from using quotes in most cases.

In the next article, I will cover how alternatives work in the PoprC runtime.

PoprC Runtime Operation Part 2: Memory Management

Dustin DeWeese — Mon, 04 Nov 2013 00:00:00 UT

Memory management in the PoprC runtime is done using a custom allocator, based on a ring of cells. It’s fairly simple, but it works, and I’m not yet to the point where it needs to be optimized. Memory allocation must be handled carefully, because I want PoprC to be able to produce code suitable for embedded devices, which have very little memory, and need real-time guarantees.

The runtime’s garbage collection is based on reference counting to minimize memory usage, to avoid unpredictable pauses, and also because it’s simple. In addition, the reference count can be used to determine when it is okay to modify a closure, rather than making a modified copy. This is an effective optimization because most closures only have a single reference.

At startup, all cells are added to a free ring by cells_init().

void cells_init() {
  int i;
  const unsigned int n = LENGTH(cells)-1;

  // zero the cells
  memset(&cells, 0, sizeof(cells));
  memset(&alt_live, 0, sizeof(alt_live));

  // set up doubly-linked pointer ring
  for(i = 0; i < n; i++) {
    cells[i].prev = &cells[i-1];
    cells[i].next = &cells[i+1];
  }
  cells[0].prev = &cells[n-1];
  cells[n-1].next = &cells[0];

  cells_ptr = &cells[0];
  alt_cnt = 0;
}

cells_init is fairly straightforward; the cells are zeroed, each cell is linked to its neighbor (n-1 and n+1 modulo the number of cells), and cells_ptr, the point from with cells are added and removed, is set to point to cells[0]. alt_cnt counts the number of alternatives (explained later) allocated.

cells_next() is used to obtain a point in the ring from which to allocate. It simply returns cells_ptr and advances cells_ptr to its next pointer.

cell_t *cells_next() {
  cell_t *p = cells_ptr;
  assert(is_cell(p) &&
         !is_closure(p) &&
         is_cell(cells_ptr->next));
  cells_ptr = cells_ptr->next;
  return p;
}

cell_alloc(c) removes the cell_t pointer c from the ring. It also keeps track of allocation metrics.

void cell_alloc(cell_t *c) {
  assert(is_cell(c) && !is_closure(c));
  cell_t *prev = c->prev;
  assert(is_cell(prev) && !is_closure(prev));
  cell_t *next = c->next;
  assert(is_cell(next) && !is_closure(next));
  if(cells_ptr == c) cells_next();
  prev->next = next;
  next->prev = prev;
  measure.alloc_cnt++;
  if(++measure.current_alloc_cnt > measure.max_alloc_cnt)
    measure.max_alloc_cnt = measure.current_alloc_cnt;
}

Allocation from the free ring is handled by closure_alloc(args), where args is the number of arguments required by the closure.

cell_t *closure_alloc(int args) {
  cell_t *c = closure_alloc_cells(calculate_cells(args));
  c->size = args;
  return c;
}

cell_t *closure_alloc_cells(int size) {
  cell_t *ptr = cells_next(), *c = ptr;
  cell_t *mark = ptr;
  int cnt = 0;
  (void)mark;

  // search for contiguous chunk
  while(cnt < size) {
    if(is_cell(ptr) && !is_closure(ptr)) {
      cnt++;
      ptr++;
    } else {
      cnt = 0;
      c = ptr = cells_next();
      assert(c != mark);
    }
  }

  // remove the found chunk
  int i;
  for(i = 0; i < size; i++) {
    cell_alloc(&c[i]);
  }

  memset(c, 0, sizeof(cell_t)*size);
  return c;
}

closure_alloc uses calculate_cells to determine the number of cells required, and calls closure_alloc_cells(size), where size is the number of cells. This function looks for a contiguous chunk of cells that is large enough to fit the new closure. It starts from the pointer returned from cells_next(), and then looks to see if there are enough unallocated cells after it. If it fails, it pulls the next pointer from cells_next(). mark is used to avoid an infinite loop if no chunk that is large enough exists.

cell_t *func(reduce_t *f, unsigned int in, unsigned int out) {
  assert(out > 0);
  unsigned int args = in + out - 1;
  cell_t *c = closure_alloc(args);
  c->out = out - 1;
  c->func = f;
  if(args) c->arg[0] = (cell_t *)(intptr_t)(args - 1);
  closure_set_ready(c, !args && f != func_placeholder);
  return c;
}

After allocation using closure_alloc(n), the new cell is filled with zeroes, except c->size == n. func(f, in, out) wraps closure_alloc to provide a simple way to make incomplete (with missing arguments) closures. f is a reduction function (reduce_t), and in and out are the number of in and out arguments, respectively. c->arg[0] is set to the offset of the first argument (closures fill right to left.) The closure is marked as incomplete by setting the low bit of c->func unless the function requires no arguments or the closure is a placeholder.

Then the arg function can be used to load the arguments from right to left.

/* arg is destructive to *cp */
void arg(cell_t **cp, cell_t *a) {
  cell_t *c = *cp;
  assert(is_closure(c) && is_closure(a));
  assert(!closure_is_ready(c));
  int i = closure_next_child(c);
  // *** shift args if placeholder
  if(is_placeholder(c) &&
     (closure_in(c) == 0 ||
      closure_is_ready(c->arg[0]))) {
    c = expand_inplace(c, 1);
    c->arg[0] = a;
  } else if(!is_data(c->arg[i])) {
    c->arg[0] = (cell_t *)(intptr_t)
	  (i - (closure_is_ready(a) ? 1 : 0));
    c->arg[i] = a;
    if(i == 0 && !is_placeholder(c))
	  closure_set_ready(c, closure_is_ready(a));
  } else {
    arg(&c->arg[i], a);
    if(!is_placeholder(c) &&
       closure_is_ready(c->arg[i])) {
      if(i == 0) closure_set_ready(c, true);
      else --*(intptr_t *)&c->arg[0]; // decrement offset
    }
  }
  *cp = c;
}

arg(&c, a) is pretty complex because it needs to handle placeholders. It works its way down the left side of the graph, looking for the innermost function lacking an argument, and then inserts a into that closure. Intuitively, it simply concatenates the argument a to c. Note that c is passed as a reference. This is because c could be moved to accommodate an expanding placeholder.

cell_t *ref(cell_t *c) {
  return(refn(c, 1));
}

cell_t *refn(cell_t *c, unsigned int n) {
  c = clear_ptr(c, 3);
  if(c) {
    assert(is_closure(c));
    c->n += n;
  }
  return c;
}

References are made by incrementing c->n. The number of references are c->n + 1, because the reference count will never reach zero; it will be immediately freed. This means !c->n indicates that a closure is only referenced once, and can be modified in place. clear_ptr(ptr, bits) clears marker bits in ptr (used by other functions).

void drop(cell_t *c) {
  if(!is_cell(c) || !is_closure(c)) return;
  if(!c->n) {
    cell_t *p;
    traverse(c, {
	cell_t *x = clear_ptr(*p, 3);
	/* !is_marked condition needed */
	/* during _modify_copy2 */
	if(!is_marked(*p, 2)) {
	  drop(x);
	}
      }, ALT | ARGS_IN | PTRS);
    if(is_dep(c) && !is_reduced(p = c->arg[0]) && is_closure(p)) {
      /* mark dep arg as gone */
      int n = closure_args(p);
      while(n--) {
	if(p->arg[n] == c) {
	  p->arg[n] = 0;
	  break;
	}
      }
    }
    if(is_reduced(c)) alt_set_drop(c->alt_set);
    if(c->func == func_id) alt_set_drop((alt_set_t)c->arg[1]);
    closure_free(c);
  } else {
    --c->n;
  }
}

Dropping (or removing a reference) is accomplished using the drop(c) function, where c is the cell to drop. It drops c. If c->n == 0, it recursively drops all the arguments of an unevaluated function, or the items in a list, as well as the alternative pointer.

#define traverse(r, action, flags) 			\
  do {							\
    cell_t **p;						\
    if(is_reduced(r)) {					\
      if(((flags) & PTRS) &&				\
		is_list(r)) {				\
	int i, n = list_size(r);			\
	for(i = 0; i < n; ++i) {			\
	  p = (r)->ptr + i;				\
	  action					\
	}						\
      }							\
    } else if((flags) & (ARGS | ARGS_IN)) {		\
      int i, n = ((flags) & ARGS_IN) ?			\
	closure_in(r) :					\
	closure_args(r);				\
      for(i = closure_next_child(r); i < n; ++i) {	\
	p = (r)->arg + i;				\
	if(*p) {action}					\
      }							\
    }							\
    if((flags) & ALT) {					\
      p = &(r)->alt;					\
      action						\
    }							\
  } while(0)

The traverse(r, action, flags) macro traverses the cell r and applies action to the parts of r indicated by flags. This makes the drop function (and many others) less tedious.

void closure_shrink(cell_t *c, int s) {
  if(!is_cell(c)) return;
  int i, size = closure_cells(c);
  if(size > s) {
    assert(is_closure(c));
    for(i = s; i < size; i++) {
      c[i].func = 0;
      c[i].prev = &c[i-1];
      c[i].next = &c[i+1];
    }
    c[s].prev = cells_ptr->prev;
    cells_ptr->prev->next = &c[s];
    c[size-1].next = cells_ptr;
    cells_ptr->prev = &c[size-1];
    measure.current_alloc_cnt -= size - s;
  }
}

void closure_free(cell_t *c) {
  closure_shrink(c, 0);
}

closure_shrink(c, s) shrinks the closure c to s cells, adding the extra cells back to the free ring. closure_free(c) then, which deallocates c entirely, is trivially implemented with closure_shrink(c, 0).

That covers the basics of the memory management functions of the PoprC runtime. The next article in this series will explain quotes (also called lists).

PoprC Runtime Operation Part 1: Cells and Closures

Dustin DeWeese — Fri, 01 Nov 2013 00:00:00 UT

Because Popr is a functional language, execution of a Popr expression can be thought of as a series of reductions to reach the final result. PoprC relies on functions in the runtime (rt.c) to handle the reduction of Popr expressions. It can reduce functions with static (known) arguments. If this were all PoprC could do, it would be an interpreter, and not a compiler. It can also reduce functions with variables as arguments, producing a trace which is converted into LLVM IR. This results in code with all static parts fully reduced, and only the dynamic (not known in advance) portions implemented by using calls to the runtime.

The runtime is also responsible for memory management. Unallocated memory is arrange as a free ring (a doubly-linked list with no head or tail) of cells. Each cell stores data required to implement a closure, which is a function and its arguments, but a cell also has more information required for memory management, alternatives, reduced values, alternatives, and a temporary pointer used during graph copying.

Here is exactly what is stored in the cell type (cell_t):

typedef bool (reduce_t)(cell_t **cell, type_rep_t type);
struct __attribute__((packed)) cell {
  reduce_t *func;
  cell_t *alt;
  cell_t *tmp;
  uint32_t n;
  uint16_t size;
  union {
    uint16_t type;
    uint16_t out;
  };
  union {
    /* unevaluated */
    cell_t *arg[3];
    /* reduced */
    struct __attribute__((packed)) {
      alt_set_t alt_set;
      union {
        intptr_t val[2]; /* value */
        cell_t *ptr[2];  /* list */
      };
    };
    /* unallocated */
    struct __attribute__((packed)) {
      cell_t *prev, *next;
    };
  };
} __attribute__((aligned(4)));

In the description of PoprC, the term closure is used to refer to a function which may be missing arguments, which can be reduced. Closures are stored in one or more cells.

cell_t is complicated, because closures have a life cycle with three stages, and to minimize memory usage, some parts of cells are reused to store different information in some stages.

The first stage is the unallocated cell; it only requires a previous (prev) and next (next) pointer, and a way to indicate that it is unallocated is useful to prevent errors, which is done by setting func to 0. When the runtime is initialized, it builds a ring of unallocated cells in a statically allocated array (cells). This could later be extended to allow dynamically growing and shrinking the memory pool, or if the code is embedded, all heap memory could be consumed by the cell ring at startup.

The next two stages share some common fields. The func field stores a pointer to the function that the closure will execute on reduction. alt stores the next alternative. tmp is a temporary pointer used to implement graph copying. n is a reference count. size indicates the size of the arg array, and consequently is one greater than the size of the val and ptr arrays.

The second stage is the unevaluated closure. In this stage, in addition to the common fields, the out field indicates how many of the arguments are outputs instead of inputs, and the arg array stores pointers to other cells, which are the arguments. In this stage, during building (when the graph is being constructed, before reduction), some arguments might be missing. Arguments are filled in from the left (c->arg[c->size - 1]), with the position of the next argument stored in c->arg[0]. Bit 0 of func is set to indicate that the closure is not ready.

The third stage is the reduced closure. In this stage, the type field indicates a type (and also has some other bits that store metadata), alt_set is a bit field that controls the interaction of alternatives (which I will describe in another post.) The reduced cell could be either a vector of unboxed values stored in the val array, or a list, which stores pointers to its members in the ptr array.

With all that out of the way, you might wonder what happens if I want a list of more than two items. Multiple contiguous cells can be used together to extend the arg, val, and ptr arrays.

So now you should understand cell_t and closures, upon which the runtime is built. In the next article in this series, I’ll explain the allocator and memory management functions in the runtime.

PoprC Code Explanation: Introduction and Overview

Dustin DeWeese — Wed, 30 Oct 2013 00:00:00 UT

I’ve been working on a compiler called PoprC for my programming language, Popr. It has been about a year since I started, so I want to explain how the compiler currently works to help clarify my ideas. It might also be interesting to others, and I hope to get some feedback.

The compiler consists of four main components: runtime (rt.c), evaluation (eval.c), predefined primitives (primitive.c), and LLVM code generation (llvm.cpp). The runtime provides code to be linked into compiled Popr code, to handle memory management and graph reduction. eval.c has code for parsing Popr expressions and displaying results, as well as functions to generate .dot files (graphing working memory) for debugging. It contains any code related to evaluation that is not required in the runtime. The predefined primitives in primitives.c form the basic building blocks for Popr programs. llvm.cpp handles tracing evaluation to generate LLVM IR that is currently JIT compiled.

The compiler is based on abstract interpretation; it is an interpreter that can also handle variables and placeholders. Variables represent a partially unknown value, such as an argument to a function. Placeholders are partially unknown functions. Functions continue to operate in the presence of variables and placeholders; if the result can’t be known given the arguments, variables and placeholders are used to denote the unknown portions in the resulting value. This allows the interpreter to partially evaluate functions. A tracing function can be hooked into the interpreter to convert operations on variables and placeholders into code implementing the partially applied function. The ability of the interpreter to execute alternatives allows all branches to be explored while generating code for full flow analysis.

Some examples might help illustrate how it works (you can try them online):

: +
?_2 <- arg(0)
?_3 <- arg(1)
?i3 <- type
?i2 <- type
?i1 <- ?3 ?2 add
 [ ?i1 ]

A + is entered at the prompt (user input is shown after the prompt ‘:’). First the interpreter produces arguments until the expression is complete. It creates ?_2 and ?_3 as the first and second arguments, respectively. The character after the ? denotes the type, an underscore (_) represents an unknown type, and i represents an integer. The + function restricts its arguments to integer type, producing the next two lines. Finally + is applied to its arguments, producing the add line, and the result is the variable ?i1.

: popr swap drop
?_6 <- arg(0)
?f7 <- ?l6 head
?8 <- f[?7]
 [ ?_8 ]

A more complex example is the head function in lib.peg. It takes a list argument (?l6), extracts the function within (?f7), and applies it, producing a value of unknown type (?_8).

To understand more in depth how PoprC works, you will need to understand the details of the interpreter, which will be the topic of my next post.

A Quick Introduction to the Popr Programming Language

Dustin DeWeese — Tue, 29 Oct 2013 00:00:00 UT

Popr is a pure functional lazy post-fix concatenative programming language. I will briefly explain some of the features of Popr through some examples. You can try these examples online.

1 2 +   -->   [ 3 ]

+ is a function that takes two integers and returns one. There are several similar functions, with the same meaning as in C: -, *, <, <=, ==, >, >=. Booleans are currently represented as integers, non-zero for true, zero for false.

1 2 | 3 +         -->   [ { 4 | 5 } ]
2 5 | dup 2 - !   -->   [ 5 ]

| takes two values and creates two alternatives, where each value is returned. ! takes two values; if the second argument is 0, the it fails, otherwise the first value is returned. Alternatives and failure provide a more general branching mechanism than if/else, similar to speculative execution, or Prolog inference.

[ 1 2 + 3 4 + ] popr   -->   [ [ 1 2 + ] 7 ]
1 [ 2 + ] pushl        -->   [ [ 1 2 + ] ]
1 [ 2 + ] pushl popr   -->   [ [] 3 ]

[ ... ] denotes a quotation, which can be used for constructing functions or for aggregating values. popr pops the rightmost element from a quotation and forces evaluation. pushl takes its first argument and pushes it onto the left of the quotation. The two primitives can be combined to evaluate a quotation. Notice that Popr is lazy, and only reduces functions in quotations when required.

1 2 | dup   -->   { [ 1 1 ] | [ 2 2 ] }
1 2 swap    -->   [ 2 1 ]
1 2 drop    -->   [ 1 ]
1 2 | cut   -->   [ 1 ]

Some other primitives. dup duplicates a value; notice how it respects the constraints introduced by alternatives. swap swaps two values. drop drops a value. cut prunes alternatives after the first successful one. This should only be used to hint that only one unique alternative would succeed, rather than to prune otherwise successful execution paths. The type checker might eventually check this.