Friday, October 30, 2009

Performance update

Performance is now around TinyScheme levels (slightly above in some cases, slightly below in others); that's not really saying much, since TinyScheme is quite slow, but it's better than where Vesta used to be. I'll focus on cleaning up everything (but core evaluator & FFI take top priority), as well as performance. Hopefully I'll be able to get better performance in Vesta before I work on Enyo & Ceres.

Thursday, October 15, 2009

Semantic Overview

So, I was discussing a project with lisp-qix, and promised to give a quick write up of Digamma's semantics. So, let's dive right in.

Data types
Digamma hosts the normal set of scheme data types: vectors, lists, atoms, numbers (integers, rationals, (double) reals, & complex), strings, &c. The only additions to those data types are environments and dictionaries.

Environments are to be used for security in functions such as eval: if you don't know where the code you're using comes from, you may wish to evaluate it in a top-level environment that doesn't have I/O, or has specially restricted I/O functions; Environments allow you define & control evaluation programatically, all from a standard program. To do this, Digamma has special forms that interact with environments, such as define-environment, set-environment!, get-environment, clone-environment, &c. eval and load can both take environments as optional arguments, but you can use the with-environment form to evaluate code in a restricted place. This code is the basis for a W7 environment within Digamma, to be used for security-related code in end-user programs; basically, user code can be sandboxed off from everything else in the system (including, technically, itself: remove def, cset!, and set! and the environment could not even modify itself).

Dictionaries are similar to what is found in most contemporary programming languages:

{ test: "this" keys: "work }

The basic "production" is:

'{' ((STRING | KEYWORD | SYMBOL) SEXPR)* '}'

The reason that I've restricted the storage index to character strings is because I'm using tries as the back end data storage mechanism, due to my initial use of Digamma as a semantic analysis platform. Dictionaries have one primitive form that other collexions do not (explained below): has-key?, which simply tells if the key exists within the dictionary. There is one other primitive that I'd like to add which is partial-key?, which would return #t if all characters within the index match, #f if none match and, N (where N is a number) if there is a partial match.

Collexions
One issue that always irked me about Lisp systems is that while there is type unity (all productions are S-Expressions after all), there is no such thing as a collexion, which have common accesors. Now, to be fair, certain Lisps do fix this issue; Gauche has a collections library. However, I think this should be specified in the base system, since it aids in selecting the correct data structure for the task, whilst still allowing room to expand. For instance, you might use an a-list initially, then switch to a dictionary later, for speed or memory issues. To aid in this endeavor, Digamma has a core set of primitive forms that operate on all collexion types (this is another reason why I chose tries: since they have a total ordering, and can easily play the role of "associative arrays" in a map, filtre, &c.). These forms are:
  • nth, which is an accessor (nth collexion index)
  • first, which grabs the first element of a collexion, ala car
  • rest, which returns everything other than the first, in an non-destructive manner
  • cset!, which destructively set!s a collexion member
  • cupdate, which returns a shared structure with the collexion index modified
These work on all sequences, and have only minor restrictions for the sequence type (i.e. anything other than a char or an integer cannot be inserted into a string, for instance).

Closures
One thing I've said nothing about yet, but that is totally important to Scheme, is closures. First, closures in Digamma are written fn, but pronounced lambda. Second, they are quite a bit different than their Scheme cousins. First, the pathological:

((fn (x) (+ x x)) 3)

Returns '6', as you might have expected. Let's try something different:

(def y (fn (:opt (x 34))) (+ x x)))
(y) ; returns 68
(y 3) ; returns 6

That :opt keyword is inserted into the middle, because :body and :rest arguments can have optionally specified default values as well. It also includes a case variant (ala case-lambda), a SRFI-89 style lambda, and a typed generic lambda.

(def f (fn :case (0 => 1) (1 => 1) (n => (+ (f (- n 1)) (f (- n 2))))))

The match is Hope-style: changing the order would not effect the result, as the matcher attempts to match the most specific case.

Wrap Up
I'll write future posts on the other parts of Digamma (unix/posix integration, networking, FFI, modules, &c.), but this is a decent overview of the semantic changes to Scheme that I've made for Digamma.

Thursday, October 8, 2009

Cleaning House

So, I'm cleaning up & refactoring the "internals" of Digamma's reference implementation, Vesta. There are two evaluators now, one of which is "stackless". Continuations & tasklets capture the evaluation stack & can "replay" it at later points, but it's rather messy, and performance is behind tinyscheme, which is quite a trick, considering that tinyscheme is one of the slowest scheme systems out there. The stackless version is much faster than the C stack version, and closer in speed to tinyscheme, but that still leaves one order of magnitude between Vesta & established systems. I'll close the gap in the upcoming days with a refactor of the stackless core, and then it's just a spruce up of the syntax-rules pattern matcher and the FFI that is holding Vesta back from being released. By the end of the month, I hope to be pushing out labs that you can run at home...