%clay is version-controlled, referentially-transparent, and global.
While this filesystem is stored in
%clay, it is mirrored to Unix for
convenience. Unix tells
%clays whenever a file changes in the Unix
copy of the filesystem so that the change may be applied.
unix whenever an app or vane changes the filesystem so that the change
can be effected in Unix. Apps and vanes may use
%clay to write to the
filesystem, query it, and subscribe to changes in it. Ford and gall use
%clay to serve up apps and web pages.
%clay includes three components. First is the filesystem/version
control algorithms, which are mostly defined in
zuse. Second is the write, query, and subscription logic. Finally, there
is the logic for communicating requests to, and receiving requests from,
This section is up to date as of November 2020
|merge %destination-desk ~source-ship %source-desk |merge %destination-desk ~source-ship %source-desk, =gem %strategy |merge %destination-desk ~source-ship %source-desk, =cas ud+5
Clay supports various merge strategies. A "commit" is a snapshot of
the files with a list of parents plus a date. Most commits have
one parent; a "merge" commit is a commit with two parents. The
%home desk starts with an initial commit with no parents; commits
with several parents ("octopus merges") are possible but we don't
generate them right now.
Unless otherwise specified, all of the following create a new commit with the source and destination commits as parents.
Several strategies need a "merge-base". They find it by identifying the most recent common ancestor of the two desks. If none, fail with
%merge-no-merge-base; if there are two or more, pick one.
%init: the only way to create a desk. Not a true merge, since it
simply assigns the source commit to the destination.
%fine: if source or destination are in the ancestry of each other,
use the newer one; else abort. If the destination is ahead of the
source, succeed but do nothing. If the source is ahead of the
destination, assign the next revision number to the source commit.
Some call this "fast-forward".
%meet: combine changes, failing if both sides changed the same file.
Specifically, take diff(merge-base,source) and
diff(merge-base,destination) and combine them as long as those diffs
touch different files.
%mate: combine changes, failing if both sides changed the same part
of a file. Identical to
%meet, except that some marks, like
allow intelligent merge of changes to different parts of a file.
%meld: combine changes; if both sides changed the same part of a
file, use the version of the file in the merge-base.
%only-this: create a merge commit with exactly the contents of the
%only-that: create a merge commit with exactly the contents of the
%take-this: create a merge commit with exactly the contents of the
destination desk except take any files from the source commit which
are not in the destination desk.
%take-that: create a merge commit with exactly the contents of the
source commit except preserve any files from the destination desk
which are not in the source commit.
%meet-this: merge as in
%meet, except if both sides changed the same
file, use the version in the destination desk.
%meet-that: merge as in
%meet, except if both sides changed the same
file, use the version in the source commit.
Examples and notes:
The most common merge strategy is %mate, which is a normal 3-way
merge which aborts on conflict.
%take-that is useful to "force" an OTA. After running
you're guaranteed to have exactly the files in the source commit plus
any files you separately added.
We speak of merging into a destination desk from a source commit because while you can only merge on top of a desk, you can merge from historical commits. For example,
|merge %old our %home, =cas ud+5, =gem %init
will create a new desk called
%old with the 5th commit in
You can revert the contents of a desk to what they were yesterday
|merge %home our %home, =cas da+(sub now ~d1), =gem %only-that
Note this is a normal
%only-that merge, which means you're creating a
new commit with the old contents.
%meld is rarely used on its own, however if you specify
omit the merge strategy,
%kiln will run a
%meld merge into a scratch
desk and then annotate the conflicts there.
If you resolve merge conflicts manually, for example by mounting the desks, copying the files in Unix and then running
should usually run an
%only-this merge. This will not change the
newly-fixed contents of your desk, but it will record that the merge
happened so that those conflicts don't reappear in later merges.
If you get a
%merge-no-merge-base error, this means you're trying to
merge two desks which have no common ancestors. You need to give
them a common ancestor by choosing a merge strategy which doesn't
need a merge-base, like
%take-this could be useful to install 3rd party software, but you
couldn't get subsequent updates this way, since the files would
already exist in the destination desk. Something like "take only
the files which aren't in my OTA source or any other 3rd party app"
would be basically correct. This would require a parameter listing
the desks to not conflict with.
%meet-that imply the existence of
%mate-that, but those don't exist yet.
The remainder of this document is out of date as of November 2020, but certain parts are still accurate. Please independently confirm information here before acting on it.
Clay is the primary filesystem for the arvo operating system, which is the core of an urbit. The architecture of clay is intrinsically connected with arvo, but we assume no knowledge of either arvo or urbit. We will point out only those features of arvo that are necessary for an understanding of clay, and we will do so only when they arise.
The first relevant feature of arvo is that it is a deterministic system where input and output are defined as a series of events and effects. The state of arvo is simply a function of its event log. None of the effects from an event are emitted until the event is entered in the log and persisted, either to disk or another trusted source of persistence, such as a Kafka cluster. Consequently, arvo is a single-level store: everything in its state is persistent.
In a more traditional OS, everything in RAM can be erased at any time by power failure, and is always erased on reboot. Thus, a primary purpose of a filesystem is to ensure files persist across power failures and reboots. In arvo, both power failures and reboots are special cases of suspending computation, which is done safely since our event log is already persistent. Therefore, clay is not needed in arvo for persistence. Why, then, do we have a filesystem? There are two answers to this question.
First, clay provides a filesystem tree, which is a convenient user interface for some applications. Unix has the useful concept of virtual filesystems, which are used for everything from direct access to devices, to random number generators, to the /proc tree. It is easy and intuitive to read from and write to a filesystem tree.
Second, clay has a distributed revision-control system baked into it. Traditional filesystems are not revision controlled, so userspace software -- such as git -- is written on top of them to do so. clay natively provides the same functionality as modern DVCSes, and more.
clay has two other unique properties that we'll cover later on: it supports typed data and is referentially transparent.
Every urbit has one or more "desks", which are independently revision-controlled branches. Each desk contains its own mark definitions, apps, doc, and so forth.
Traditionally, an urbit has at least a base and a home desk. The base desk has all the system software from the distribution. the home desk is a fork of base with all the stuff specific to the user of the urbit.
A desk is a series of numbered commits, the most recent of which represents the current state of the desk. A commit is composed of (1) an absolute time when it was created, (2) a list of zero or more parents, and (3) a map from paths to data.
Most commits have exactly one parent, but the initial commit on a desk may have zero parents, and merge commits have more than one parent.
The non-meta data is stored in the map of paths to data. It's worth noting that no constraints are put on this map, so, for example, both /a/b and /a/b/c could have data. This is impossible in a traditional Unix filesystem since it means that /a/b is both a file and a directory. Conventionally, the final element in the path is its mark -- much like a filename extension in Unix. Thus, /doc/readme.md in Unix is stored as /doc/readme/md in urbit.
The data is not stored directly in the map; rather, a hash of the data is stored, and we maintain a master blob store. Thus, if the same data is referred to in multiple commits (as, for example, when a file doesn't change between commits), only the hash is duplicated.
In the master blob store, we either store the data directly, or else we store a diff against another blob. The hash is dependent only on the data within and not on whether or not it's stored directly, so we may on occasion rearrange the contents of the blob store for performance reasons.
Recall that a desk is a series of numbered commits. Not every commit in a desk must be numbered. For example, if the base desk has had 50 commits since home was forked from it, then a merge from base to home will only add a single revision number to home, although the full commit history will be accessible by traversing the parentage of the individual commits.
We do guarantee that the first commit is numbered 1, commits are numbered consecutively after that (i.e. there are no "holes"), the topmost commit is always numbered, and every numbered commit is an ancestor of every later numbered commit.
There are three ways to refer to particular commits in the revision history. Firstly, one can use the revision number. Secondly, one can use any absolute time between the one numbered commit and the next (inclusive of the first, exclusive of the second). Thirdly, every desk has a map of labels to revision numbers. These labels may be used to refer to specific commits.
Additionally, clay is a global filesystem, so data on other urbit is easily accessible the same way as data on our local urbit. In general, the path to a particular revision of a desk is /~urbit-name/desk-name/revision. Thus, to get /try/readme/md from revision 5 of the home desk on ~sampel-sipnym, we refer to /~sampel-sipnym/home/5/try/readme/md. Clay's namespace is thus global and referentially transparent.
XXX reactivity here?
A Typed Filesystem
Since clay is a general filesystem for storing data of arbitrary types, in order to revision control correctly it needs to be aware of types all the way through. Traditional revision control does an excellent job of handling source code, so for source code we act very similar to traditional revision control. The challenge is to handle other data similarly well.
For example, modern VCSs generally support "binary files", which are files for which the standard textual diffing, patching, and merging algorithms are not helpful. A "diff" of two binary files is just a pair of the files, "patching" this diff is just replacing the old file with the new one, and "merging" non-identical diffs is always a conflict, which can't even be helpfully annotated. Without knowing anything about the structure of a blob of data, this is the best we can do.
Often, though, "binary" files have some internal structure, and it is possible to create diff, patch, and merge algorithms that take advantage of this structure. An image may be the result of a base image with some set of operations applied. With algorithms aware of this set of operations, not only can revision control software save space by not having to save every revision of the image individually, these transformations can be made on parallel branches and merged at will.
Suppose Alice is tasked with touching up a picture, improving the color balance, adjusting the contrast, and so forth, while Bob has the job of cropping the picture to fit where it's needed and adding textual overlay. Without type-aware revision control, these changes must be made serially, requiring Alice and Bob to explicitly coordinate their efforts. With type-aware revision control, these operations may be performed in parallel, and then the two changesets can be merged programmatically.
Of course, even some kinds of text files may be better served by diff, patch, and merge algorithms aware of the structure of the files. Consider a file containing a pretty-printed JSON object. Small changes in the JSON object may result in rather significant changes in how the object is pretty-printed (for example, by addding an indentation level, splitting a single line into multiple lines).
A text file wrapped at 80 columns also reacts suboptimally with unadorned Hunt-McIlroy diffs. A single word inserted in a paragraph may push the final word or two of the line onto the next line, and the entire rest of the paragraph may be flagged as a change. Two diffs consisting of a single added word to different sentences may be flagged as a conflict. In general, prose should be diffed by sentence, not by line.
As far as the author is aware, clay is the first generalized, type-aware revision control system. We'll go into the workings of this system in some detail.
Central to a typed filesystem is the idea of types. In clay, we call these "marks". A mark is a file that defines a type, conversion routines to and from the mark, and diff, patch, and merge routines.
For example, a
%txt mark may be a list of lines of text, and it
may include conversions to
%mime to allow it to be serialized
and sent to a browswer or to the unix filesystem. It will also
include Hunt-McIlroy diff, patch, and merge algorithms.
%json mark would be defined as a json object in the code, and
it would have a parser to convert from
%txt and a printer to
convert back to
%txt. The diff, patch, and merge algorithms are
fairly straightforward for json, though they're very different
from the text ones.
More formally, a mark is a core with three arms,
++grab is a series of functions to
convert from other marks to the given mark. In
++grow is a
series of functions to convert from the given mark to other
The types are as follows, in an informal pseudocode:
++ grab: ++ mime: <mime> -> <mark-type> ++ txt: <txt> -> <mark-type> ... ++ grow: ++ mime: <mark-type> -> <mime> ++ txt: <mark-type> -> <txt> ... ++ grad ++ diff: (<mark-type>, <mark-type>) -> <diff-type> ++ pact: (<mark-type>, <diff-type>) -> <mark-type> ++ join: (<diff-type>, <diff-type>) -> <diff-type> or NULL ++ mash: (<diff-type>, <diff-type>) -> <diff-type>
These types are basically what you would expect. Not every mark has each of these functions defined -- all of them are optional in the general case.
In general, for a particular mark, the
(if they exist) should be inverses of each other.
++diff takes two instances of a mark and produces
a diff of them.
++pact takes an instance of a mark and patches
it with the given diff.
++join takes two diffs and attempts to
merge them into a single diff. If there are conflicts, it
++mash takes two diffs and forces a merge,
annotating any conflicts.
In general, if
++diff called with A and B produces diff D, then
++pact called with A and D should produce B. Also, if
of two diffs does not produce null, then
++mash of the same
diffs should produce the same result.
Alternately, instead of
++mash, a mark can provide the same functionality by defining
++sted to be the name of another mark to which we wish to
delegate the revision control responsibilities. Then, before
running any of those functions, clay will convert to the other
mark, and convert back afterward. For example, the
is revision-controlled in the same way as
%txt, so its
++sted %txt. Of course,
++txt must be defined in
++grab as well.
Every file in clay has a mark, and that mark must have a
++grad. Marks are used for more than just
clay, and other marks don't need a
++grad, but if a piece of
data is to be saved to clay, we must know how to revision-control
Additionally, if a file is to be synced out to unix, then it must
have conversion routines to and from the
Reading and Subscribing
When reading from Clay, there are three types of requests. A
%sing request asks for data at single revsion. A
request asks to be notified the next time there's a change to
given file. A
%many request asks to be notified on every
change in a desk for a range of changes.
%next, there are generally three things to be
%u request simply checks for the existence of a
file at a path. A
%x request gets the data in the file at a
%y request gets a hash of the data in the file at the
path combined with all its children and their data. Thus,
of a node changes if it or any of its children change.
%sing request is fulfilled immediately if possible. If the
requested revision is in the future, or is on another ship for
which we don't have the result cached, we don't respond
immediately. If the requested revision is in the future, we wait
until the revision happens before we respond to the request. If
the request is for data on another ship, we pass on the request
to the other ship. In general, Clay subscriptions, like most
things in Urbit, aren't guaranteed to return immediately.
They'll return when they can, and they'll do so in a
referentially transparent manner.
%next request checks query at the given revision, and it
produces the result of the query the next time it changes, along
with the revsion number when it changes. Thus, a
%next of a
%u is triggered when a file is added or deleted, a
%next of a
%x is triggered when a file is added, deleted, or changed, and
%next of a
%y is triggered when a file or any of its
children is added, deleted, or changed.
%many request is triggered every time the given desk has a
new revision. Unlike a
%many has both a start and
an end revsion, after which it stops returning. For
single change is reported, and if the caller wishes to hear of
the next change, it must resubscribe. For
%many, every revsion
from the start to the end triggers a response. Since a
request doesn't ask for any particular data, there aren't
%y versions for it.
One of the primary functions of clay is as a convenient user interface. While tools exist to use clay from within urbit, it's often useful to be able to treat clay like any other filesystem from the Unix perspective -- to "mount" it, as it were.
From urbit, you can run
|mount /path/to/directory %mount-point,
and this will mount the given clay directory to the mount-point
directory in Unix. Every file is converted to
%mime before it's
written to Unix, and converted back when read from Unix. The
entire directory is watched (a la Dropbox), and every change is
committed once you run
Merging is a fundamental operation for a distributed revision control system. At their root, clay's merges are similar to git's, but with some additions to accomodate typed data. There are seven different merge strategies.
Throughout our discussion, we'll say that the merge is from Alice's desk to Bob's. Recall that a commit is a date (for all new commits this will be the current date), a list of parents, and the data itself.
%init merge should be used iff it's the first commit to a
desk. The head of Alice's desk is used as the number 1 commit to
Bob's desk. Obviously, the ancestry remains intact through
traversing the parentage of the commit even though previous
commits are not numbered for Bob's desk.
%this merge means to keep what's in Bob's desk, but join the
ancestry. Thus, the new commit has the head of each desk as
parents, but the data is exactly what's in Bob's desk. For those
following along in git, this is the 'ours' merge strategy, not
the '--ours' option to the 'recursive' merge strategy. In other
words, even if Alice makes a change that does not conflict with
Bob, we throw it away. It's Bob's way or the highway.
%that merge means to take what's in Alice's desk, but join
the ancestry. This is the reverse of
%fine merge is a "fast-forward" merge. This succeeds iff one
head is in the ancestry of the other. In this case, we use the
descendant as our new head.
%meld merges, we first find the most
recent common ancestor to use as our merge base. If we have no
common ancestors, then we fail. If we have more than one most
recent common ancestor, then we have a criss-cross situation,
which should be handled delicately. At present, we delicately
throw up our hands and give up, but something akin to git's
'recursive' strategy should be implemented in the future.
There's a functional inclusion ordering on
%meld such that if an earlier strategy would have
succeeded, then every later strategy will produce the same
result. Put another way, every earlier strategy is the same as
every later strategy except with a restricted domain.
%meet merge only succeeds if the changes from the merge base
to Alice's head (hereafter, "Alice's changes") are in different
files than Bob's changes. In this case, the parents are both
Alice's and Bob's heads, and the data is the merge base plus
Alice's changed files plus Bob's changed files.
%mate merge attempts to merge changes to the same file when
both Alice and bob change it. If the merge is clean, we use it;
otherwise, we fail. A merge between different types of changes --
for example, deleting a file vs changing it -- is always a
conflict. If we succeed, the parents are both Alice's and Bob's
heads, and the data is the merge base plus Alice's changed files
plus Bob's changed files plus the merged files.
%meld merge will succeed even if there are conflicts. If
there are conflicts in a file, then we use the merge base's
version of that file, and we produce a set of files with
conflicts. The parents are both Alice's and Bob's heads, and the
data is the merge base plus Alice's changed files plus Bob's
changed files plus the successfully merged files plus the merge
base's version of the conflicting files.
That's the extent of the merge options in clay proper. In
userspace there's a final option
%auto, which is the most
%auto checks to see if Bob's desk exists, and if it
doesn't we use a
%init merge. Otherwise, we progressively try
%mate until one succeeds.
If none succeed, we merge Bob's desk into a scratch desk. Then,
we merge Alice's desk into the scratch desk with the
option to force the merge. For each file in the produced set of
conflicting files, we call the
++mash function for the
appropriate mark, which annotates the conflicts if we know how.
Finally, we display a message to the user informing them of the
scratch desk's existence, which files have annotated conflicts,
and which files have unannotated conflicts. When the user has
resolved the conflicts, they can merge the scratch desk back into
Bob's desk. This will be a
%fine merge since Bob's head is in
the ancestry of the scratch desk.
Tracking and staying in sync with another desk is another fundamental operation. We call this "autosync". This doesn't mean simply mirroring a desk, since that wouldn't allow local changes. We simply want to apply changes as they are made upstream, as long as there are no conflicts with local changes.
This is implemented by watching the other desk, and, when it has changes, merging these changes into our desk with the usual merge strategies.
Note that it's quite reasonable for two desks to be autosynced to each other. This results in any change on one desk being mirrored to the other and vice versa.
Additionally, it's fine to set up an autosync even if one desk, the other desk, or both desks do not exist. The sync will be activated when the upstream desk comes into existence and will create the downstream desk if needed.
The first part of this section will be reference documentation for the data types used by our filesystem. In fact, as a general guide, we recommend reading and attempting to understand the data structures used in any Hoon code before you try to read the code itself. Although complete understanding of the data structures is impossible without seeing them used in the code, an 80% understanding greatly clarifies the code. As another general guide, when reading Hoon, it rarely pays off to understand every line of code when it appears. Try to get the gist of it, and then move on. The next time you come back to it, it'll likely make a lot more sense.
After a description of the data models, we'll give an overview of the interface that vanes and applications can use to interact with the filesystem.
Finally, we'll dive into the code and the algorithms themselves. You know, the fun part.
As you're reading through this section, remember you can always come back to this when you run into these types later on. You're not going to remember everything the first time through, but it is worth reading, or at least skimming, this so that you get a rough idea of how our state is organized.
The types that are certainly worth reading are
(possibly in that order). All in all, though, this section isn't too
long, so many readers may wish to quickly read through all of it. If you
get bored, though, just skip to the next section. You can always come
back when you need to.
++raft, formal state
++ raft :: filesystem $: fat=(map ship room) :: domestic hoy=(map ship rung) :: foreign ran=rang :: hashes == ::
This is the state of our vane. Anything that must be remembered between calls to clay is stored in this state.
fat is the set of domestic servers. This stores all the information
that is specfic to a particular ship on this pier. The keys to this map
are the ships on the current pier. all the information that is specific
to a particular foreign ship. The keys to this map are all the ships
whose filesystems we have attempted to access through clay.
ran is the store of all commits and deltas, keyed by hash. The is
where all the "real" data we know is stored; the rest is "just
++room, filesystem per domestic ship
++ room :: fs per ship $: hun=duct :: terminal duct hez=(unit duct) :: sync duch dos=(map desk dojo) :: native desk == ::
This is the representation of the filesystem of a ship on our pier.
hun is the duct we use to send messages to dill to display
notifications of filesystem changes. Only
%note gifts should be
produced along this duct. This is set by the
hez, if present, is the duct we use to send sync messages to unix so
that they end up in the pier unix directory. Only
%ergo gifts should
be producd along this duct. This is set by
dos is a well-known operating system released in 1981. It is also the
set of desks on this ship, mapped to their data.
++desk, filesystem branch
++ desk ,@tas :: ship desk case spur
This is the name of a branch of the filesystem. The default desks are "arvo", "main", and "try". More may be created by simply referencing them. Desks have independent histories and states, and they may be merged into each other.
++dojo, domestic desk state
++ dojo ,[p=cult q=dome] :: domestic desk state
This is the all the data that is specific to a particular desk on a
p is the set of subscribers to this desk and
q is the
data in the desk.
++ cult (map duct rave) :: subscriptions
This is the set of subscriptions to a particular desk. The keys are the ducts from where the subscriptions requests came. The results will be produced along these ducts. The values are a description of the requested information.
++rave:clay, general subscription request
++ rave :: general request $% [& p=mood] :: single request [| p=moat] :: change range == ::
This represents a subscription request for a desk. The request can be for either a single item in the desk or else for a range of changes on the desk.
++rove, stored general subscription request
++ rove (each mood moot) :: stored request
When we store a request, we store subscriptions with a little extra information so that we can determine whether new versions actually affect the path we're subscribed to.
++mood:clay, single subscription request
++ mood ,[p=care q=case r=path] :: request in desk
This represents a request for the state of the desk at a particular
commit, specfied by
p specifies what kind of information is
r specifies the path we are requesting.
++moat:clay, range subscription request
++ moat ,[p=case q=case r=path] :: change range
This represents a request for all changes between
q on path
r. You will be notified when a change is made to the node referenced
by the path or to any of its children.
++moot, stored range subscription request
++ moot ,[p=case q=case r=path s=(map path lobe)] ::
This is just a
++moat:clay plus a map of paths to lobes. This map
represents the data at the node referenced by the path at case
we've gotten to that case (else null). We only send a notification along
the subscription if the data at a new revision is different than it was.
++care:clay, clay submode
++ care ?(%u %v %w %x %y %z) :: clay submode
This specifies what type of information is requested in a subscription or a scry.
%u requests the
++rang:clay at the current moment. Because this
information is not stored for any moment other than the present, we
crash if the
++case:clay is not a
%da for now.
%v requests the
++dome:clay at the specified commit.
%w requests the revsion number of the desk.
%x requests the file at a specified path at the specified commit. If
there is no node at that path or if the node has no contents (that is,
q:ankh is null), then this produces null.
%y requests a
++arch of the specfied commit at the specified path.
%z requests the
++ankh of the specified commit at the specfied path.
++arch, shallow filesystem node
++ arch ,[p=@uvI q=(unit ,@uvI) r=(map ,@ta ,~)] :: fundamental node
This is analogous to
++ankh:clay except that the we have neither our
contents nor the ankhs of our children. The other fields are exactly the
p is a hash of the associated ankh,
u.q, if it exists, is a
hash of the contents of this node, and the keys of
r are the names of
r is a map to null rather than a set so that the
ordering of the map will be equivalent to that of
++case:clay, specifying a commit
++ case :: ship desk case spur $% [%da p=@da] :: date [%tas p=@tas] :: label [%ud p=@ud] :: number == ::
A commit can be referred to in three ways:
%da refers to the commit
that was at the head on date
%tas refers to the commit labeled
%ud refers to the commit numbered
p. Note that since these
all can be reduced down to a
%ud, only numbered commits may be
referenced with a
++dome:clay, desk data
++ dome :: project state $: ang=agon :: pedigree ank=ankh :: state let=@ud :: top id hit=(map ,@ud tako) :: changes by id lab=(map ,@tas ,@ud) :: labels == ::
This is the data that is actually stored in a desk.
ang is unused and should be removed.
ank is the current state of the desk. Thus, it is the state of the
filesystem at revison
let. The head of a desk is always a numbered
let is the number of the most recently numbered commit. This is also
the total number of numbered commits.
hit is a map of numerical ids to hashes of commits. These hashes are
mapped into their associated commits in
hut:rang:clay. In general, the keys
of this map are exactly the numbers from 1 to
let, with no gaps. Of
course, when there are no numbered commits,
let is 0, so
null. Additionally, each of the commits is an ancestor of every commit
numbered greater than this one. Thus, each is a descendant of every
commit numbered less than this one. Since it is true that the date in
each commit (
t:yaki) is no earlier than that of each of its parents,
the numbered commits are totally ordered in the same way by both
pedigree and date. Of course, not every commit is numbered. If that
sounds too complicated to you, don't worry about it. It basically
behaves exactly as you would expect.
lab is a map of textual labels to numbered commits. Note that labels
can only be applied to numbered commits. Labels must be unique across a
++ankh, filesystem node
++ ankh :: fs node (new) $: p=cash :: recursive hash q=(unit ,[p=cash q=*]) :: file r=(map ,@ta ankh) :: folders == ::
This is a single node in the filesystem. This may be file or a directory or both. In earth filesystems, a node is a file xor a directory. On mars, we're inclusive, so a node is a file ior a directory.
p is a recursive hash that depends on the contents of the this file or
directory and on any children.
q is the contents of this file, if any.
p.q is a hash of the
q.q is the data itself.
r is the set of children of this node. In the case of a pure file,
this is empty. The keys are the names of the children and the values
are, recursively, the nodes themselves.
++cash, ankh hash
++ cash ,@uvH :: ankh hash
This is a 128-bit hash of an ankh. These are mostly stored within ankhs themselves, and they are used to check for changes in possibly-deep hierarchies.
++rung, filesystem per neighbor ship
++ rung $: rus=(map desk rede) :: neighbor desks == ::
This is the filesystem of a neighbor ship. The keys to this map are all the desks we know about on their ship.
++rede, desk state
++ rede :: universal project $: lim=@da :: complete to qyx=cult :: subscribers ref=(unit rind) :: outgoing requests dom=dome :: revision state == ::
This is our knowledge of the state of a desk, either foreign or domestic.
lim is the date of the last full update. We only respond to requests
for stuff before this time.
qyx is the list of subscribers to this desk. For domestic desks, this
p:dojo, all subscribers to the desk, while in foreign desks
this is all the subscribers from our ship to the foreign desk.
ref is the request manager for the desk. For domestic desks, this is
null since we handle requests ourselves.
dom is the actual data in the desk.
++rind, request manager
++ rind :: request manager $: nix=@ud :: request index bom=(map ,@ud ,[p=duct q=rave]) :: outstanding fod=(map duct ,@ud) :: current requests haw=(map mood (unit)) :: simple cache == ::
This is the request manager for a foreign desk.
nix is one more than the index of the most recent request. Thus, it is
the next available request number.
bom is the set of outstanding requests. The keys of this map are some
subset of the numbers between 0 and one less than
nix. The members of
the map are exactly those requests that have not yet been fully
fod is the same set as
bom, but from a different perspective. In
particular, the values of
fod are the same as the values of
p out of the values of
bom are the same as the keys of
Thus, we can map ducts to their associated request number and
and we can map numbers to their associated duct and
haw is a map from simple requests to their values. This acts as a
cache for requests that have already been made. Thus, the second request
for a particular
++mood:clay is nearly instantaneous.
++rang:clay, data store
++ rang $: hut=(map tako yaki) :: lat=(map lobe blob) :: == ::
This is a set of data keyed by hash. Thus, this is where the "real" data is stored, but it is only meaningful if we know the hash of what we're looking for.
hut is a map from hashes to commits. We often get the hashes from
hit:dome:clay, which keys them by logical id. Not every commit has an id.
lat is a map from hashes to the actual data. We often get the hashes
++yaki, a commit, which references this map to get the data.
There is no
++blob:clay in any
++yaki:clay. They are only accessible through
++tako:clay, commit reference
++ tako ,@ :: yaki ref
This is a hash of a
++yaki:clay, a commit. These are most notably used as
the keys in
hut:rang:clay, where they are associated with the actual
++yaki:clay, and as the values in
hit:dome:clay, where sequential ids are
associated with these.
++ yaki ,[p=(list tako) q=(map path lobe) r=tako t=@da] :: commit
This is a single commit.
p is a list of the hashes of the parents of this commit. In most
cases, this will be a single commit, but in a merge there may be more
parents. In theory, there may be an arbitrary number of parents, but in
practice merges have exactly two parents. This may change in the future.
For commit 1, there is no parent.
q is a map of the paths on a desk to the data at that location. If you
understand what a
++lobe:clay and a
++blob:clay is, then the type signature
here tells the whole story.
r is the hash associated with this commit.
t is the date at which this commit was made.
++lobe:clay, data reference
++ lobe ,@ :: blob ref
This is a hash of a
++blob:clay. These are most notably used in
where they are associated with the actual
++blob:clay, and as the values in
q:yaki:clay, where paths are associated with their data in a commit.
++ blob $% [%delta p=lobe q=lobe r=udon] :: delta on q [%direct p=lobe q=* r=umph] :: [%indirect p=lobe q=* r=udon s=lobe] :: == ::
This is a node of data. In every case,
p is the hash of the blob.
%delta is the case where we define the data by a delta on other data.
In practice, the other data is always the previous commit, but nothing
depends on this.
q is the hash of the parent blob, and
r is the
%direct is the case where we simply have the data directly.
q is the
data itself, and
r is any preprocessing instructions. These almost
always come from the creation of a file.
%indirect is both of the preceding cases at once.
q is the direct
r is the delta, and
s is the parent blob. It should always be
the case that applying
s gives the same data as
(with the prepreprocessor instructions in
p.r). This exists purely for
performance reasons. This is unused, at the moment, but in general these
should be created when there are a long line of changes so that we do
not have to traverse the delta chain back to the creation of the file.
++udon, abstract delta
++ udon :: abstract delta $: p=umph :: preprocessor $= q :: patch $% [%a p=* q=*] :: trivial replace [%b p=udal] :: atomic indel [%c p=(urge)] :: list indel [%d p=upas q=upas] :: tree edit == :: == ::
This is an abstract change to a file. This is a superset of what would normally be called diffs. Diffs usually refer to changes in lines of text while we have the ability to do more interesting deltas on arbitrary data structures.
p is any preprocessor instructions.
%a refers to the trival delta of a complete replace of old data with
%b refers to changes in an opaque atom on the block level. This has
very limited usefulness, and is not used at the moment.
%c refers to changes in a list of data. This is often lines of text,
which is your classic diff. We, however, will work on any list of data.
%d refers to changes in a tree of data. This is general enough to
describe changes to any hoon noun, but often more special-purpose delta
should be created for different content types. This is not used at the
moment, and may in fact be unimplemented.
++urge, list change
++ urge |*(a=_,* (list (unce a))) :: list change
This is a parametrized type for list changes. For example,
is a list change for lines of text.
++unce, change part of a list.
++ unce |* a=_,* :: change part $% [%& p=@ud] :: skip[copy] [%| p=(list a) q=(list a)] :: p -> q[chunk] == ::
This is a single change in a list of elements of type
a. For example,
(unce ,@t) is a single change in a lines of text.
%& means the next
p lines are unchanged.
%| means the lines
p have changed to
++umph, preprocessing information
++ umph :: change filter $| $? %a :: no filter %b :: jamfile %c :: LF text == :: $% [%d p=@ud] :: blocklist == ::
This space intentionally left undocumented. This stuff will change once we get a well-typed clay.
++upas, tree change
++ upas :: tree change (%d) $& [p=upas q=upas] :: cell $% [%0 p=axis] :: copy old [%1 p=*] :: insert new [%2 p=axis q=udon] :: mutate! == ::
This space intentionally left undocumented. This stuff is not known to work, and will likely change when we get a well-typed clay. Also, this is not a complicated type; it is not difficult to work out the meaning.
++nori:clay, repository action
++ nori :: repository action $% [& q=soba] :: delta [| p=@tas] :: label == ::
This describes a change that we are asking clay to make to the desk. There are two kinds of changes that may be made: we can modify files or we can apply a label to a commit.
| case, we will simply label the current commit with the given
label. In the
& case, we will apply the given changes.
++ soba ,[p=cart q=(list ,[p=path q=miso])] :: delta
This describes a set of changes to make to a desk. The
cart is simply
a pair of the old hash and the new hash of the desk. The list is a list
of changes keyed by the file they're changing. Thus, the paths are paths
to files to be changed while
miso is a description of the change
++miso:clay, ankh delta
++ miso :: ankh delta $% [%del p=*] :: delete [%ins p=*] :: insert [%mut p=udon] :: mutate == ::
There are three kinds of changes that may be made to a node in a desk.
We can insert a file, in which case
p is the contents of the new file.
We can delete a file, in which case
p is the contents of the old file.
Finally, we can mutate that file, in which case the
udon describes the
changes we are applying to the file.
++mizu:clay, merged state
++ mizu ,[p=@u q=(map ,@ud tako) r=rang] :: new state
This is the input to the
%merg kiss, which allows us to perform a
p is the number of the new head commit. The
q is a map
from numbers to commit hashes. This is all the new numbered commits that
are to be inserted. The keys to this should always be the numbers from
let.dom plus one to
p, inclusive. The
r is the maps of all the new
commits and data. Since these are merged into the current state, no old
commits or data need be here.
++ riff ,[p=desk q=(unit rave)] :: request/desist
This represents a request for data about a particular desk. If
rave, then this opens a subscription to the desk for that
q is null, then this tells clay to cancel the subscription
along this duct.
++ riot (unit rant) :: response/complete
A riot is a response to a subscription. If null, the subscription has
been completed, and no more responses will be sent. Otherwise, the
rant is the produced data.
++rant:clay, response data
++ rant :: namespace binding $: p=[p=care q=case r=@tas] :: clade release book q=path :: spur r=* :: data == ::
This is the data at a particular node in the filesystem.
the type of data that was requested (and is produced).
q.p gives the
specific version reported (since a range of versions may be requested in
r.p is the desk.
q is the path to the filesystem
r is the data itself (in the format specified by
++nako, subscription response data
++ nako $: gar=(map ,@ud tako) :: new ids let=@ud :: next id lar=(set yaki) :: new commits bar=(set blob) :: new content == ::
This is the data that is produced by a request for a range of revisions
of a desk. This allows us to easily keep track of a remote repository --
all the new information we need is contained in the
gar is a map of the revisions in the range to the hash of the commit
at that revision. These hashes can be used with
hut:rang:clay to find the
let is either the last revision number in the range or the most recent
revision number, whichever is smaller.
lar is the set of new commits, and
bar is the set of new content.
As with all vanes, there are exactly two ways to interact with clay.
%clay exports a namespace accessible through
.^, which is described
++care:clay. The primary way of interacting with clay, though,
is by sending kisses and receiving gifts.
++ gift :: out result <-$ $% [%ergo p=@p q=@tas r=@ud] :: version update [%note p=@tD q=tank] :: debug message [%writ p=riot] :: response == :: ++ kiss :: in request ->$ $% [%info p=@p q=@tas r=nori] :: internal edit [%ingo p=@p q=@tas r=nori] :: internal noun edit [%init p=@p] :: report install [%into p=@p q=@tas r=nori] :: external edit [%invo p=@p q=@tas r=nori] :: external noun edit [%merg p=@p q=@tas r=mizu] :: internal change [%wart p=sock q=@tas r=path s=*] :: network request [%warp p=sock q=riff] :: file request == ::
There are only a small number of possible kisses, so it behooves us to describe each in detail.
$% [%info p=@p q=@tas r=nori] :: internal edit [%into p=@p q=@tas r=nori] :: external edit
These two kisses are nearly identical. At a high level, they apply
changes to the filesystem. Whenever we add, remove, or edit a file, one
of these cards is sent. The
p is the ship whose filesystem we're
trying to change, the
q is the desk we're changing, and the
r is the
request change. For the format of the requested change, see the
When a file is changed in the unix filesystem, vere will send a
kiss. This tells clay that the duct over which the kiss was sent is the
duct that unix is listening on for changes. From within Arvo, though, we
should never send a
%into kiss. The
%info kiss is exactly identical
except it does not reset the duct.
[%ingo p=@p q=@tas r=nori] :: internal noun edit [%invo p=@p q=@tas r=nori] :: external noun edit
These kisses are currently identical to
%into, though this
will not always be the case. The intent is for these kisses to allow
typed changes to clay so that we may store typed data. This is currently
[%init p=@p] :: report install
Init is called when a ship is started on our pier. This simply creates a
room to go into our
raft. Essentially, this initializes the
filesystem for a ship.
[%merg p=@p q=@tas r=mizu] :: internal change
This is called to perform a merge. This is most visibly called by
:update to update the filesystem of the current ship to that of its
q are as in
%info, and the
r is the description
of the merge. See
XX [%wake ~] :: timer activate XX
XX This card is sent by unix at the time specified by
time is XX usually the closest time specified in a subscription request.
%wake is XX called, we update our subscribers if there have been
[%wart p=sock q=@tas r=path s=*] :: network request
This is a request that has come across the network for a particular
file. When another ship asks for a file from us, that request comes to
us in the form of a
%wart kiss. This is handled by trivially turning
it into a
[%warp p=sock q=riff] :: file request
This is a request for information about a particular desk. This is, in
its most general form, a subscription, though in many cases it is the
trivial case of a subscription -- a read. See
++riff:clay for the format of
Lifecycle of a Local Read
There are two real types of interaction with a filesystem: you can read, and you can write. We'll describe each process, detailing both the flow of control followed by the kernel and the algorithms involved. The simpler case is that of the read, so we'll begin with that.
When a vane or an application wishes to read a file from the filesystem,
it sends a
%warp kiss, as described above. Of course, you may request
a file on another ship and, being a global filesystem, clay will happily
produce it for you. That code pathway will be described in another
section; here, we will restrict ourselves to examining the case of a
read from a ship on our own pier.
The kiss can request either a single version of a file node or a range of versions of a desk. Here, we'll deal only with a request for a single version.
As in all vanes, a kiss enters clay via a call to
through the arm, we quickly see where
%warp is handled.
?: =(p.p.q.hic q.p.q.hic) =+ une=(un p.p.q.hic now ruf) =+ wex=(di:une p.q.q.hic) =+ ^= wao ?~ q.q.q.hic (ease:wex hen) (eave:wex hen u.q.q.q.hic) =+ ^= woo abet:wao [-.woo abet:(pish:une p.q.q.hic +.woo ran.wao)]
We're following the familar patern of producing a list of moves and an
updated state. In this case, the state is
We first check to see if the sending and receiving ships are the same. If they're not, then this is a request for data on another ship. We describe that process later. Here, we discuss only the case of a local read.
At a high level, the call to
++un sets up the core for the domestic
ship that contains the files we're looking for. The call to
up the core for the particular desk we're referring to.
After this, we perform the actual request. If there is no rave in the
riff, then that means we are cancelling a request, so we call
++ease:de. Otherwise, we start a subscription with
++abet:de to resolve our various types of output into actual
moves. We produce the moves we found above and the
++un core resolved
++pish:un (putting the modified desk in the room) and
(putting the modified room in the raft).
Much of this is fairly straightforward, so we'll only describe
++abet:de. Feel free to look up the code to the other
steps -- it should be easy to follow.
Although it's called last, it's usually worth examining
since it defines in what ways we can cause side effects. Let's do that,
and also a few of the lines at the beginning of
=| yel=(list ,[p=duct q=gift]) =| byn=(list ,[p=duct q=riot]) =| vag=(list ,[p=duct q=gift]) =| say=(list ,[p=duct q=path r=ship s=[p=@ud q=riff]]) =| tag=(list ,[p=duct q=path c=note]) |% ++ abet ^- [(list move) rede] :_ red ;: weld %+ turn (flop yel) |=([a=duct b=gift] [hun %give b]) :: %+ turn (flop byn) |=([a=duct b=riot] [a %give [%writ b]]) :: %+ turn (flop vag) |=([a=duct b=gift] [a %give b]) :: %+ turn (flop say) |= [a=duct b=path c=ship d=[p=@ud q=riff]] :- a [%pass b %a %want [who c] [%q %re p.q.d (scot %ud p.d) ~] q.d] :: %+ turn (flop tag) |=([a=duct b=path c=note] [a %pass b c]) ==
This is very simple code. We see there are exactly five different kinds of side effects we can generate.
yel we put gifts that we wish to be sent along the
to dill. See the documentation for
++room above. This is how we
display messages to the terminal.
byn we put riots that we wish returned to subscribers. Recall that
a riot is a response to a subscription. These are returned to our
subscribers in the form of a
vag we put gifts along with the ducts on which to send them. This
allows us to produce arbitrary gifts, but in practice this is only used
say we put messages we wish to pass to ames. These messages are
used to request information from clay on other piers. We must provide
not only the duct and the request (the riff), but also the return path,
the other ship to talk to, and the sequence number of the request.
tag we put arbitrary notes we wish to pass to other vanes. For now,
the only notes we pass here are
%rest to the timer vane.
Now that we know what kinds of side effects we may have, we can jump into the handling of requests.
++ ease :: release request |= hen=duct ^+ +> ?~ ref +> =+ rov=(~(got by qyx) hen) =. qyx (~(del by qyx) hen) (mabe rov (cury best hen)) =. qyx (~(del by qyx) hen) |- ^+ +>+.$ =+ nux=(~(get by fod.u.ref) hen) ?~ nux +>+.$ %= +>+.$ say [[hen [(scot %ud u.nux) ~] for [u.nux syd ~]] say] fod.u.ref (~(del by fod.u.ref) hen) bom.u.ref (~(del by bom.u.ref) u.nux) ==
This is called when we're cancelling a subscription. For domestic desks,
ref is null, so we're going to cancel any timer we might have created.
We first delete the duct from our map of requests, and then we call
++best to send a
%rest kiss to the timer vane if we
have started a timer. We'll describe
Although we said we're not going to talk about foreign requests yet, it's easy to see that for foreign desks, we cancel any outstanding requests for this duct and send a message over ames to the other ship telling them to cancel the subscription.
++ best |= [hen=duct tym=@da] %_(+> tag :_(tag [hen /tyme %t %rest tym]))
This simply pushes a
%rest note onto
tag, from where it will be
passed back to arvo to be handled. This cancels the timer at the given
duct (with the given time).
++ mabe :: maybe fire function |* [rov=rove fun=$+(@da _+>.^$)] ^+ +>.$ %- fall :_ +>.$ %- bind :_ fun ^- (unit ,@da) ?- -.rov %& ?. ?=(%da -.q.p.rov) ~ `p.q.p.rov %| =* mot p.rov %+ hunt ?. ?=(%da -.p.mot) ~ ?.((lth now p.p.mot) ~ [~ p.p.mot]) ?. ?=(%da -.q.mot) ~ ?.((lth now p.q.mot) [~ now] [~ p.q.mot]) ==
This decides whether the given request can only be satsified in the
future. In that case, we call the given function with the time in the
future when we expect to have an update to give to this request. This is
++best to cancel timers and with
++bait to start them.
For single requests, we have a time if the request is for a particular time (which is assumed to be in the future). For ranges of requests, we check both the start and end cases to see if they are time cases. If so, we choose the earlier time.
If any of those give us a time, then we call the given funciton with the smallest time.
The more interesting case is, of course, when we're not cancelling a subscription but starting one.
++ eave :: subscribe |= [hen=duct rav=rave] ^+ +> ?- -.rav & ?: &(=(p.p.rav %u) !=(p.q.p.rav now)) ~& [%clay-fail p.q.p.rav %now now] !! =+ ver=(aver p.rav) ?~ ver (duce hen rav) ?~ u.ver (blub hen) (blab hen p.rav u.u.ver)
There are two types of subscriptions -- either we're requesting a single file or we're requesting a range of versions of a desk. We'll dicuss the simpler case first.
First, we check that we're not requesting the
rang from any time other
than the present. Since we don't store that information for any other
time, we can't produce it in a referentially transparent manner for any
time other than the present.
Then, we try to read the requested
p.rav. If we can't access
the request data right now, we call
++duce to put the request in our
queue to be satisfied when the information becomes available.
This case occurs when we make a request for a case whose (1) date is after the current date, (2) number is after the current number, or (3) label is not yet used.
++ duce :: produce request |= [hen=duct rov=rove] ^+ +> =. qyx (~(put by qyx) hen rov) ?~ ref (mabe rov (cury bait hen)) |- ^+ +>+.$ :: XX why? =+ rav=(reve rov) =+ ^= vaw ^- rave ?. ?=([%& %v *] rav) rav [%| [%ud let.dom] `case`q.p.rav r.p.rav] =+ inx=nix.u.ref %= +>+.$ say [[hen [(scot %ud inx) ~] for [inx syd ~ vaw]] say] nix.u.ref +(nix.u.ref) bom.u.ref (~(put by bom.u.ref) inx [hen vaw]) fod.u.ref (~(put by fod.u.ref) hen inx) ==
The code for
++duce is nearly the exact inverse of
++ease, which in
the case of a domestic desk is very simple -- we simply put the duct and
qyx and possibly start a timer with
ref is null for domestic desks and that
++mabe fires the
given function with the time we need to be woken up at, if we need to be
woken up at a particular time.
++ bait |= [hen=duct tym=@da] %_(+> tag :_(tag [hen /tyme %t %wait tym]))
This sets an alarm by sending a
%wait card with the given time to the
[~ ~], then we cancel the
subscription. This occurs when we make (1) a
%x request for a file
that does not exist, (2) a
%w request with a case that is not a
number, or (3) a
%w request with a nonempty path. The
exactly what you would expect it to be.
++ blub :: ship stop |= hen=duct %_(+> byn [[hen ~] byn])
We notify the duct that we're cancelling their subscription since it isn't satisfiable.
Otherwise, we have received the desired information, so we send it on to
the subscriber with
++ blab :: ship result |= [hen=duct mun=mood dat=*] ^+ +> +>(byn [[hen ~ [p.mun q.mun syd] r.mun dat] byn])
The most interesting arm called in
++eave is, of course,
where we actually try to read the data.
++ aver :: read |= mun=mood ^- (unit (unit ,*)) ?: &(=(p.mun %u) !=(p.q.mun now)) :: prevent bad things ~& [%clay-fail p.q.mun %now now] !! =+ ezy=?~(ref ~ (~(get by haw.u.ref) mun)) ?^ ezy ezy =+ nao=(~(case-to-aeon ze lim dom ran) q.mun) ?~(nao ~ [~ (~(read-at-aeon ze lim dom ran) u.nao mun)])
We check immediately that we're not requesting the
rang for any time
other than the present.
If this is a foreign desk, then we check our cache for the specific request. If either this is a domestic desk or we don't have the request in our cache, then we have to actually go read the data from our dome.
We need to do two things. First, we try to find the number of the commit specified by the given case, and then we try to get the data there.
Here, we jump into
arvo/zuse.hoon, which is where much of the
algorithmic code is stored, as opposed to the clay interface, which is
arvo/clay.hoon. We examine
++ case-to-aeon :: case-to-aeon:ze |= lok=case :: act count through ^- (unit aeon) ?- -.lok %da ?: (gth p.lok lim) ~ |- ^- (unit aeon) ?: =(0 let) [~ 0] :: avoid underflow ?: %+ gte p.lok =< t %- tako-to-yaki %- aeon-to-tako let [~ let] $(let (dec let)) :: %tas (~(get by lab) p.lok) %ud ?:((gth p.lok let) ~ [~ p.lok]) ==
We handle each type of
case:clay differently. The latter two types are
If we're requesting a revision by label, then we simply look up the
requested label in
lab from the given dome. If it exists, that is our
aeon; else we produce null, indicating the requested revision does not
If we're requesting a revision by number, we check if we've yet reached that number. If so, we produce the number; else we produce null.
If we're requesting a revision by date, we check first if the date is in
the future, returning null if so. Else we start from the most recent
revision and scan backwards until we find the first revision committed
before that date, and we produce that. If we requested a date before any
revisions were committed, we produce
The definitions of
++tako-to-yaki are trivial.
++ aeon-to-tako ~(got by hit) ++ tako-to-yaki ~(got by hut) :: grab yaki
We simply look up the aeon or tako in their respective maps (
Assuming we got a valid version number,
++read-at-aeon:ze, which reads the requested data at the given
++ read-at-aeon :: read-at-aeon:ze |= [oan=aeon mun=mood] :: seek and read ^- (unit) ?: &(?=(%w p.mun) !?=(%ud -.q.mun)) :: NB only for speed ?^(r.mun ~ [~ oan]) (read:(rewind oan) mun)
If we're requesting the revision number with a case other than by
number, then we go ahead and just produce the number we were given.
Otherwise, we call
++rewind to rewind our state to the given revision,
and then we call
++read to get the requested information.
++ rewind :: rewind:ze |= oan=aeon :: rewind to aeon ^+ +> ?: =(let oan) +> ?: (gth oan let) !! :: don't have version +>(ank (checkout-ankh q:(tako-to-yaki (aeon-to-tako oan))), let oan)
If we're already at the requested version, we do nothing. If we're requesting a version later than our head, we are unable to comply.
Otherwise, we get the hash of the commit at the number, and from that we
get the commit itself (the yaki), which has the map of path to lobe that
represents a version of the filesystem. We call
checkout the commit, and we replace
ank in our context with the
++ checkout-ankh :: checkout-ankh:ze |= hat=(map path lobe) :: checkout commit ^- ankh %- cosh %+ roll (~(tap by hat) ~) |= [[pat=path bar=lobe] ank=ankh] ^- ankh %- cosh ?~ pat =+ zar=(lobe-to-noun bar) ank(q [~ (sham zar) zar]) =+ nak=(~(get by r.ank) i.pat) %= ank r %+ ~(put by r.ank) i.pat $(pat t.pat, ank (fall nak _ankh)) ==
Twice we call
++cosh, which hashes a commit, updating
p in an
ankh. Let's jump into that algorithm before we describe
++ cosh :: locally rehash |= ank=ankh :: NB v/unix.c ank(p rehash:(zu ank))
We simply replace
p in the hash with the
cash we get from a call to
++ zu !: :: filesystem |= ank=ankh :: filesystem state =| myz=(list ,[p=path q=miso]) :: changes in reverse =| ram=path :: reverse path into |% ++ rehash :: local rehash ^- cash %+ mix ?~(q.ank 0 p.u.q.ank) =+ axe=1 |- ^- cash ?~ r.ank _@ ;: mix (shaf %dash (mix axe (shaf %dush (mix p.n.r.ank p.q.n.r.ank)))) $(r.ank l.r.ank, axe (peg axe 2)) $(r.ank r.r.ank, axe (peg axe 3)) ==
++zu is a core we set up with a particular filesystem node to traverse
a checkout of the filesystem and access the actual data inside it. One
of the things we can do with it is to create a recursive hash of the
++rehash, if this node is a file, then we xor the remainder of the
hash with the hash of the contents of the file. The remainder of the
0 if we have no children, else we descend into our children.
Basically, we do a half SHA-256 of the xor of the axis of this child and
the half SHA-256 of the xor of the name of the child and the hash of the
child. This is done for each child and all the results are xored
Now we return to our discussion of
We fold over every path in this version of the filesystem and create a
great ankh out of them. First, we call
++lobe-to-noun to get the raw
data referred to be each lobe.
++ lobe-to-noun :: grab blob |= p=lobe :: ^- * %- blob-to-noun (lobe-to-blob p)
This converts a lobe into the raw data it refers to by first getting the
++lobe-to-blob and converting that into data with
++ lobe-to-blob ~(got by lat) :: grab blob
This just grabs the blob that the lobe refers to.
++ blob-to-noun :: grab blob |= p=blob ?- -.p %delta (lump r.p (lobe-to-noun q.p)) %direct q.p %indirect q.p ==
If we have either a direct or an indirect blob, then the data is stored
right in the blob. Otherwise, we have to reconstruct it from the diffs.
We do this by calling
++lump on the diff in the blob with the data
obtained by recursively calling the parent of this blob.
++ lump :: apply patch |= [don=udon src=*] ^- * ?+ p.don ~|(%unsupported !!) %a ?+ -.q.don ~|(%unsupported !!) %a q.q.don %c (lurk ((hard (list)) src) p.q.don) %d (lure src p.q.don) == :: %c =+ dst=(lore ((hard ,@) src)) %- roly ?+ -.q.don ~|(%unsupported !!) %a ((hard (list ,@t)) q.q.don) %c (lurk dst p.q.don) == ==
This is defined in
arvo/hoon.hoon for historical reasons which are
likely no longer applicable. Since the
++umph structure will likely
change we convert clay to be a typed filesystem, we'll only give a
high-level description of this process. If we have a
%a udon, then
we're performing a trivial replace, so we produce simply
we have a
%c udon, then we're performing a list merge (as in, for
example, lines of text). The merge is performed by
++ lurk :: apply list patch |* [hel=(list) rug=(urge)] ^+ hel =+ war=`_hel`~ |- ^+ hel ?~ rug (flop war) ?- -.i.rug & %= $ rug t.rug hel (slag p.i.rug hel) war (weld (flop (scag p.i.rug hel)) war) == :: | %= $ rug t.rug hel =+ gur=(flop p.i.rug) |- ^+ hel ?~ gur hel ?>(&(?=(^ hel) =(i.gur i.hel)) $(hel t.hel, gur t.gur)) war (weld q.i.rug war) == ==
We accumulate our final result in
war. If there's nothing more in our
list of merge instructions (unces), we just reverse
war and produce
it. Otherwise, we process another unce. If the unce is of type
p.i.rug lines of no changes, so we just copy them over from
war. If the unice is of type
|, then we verify that the
source lines (in
hel) are what we expect them to be (
crashing on failure. If they're good, then we append the new lines in
And that's really it. List merges are pretty easy. Anyway, if you
recall, we were discussing
++ checkout-ankh :: checkout-ankh:ze |= hat=(map path lobe) :: checkout commit ^- ankh %- cosh %+ roll (~(tap by hat) ~) |= [[pat=path bar=lobe] ank=ankh] ^- ankh %- cosh ?~ pat =+ zar=(lobe-to-noun bar) ank(q [~ (sham zar) zar]) =+ nak=(~(get by r.ank) i.pat) %= ank r %+ ~(put by r.ank) i.pat $(pat t.pat, ank (fall nak _ankh)) ==
If the path is null, then we calculate
zar, the raw data at the path
pat in this version. We produce the given ankh with the correct data.
Otherwise, we try to get the child we're looking at from our parent ankh. If it's already been created, this succeeds; otherwise, we simply create a default blank ankh. We place ourselves in our parent after recursively computing our children.
This algorithm really isn't that complicated, but it may not be immediately obvious why it works. An example should clear everything up.
hat is a map of the following information.
/greeting --> "customary upon meeting" /greeting/english --> "hello" /greeting/spanish --> "hola" /greeting/russian/short --> "привет" /greeting/russian/long --> "Здравствуйте" /farewell/russian --> "до свидания"
Furthermore, let's say that we process them in this order:
/greeting/english /greeting/russian/short /greeting/russian/long /greeting /greeting/spanish /farewell/russian
Then, the first path we process is
/greeting/english . Since our path
is not null, we try to get
nak, but because our ankh is blank at this
point it doesn't find anything. Thus, update our blank top-level ankh
with a child
%greeting. and recurse with the blank
nak to create the
ankh of the new child.
In the recursion, we our path is
/english and our ankh is again blank.
We try to get the
english child of our ankh, but this of course fails.
Thus, we update our blank
/greeting ankh with a child
produced by recursing.
Now our path is null, so we call
++lobe-to-noun to get the actual
data, and we place it in the brand-new ankh.
Next, we process
/greeting/russian/short. Since our path is not null,
we try to get the child named
%greeting, which does exist since we
created it earlier. We put modify this child by recursing on it. Our
path is now
/russian/short, so we look for a
%russian child in our
/greeting ankh. This doesn't exist, so we add it by recursing. Our
path is now
/short, so we look for a
%short child in our
/greeting/russian ankh. This doesn't exist, so we add it by recursing.
Now our path is null, so we set the contents of this node to
and we're done processing this path.
Next, we process
/greeting/russian/long. This proceeds similarly to
the previous except that now the ankh for
exists, so we simply reuse it rather than creating a new one. Of course,
we still must create a new
Next, we process
/greeting. This ankh already exists, so after we've
recursed once, our path is null, and our ankh is not blank -- it already
has two children (and two grandchildren). We don't touch those, though,
since a node may be both a file and a directory. We just add the
contents of the file -- "customary upon meeting" -- to the existing
Next, we process
/greeting/spanish. Of course, the
already exists, but it doesn't have a
%spanish child, so we create
that, taking care not to disturb the contents of the
We put "hola" into the ankh and call it good.
Finally, we process
/farewell/russian. Here, the
doesn't exist, so we create it. Clearly the newly-created ankh doesn't
have any children, so we have to add a
%russian child, and in this
child we put our last content -- "до свидания".
We hope it's fairly obvious that the order we process the paths doesn't affect the final ankh tree. The tree will be constructed in a very different order depending on what order the paths come in, but the resulting tree is independent of order.
At any rate, we were talking about something important, weren't we? If
you recall, that concludes our discussion of
++rewind, which was
++read-at-aeon. In summary,
++rewind returns a context
in which our current state is (very nearly) as it was when the specified
version of the desk was the head. This allows
++read-at-aeon to call
++read to read the requested information.
++ read :: read:ze |= mun=mood :: read at point ^- (unit) ?: ?=(%v p.mun) [~ `dome`+<+<.read] ?: &(?=(%w p.mun) !?=(%ud -.q.mun)) ?^(r.mun ~ [~ let]) ?: ?=(%w p.mun) =+ ^= yak %- tako-to-yaki %- aeon-to-tako let ?^(r.mun ~ [~ [t.yak (forge-nori yak)]]) ::?> ?=(^ hit) ?^(r.mun ~ [~ i.hit]) :: what do?? need [@da nori] (query(ank ank:(descend-path:(zu ank) r.mun)) p.mun)
If we're requesting the dome, then we just return that immediately.
If we're requesting the revision number of the desk and we're not
requesting it by number, then we just return the current number of this
desk. Note of course that this was really already handled in
If we're requesting a
%w with a specific revision number, then we do
something or other with the commit there. It's kind of weird, and it
doesn't seem to work, so we'll ignore this case.
Otherwise, we descend into the ankh tree to the given path with
++descend-path:zu, and then we handle specific request in
++ descend-path :: descend recursively |= way=path ^+ +> ?~(way +> $(way t.way, +> (descend i.way)))
This is simple recursion down into the ankh tree.
one level, so this will eventually get us down to the path we want.
++ descend :: descend |= lol=@ta ^+ +> =+ you=(~(get by r.ank) lol) +>.$(ram [lol ram], ank ?~(you [*cash ~ ~] u.you))
ram is the path that we're at, so to descend one level we push the
name of this level onto that path. We update the ankh with the correct
one at that path if it exists; else we create a blank one.
Once we've decscended to the correct level, we need to actually deal with the request.
++ query :: query:ze |= ren=?(%u %v %x %y %z) :: endpoint query ^- (unit ,*) ?- ren %u [~ `rang`+<+>.query] %v [~ `dome`+<+<.query] %x ?~(q.ank ~ [~ q.u.q.ank]) %y [~ as-arch] %z [~ ank] ==
Now that everything's set up, it's really easy. If they're requesting
the rang, dome, or ankh, we give it to them. If the contents of a file,
we give it to them if it is in fact a file. If the
arch, then we
calculate it with
++ as-arch :: as-arch:ze ^- arch :: arch report :+ p.ank ?~(q.ank ~ [~ p.u.q.ank]) |- ^- (map ,@ta ,~) ?~ r.ank ~ [[p.n.r.ank ~] $(r.ank l.r.ank) $(r.ank r.r.ank)]
This very simply strips out all the "real" data and returns just our own hash, the hash of the file contents (if we're a file), and a map of the names of our immediate children.
Lifecycle of a Local Subscription
A subscription to a range of revisions of a desk initially follows the
same path that a single read does. In
++aver, we checked the head of
the given rave. If the head was
&, then it was a single request, so we
handled it above. If
|, then we handle it with the following code.
=+ nab=(~(case-to-aeon ze lim dom ran) p.p.rav) ?~ nab ?> =(~ (~(case-to-aeon ze lim dom ran) q.p.rav)) (duce hen (rive rav)) =+ huy=(~(case-to-aeon ze lim dom ran) q.p.rav) ?: &(?=(^ huy) |((lth u.huy u.nab) &(=(0 u.huy) =(0 u.nab)))) (blub hen) =+ top=?~(huy let.dom u.huy) =+ sar=(~(lobes-at-path ze lim dom ran) u.nab r.p.rav) =+ ear=(~(lobes-at-path ze lim dom ran) top r.p.rav) =. +>.$ ?: =(sar ear) +>.$ =+ fud=(~(make-nako ze lim dom ran) u.nab top) (bleb hen u.nab fud) ?^ huy (blub hen) =+ ^= ptr ^- case [%ud +(let.dom)] (duce hen `rove`[%| ptr q.p.rav r.p.rav ear]) ==
++case-to-aeon:ze produces the revision number that a case
corresponds to, if it corresponds to any. If it doesn't yet correspond
to a revision, then it produces null.
Thus, we first check to see if we've even gotten to the beginning of the
range of revisions requested. If not, then we assert that we haven't yet
gotten to the end of the range either, because that would be really
strange. If not, then we immediately call
++duce, which, if you
recall, for a local request, simply puts this duct and rove into our
qyx, so that we know who to respond to when the revision does
If we've already gotten to the first revision, then we can produce some
content immediately. If we've also gotten to the final revision, and
that revision is earlier than the start revision, then it's a bad
request and we call
++blub, which tells the subscriber that his
subscription will not be satisfied.
Otherwise, we find the data at the given path at the beginning of the
subscription and at the last available revision in the subscription. If
they're the same, then we don't send a notification. Otherwise, we call
++gack, which creates the
++nako we need to produce. We call
++bleb to actually produce the information.
If we already have the last requested revision, then we also tell the
++blub that the subscription will receive no further
If there will be more revisions in the subscription, then we call
++duce, adding the duct to our subscribers. We modify the rove to
start at the next revision since we've already handled all the revisions
up to the present.
We glossed over the calls to
++bleb, so we'll get back to those right now.
++bleb is simple, so
we'll start with that.
++ bleb :: ship sequence |= [hen=duct ins=@ud hip=nako] ^+ +> (blab hen [%w [%ud ins] ~] hip)
We're given a duct, the beginning revision number, and the nako that
contains the updates since that revision. We use
++blab to produce
this result to the subscriber. The case is
%w with a revision number
of the beginning of the subscription, and the data is the nako itself.
++lobes-at-path:ze to get the data at the particular path.
++ lobes-at-path :: lobes-at-path:ze |= [oan=aeon pax=path] :: data at path ^- (map path lobe) ?: =(0 oan) ~ %- mo %+ skim %. ~ %~ tap by =< q %- tako-to-yaki %- aeon-to-tako oan |= [p=path q=lobe] ?| ?=(~ pax) ?& !?=(~ p) =(-.pax -.p) $(p +.p, pax +.pax) == ==
At revision zero, the theoretical common revision between all repositories, there is no data, so we produce null.
We get the list of paths (paired with their lobe) in the revision
referred to by the given number and we keep only those paths which begin
pax. Converting to a map, we now have a map from the subpaths at
the given path to the hash of their data. This is simple and efficient
to calculate and compare to later revisions. This allows us to easily
tell if a node or its children have changed.
Finally, we will describe
++ make-nako :: gack a through b |= [a=aeon b=aeon] ^- [(map aeon tako) aeon (set yaki) (set blob)] :_ :- b =- [(takos-to-yakis -<) (lobes-to-blobs ->)] %+ reachable-between-takos (~(get by hit) a) :: if a not found, a=0 (aeon-to-tako b) ^- (map aeon tako) %- mo %+ skim (~(tap by hit) ~) |= [p=aeon *] &((gth p a) (lte p b))
We need to produce four things -- the numbers of the new commits, the number of the latest commit, the new commits themselves, and the new data itself.
The first is fairly easy to produce. We simply go over our map of
numbered commits and produce all those numbered greater than
a and not
The second is even easier to produce --
b is clearly our most recent
The third and fourth are slightly more interesting, though not too
terribly difficult. First, we call
++ reachable-between-takos |= [a=(unit tako) b=tako] :: pack a through b ^- [(set tako) (set lobe)] =+ ^= sar ?~ a ~ (reachable-takos r:(tako-to-yaki u.a)) =+ yak=`yaki`(tako-to-yaki b) %+ new-lobes-takos (new-lobes ~ sar) :: get lobes |- ^- (set tako) :: walk onto sar ?: (~(has in sar) r.yak) ~ =+ ber=`(set tako)`(~(put in `(set tako)`~) `tako`r.yak) %- ~(uni in ber) ^- (set tako) %+ roll p.yak |= [yek=tako bar=(set tako)] ^- (set tako) ?: (~(has in bar) yek) :: save some time bar %- ~(uni in bar) ^$(yak (tako-to-yaki yek))
We take a possible starting commit and a definite ending commit, and we produce the set of commits and the set of data between them.
sar be the set of commits reachable from
a is null,
then obviously no commits are reachable. Otherwise, we call
++reachable-takos to calculate this.
++ reachable-takos :: reachable |= p=tako :: XX slow ^- (set tako) =+ y=(tako-to-yaki p) =+ t=(~(put in _(set tako)) p) %+ roll p.y |= [q=tako s=_t] ?: (~(has in s) q) :: already done s :: hence skip (~(uni in s) ^$(p q)) :: otherwise traverse
We very simply produce the set of the given tako plus its parents, recursively.
++reachable-between-takos, we let
yak be the yaki of
the ending commit. With this, we create a set that is the union of
and all takos reachable from
++new-lobes to get all the lobes referenced by any
tako referenced by
a. The result is passed into
do the same, but not recomputing those in already calculated last
sentence. This produces the sets of takos and lobes we need.
++ new-lobes :: object hash set |= [b=(set lobe) a=(set tako)] :: that aren't in b ^- (set lobe) %+ roll (~(tap in a) ~) |= [tak=tako bar=(set lobe)] ^- (set lobe) =+ yak=(tako-to-yaki tak) %+ roll (~(tap by q.yak) ~) |= [[path lob=lobe] far=_bar] ^- (set lobe) ?~ (~(has in b) lob) :: don't need far =+ gar=(lobe-to-blob lob) ?- -.gar %direct (~(put in far) lob) %delta (~(put in $(lob q.gar)) lob) %indirect (~(put in $(lob s.gar)) lob) ==
Here, we're creating a set of lobes referenced in a commit in
start out with
b as the initial set of lobes, so we don't need to
recompute any of the lobes referenced in there.
The algorithm is pretty simple, so we won't bore you with the details.
We simply traverse every commit in
a, looking at every blob referenced
there, and, if it's not already in
b, we add it to
b. In the case of
a direct blob, we're done. For a delta or an indirect blob, we
recursively add every blob referenced within the blob.
++ new-lobes-takos :: garg & repack |= [b=(set lobe) a=(set tako)] ^- [(set tako) (set lobe)] [a (new-lobes b a)]
Here, we just update the set of lobes we're given with the commits we're given and produce both sets.
This concludes our discussion of a local subscription.
Lifecycle of a Foreign Read or Subscription
Foreign reads and subscriptions are handled in much the same way as
local ones. The interface is the same -- a vane or app sends a
kiss with the request. The difference is simply that the
to the foreign ship.
Thus, we start in the same place -- in
However, since the two side of the
sock are different, we follow a
=+ wex=(do now p.q.hic p.q.q.hic ruf) =+ ^= woo ?~ q.q.q.hic abet:(ease:wex hen) abet:(eave:wex hen u.q.q.q.hic) [-.woo (posh q.p.q.hic p.q.q.hic +.woo ruf)]
If we compare this to how the local case was handled, we see that it's
not all that different. We use
++do rather than
set up the core for the foreign ship. This gives us a
++de core, so we
either cancel or begin the request by calling
exactly as in the local case. In either case, we call
resolve our various types of output into actual moves, as described in
the local case. Finally, we call
++posh to update our raft, putting
the modified rung into the raft.
We'll first trace through
++ do |= [now=@da [who=ship him=ship] syd=@tas ruf=raft] =+ ^= rug ^- rung =+ rug=(~(get by hoy.ruf) him) ?^(rug u.rug *rung) =+ ^= red ^- rede =+ yit=(~(get by rus.rug) syd) ?^(yit u.yit `rede`[~2000.1.1 ~ [~ *rind] *dome]) ((de now ~ ~) [who him] syd red ran.ruf)
If we already have a rung for this foreign ship, then we use that.
Otherwise, we create a new blank one. If we already have a rede in this
rung, then we use that, otherwise we create a blank one. An important
point to note here is that we let
ref in the rede be
Recall, for domestic desks
ref is null. We use this to distinguish
between foreign and domestic desks in
With this information, we create a
++de core as usual.
Although we've already covered
++eave, we'll go through
them quickly again, highlighting the case of foreign request.
++ ease :: release request |= hen=duct ^+ +> ?~ ref +> =+ rov=(~(got by qyx) hen) =. qyx (~(del by qyx) hen) (mabe rov (cury best hen)) =. qyx (~(del by qyx) hen) |- ^+ +>+.$ =+ nux=(~(get by fod.u.ref) hen) ?~ nux +>+.$ %= +>+.$ say [[hen [(scot %ud u.nux) ~] for [u.nux syd ~]] say] fod.u.ref (~(del by fod.u.ref) hen) bom.u.ref (~(del by bom.u.ref) u.nux) ==
Here, we still remove the duct from our cult (we maintain a cult even
for foreign desks), but we also need to tell the foreign desk to cancel
our subscription. We do this by sending a request (by appending to
say, which gets resolved in
++abet:de to a
%want kiss to ames) to
the foreign ship to cancel the subscription. Since we don't anymore
expect a response on this duct, we remove it from
are the maps between ducts, raves, and request sequence numbers.
Basically, we remove every trace of the subscription from our request
In the case of
++eave, where we're creating a new request, everything
is exactly identical to the case of the local request except
We said that
++duce simply puts the request into our cult. This is
true for a domestic request, but distinctly untrue for foreign requests.
++ duce :: produce request |= [hen=duct rov=rove] ^+ +> =. qyx (~(put by qyx) hen rov) ?~ ref +>.$ |- ^+ +>+.$ :: XX why? =+ rav=(reve rov) =+ ^= vaw ^- rave ?. ?=([%& %v *] rav) rav [%| [%ud let.dom] `case`q.p.rav r.p.rav] =+ inx=nix.u.ref %= +>+.$ say [[hen [(scot %ud inx) ~] for [inx syd ~ vaw]] say] nix.u.ref +(nix.u.ref) bom.u.ref (~(put by bom.u.ref) inx [hen vaw]) fod.u.ref (~(put by fod.u.ref) hen inx) ==
If we have a request manager (i.e. this is a foreign desk), then we do
the approximate inverse of
++ease. We create a rave out of the given
request and send it off to the foreign desk by putting it in
that the rave is created to request the information starting at the next
revision number. Since this is a new request, we put it into
bom to associate the request with its duct and its sequence number.
Since we're using another sequence number, we must increment
which represents the next available sequence number.
And that's really it for this side of the request. Requesting foreign
information isn't that hard. Let's see what it looks like on the other
side. When we get a request from another ship for information on our
ship, that comes to us in the form of a
%wart from ames.
We handle a
++call, right next to where we handle the
%wart ?> ?=(%re q.q.hic) =+ ryf=((hard riff) s.q.hic) :_ ..^$ :~ :- hen :^ %pass [(scot %p p.p.q.hic) (scot %p q.p.q.hic) r.q.hic] %c [%warp [p.p.q.hic p.p.q.hic] ryf] ==
Every request we receive should be of type
riff, so we coerce it into
that type. We just convert this into a new
%warp kiss that we pass to
ourself. This gets handled like normal, as a local request. When the
request produces a value, it does so like normal as a
%writ, which is
++take along the path we just sent it on.
%writ ?> ?=([@ @ *] tea) =+ our=(need (slaw %p i.tea)) =+ him=(need (slaw %p i.t.tea)) :_ ..^$ :~ :- hen [%pass ~ %a [%want [our him] [%r %re %c t.t.tea] p.+.q.hin]] ==
Since we encoded the ship we need to respond to in the path, we can just
%want back to ames, so that we tell the requesting ship about
the new data.
This comes back to the original ship as a
%waft from ames, which comes
++take, right next to where we handled
%waft ?> ?=([@ @ ~] tea) =+ syd=(need (slaw %tas i.tea)) =+ inx=(need (slaw %ud i.t.tea)) =+ ^= zat =< wake (knit:(do now p.+.q.hin syd ruf) [inx ((hard riot) q.+.q.hin)]) =^ mos ruf =+ zot=abet.zat [-.zot (posh q.p.+.q.hin syd +.zot ruf)] [mos ..^$(ran.ruf ran.zat)] :: merge in new obj
This gets the desk and sequence number from the path the request was
sent over. This determines exactly which request is being responded to.
++knit:de to apply the changes to our local desk, and we call
++wake to update our subscribers. Then we call
++posh as normal (like in
++wake, in that order.
++ knit :: external change |= [inx=@ud rot=riot] ^+ +> ?> ?=(^ ref) |- ^+ +>+.$ =+ ruv=(~(get by bom.u.ref) inx) ?~ ruv +>+.$ => ?. |(?=(~ rot) ?=(& -.q.u.ruv)) . %_ . bom.u.ref (~(del by bom.u.ref) inx) fod.u.ref (~(del by fod.u.ref) p.u.ruv) == ?~ rot =+ rav=`rave`q.u.ruv %= +>+.$ lim ?.(&(?=(| -.rav) ?=(%da -.q.p.rav)) lim `@da`p.q.p.rav) :: haw.u.ref ?. ?=(& -.rav) haw.u.ref (~(put by haw.u.ref) p.rav ~) == ?< ?=(%v p.p.u.rot) =. haw.u.ref (~(put by haw.u.ref) [p.p.u.rot q.p.u.rot q.u.rot] ~ r.u.rot) ?. ?=(%w p.p.u.rot) +>+.$ |- ^+ +>+.^$ =+ nez=[%w [%ud let.dom] ~] =+ nex=(~(get by haw.u.ref) nez) ?~ nex +>+.^$ ?~ u.nex +>+.^$ :: should never happen =. +>+.^$ =+ roo=(edis ((hard nako) u.u.nex)) ?>(?=(^ ref.roo) roo) %= $ haw.u.ref (~(del by haw.u.ref) nez) ==
This is kind of a long gate, but don't worry, it's not bad at all.
First, we assert that we're not a domestic desk. That wouldn't make any sense at all.
Since we have the sequence number of the request, we can get the duct
and rave from
bom in our request manager. If we didn't actually
request this data (or the request was canceled before we got it), then
we do nothing.
Else, we remove the request from
fod unless this was a
subscription request and we didn't receive a null riot (which would
indicate the last message on the subscription).
Now, if we received a null riot, then if this was a subscription request
by date, then we update
lim in our request manager (representing the
latest time at which we have complete information for this desk) to the
date that was requested. If this was a single read request, then we put
the result in our simple cache
haw to make future requests faster.
Then we're done.
If we received actual data, then we put it into our cache
%w request, we're done.
If it is a
%w request, then we try to get the
%w at our current head
from the cache. In general, that should be the thing we just put in a
moment ago, but that is not guaranteed. The most common case where this
is different is when we receive desk updates out of order. At any rate,
since we now have new information, we need to apply it to our local copy
of the desk. We do so in
++edis, and then we remove the stuff we just
applied from the cache, since it's not really a true "single read", like
what should really be in the cache.
++ edis :: apply subscription |= nak=nako ^+ +> %= +> hit.dom (~(uni by hit.dom) gar.nak) let.dom let.nak lat.ran %+ roll (~(tap in bar.nak) ~) =< .(yeb lat.ran) |= [sar=blob yeb=(map lobe blob)] =+ zax=(blob-to-lobe sar) %+ ~(put by yeb) zax sar hut.ran %+ roll (~(tap in lar.nak) ~) =< .(yeb hut.ran) |= [sar=yaki yeb=(map tako yaki)] %+ ~(put by yeb) r.sar sar ==
This shows, of course, exactly why nako is defined the way it is. To
become completely up to date with the foreign desk, we need to merge
hit with the foreign one so that we have all the revision numbers. We
let so that we know which revision is the head.
We merge the new blobs in
lat, keying them by their hash, which we get
from a call to
++blob-to-lobe. Recall that the hash is stored in the
actual blob itself. We do the same thing to the new yakis, putting them
hut, keyed by their hash.
Now, our local dome should be exactly the same as the foreign one.
This concludes our discussion of
++knit. Now the changes have been
applied to our local copy of the desk, and we just need to update our
subscribers. We do so in
++ wake :: update subscribers ^+ . =+ xiq=(~(tap by qyx) ~) =| xaq=(list ,[p=duct q=rove]) |- ^+ ..wake ?~ xiq ..wake(qyx (~(gas by *cult) xaq)) ?- -.q.i.xiq & =+ cas=?~(ref ~ (~(get by haw.u.ref) `mood`p.q.i.xiq)) ?^ cas %= $ xiq t.xiq ..wake ?~ u.cas (blub p.i.xiq) (blab p.i.xiq p.q.i.xiq u.u.cas) == =+ nao=(~(case-to-aeon ze lim dom ran) q.p.q.i.xiq) ?~ nao $(xiq t.xiq, xaq [i.xiq xaq]) $(xiq t.xiq, ..wake (balk p.i.xiq u.nao p.q.i.xiq)) :: | =+ mot=`moot`p.q.i.xiq =+ nab=(~(case-to-aeon ze lim dom ran) p.mot) ?~ nab $(xiq t.xiq, xaq [i.xiq xaq]) =+ huy=(~(case-to-aeon ze lim dom ran) q.mot) ?~ huy =+ ptr=[%ud +(let.dom)] %= $ xiq t.xiq xaq [[p.i.xiq [%| ptr q.mot r.mot s.mot]] xaq] ..wake =+ ^= ear (~(lobes-at-path ze lim dom ran) let.dom r.p.q.i.xiq) ?: =(s.p.q.i.xiq ear) ..wake =+ fud=(~(make-nako ze lim dom ran) u.nab let.dom) (bleb p.i.xiq let.dom fud) == %= $ xiq t.xiq ..wake =- (blub:- p.i.xiq) =+ ^= ear (~(lobes-at-path ze lim dom ran) u.huy r.p.q.i.xiq) ?: =(s.p.q.i.xiq ear) (blub p.i.xiq) =+ fud=(~(make-nako ze lim dom ran) u.nab u.huy) (bleb p.i.xiq +(u.nab) fud) == == --
This is even longer than
++knit, but it's pretty similar to
We loop through each of our subscribers
xiq, processing each in turn.
When we're done, we just put the remaining subscribers back in our
If the subscriber is a single read, then, if this is a foreign desk
++wake is called from other arms, and not only on foreign
desks). Obviously, if we find an identical request there, then we can
produce the result immediately. Referential transparency for the win. We
produce the result with a call to
++blab. If this is a foreign desk
but the result is not in the cache, then we produce
subscription with no data) for reasons entirely opaque to me. Seriously,
it seems like we should wait until we get an actual response to the
request. If someone figures out why this is, let me know. At any rate,
it seems to work.
If this is a domestic desk, then we check to see if the case exists yet.
If it doesn't, then we simply move on to the next subscriber, consing
this one onto
xaq so that we can check again the next time around. If
it does exist, then we call
++balk to fulfill the request and produce
++balk is very simple, so we'll describe it here before we get to the
++ balk :: read and send |= [hen=duct oan=@ud mun=mood] ^+ +> =+ vid=(~(read-at-aeon ze lim dom ran) oan mun) ?~ vid (blub hen) (blab hen mun u.vid)
++read-at-aeon on the given request and aeon. If you recall,
this processes a
mood at a particular aeon and produces the result, if
there is one. If there is data at the requested location, then we
produce it with
++blab. Else, we call
++blub to notify the
subscriber that no data can ever come over this subscriptioin since it
is now impossible for there to ever be data for the given request.
Because referential transparency.
At any rate, back to
++wake. If the given rave is a subscription
request, then we proceed similarly to how we do in
++eave. We first
try to get the aeon referred to by the starting case. If it doesn't
exist yet, then we can't do anything interesting with this subscription,
so we move on to the next one.
Otherwise, we try to get the aeon referred to by the ending case. If it
doesn't exist yet, then we produce all the information we can. We call
++lobes-at-path at the given aeon and path to see if the requested
path has actually changed. If it hasn't, then we don't produce anything;
else, we produce the correct nako by calling
++bleb on the result of
++make-nako, as in
++eave. At any rate, we move on to the next
subscription, putting back into our cult the current subscription with a
new start case of the next aeon after the present.
If the aeon referred to by the ending case does exist, then we drop this
subscriber from our cult and satisfy its request immediately. This is
the same as before -- we check to see if the data at the path has
actually changed, producing it if it has; else, we call
no more data can be produced over this subscription.
This concludes our discussion of foreign requests.