Data Types
This section will be reference documentation for the data types used by our filesystem. Parts of it may be inaccurate as of March 2021. As a general guide, we recommend reading and attempting to understand the data structures used in any Hoon code before you try to read the code itself. Although complete understanding of the data structures is impossible without seeing them used in the code, an 80% understanding greatly clarifies the code. As another general guide, when reading Hoon, it rarely pays off to understand every line of code when it appears. Try to get the gist of it, and then move on. The next time you come back to it, it'll likely make a lot more sense.
Data Models
As you're reading through this section, remember you can always come back to this when you run into these types later on. You're not going to remember everything the first time through, but it is worth reading, or at least skimming, this so that you get a rough idea of how our state is organized.
The types that are certainly worth reading are ++raft
, ++room
,
++dome:clay
, ++ankh:clay
, ++rung:clay
, ++rang:clay
, ++blob:clay
, ++yaki:clay
, and ++nori:clay
(possibly in that order). All in all, though, this section isn't too
long, so many readers may wish to quickly read through all of it. If you
get bored, though, just skip to the next section. You can always come
back when you need to.
++raft
, formal state
++ raft :: filesystem
$: fat=(map ship room) :: domestic
hoy=(map ship rung) :: foreign
ran=rang :: hashes
== ::
This is the state of our vane. Anything that must be remembered between calls to clay is stored in this state.
fat
is the set of domestic servers. This stores all the information
that is specfic to a particular ship on this pier. The keys to this map
are the ships on the current pier. all the information that is specific
to a particular foreign ship. The keys to this map are all the ships
whose filesystems we have attempted to access through clay.
ran
is the store of all commits and deltas, keyed by hash. The is
where all the "real" data we know is stored; the rest is "just
bookkeeping".
++room
, filesystem per domestic ship
++ room :: fs per ship
$: hun=duct :: terminal duct
hez=(unit duct) :: sync duch
dos=(map desk dojo) :: native desk
== ::
This is the representation of the filesystem of a ship on our pier.
hun
is the duct we use to send messages to dill to display
notifications of filesystem changes. Only %note
gifts should be
produced along this duct. This is set by the %init
kiss.
hez
, if present, is the duct we use to send sync messages to unix so
that they end up in the pier unix directory. Only %ergo
gifts should
be producd along this duct. This is set by %into
and %invo
kisses.
dos
is a well-known operating system released in 1981. It is also the
set of desks on this ship, mapped to their data.
++desk
, filesystem branch
++ desk ,@tas :: ship desk case spur
This is the name of a branch of the filesystem. The default desks are "arvo", "main", and "try". More may be created by simply referencing them. Desks have independent histories and states, and they may be merged into each other.
++dojo
, domestic desk state
++ dojo ,[p=cult q=dome] :: domestic desk state
This is the all the data that is specific to a particular desk on a
domestic ship. p
is the set of subscribers to this desk and q
is the
data in the desk.
++cult
, subscriptions
++ cult (map duct rave) :: subscriptions
This is the set of subscriptions to a particular desk. The keys are the ducts from where the subscriptions requests came. The results will be produced along these ducts. The values are a description of the requested information.
++rave:clay
, general subscription request
++ rave :: general request
$% [& p=mood] :: single request
[| p=moat] :: change range
== ::
This represents a subscription request for a desk. The request can be for either a single item in the desk or else for a range of changes on the desk.
++rove
, stored general subscription request
++ rove (each mood moot) :: stored request
When we store a request, we store subscriptions with a little extra information so that we can determine whether new versions actually affect the path we're subscribed to.
++mood:clay
, single subscription request
++ mood ,[p=care q=case r=path] :: request in desk
This represents a request for the state of the desk at a particular
commit, specfied by q
. p
specifies what kind of information is
desired, and r
specifies the path we are requesting.
++moat:clay
, range subscription request
++ moat ,[p=case q=case r=path] :: change range
This represents a request for all changes between p
and q
on path
r
. You will be notified when a change is made to the node referenced
by the path or to any of its children.
++moot
, stored range subscription request
++ moot ,[p=case q=case r=path s=(map path lobe)] ::
This is just a ++moat:clay
plus a map of paths to lobes. This map
represents the data at the node referenced by the path at case p
, if
we've gotten to that case (else null). We only send a notification along
the subscription if the data at a new revision is different than it was.
++care:clay
, clay submode
++ care ?(%u %v %w %x %y %z) :: clay submode
This specifies what type of information is requested in a subscription or a scry.
%u
requests the ++rang:clay
at the current moment. Because this
information is not stored for any moment other than the present, we
crash if the ++case:clay
is not a %da
for now.
%v
requests the ++dome:clay
at the specified commit.
%w
requests the revsion number of the desk.
%x
requests the file at a specified path at the specified commit. If
there is no node at that path or if the node has no contents (that is,
if q:ankh
is null), then this produces null.
%y
requests a ++arch
of the specfied commit at the specified path.
%z
requests the ++ankh
of the specified commit at the specfied path.
++arch
, shallow filesystem node
++ arch ,[p=@uvI q=(unit ,@uvI) r=(map ,@ta ,~)] :: fundamental node
This is analogous to ++ankh:clay
except that the we have neither our
contents nor the ankhs of our children. The other fields are exactly the
same, so p
is a hash of the associated ankh, u.q
, if it exists, is a
hash of the contents of this node, and the keys of r
are the names of
our children. r
is a map to null rather than a set so that the
ordering of the map will be equivalent to that of r:ankh
, allowing
efficient conversion.
++case:clay
, specifying a commit
++ case :: ship desk case spur
$% [%da p=@da] :: date
[%tas p=@tas] :: label
[%ud p=@ud] :: number
== ::
A commit can be referred to in three ways: %da
refers to the commit
that was at the head on date p
, %tas
refers to the commit labeled
p
, and %ud
refers to the commit numbered p
. Note that since these
all can be reduced down to a %ud
, only numbered commits may be
referenced with a ++case:clay
.
++dome:clay
, desk data
++ dome :: project state
$: ang=agon :: pedigree
ank=ankh :: state
let=@ud :: top id
hit=(map ,@ud tako) :: changes by id
lab=(map ,@tas ,@ud) :: labels
== ::
This is the data that is actually stored in a desk.
ang
is unused and should be removed.
ank
is the current state of the desk. Thus, it is the state of the
filesystem at revison let
. The head of a desk is always a numbered
commit.
let
is the number of the most recently numbered commit. This is also
the total number of numbered commits.
hit
is a map of numerical ids to hashes of commits. These hashes are
mapped into their associated commits in hut:rang:clay
. In general, the keys
of this map are exactly the numbers from 1 to let
, with no gaps. Of
course, when there are no numbered commits, let
is 0, so hit
is
null. Additionally, each of the commits is an ancestor of every commit
numbered greater than this one. Thus, each is a descendant of every
commit numbered less than this one. Since it is true that the date in
each commit (t:yaki
) is no earlier than that of each of its parents,
the numbered commits are totally ordered in the same way by both
pedigree and date. Of course, not every commit is numbered. If that
sounds too complicated to you, don't worry about it. It basically
behaves exactly as you would expect.
lab
is a map of textual labels to numbered commits. Note that labels
can only be applied to numbered commits. Labels must be unique across a
desk.
++ankh
, filesystem node
++ ankh :: fs node (new)
$: p=cash :: recursive hash
q=(unit ,[p=cash q=*]) :: file
r=(map ,@ta ankh) :: folders
== ::
This is a single node in the filesystem. This may be file or a directory or both. In earth filesystems, a node is a file xor a directory. On mars, we're inclusive, so a node is a file ior a directory.
p
is a recursive hash that depends on the contents of the this file or
directory and on any children.
q
is the contents of this file, if any. p.q
is a hash of the
contents while q.q
is the data itself.
r
is the set of children of this node. In the case of a pure file,
this is empty. The keys are the names of the children and the values
are, recursively, the nodes themselves.
++cash
, ankh hash
++ cash ,@uvH :: ankh hash
This is a 128-bit hash of an ankh. These are mostly stored within ankhs themselves, and they are used to check for changes in possibly-deep hierarchies.
++rung
, filesystem per neighbor ship
++ rung $: rus=(map desk rede) :: neighbor desks
== ::
This is the filesystem of a neighbor ship. The keys to this map are all the desks we know about on their ship.
++rede
, desk state
++ rede :: universal project
$: lim=@da :: complete to
qyx=cult :: subscribers
ref=(unit rind) :: outgoing requests
dom=dome :: revision state
== ::
This is our knowledge of the state of a desk, either foreign or domestic.
lim
is the date of the last full update. We only respond to requests
for stuff before this time.
qyx
is the list of subscribers to this desk. For domestic desks, this
is simply p:dojo
, all subscribers to the desk, while in foreign desks
this is all the subscribers from our ship to the foreign desk.
ref
is the request manager for the desk. For domestic desks, this is
null since we handle requests ourselves.
dom
is the actual data in the desk.
++rind
, request manager
++ rind :: request manager
$: nix=@ud :: request index
bom=(map ,@ud ,[p=duct q=rave]) :: outstanding
fod=(map duct ,@ud) :: current requests
haw=(map mood (unit)) :: simple cache
== ::
This is the request manager for a foreign desk.
nix
is one more than the index of the most recent request. Thus, it is
the next available request number.
bom
is the set of outstanding requests. The keys of this map are some
subset of the numbers between 0 and one less than nix
. The members of
the map are exactly those requests that have not yet been fully
satisfied.
fod
is the same set as bom
, but from a different perspective. In
particular, the values of fod
are the same as the values of bom
, and
the p
out of the values of bom
are the same as the keys of fod
.
Thus, we can map ducts to their associated request number and ++rave:clay
,
and we can map numbers to their associated duct and ++rave:clay
.
haw
is a map from simple requests to their values. This acts as a
cache for requests that have already been made. Thus, the second request
for a particular ++mood:clay
is nearly instantaneous.
++rang:clay
, data store
++ rang $: hut=(map tako yaki) ::
lat=(map lobe blob) ::
== ::
This is a set of data keyed by hash. Thus, this is where the "real" data is stored, but it is only meaningful if we know the hash of what we're looking for.
hut
is a map from hashes to commits. We often get the hashes from
hit:dome:clay
, which keys them by logical id. Not every commit has an id.
lat
is a map from hashes to the actual data. We often get the hashes
from a ++yaki
, a commit, which references this map to get the data.
There is no ++blob:clay
in any ++yaki:clay
. They are only accessible through
this map.
++tako:clay
, commit reference
++ tako ,@ :: yaki ref
This is a hash of a ++yaki:clay
, a commit. These are most notably used as
the keys in hut:rang:clay
, where they are associated with the actual
++yaki:clay
, and as the values in hit:dome:clay
, where sequential ids are
associated with these.
++yaki:clay
, commit
++ yaki ,[p=(list tako) q=(map path lobe) r=tako t=@da] :: commit
This is a single commit.
p
is a list of the hashes of the parents of this commit. In most
cases, this will be a single commit, but in a merge there may be more
parents. In theory, there may be an arbitrary number of parents, but in
practice merges have exactly two parents. This may change in the future.
For commit 1, there is no parent.
q
is a map of the paths on a desk to the data at that location. If you
understand what a ++lobe:clay
and a ++blob:clay
is, then the type signature
here tells the whole story.
r
is the hash associated with this commit.
t
is the date at which this commit was made.
++lobe:clay
, data reference
++ lobe ,@ :: blob ref
This is a hash of a ++blob:clay
. These are most notably used in lat:rang:clay
,
where they are associated with the actual ++blob:clay
, and as the values in
q:yaki:clay
, where paths are associated with their data in a commit.
++blob:clay
, data
++ blob $% [%delta p=lobe q=lobe r=udon] :: delta on q
[%direct p=lobe q=* r=umph] ::
[%indirect p=lobe q=* r=udon s=lobe] ::
== ::
This is a node of data. In every case, p
is the hash of the blob.
%delta
is the case where we define the data by a delta on other data.
In practice, the other data is always the previous commit, but nothing
depends on this. q
is the hash of the parent blob, and r
is the
delta.
%direct
is the case where we simply have the data directly. q
is the
data itself, and r
is any preprocessing instructions. These almost
always come from the creation of a file.
%indirect
is both of the preceding cases at once. q
is the direct
data, r
is the delta, and s
is the parent blob. It should always be
the case that applying r
to s
gives the same data as q
directly
(with the prepreprocessor instructions in p.r
). This exists purely for
performance reasons. This is unused, at the moment, but in general these
should be created when there are a long line of changes so that we do
not have to traverse the delta chain back to the creation of the file.
++udon
, abstract delta
++ udon :: abstract delta
$: p=umph :: preprocessor
$= q :: patch
$% [%a p=* q=*] :: trivial replace
[%b p=udal] :: atomic indel
[%c p=(urge)] :: list indel
[%d p=upas q=upas] :: tree edit
== ::
== ::
This is an abstract change to a file. This is a superset of what would normally be called diffs. Diffs usually refer to changes in lines of text while we have the ability to do more interesting deltas on arbitrary data structures.
p
is any preprocessor instructions.
%a
refers to the trival delta of a complete replace of old data with
new data.
%b
refers to changes in an opaque atom on the block level. This has
very limited usefulness, and is not used at the moment.
%c
refers to changes in a list of data. This is often lines of text,
which is your classic diff. We, however, will work on any list of data.
%d
refers to changes in a tree of data. This is general enough to
describe changes to any hoon noun, but often more special-purpose delta
should be created for different content types. This is not used at the
moment, and may in fact be unimplemented.
++urge
, list change
++ urge |*(a=_,* (list (unce a))) :: list change
This is a parametrized type for list changes. For example, (urge ,@t)
is a list change for lines of text.
++unce
, change part of a list.
++ unce |* a=_,* :: change part
$% [%& p=@ud] :: skip[copy]
[%| p=(list a) q=(list a)] :: p -> q[chunk]
== ::
This is a single change in a list of elements of type a
. For example,
(unce ,@t)
is a single change in a lines of text.
%&
means the next p
lines are unchanged.
%|
means the lines p
have changed to q
.
++umph
, preprocessing information
++ umph :: change filter
$| $? %a :: no filter
%b :: jamfile
%c :: LF text
== ::
$% [%d p=@ud] :: blocklist
== ::
This space intentionally left undocumented. This stuff will change once we get a well-typed clay.
++upas
, tree change
++ upas :: tree change (%d)
$& [p=upas q=upas] :: cell
$% [%0 p=axis] :: copy old
[%1 p=*] :: insert new
[%2 p=axis q=udon] :: mutate!
== ::
This space intentionally left undocumented. This stuff is not known to work, and will likely change when we get a well-typed clay. Also, this is not a complicated type; it is not difficult to work out the meaning.
++nori:clay
, repository action
++ nori :: repository action
$% [& q=soba] :: delta
[| p=@tas] :: label
== ::
This describes a change that we are asking clay to make to the desk. There are two kinds of changes that may be made: we can modify files or we can apply a label to a commit.
In the |
case, we will simply label the current commit with the given
label. In the &
case, we will apply the given changes.
++soba:clay
, delta
++ soba ,[p=cart q=(list ,[p=path q=miso])] :: delta
This describes a set of changes to make to a desk. The cart
is simply
a pair of the old hash and the new hash of the desk. The list is a list
of changes keyed by the file they're changing. Thus, the paths are paths
to files to be changed while miso
is a description of the change
itself.
++miso:clay
, ankh delta
++ miso :: ankh delta
$% [%del p=*] :: delete
[%ins p=*] :: insert
[%mut p=udon] :: mutate
== ::
There are three kinds of changes that may be made to a node in a desk.
We can insert a file, in which case p
is the contents of the new file.
We can delete a file, in which case p
is the contents of the old file.
Finally, we can mutate that file, in which case the udon
describes the
changes we are applying to the file.
++mizu:clay
, merged state
++ mizu ,[p=@u q=(map ,@ud tako) r=rang] :: new state
This is the input to the %merg
kiss, which allows us to perform a
merge. The p
is the number of the new head commit. The q
is a map
from numbers to commit hashes. This is all the new numbered commits that
are to be inserted. The keys to this should always be the numbers from
let.dom
plus one to p
, inclusive. The r
is the maps of all the new
commits and data. Since these are merged into the current state, no old
commits or data need be here.
++riff:clay
, request/desist
++ riff ,[p=desk q=(unit rave)] :: request/desist
This represents a request for data about a particular desk. If q
contains a rave
, then this opens a subscription to the desk for that
data. If q
is null, then this tells clay to cancel the subscription
along this duct.
++riot:clay
, response
++ riot (unit rant) :: response/complete
A riot is a response to a subscription. If null, the subscription has
been completed, and no more responses will be sent. Otherwise, the
rant
is the produced data.
++rant:clay
, response data
++ rant :: namespace binding
$: p=[p=care q=case r=@tas] :: clade release book
q=path :: spur
r=* :: data
== ::
This is the data at a particular node in the filesystem. p.p
specifies
the type of data that was requested (and is produced). q.p
gives the
specific version reported (since a range of versions may be requested in
a subscription). r.p
is the desk. q
is the path to the filesystem
node. r
is the data itself (in the format specified by p.p
).
++nako
, subscription response data
++ nako $: gar=(map ,@ud tako) :: new ids
let=@ud :: next id
lar=(set yaki) :: new commits
bar=(set blob) :: new content
== ::
This is the data that is produced by a request for a range of revisions
of a desk. This allows us to easily keep track of a remote repository --
all the new information we need is contained in the nako
.
gar
is a map of the revisions in the range to the hash of the commit
at that revision. These hashes can be used with hut:rang:clay
to find the
commit itself.
let
is either the last revision number in the range or the most recent
revision number, whichever is smaller.
lar
is the set of new commits, and bar
is the set of new content.
Public Interface
As with all vanes, there are exactly two ways to interact with clay.
%clay
exports a namespace accessible through .^
, which is described
above under ++care:clay
. The primary way of interacting with clay, though,
is by sending kisses and receiving gifts.
++ gift :: out result <-$
$% [%ergo p=@p q=@tas r=@ud] :: version update
[%note p=@tD q=tank] :: debug message
[%writ p=riot] :: response
== ::
++ kiss :: in request ->$
$% [%info p=@p q=@tas r=nori] :: internal edit
[%ingo p=@p q=@tas r=nori] :: internal noun edit
[%init p=@p] :: report install
[%into p=@p q=@tas r=nori] :: external edit
[%invo p=@p q=@tas r=nori] :: external noun edit
[%merg p=@p q=@tas r=mizu] :: internal change
[%wart p=sock q=@tas r=path s=*] :: network request
[%warp p=sock q=riff] :: file request
== ::
There are only a small number of possible kisses, so it behooves us to describe each in detail.
$% [%info p=@p q=@tas r=nori] :: internal edit
[%into p=@p q=@tas r=nori] :: external edit
These two kisses are nearly identical. At a high level, they apply
changes to the filesystem. Whenever we add, remove, or edit a file, one
of these cards is sent. The p
is the ship whose filesystem we're
trying to change, the q
is the desk we're changing, and the r
is the
request change. For the format of the requested change, see the
documentation for ++nori:clay
above.
When a file is changed in the unix filesystem, vere will send a %into
kiss. This tells clay that the duct over which the kiss was sent is the
duct that unix is listening on for changes. From within Arvo, though, we
should never send a %into
kiss. The %info
kiss is exactly identical
except it does not reset the duct.
[%ingo p=@p q=@tas r=nori] :: internal noun edit
[%invo p=@p q=@tas r=nori] :: external noun edit
These kisses are currently identical to %info
and %into
, though this
will not always be the case. The intent is for these kisses to allow
typed changes to clay so that we may store typed data. This is currently
unimplemented.
[%init p=@p] :: report install
Init is called when a ship is started on our pier. This simply creates a
default room
to go into our raft
. Essentially, this initializes the
filesystem for a ship.
[%merg p=@p q=@tas r=mizu] :: internal change
This is called to perform a merge. This is most visibly called by
:update to update the filesystem of the current ship to that of its
sein. The p
and q
are as in %info
, and the r
is the description
of the merge. See ++mizu:clay
above.
XX
XX [%wake ~] :: timer activate XX
XX
XX This card is sent by unix at the time specified by ++doze
. This
time is XX usually the closest time specified in a subscription request.
When %wake
is XX called, we update our subscribers if there have been
any changes.
[%wart p=sock q=@tas r=path s=*] :: network request
This is a request that has come across the network for a particular
file. When another ship asks for a file from us, that request comes to
us in the form of a %wart
kiss. This is handled by trivially turning
it into a %warp
.
[%warp p=sock q=riff] :: file request
This is a request for information about a particular desk. This is, in
its most general form, a subscription, though in many cases it is the
trivial case of a subscription -- a read. See ++riff:clay
for the format of
the request.