Perl6 Distribution thoughts and proposals (s22)

Currently Distribution1 is just a glorified key/value store. After years of digesting s222 I'm comfortable pushing for its adaption. A common complaint is that its over engineered or too academic. However I would guess many such complaints boil down to not putting in all the considerations of the original drafters. I say this because over the years I've gone from being confused and disliking it to promoting its implementation. Nearly every problem I've encountered or run into I end up finding a solution for in s22. Sometimes these solutions are vague, but their intentions can still be interpreted. So I will lay out my understanding of the Distribution aspect of s22.

DESIGN:

https://design.perl6.org/S22.html#Distribution
The class for installing distributions using CompUnitRepo::Local::Installation. Basically provides the API to access the META6.json file representation of a distribution. It should at least contain the following methods:

method meta
my $meta = $dist.meta;
# Return a Hash with the representation of the meta-data,
# constructed the same way as in the META6.json specification.
# Please note that an actual META6.json file does not need to
# exist, just a representation in that format.
method content
my $content = $dist.content( <provides JSON::Fast lib/JSON/Fast.pm6> );
my $content = $dist.content( <resource images fido.png> );
# Return the octet-stream as specified by the given keys,
# navigating through the META6.json hash.

INTERFACE:

https://design.perl6.org/S22.html#Distribution
(note this is an interface, not a class as currently implemented)

role Distribution {
    meta {...}
    content(*@_ -> IO) {...}
}

INTERFACE EXPLANATION:

method meta is simply for giving hash-like access to the META6. In most cases it would probably just be an alias to a json-from-file generated hash, but it doesn't have to be (nor does the META6 have to exist as a file). I don't think there is much confusion here.

method content is the interesting bit.

currently: Distribution assumes everything is a file. CU::R::I concats relative paths into strings so there is no way for anything other than a file to transfer data.

proposed: Distribution doesn't not care what representation the Distribution is as it just uses .content, so content may feed it data from a socket, a file, or even a hash

example:

$dist.content(<provides Module::XXX lib/Module/XXX.pm6>)

(*note: key names found via method .meta)

This would return the raw data of whatever that set of keys point to (in this case lib/Module/XXX.pm6) so that CU::R::I can save that data anywhere it wants on the filesystem. So when CU::R::I gets the Distribution object the Distribution does not even have to exist yet; CU::R::I is just going to get the raw data and save it to (for instance) rakudo/install/perl6/site/XXX.pm6

%?RESOURCE:

https://design.perl6.org/S22.html#%25%3FRESOURCE

"resource" : {
    "images" : [
        "fido.png", "zowie.png"
    ],
    "libraries" : {
        "inline_helper" : "build-time",
    }
}

The current implementation of resource is rather lacking. Right now it just takes an array of paths: "resources" : ["libraries/libfoo", "xxx.img"]
The s22 design would allows for:

  1. each directory is its own hash key (so not "dir/dir2/xxx.txt" but rather "dir" : ["dir2" : "xxx.txt"])
  2. each file is not directly part of a string that contains its directory (no directory separators involved)
  3. Arbitrary data can be attached on leaf nodes; if a leaf node is a hash
    then its meant to be understood by a package manager and can be ignored by rakudo/compunit (as these might mean different things to specific package managers).

Let us look at the libraries example above; the arbitrary data here is build-time. This may tell a package manager something about libraries, so for this example we will say it tells the build phase we will generate a file called inline_helper that does not exist yet (so take this into account when creating a manifest/uninstall). It may also be that the package manager simply added it itself so that later it can look up that info [think dependency chain])

But its more useful than that. A package manager could then allow a command like <pkger> build-time . to run the build hook manually (similar to how npm scripts is).
Or allow explicitly requesting that $*VM.platform-library-name be applied (or explicitly supplying a type of "-or" => ["lib.so", "lib.dll"] to say one of these will exist). Remember, CU::R::I doesn't need to understand these outer hash nodes so any code meant to interpret these would be in whatever Distribution object (or package manager).

t/ hooks/ bin/:

resources/ does not belong in this group. Why? Because t/ hooks/ bin/ are special folders whereas resources can contain special files (anything with a hash leaf is considered a special file). I will say I can not explain why these special folders are not required entries in the META. I can only guess its because:

  1. These folders probably won't end up with any build time generated files meant to be installed (like resources)
  2. The files do not get tied to a specific name (like provides style Module::Foo => 'some/path')

So in the prototype distributions I code generally include a .files method to allow access to the files in these special folders that are not included in the META6. This is ok as none of these files have special properties (like resources) nor have to associate with a name (like provides).

CompUnit::Repository::Installation.install

Currently CompUnit::Repository::Installation has a signature3 of:

method install(Distribution $dist, %sources, %scripts?, %resources?, :$force)

We'll ignore the flaw of 2 optional positional hashes, but I do have to point out the pointlessness of passing in the 3 hashes at all. The $dist should already know all of this, and if not it should be able to generate it on the fly whenever CompUnit::Repository::Installation makes a request. So when its time to install the sources it would call something like

for $dist.meta<provides>.kv -> {
    $save-to.spurt: $dist.content(["provides", $_.key, $_.value])
}

instead of relying on a hash that gets passed in with the exact same data. In other words we want: method install(Distribution $dist, :$force)

I ask: is it safe to dismiss s22 Distribution for something else, when the something else is probably not considering many of the things that s22 does (but the proposer may not have thought of yet)? I'm willing to assuming quite a few hours were put into s22, and I'd hope anyone wanting to do something different puts in an equal amount of time to understand the problems its meant to solve before we say "lets just rewrite it because I understand this model I've built in my head over $some-fraction of the hours that were put into s22"

(following the programming story of programmer Foo insisting on removing a piece of code to programmer Bar, but programmer Bar denies this course of action because programmer Foo is unable to answer what its original purpose was. Foo wants to change code he does not understand into code he does understand, but Bar knows you should not change code you do not fully understand yet).

Again, I've been guilty of this as well. But I don't want to see the hack-y workarounds I've used for zef's4 Distribution objects over the last 2 years (see the bottom of this post) to somehow make it as the default implementation because they are easier to understand initially... they were developed as hacks until s22 was implemented after all.

A simpler interface could be used internally since the CompUnit::Repository already knows the actual location of the distribution it will load. It will also know what type of Distribution to use because the CompUnit installed it itself. This means the CompUnit can save it as any type of Distribution it wants; it may get Distribution::Tar but that does not mean CompUnit::Repository can't save it as a Distribution::Local (which means you no longer need Distribution::Tar to reinstall it when you upgrade rakudo). However a simpler interface being the default trades making it only a few lines easier to install a Distribution for taking away some of the flexibility s22 provides.

To wrap this up: I think we have been avoiding some/many of these things because at the time we did not understand why they were designed that way to begin with. I would ask that instead of requiring one to explain why we should use the s22 version of Distribution that anyone that would like to see otherwise explain what is actually wrong with it and what they think the original intention was. I do not think anyone should push for alternatives unless they can explain what they feel the original intention was meant to solve and whats wrong with the design decision. If these cannot be answered then it could be inferred that you may not have considered or be aware of certain problems that will be encountered. I know i've certainly been guilty of this in the past, but years of working on zef has only shown me the answers to most problems/features were in fact already solved in s22 in some way (and that I simply had not understood everything fully before).

Examples / Code

The first 2 examples each show multiple Distribution objects fulfilling the s22 interface (one using roles, one using classes):
Demo/prototype of Distribution::Local and Distribution::Tar

Demo/prototype of Distribution::Local and Distribution::GitHub (with a modified CompUnit::Repository::Installation to use .content and .meta methods)

A type of Distribution::Simple that can be created using the previously mentioned s22 implementation (but making this the actual interface leaves it too simple/limited to be the actual API; let the core have an API that allows slightly lower level access to the install process and leave the ::Simple style 'method per META6 root key' wrappers for distributions to implement themselves (as such implementations are simple enough to not require a core Distribution::Simple) or make a simple API available through a different CompUnit::Repository (CompUnit::Repository::Installation::Simple)
zef's Hack-y Distribution::Simple style implementation

Note: ioify is just a way to "absolutify" the relative paths contained in a META. In reality the data doesn't have to come from a file, but it may be used to absolutify a relative url or anything else that could transform a urn to an actual location.