Software Hash Identifier (SWHID)
Similar to DOIs, Software Hash Identifiers (SWHIDs, also known as Software Heritage Identifers) are persistent, intriinsic identifiers developed by Software Heritage Network. Software Heritage is a non-profit, long term archive for software source code. It does not only store software artifacts, but also the full version control history.
SWHIDs are organized using a tree-like filesystem hierarchy and include special nodes to track revisions, releases, full version control systems, and all branches. Unlike DOIs, which track changes of the software projects or releases as a whole, SWHIDs is developed to track all individual components within a software project.
SWHID Syntax
A SWHID consists of two separate parts, the core identifer and an optional list of qualifiers. The core identifier identifies any software artifacts or objects, while the qualifiers allow specification of the context where the object is meant to be seen and points to a subpart of the object itself.
Core Identifier
The core identifier is composed of 4 fields separated by colons :.
Example: swh:1:dir:93711e958cdde0b729ab948d3a904399dae0c890.
swhis the identifier prefix.1is the version of the identifier scheme.dirindicates the type of object ot be directories, there are alsocnt(contents),rev(revisions),rel(releases) andsnp(snapshots).93711e958cdde0b729ab948d3a904399dae0c890is the indicates the intrinsic identifiers, which is a string of hex encoded lowercase ASCII characters computed from the content and relevant metadata of the object.
Qualified Identifiers
A qualifier may be a fragment qualifier, which identifies subparts of a software artifact, or a context qualifier, which provides additional context on the software artifact.
Each qualifier is specified as a key-value pair, using an = character as a separator. Qualifiers are separated from the core identifier and from each other by using a ; character.
Example:
swh:1:dir:93711e958cdde0b729ab948d3a904399dae0c890;
origin=https://github.com/McMasterRS/WARIO;
visit=swh:1:snp:0369ad5f0f4b74eb586fbd130ca47e2cb6ac8034
swh:1:dir:93711e958cdde0b729ab948d3a904399dae0c890is the core identifier as discussed previously.origindeclares the origin of the software.visitindicates the snapshot of the repository where the software artifact can be observed, this fragment qualifier is only valid when theoriginqualifier is present.
How to Get a SWHID?
SWHIDs are automatically generated when your source code or software artifacts are archived within the Software Heritage archive.
If your software source code is hosted on a public repository, such as GitHub, you can manually submit it to Software Heritage using their web interface. Additionally, Software Heritage periodically crawls and harvests source code from various public repositories, ensuring that a wide range of software is automatically archived.
Once Software Heritage captures a snapshot of your source code, it is securely preserved in their extensive archive. At this point, a SWHID is generated and available on the Software Heritage’s archive browser.