Skip to content

Assign a property to ALL enum values. #458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

gouttegd
Copy link
Contributor

@gouttegd gouttegd commented Jun 6, 2025

Related to #454

  • [ ] docs/ have been added/updated if necessary
  • make test has been run locally
  • [ ] tests have been added/updated (if applicable)
  • [ ] CHANGELOG.md has been updated.
  • run SSSOM-Py test suite against the updated model

This PR ensures that all enum values (except for the mapping_cardinality_enum) throughout the LinkML model have a property assigned to them. This is important for RDF serialisation, so that all enum values are consistently rendered as resource identifiers, instead of having some values being rendered as resource identifiers and some values being rendered as string literals.

I normally agree with @nichtich ’s sentiment (expressed in #450) that we should as much as possible reuse existing properties from outside SSSOM (as we already do, for example, for most values of the entity_type_enum) and avoid creating SSSOM-specific properties in the https://w3id.org/sssom/ namespace. However, for all the enum values concerned in this PR, I am not aware of pre-existing properties that would be suitable, so I did in fact create a bunch of SSSOM-specific properties. I am open to replacing some of them by more appropriate existing properties if someone can suggest some.

Also open to discussing the naming of those properties.

Of note, it would be nice if those properties could resolve to something meaningful to users. In particular, for the values in sssom_version_enum, I think most people would reasonably expect https://w3id.org/sssom/version1.0 to resolve to a frozen version of the full specification as it was in the 1.0 release (and likewise for https://w3id.org/sssom/version1.1 and any futur version). But this is an infrastructure issue that I consider out of scope for this PR.

Ensure that all enum values throughout the specification have an
explicit property assigned to them. This is important for consistency of
the RDF serialisation.
@gouttegd gouttegd self-assigned this Jun 6, 2025
@gouttegd gouttegd requested a review from matentzn June 6, 2025 10:57
matentzn
matentzn previously approved these changes Jun 6, 2025
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing controversial or breaking here?

@gouttegd
Copy link
Contributor Author

gouttegd commented Jun 6, 2025

The only think it breaks is the current RDF serialisation.

For now, enum values that do not have an associated property are rendered as string literals. For example, sssom_version: 1.1 is currently rendered in RDF as

[] a sssom:MappingSet ;
   sssom:sssom_version "1.1"^^xsd:string .

With this PR, it will be rendered instead as

[] a sssom:MappingSet ;
   sssom:sssom_version sssom:version1.1 .

But since the RDF serialisation has yet to be officially specified, I won’t lose too much sleep over that change (and I plan to make my RDF reader still accept literal values to soften the impact of the change).

As for “controversial”, I don’t think so, but since we are naming properties, there’s a lot of potential for bike-shedding. ;)

@matentzn matentzn requested a review from ehartley June 6, 2025 12:53
Copy link
Contributor

@nichtich nichtich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding of version number should be changed to keep it a string value.

@graybeal
Copy link

graybeal commented Jun 6, 2025

I don't like to bring this up, but please consider the following: is version really an enum?

For comparison, I assume we would agree that versionDate (or releaseDate or creationData) are not enums, because they could be anything, and everyone knows what a date format is. So we use a string for that.

But they couldn't be 'anything' in the real world of a particular system or entity. versionDate could only be one of a small set of strings that have been issued when the system is released. And every time you do a new version, you anoint another date as a versionDate, so the number of dates increases by 1. And you don't necessarily know ahead of time which string will be the next date, but you know how to generate it.

So what is the difference between versionDate and version, that makes versionDate a string and version an enum?

I'd argue that in the same way that there can be any number of versionDates in the wild (beyond the ones we end up using for a given system) , there can be any number of versions, even semantic versions. 0.0.0 through infinite #.#.#. Is the idea that by giving version strings enums we have somehow made the world more interoperable?

In the justification above, it seems we are using the version for a second purpose than just indicating the version:

I think most people would reasonably expect https://w3id.org/sssom/version1.0 to resolve to a frozen version of the full specification as it was in the 1.0 release

I agree with that statement on its own. But it's not because this is a version identifier, it's because this is a versioned IRI for the specification. So "https://w3id.org/sssom/version1.0" should resolves as you say. For me it is only meaningful as a full IRI string; referring to it as "sssom:version1.0" would not work in a browser, would not be discoverable in a text search as an IRI, would not be expanded in most tools that are displaying the string—it only works in a context that the prefix has been defined and is being expanded.

The point is that I've seen versionedIRIs and version IRIs and specification IRIs, but I've never seen any of them treated as an enum (unless it's to create a drop down menu to select your specification version, I suppose) and I've never seen any of them in the wild expressed as a CURIE. They are always strings.

Sorry to make this controversial. If you still think it's worth it the sky won't fall if you do it, I just have a different take on whether it will be helpful in the bigger scheme of things.

@gouttegd
Copy link
Contributor Author

gouttegd commented Jun 6, 2025

is version really an enum?

Yes it is. Because the version we are talking about here is not something like “the version of the mapping set”, but the version of the SSSOM specification that the mapping set declares itself to be compliant with. It is not and must not be an open-ended field.

it's because this is a versioned IRI for the specification

Yes.

For me it is only meaningful as a full IRI string; referring to it as "sssom:version1.0" would not work in a browser

The CURIE is nothing but a syntactic sugar for the full IRI. Whther the CURIE ends up being used or not is left to the discretion of the RDF writer, depending on whether it is configured to write IRI in their full-length form or in their short form.

A SSSOM/TSV file like this:

# sssom_version: "1.1"

can be rendered in RDF either like this:

[] a <https://w3id.org/sssom/MappingSet> ;
   <https://w3id.org/sssom/sssom_version> <https://w3id.org/sssom/version1.1> .

or like this:

@prefix sssom: <https://w3id.org/sssom/> .

[] a sssom:MappingSet ;
   sssom:sssom_version sssom:version1.1 .

This is of no consequence. The version IRI for the version 1.1 of the spec is still https://w3id.org/sssom/version1.1 in both cases. The fact that in the second example it is written on file as a CURIE does not matter. Once the file is parsed whatever downstream application consumes the file will see the full IRI.

@nichtich
Copy link
Contributor

nichtich commented Jun 7, 2025

@gouttegd wrote:

but the version of the SSSOM specification that the mapping set declares itself to be compliant with.

You are right, in this case the version could also be treated differently. Either an URI as suggested, or not included in RDF at all just like XML version number is not included in most DOM representations of the document tree. In the end, this issue seems of less relevance compared to how to express the actual content of a mapping set.

@gouttegd
Copy link
Contributor Author

gouttegd commented Jun 7, 2025

or not included in RDF at all just like XML version number is not included in most DOM representations of the document tree.

Not sure the XML version number is the appropriate comparison here. I’d rather compare sssom_version with something like the version attribute in a XSL/Transform stylesheet

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
...
</xsl:stylesheet>

which is always included in the DOM representation of the underlying XML tree – as it should be, because if you are writing a XSL/T engine, you’d want to know which version of the XSL/T language is used by the stylesheet you just read.

Granted, it is less important for SSSOM for now given that there are only two versions (1.0 and 1.1) with objectively very little differences between them. But that may very well change in the future.

For the `mapping_cardinality_enum`, representing values with a resource
identifier in RDF doesn’t really bring any benefit compared to
representing the values with a string literal.
matentzn
matentzn previously approved these changes Jun 10, 2025
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. @nichtich @graybeal ?

@matentzn matentzn requested a review from nichtich June 10, 2025 13:06
Incorrect indentation meant that the `meaning: sssom:NegatedPredicated` would have been interpreted as _another PredicateModifier value_, instead of the meaning of the `Not` value.

Co-authored-by: Emily Hartley <ehartley@c-path.org>
gouttegd and others added 2 commits June 19, 2025 08:17
It is not possible to combine the "short syntax" for enum descriptions
with the use of a `meaning` qualifier.
@matentzn matentzn requested a review from ehartley June 19, 2025 15:30
Copy link
Contributor

@ehartley ehartley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants