Discussion:
[urn] Benjamin Kaduk's Discuss on
John C Klensin
2018-06-08 00:52:21 UTC
Permalink
--On Thursday, June 7, 2018 16:29 -0600 Peter Saint-Andre
Text along those lines seems appropriate. We might also
discourage people from defining their own canonical
transformation, but rather re-use one that's already defined
(say, a PRECIS profile such as UsernameCaseMapped).
I started to write a much longer note and explanation, but maybe
a summary is better for all concerned (i.e., if more is needed,
it is readily available).

IMO, there are two separate issues involved with this
discussion.

One is whether an Internationalization Considerations section is
needed. It seems to me that it is, but that it should mostly be
part of much clearer explanations in RFC 3986 about what rules,
especially about canonical forms and matching, apply to all URI
types, what is delegated to schemes/methods, and what can
reasonably be further delegated by particular schemes to, e.g.,
for the URN case, particular namespaces. The URNBIS WG
discussed whether to try to incorporate that type of explanation
into what became RFC 8141 but concluded that would be unwise, in
part because it would risk contradictions between the way some
of those in the URN community interpreted (or wanted to
interpret) 3986 and the way some of those in the more
traditional web community (a subset of whom are offended at the
whole idea of URNs interpret 3986. Pushing those issues and
that debate into the definition of a particular namespace seems
unwise and inappropriate.

While the other is about i18n, it is less about what I would
think of as internationalization considerations for NBNs in
general then it is about free advice to those who are defining
national sub-namespaces for NBNs. Make statements like that if
you like (and please pay attention to whatever Juha, who is
ultimately responsible for the text, has to say on the subject),
but remember that the managers of those national sub-namespaces
are mostly repository libraries who, in general, know far more
about their names, their languages, and how Unicode relates to
those languages than we can hope to. Remembering that NBNs are
already deployed by some of those libraries and that they have
hundreds of years of experience with the types of identifier,
and identifier comparison rules, that work for them, I
personally think that the IETF should be a little careful about
hubris in giving advice on those issues, but whatever works.

best,
john
Peter Saint-Andre
2018-06-08 18:01:51 UTC
Permalink
Post by John C Klensin
--On Thursday, June 7, 2018 16:29 -0600 Peter Saint-Andre
Text along those lines seems appropriate. We might also
discourage people from defining their own canonical
transformation, but rather re-use one that's already defined
(say, a PRECIS profile such as UsernameCaseMapped).
I started to write a much longer note and explanation, but maybe
a summary is better for all concerned (i.e., if more is needed,
it is readily available).
Indeed. Similarly, I had started writing a longer note yesterday,
pointing out that much of this is already covered in:

1. RFC 3986 (esp. Section 1.2.1, which doesn't say how "to use a wider
range of characters" nor does it "specify the character encoding used to
map those characters to octets prior to being percent-encoded for the URI")

2. RFC 8141 (esp. Section 2.2, which specifies a character encoding of
UTF-8 but doesn't say how to process "those characters" before UTF-8
encoding).

My reading of Adam's and Ben's messages is that they'd like a more
detailed specification of the last-mentioned item, or at least bring the
matter to the attention of anyone brave or foolhardy enough to use
"those characters" in NBNs.

+1 to the rest of what John said.

Peter
Adam Roach
2018-06-08 19:45:03 UTC
Permalink
Post by Peter Saint-Andre
...
Indeed. Similarly, I had started writing a longer note yesterday,
1. RFC 3986 (esp. Section 1.2.1, which doesn't say how "to use a wider
range of characters" nor does it "specify the character encoding used to
map those characters to octets prior to being percent-encoded for the URI")
2. RFC 8141 (esp. Section 2.2, which specifies a character encoding of
UTF-8 but doesn't say how to process "those characters" before UTF-8
encoding).
My reading of Adam's and Ben's messages is that they'd like a more
detailed specification of the last-mentioned item, or at least bring the
matter to the attention of anyone brave or foolhardy enough to use
"those characters" in NBNs.
What I have in mind is more the second thing than the first. As John
pointed out, what libraries want to do may vary from country to country.
I just want text that warns that there's a gun here and that they should
take care to aim it away from their own foot.

/a
Benjamin Kaduk
2018-06-08 20:25:59 UTC
Permalink
Post by Adam Roach
Post by Peter Saint-Andre
...
Indeed. Similarly, I had started writing a longer note yesterday,
1. RFC 3986 (esp. Section 1.2.1, which doesn't say how "to use a wider
range of characters" nor does it "specify the character encoding used to
map those characters to octets prior to being percent-encoded for the URI")
2. RFC 8141 (esp. Section 2.2, which specifies a character encoding of
UTF-8 but doesn't say how to process "those characters" before UTF-8
encoding).
My reading of Adam's and Ben's messages is that they'd like a more
detailed specification of the last-mentioned item, or at least bring the
matter to the attention of anyone brave or foolhardy enough to use
"those characters" in NBNs.
What I have in mind is more the second thing than the first. As John
pointed out, what libraries want to do may vary from country to country.
I just want text that warns that there's a gun here and that they should
take care to aim it away from their own foot.
I think that's what I had in mind as well (to the extent that I had
anything at all in mind when I balloted -- I don't want to claim to
be an expert in this space).

-Benjamin
Benjamin Kaduk
2018-06-08 20:32:30 UTC
Permalink
I'm happy to see the main point of discussion progressing with input
from people who know more about the subject than me ... that said, I
can comment on some of the other points, inline.
Document shepherd here. I expect the document author (and perhaps my
co-author on RFC 8141) to provide further thoughts.
Benjamin Kaduk has entered the following ballot position for
draft-hakala-urn-nbn-rfc3188bis-01: Discuss
When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)
Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.
https://datatracker.ietf.org/doc/draft-hakala-urn-nbn-rfc3188bis/
----------------------------------------------------------------------
----------------------------------------------------------------------
I think this document may benefit from an Internationalization
Considerations sections, but am not entirely sure how needed it is.
So let's discuss it...
In particular, the URN:NBN lexical equivalence rules include several
case-insensitive comparisons, for the prefix and for the case of the
hex digits in any percent-encoded values, but do not specify any
operation on the decoded percent-encoded values/characters.
In particular, with regard to characters outside the ASCII range,
URNs that appear in protocols or that are passed between systems MUST
use only Unicode characters encoded in UTF-8 and further encoded as
required by RFC 3986. To the extent feasible and consistent with the
requirements of names defined and standardized elsewhere, as well as
the principles discussed in Section 1.2, the characters used to
represent names SHOULD be restricted to either ASCII letters and
digits or to the characters and syntax of some widely used models
such as those of Internationalizing Domain Names in Applications
(IDNA) [RFC5890], Preparation, Enforcement, and Comparison of
Internationalized Strings (PRECIS) [RFC7613], or the Unicode
Identifier and Pattern Syntax specification [UAX31].
In order to make URNs as stable and persistent as possible when
protocols evolve and the environment around them changes, URN
namespaces SHOULD NOT allow characters outside the ASCII range
[RFC20] unless the nature of the particular URN namespace makes such
characters necessary.
By my reading of draft-hakala-urn-nbn-rfc3188bis and RFC 8141, the
allowable case-sensitivity for nbn_string constructs generated by a
national library applies to the percent-encoded string because that is
where any comparison or equivalence-matching would occur for these
identifiers. Venturing into case matching of percent-decoded strings
would (IMHO) unnecessarily open up an ugly can of worms.
In many
(perhaps even most?) cases, ignoring such encoded characters for
purposes of case-insensitive comparison is the wrong thing to do,
but if I understand correctly, it actually is the correct thing to
do in this case. Namely, a NBN (or URN:NBN), once assigned, is
essentially static data and consumers of it should not attempt to
perform modification, Unicode normalization, etc. on it -- that
would potentially change what is being identified (or render the
identifier invalid).
Well, Unicode normalization would be used as part of equivalence
operations (as in IDNA or PRECIS), but in general you are right about
modification. These are identifiers or even numbers, not malleable strings.
On the other hand, a national library or
delegated institution that is assigning NBNs may wish to take into
account Unicode normalization rules and other similar considerations
while assigning NBNs (in particular, the nbn_string component), as
part of their allocation policy.
It could, but as far as I know none of the national libraries have yet
gone down that path or seen the need to. Juha can tell us if I'm wrong.
Because these can be subtle, it
may be worth explicitly pointing out the potential issues for
registration authorities.
"There be dragons and don't go there" seems like fine advice.
That, plus the directive to consumers to
not normalize, seems like it would be appropriate content for an
Internationalization Considerations section.
By "normalize" you mean perform equivalence matching of percent-decoded
strings (of which Unicode normalization might be one step), right? Here
again I think the answer is "don't do that" because it's equivalence
matching is done on the percent-encoded strings.
I did not have a terribly concrete scenario in mind when I wrote
this; I think the one Adam described is probably enough to get us
thinking about the right things.
Separately, in Section 4.2.1 where we cover 4-components, I noted
that RFC 8141 rather discourages actually using r-components until
their semantics are standardized. The text here seems to be giving
free reign for national libraries to assign their own semantics
without any coordination with a broader community.
Juha and perhaps John can clarify, but as I understand it the scope of a
URN resolver for NBNs would likely be within a particular national
library system, not even necessarily across all national libraries (this
is how things are deployed now in the absence of URN resolution, in any
case).
Do we really
want to advocate for this, as opposed to attempting to get broadly
unified semantics for r-components Internet-wide? (Perhaps we
already have and I just missed it; if so, a reference here would be
appropriate.)
The semantics of r-components are yet to be defined. I would venture
that the IETF is probably not the right place to do that work, given how
little energy remained in the URN WG at the end (and we probably didn't
have the right people in the room in the first place).
I won't argue with that. Does it make sense to say something like
"There are not currently any broadly accepted semantics for
r-components at the time of this writing which may be grounds to be
cautious with their use" in this document?
----------------------------------------------------------------------
----------------------------------------------------------------------
I'm a little confused on some of the places in the text that talk
about URN:NBNs being "generated from" NBNs (and non-reuse
thereafter) or restrictions on URN:NBN assignment (e.g.,
uniqueness). The procedure seems to be basically deterministic for
creating a URN:NBN once an NBN is assigned, and potentially
something that could be done by any party in possession of the NBN
(i.e., not necessarily the registration authority that created the
NBN). So I'm not sure why the act of generating the URN:NBN has any
significance, if anyone could do it -- the restrictions would need
to apply at NBN assignment time in order to be useful. (This kind
of gets into Ben's DISCUSS point, too, in the sense that we can only
say what prerequisites there are for national library NBN allocation
policies in order for them to be useful with URN:NBN, but they can
in principle do whatever they like and choose to not use URN:NBN.)
Yes, the process of creating a URN from an NBN is trivial (modulo
potentially interesting encoding of non-ASCII characters). I think the
point of the text is that an NBN URN is not exactly the same as an NBN.
Perhaps that could be worded more clearly.
Okay. (I don't think I have any suggestions for different text.)
Section 3.2
From the library community point of view it is important that the
f-component is not a part of the NSS and therefore f-component
attachment does not mean that the relevant component part is
identified. Moreover, the resolution process still retrieves the
entire resource even if there is an f-component. The fragment
selection is applied by the resolution client (e.g., browser) to the
media returned by the resolution process. In other words, in this
latter case the fragments are logical and physical components of the
identified resource whereas in the former cases these "fragments" are
actually complete, independently named entities.
I'm not sure I'm understanding this correctly -- is the "former
case" the thing that libraries should not do, namely, including the
f-component in the NSS?
Now that you point it out, I'm not sure what the former case is.
Formally speaking the f-component simply is not part of the NSS, see the
ABNF in RFC 8141.
I guess we should wait for Juha to clarify.
If an NBN identifies a work, descriptive metadata about the work
SHOULD be supplied. The metadata record MAY contain links to
Internet-accessible digital manifestations of the work.
This left me confused. Is it only intended to apply in the case
described in the previous paragraph, where the resource identified
by the NBN is not available in the Internet? Or does it always
apply, forcing the metadata to take precedence over delivering the
actual work? (Or maybe I'm just confused, and there's an easy way
to deliver both metadata and the actual work alongside each other
with no ambiguity.)
Juha can clarify this.
Section 4.1
National Bibliography Number (NBN) is a generic term referring to a
group of identifier systems administered by the national libraries
and institutions authorized by them.
"the national libraries" implies a specific set -- which ones? It
may be better to hedge with "some national libraries".
Or remove "the" ... "by national libraries".
That's probably better :)

Thanks,

Benjamin
Section 4.2.2
Do we need to say anything about a URN-to-URI step before talking
about URI-to-resource services?
I'm also wondering about any relationship between "component
resource" NBNs and f-components of the containing work. If there is
are NBNs assigned to both an image within a work and that containing
work, and an NBN with f-resource is used to refer to the image
within the containing work, is there any relationship between the
f-resource and the image-specific NBN?
Section 4.3
Expressing NBNs as URNs is usually straightforward, as only ASCII
characters are allowed in NBN strings. If necessary, NBNs MUST be
translated into canonical form as specified in RFC 8141.
When is it necessary?
It seems that in theory an NBN itself could contain non-ASCII
characters, whereas an NBN URN and its nbn_string construct can contain
only ASCII characters. At least that is my understanding.
Being part of the prefix, sub-namespace identifier strings are case-
insensitive. They MUST NOT contain any hyphens.
This MUST seems to just duplicate a syntactic requirement from the
ABNF; is RFC 2119 language really necessary?
/me shrugs
Section 8
John Klensin provided significant editorial and advisory support for
late versions of the draft.
Presumably that's "later versions"?
Yes.
Peter
John C Klensin
2018-06-09 20:20:39 UTC
Permalink
--On Friday, June 8, 2018 15:32 -0500 Benjamin Kaduk
Post by Benjamin Kaduk
The semantics of r-components are yet to be defined. I would
venture that the IETF is probably not the right place to do
that work, given how little energy remained in the URN WG at
the end (and we probably didn't have the right people in the
room in the first place).
I won't argue with that. Does it make sense to say something
like "There are not currently any broadly accepted semantics
for r-components at the time of this writing which may be
grounds to be cautious with their use" in this document?
Perhaps. But see below.
Post by Benjamin Kaduk
...
If an NBN identifies a work, descriptive metadata about
the work SHOULD be supplied. The metadata record MAY
contain links to Internet-accessible digital
manifestations of the work.
This left me confused. Is it only intended to apply in the
case described in the previous paragraph, where the
resource identified by the NBN is not available in the
Internet? Or does it always apply, forcing the metadata to
take precedence over delivering the actual work? (Or maybe
I'm just confused, and there's an easy way to deliver both
metadata and the actual work alongside each other with no
ambiguity.)
Juha can clarify this.
In the interest of saving time, I can probably get this one.
The answer is "yes", this is intended to be about works not
accessible on the Internet (although a very similar issue
applies in at least one case where the NBN describes a
conceptual work whose components also have NBNs, but not all of
the components are available on the Internet. The extra
paragraph break my be my fault as periodically serving as copy
editor on the I-D.
Post by Benjamin Kaduk
Section 4.1
National Bibliography Number (NBN) is a generic term
referring to a group of identifier systems administered
by the national libraries and institutions authorized by
them.
"the national libraries" implies a specific set -- which
ones? It may be better to hedge with "some national
libraries".
Or remove "the" ... "by national libraries".
That's probably better :)
That would be my preference, but Juha should decide on this.
Post by Benjamin Kaduk
Section 4.2.2
Do we need to say anything about a URN-to-URI step before
talking about URI-to-resource services?
Given what 3986 has to say, a URN-to_URI step would be an
oxymoron. If you meant a URN-to-URL step, that is probably a
matter for 8141 and it may be worth pointing out that members of
the web community (a euphuism for a particular, mostly known,
set of individuals who claim to speak for that community in case
you haven't figured that out) have been violently opposed to
such text, claiming that, if it is needed, then there is really
no need for URNs. On the other hand, while the URNBIS WG could
not reach consensus on any particular proposal and did reach
consensus about not trying to proceed with definitions, that is
much of what r-components are expected to be about.
Post by Benjamin Kaduk
I'm also wondering about any relationship between "component
resource" NBNs and f-components of the containing work. If
there is are NBNs assigned to both an image within a work
and that containing work, and an NBN with f-resource is
used to refer to the image within the containing work, is
there any relationship between the f-resource and the
image-specific NBN?
On a per-sub-namespace basis, possibly. In the general case,
maybe. This is not an NBN issue but an issue about how
namespaces are managed, organized, and used, i.e., probably an
8141 issue.
Post by Benjamin Kaduk
Section 4.3
Expressing NBNs as URNs is usually straightforward, as
only ASCII characters are allowed in NBN strings. If
necessary, NBNs MUST be translated into canonical form
as specified in RFC 8141.
When is it necessary?
It seems that in theory an NBN itself could contain non-ASCII
characters, whereas an NBN URN and its nbn_string construct
can contain only ASCII characters. At least that is my
understanding.
That is correct. But, more or less per 3986, _any_ URI can
contain non-ASCII characters in the tail by %-encoding them.
There were some moves in the URNBIS WG to restrict that for
URNs, but it met resistance from the usual suspects. The bottom
line here, and I don't know how loudly to say it, is that using
non-ASCII characters in nbn_strings would probably nothing short
to stupid, especially given that both IETF and W3C have
suggested that they be avoided in identifiers non-specialist end
users are not expected to see. However, due to a problem that
goes back well before the early decision that ISO 8859-1 was
going to be an adequate encoding for HTMP content (but of which
that decision is symptomatic), it would be unsurprising if one
or more national libraries whose local language uses Latin
script with a few lightly-decorated characters had not taken
that advice or had decided to incorporate existing (perhaps for
decades) identifier strings with a few Latin characters outside
the ASCOO subset into their national NBNs. One could imagine
rewording the text mentioned above for more clarity (a job I
will happily leave to the experts who make up the RFC Editor
function) but the bottom line is that all we do is to say "don't
do that, but if you decide to do it anyway, this is what you
must do to prevent even worse problems".
Post by Benjamin Kaduk
Being part of the prefix, sub-namespace identifier
strings are case- insensitive. They MUST NOT contain
any hyphens.
This MUST seems to just duplicate a syntactic requirement
from the ABNF; is RFC 2119 language really necessary?
/me shrugs
Probably not, but, while Juha should confirm, I assume that part
of the origin of this text is that several other International
Standard identifiers, e.g., ISBNs, all hyphens and treat them as
optional. It might be wise to reinforce the message that the
URN:NBN solution to the problems that causes is to clearly say
"no" and say that clearly enough that even those whose eyes
glaze over at ABNF will get the message. Whether it is better
done by something like the sentences above or by saying "Hyphens
are prohibited by the ABFN, see Section XXXX" is, IMO, a matter
of editorial style and preference.
Post by Benjamin Kaduk
Section 8
John Klensin provided significant editorial and advisory
support for late versions of the draft.
Presumably that's "later versions"?
Yes.
I really don't care. If one thinks this is an editorial
problem, leave it to the RFC Editor. If one thinks it is
substantive, remember that, while this is a -00 draft, the I-D
itself has been through many iterations under other names, so it
depends on how you count because I had nothing to do with early
version of the I-D and at most only a reviewer/participant role
in RFC 3188. If this were a different sort of document and I
cared, I could make a strong case that I've been involved enough
and have written enough text to be listed as a Contributor, but
I think the nature of this document is that it is better if Juha
is sole author without contributors other than Alfred.

FWIW, I can't see why attribution at this level should be an
IESG problem unless you have reason to believe that IPR rules
are being violated.

Finally, to avoid writing a separate note even though it will
make this a paragraph longer, I think several of the comments
you )Benjamin) and Adam have made make a strong case for a
clarifying update to RFC 8141. In principle, I agree with that.
It is little surprise to me that new URN namespace proposals are
exposing issues that, if we had more ability to predict them,
would have been reflected in 8141 itself. The difficulty with
such an update is that, at the time 8141 and 8254 were
completed, the URNBIS WG had run out of energy and was
developing a level of acrimony that made further progress
unlikely. If we were to try to open 8141 to do a clarification,
I can just about guarantee that some of those who were the
sources of frustration that led to that acrimony would insist
that no document move forward until their pet issues and easy
solutions were addressed. That, in turn, would result in a
situation like the i18n one, only with less downside if the
issues are not addressed and more issues that soul require
solving fundamental philosophical disagreements in IETF
community. I can't recommend going there, but it doesn't seem to
me that trying to clarify 9141 by text in a single namespace
definition is the solution either.

best,
john
Hakala, Juha E
2018-06-11 08:42:07 UTC
Permalink
Hello,

comments inline.

-----Original Message-----
From: urn <urn-***@ietf.org> On Behalf Of John C Klensin
Sent: lauantai 9. kesäkuuta 2018 23.21
To: Benjamin Kaduk <***@mit.edu>; Peter Saint-Andre <***@mozilla.com>
Cc: ***@ietf.org; The IESG <***@ietf.org>; draft-hakala-urn-nbn-***@ietf.org
Subject: Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)
Post by Benjamin Kaduk
The semantics of r-components are yet to be defined. I would venture
that the IETF is probably not the right place to do that work, given
how little energy remained in the URN WG at the end (and we probably
didn't have the right people in the room in the first place).
Juha: in order to avoid chaos, URN user community needs a centrally maintained registry of resolution services and parameters related to them. As far as I am concerned, r-component syntax and semantics should not be user- or namespace-specific, it has to apply to all namespaces. So if somebody establishes r-component syntax for requesting a Dublin Core metadata record about the identified resource, the components used should be registered. And before creating the r-component, URN users should check from the registry if the required components (service and parameters) exist already.

While writing the I-D, my assumption was that r-component usage will only start once there is an agreement on r-component syntax, and there is a central registry for services specified. Developing syntax should not be rocket science; what we need is a way to specify services and parameters related to them in a machine readable way.

Assuming that each r-component is allowed to contain one and only one service and 0-n parameters related to it, syntax might look like this:

s=<service>&<parameter1>=<value>&<parameter2>=<value>&...&<parametern>=<value>

for instance:

s=URC&format=DC

to request metadata about the identified resource in Dublin Core format. I am sure that people who are more technically oriented than myself will come up with something better than my example, but the components (services and their parameters) should be the same.

Generic r-component syntax can be specified without any knowledge about the resolution services to be supported, but service specific details may only be provided by experts who know how current applications operate. I do hope that IETF experts can assist with syntax definition; once that (and the registry) is in place, URN user communities can start providing service level specifications.

Users of persistent identifiers (DOI, Handle, ARK, URN) are all currently under pressure to enrich the functionality of resolvers. Unless a central (and shared) registry of resolution services is established, there is a clear danger that each identifier system will develop its own solutions, which will seriously limit interoperability between persistent identifier systems.
Post by Benjamin Kaduk
I won't argue with that. Does it make sense to say something like
"There are not currently any broadly accepted semantics for
r-components at the time of this writing which may be grounds to be
cautious with their use" in this document?
Juha: such text can be added as a clarification. Since my assumption has been that the r-component will not be used at all before we have generally approved syntax and semantics for it, I might use even stronger formulation than just "cautious". But as noted above, I did not say anything about this because I thought that non-usage of r-component applies automatically to all URN namespaces as long as r-component syntax is still work in progress. It seems a bit redundant to repeat the same thing in all namespace registrations.

Having said that, I do hope that the syntax will be formally specified soon, so that the URN user communities can start adding service level specifications into the central registry (which should also be there from the beginning).
Post by Benjamin Kaduk
If an NBN identifies a work, descriptive metadata about
the work SHOULD be supplied. The metadata record MAY
contain links to Internet-accessible digital
manifestations of the work.
This left me confused. Is it only intended to apply in the case
described in the previous paragraph, where the resource identified
by the NBN is not available in the Internet? Or does it always
apply, forcing the metadata to take precedence over delivering the
actual work? (Or maybe I'm just confused, and there's an easy way
to deliver both metadata and the actual work alongside each other
with no ambiguity.)
Juha can clarify this.
Juha: I can understand why this confuses people. Sorry, this bit is library specific and hard to understand unless the reader knows our practices.

Work itself is immaterial. There may be 0-n manifestations of it, some of them hand-held, some digital.

Work is an umbrella concept with which it is possible to bring together all existing manifestations. In practice this can be done by e.g adding to the work metadata record links to the metadata records describing these manifestations. A practical example of similar practice is a splash page describing a research data set ("work level metadata"). Such pages often contains links to all versions of the relevant data set, with manifestation level metadata such as appropriate warnings if some versions of the data set are very large.

A user who requires metadata about a work may not even know if there are digital manifestations of the work. But with the metadata record the user will be able to find this out, and select the version which suits his/her needs best.
Post by Benjamin Kaduk
Section 4.1
National Bibliography Number (NBN) is a generic term
referring to a group of identifier systems administered
by the national libraries and institutions authorized by
them.
"the national libraries" implies a specific set -- which ones? It
may be better to hedge with "some national libraries".
Or remove "the" ... "by national libraries".
That's probably better :)
That would be my preference, but Juha should decide on this.

Juha: by national libraries is better.
Post by Benjamin Kaduk
Section 4.2.2
Do we need to say anything about a URN-to-URI step before talking
about URI-to-resource services?
Given what 3986 has to say, a URN-to_URI step would be an oxymoron. If you meant a URN-to-URL step, that is probably a matter for 8141 and it may be worth pointing out that members of the web community (a euphuism for a particular, mostly known, set of individuals who claim to speak for that community in case you haven't figured that out) have been violently opposed to such text, claiming that, if it is needed, then there is really no need for URNs. On the other hand, while the URNBIS WG could not reach consensus on any particular proposal and did reach consensus about not trying to proceed with definitions, that is much of what r-components are expected to be about.
Post by Benjamin Kaduk
I'm also wondering about any relationship between "component
resource" NBNs and f-components of the containing work. If there
is are NBNs assigned to both an image within a work and that
containing work, and an NBN with f-resource is used to refer to the
image within the containing work, is there any relationship between
the f-resource and the image-specific NBN?
On a per-sub-namespace basis, possibly. In the general case,
maybe. This is not an NBN issue but an issue about how namespaces are managed, organized, and used, i.e., probably an
8141 issue.

Juha: with NBN, national libraries have free hands to specify their own naming policies. They may give just one identifier for e.g. an entire EPUB 3.0..1 e-book, or they may assign separate identifiers for all the component parts of the e-book. The best policy depends on many things, including the level of control & access required / possible.

NBN specification does not specify limits on what can / should be done. If the library prefers to use f-component to identify for instance images in a PDF document, that is fine.
Post by Benjamin Kaduk
Section 4.3
Expressing NBNs as URNs is usually straightforward, as
only ASCII characters are allowed in NBN strings. If
necessary, NBNs MUST be translated into canonical form
as specified in RFC 8141.
When is it necessary?
It seems that in theory an NBN itself could contain non-ASCII
characters, whereas an NBN URN and its nbn_string construct can
contain only ASCII characters. At least that is my understanding.
That is correct. But, more or less per 3986, _any_ URI can contain non-ASCII characters in the tail by %-encoding them.
There were some moves in the URNBIS WG to restrict that for URNs, but it met resistance from the usual suspects. The bottom line here, and I don't know how loudly to say it, is that using non-ASCII characters in nbn_strings would probably nothing short to stupid, especially given that both IETF and W3C have suggested that they be avoided in identifiers non-specialist end
users are not expected to see. However, due to a problem that
goes back well before the early decision that ISO 8859-1 was going to be an adequate encoding for HTMP content (but of which that decision is symptomatic), it would be unsurprising if one or more national libraries whose local language uses Latin script with a few lightly-decorated characters had not taken that advice or had decided to incorporate existing (perhaps for
decades) identifier strings with a few Latin characters outside
the ASCOO subset into their national NBNs. One could imagine
rewording the text mentioned above for more clarity (a job I will happily leave to the experts who make up the RFC Editor
function) but the bottom line is that all we do is to say "don't do that, but if you decide to do it anyway, this is what you must do to prevent even worse problems".

Juha: all NBNs I have seen so far have contained just (printable) ASCII characters. But outside Europe there may be national libraries which have been more liberal. If so, non-ASCII characters in their NBNs must be %-encoded when these NBNs become URN:NBNs.

Nobody knows if there are NBNs with non-ASCII characters, and if so, how common they are. Therefore I decided to drop the recommendation that such characters should be avoided in NBN strings.

In order to clarify text, beginning of 4.3 could be edited into:

Expressing NBNs as URNs is straightforward if NBN strings contain only ASCII characters. Non-ASCII characters, if any, MUST be translated into canonical form as specified in RFC 8141.
Post by Benjamin Kaduk
Being part of the prefix, sub-namespace identifier
strings are case- insensitive. They MUST NOT contain
any hyphens.
This MUST seems to just duplicate a syntactic requirement from the
ABNF; is RFC 2119 language really necessary?
/me shrugs
Probably not, but, while Juha should confirm, I assume that part of the origin of this text is that several other International Standard identifiers, e.g., ISBNs, all hyphens and treat them as
optional. It might be wise to reinforce the message that the
URN:NBN solution to the problems that causes is to clearly say "no" and say that clearly enough that even those whose eyes
glaze over at ABNF will get the message. Whether it is better
done by something like the sentences above or by saying "Hyphens are prohibited by the ABFN, see Section XXXX" is, IMO, a matter of editorial style and preference.

Juha: I believe this may be an error inherited from RFC 3188. The forbidden character should be colon.

URN:NBNs with sub-namespaces look like this:

urn:nbn:se:uu:diva-284370

This is a Swedish URN:NBN assigned by the Uppsala university. Organizations which have a sub-namespace may divide their sub-namespace further if necessary, using colons (e.g. Uppsala could create a namespace ID urn:nbn:se:uu:thesis:). Given the special role the colon has, sub-namespace identifiers must not contain them, since theoretically allowing colons could cause duplicate assignments. So if there is nbn:fi subnamespaces nbn:fi:aa, every NID in the form nbn:fi:aa:<string> must be a sub-namespace of nbn:fi:aa.

Changing the specification might cause problems with backwards compatibility had some libraries assigned sub-namespaces with colons in them. I don't think that this is the case. So the next version of the I-D could say "MUST NOT contain any hyphens or colons".
Post by Benjamin Kaduk
Section 8
John Klensin provided significant editorial and advisory
support for late versions of the draft.
Presumably that's "later versions"?
Yes.
I really don't care. If one thinks this is an editorial problem, leave it to the RFC Editor. If one thinks it is substantive, remember that, while this is a -00 draft, the I-D itself has been through many iterations under other names, so it depends on how you count because I had nothing to do with early version of the I-D and at most only a reviewer/participant role in RFC 3188. If this were a different sort of document and I cared, I could make a strong case that I've been involved enough and have written enough text to be listed as a Contributor, but I think the nature of this document is that it is better if Juha is sole author without contributors other than Alfred.

FWIW, I can't see why attribution at this level should be an IESG problem unless you have reason to believe that IPR rules are being violated.

Juha: I think "later versions" is fine.

Finally, to avoid writing a separate note even though it will make this a paragraph longer, I think several of the comments you )Benjamin) and Adam have made make a strong case for a clarifying update to RFC 8141. In principle, I agree with that.
It is little surprise to me that new URN namespace proposals are exposing issues that, if we had more ability to predict them, would have been reflected in 8141 itself. The difficulty with such an update is that, at the time 8141 and 8254 were completed, the URNBIS WG had run out of energy and was developing a level of acrimony that made further progress unlikely. If we were to try to open 8141 to do a clarification, I can just about guarantee that some of those who were the sources of frustration that led to that acrimony would insist that no document move forward until their pet issues and easy solutions were addressed. That, in turn, would result in a situation like the i18n one, only with less downside if the issues are not addressed and more issues that soul require solving fundamental philosophical disagreements in IETF community. I can't recommend going there, but it doesn't seem to me that trying to clarify 9141 by text in a single namespace definition is the solution either.

Juha: library community has been using URN:NBNs succesfully since RFC 3188 was published. Tens of millions of identifiers have been assigned. From libraries' point of view, the important thing is that the revised RFC validates some new URN:NBN assignment practices which we did not foresee when RFC 3188 was written. I do hope that the revision will not get stuck on technicalities or philosophical disagreements which have only minor impact on practical work.

In the long term I do hope that allowing the use of r-, q- and f-components will help libraries and other URN users such as film industry to build smarter URN resolvers. There is definitely a need for that.

All the best,

Juha

PS. I volunteer to produce the next version of the I-D, but should I use the txt or XML version? And if the latter, where do I get it (last time I edited the txt version).
Benjamin Kaduk
2018-06-29 22:08:59 UTC
Permalink
My belated thanks to Juha and everyone for the additional explanations.
I think the changes in the -01 help at least this reader.

I cleared my discuss position; sorry that it took so long to do so.

-Benjamin
Post by Hakala, Juha E
Hello,
comments inline.
-----Original Message-----
Sent: lauantai 9. kesäkuuta 2018 23.21
Subject: Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)
Post by Benjamin Kaduk
The semantics of r-components are yet to be defined. I would venture
that the IETF is probably not the right place to do that work, given
how little energy remained in the URN WG at the end (and we probably
didn't have the right people in the room in the first place).
Juha: in order to avoid chaos, URN user community needs a centrally maintained registry of resolution services and parameters related to them. As far as I am concerned, r-component syntax and semantics should not be user- or namespace-specific, it has to apply to all namespaces. So if somebody establishes r-component syntax for requesting a Dublin Core metadata record about the identified resource, the components used should be registered. And before creating the r-component, URN users should check from the registry if the required components (service and parameters) exist already.
While writing the I-D, my assumption was that r-component usage will only start once there is an agreement on r-component syntax, and there is a central registry for services specified. Developing syntax should not be rocket science; what we need is a way to specify services and parameters related to them in a machine readable way.
s=<service>&<parameter1>=<value>&<parameter2>=<value>&...&<parametern>=<value>
s=URC&format=DC
to request metadata about the identified resource in Dublin Core format. I am sure that people who are more technically oriented than myself will come up with something better than my example, but the components (services and their parameters) should be the same.
Generic r-component syntax can be specified without any knowledge about the resolution services to be supported, but service specific details may only be provided by experts who know how current applications operate. I do hope that IETF experts can assist with syntax definition; once that (and the registry) is in place, URN user communities can start providing service level specifications.
Users of persistent identifiers (DOI, Handle, ARK, URN) are all currently under pressure to enrich the functionality of resolvers. Unless a central (and shared) registry of resolution services is established, there is a clear danger that each identifier system will develop its own solutions, which will seriously limit interoperability between persistent identifier systems.
Post by Benjamin Kaduk
I won't argue with that. Does it make sense to say something like
"There are not currently any broadly accepted semantics for
r-components at the time of this writing which may be grounds to be
cautious with their use" in this document?
Juha: such text can be added as a clarification. Since my assumption has been that the r-component will not be used at all before we have generally approved syntax and semantics for it, I might use even stronger formulation than just "cautious". But as noted above, I did not say anything about this because I thought that non-usage of r-component applies automatically to all URN namespaces as long as r-component syntax is still work in progress. It seems a bit redundant to repeat the same thing in all namespace registrations.
Having said that, I do hope that the syntax will be formally specified soon, so that the URN user communities can start adding service level specifications into the central registry (which should also be there from the beginning).
Post by Benjamin Kaduk
If an NBN identifies a work, descriptive metadata about
the work SHOULD be supplied. The metadata record MAY
contain links to Internet-accessible digital
manifestations of the work.
This left me confused. Is it only intended to apply in the case
described in the previous paragraph, where the resource identified
by the NBN is not available in the Internet? Or does it always
apply, forcing the metadata to take precedence over delivering the
actual work? (Or maybe I'm just confused, and there's an easy way
to deliver both metadata and the actual work alongside each other
with no ambiguity.)
Juha can clarify this.
Juha: I can understand why this confuses people. Sorry, this bit is library specific and hard to understand unless the reader knows our practices.
Work itself is immaterial. There may be 0-n manifestations of it, some of them hand-held, some digital.
Work is an umbrella concept with which it is possible to bring together all existing manifestations. In practice this can be done by e.g adding to the work metadata record links to the metadata records describing these manifestations. A practical example of similar practice is a splash page describing a research data set ("work level metadata"). Such pages often contains links to all versions of the relevant data set, with manifestation level metadata such as appropriate warnings if some versions of the data set are very large.
A user who requires metadata about a work may not even know if there are digital manifestations of the work. But with the metadata record the user will be able to find this out, and select the version which suits his/her needs best.
Post by Benjamin Kaduk
Section 4.1
National Bibliography Number (NBN) is a generic term
referring to a group of identifier systems administered
by the national libraries and institutions authorized by
them.
"the national libraries" implies a specific set -- which ones? It
may be better to hedge with "some national libraries".
Or remove "the" ... "by national libraries".
That's probably better :)
That would be my preference, but Juha should decide on this.
Juha: by national libraries is better.
Post by Benjamin Kaduk
Section 4.2.2
Do we need to say anything about a URN-to-URI step before talking
about URI-to-resource services?
Given what 3986 has to say, a URN-to_URI step would be an oxymoron. If you meant a URN-to-URL step, that is probably a matter for 8141 and it may be worth pointing out that members of the web community (a euphuism for a particular, mostly known, set of individuals who claim to speak for that community in case you haven't figured that out) have been violently opposed to such text, claiming that, if it is needed, then there is really no need for URNs. On the other hand, while the URNBIS WG could not reach consensus on any particular proposal and did reach consensus about not trying to proceed with definitions, that is much of what r-components are expected to be about.
Post by Benjamin Kaduk
I'm also wondering about any relationship between "component
resource" NBNs and f-components of the containing work. If there
is are NBNs assigned to both an image within a work and that
containing work, and an NBN with f-resource is used to refer to the
image within the containing work, is there any relationship between
the f-resource and the image-specific NBN?
On a per-sub-namespace basis, possibly. In the general case,
maybe. This is not an NBN issue but an issue about how namespaces are managed, organized, and used, i.e., probably an
8141 issue.
Juha: with NBN, national libraries have free hands to specify their own naming policies. They may give just one identifier for e.g. an entire EPUB 3..0.1 e-book, or they may assign separate identifiers for all the component parts of the e-book. The best policy depends on many things, including the level of control & access required / possible.
NBN specification does not specify limits on what can / should be done. If the library prefers to use f-component to identify for instance images in a PDF document, that is fine.
Post by Benjamin Kaduk
Section 4.3
Expressing NBNs as URNs is usually straightforward, as
only ASCII characters are allowed in NBN strings. If
necessary, NBNs MUST be translated into canonical form
as specified in RFC 8141.
When is it necessary?
It seems that in theory an NBN itself could contain non-ASCII
characters, whereas an NBN URN and its nbn_string construct can
contain only ASCII characters. At least that is my understanding.
That is correct. But, more or less per 3986, _any_ URI can contain non-ASCII characters in the tail by %-encoding them.
There were some moves in the URNBIS WG to restrict that for URNs, but it met resistance from the usual suspects. The bottom line here, and I don't know how loudly to say it, is that using non-ASCII characters in nbn_strings would probably nothing short to stupid, especially given that both IETF and W3C have suggested that they be avoided in identifiers non-specialist end
users are not expected to see. However, due to a problem that
goes back well before the early decision that ISO 8859-1 was going to be an adequate encoding for HTMP content (but of which that decision is symptomatic), it would be unsurprising if one or more national libraries whose local language uses Latin script with a few lightly-decorated characters had not taken that advice or had decided to incorporate existing (perhaps for
decades) identifier strings with a few Latin characters outside
the ASCOO subset into their national NBNs. One could imagine
rewording the text mentioned above for more clarity (a job I will happily leave to the experts who make up the RFC Editor
function) but the bottom line is that all we do is to say "don't do that, but if you decide to do it anyway, this is what you must do to prevent even worse problems".
Juha: all NBNs I have seen so far have contained just (printable) ASCII characters. But outside Europe there may be national libraries which have been more liberal. If so, non-ASCII characters in their NBNs must be %-encoded when these NBNs become URN:NBNs.
Nobody knows if there are NBNs with non-ASCII characters, and if so, how common they are. Therefore I decided to drop the recommendation that such characters should be avoided in NBN strings.
Expressing NBNs as URNs is straightforward if NBN strings contain only ASCII characters. Non-ASCII characters, if any, MUST be translated into canonical form as specified in RFC 8141.
Post by Benjamin Kaduk
Being part of the prefix, sub-namespace identifier
strings are case- insensitive. They MUST NOT contain
any hyphens.
This MUST seems to just duplicate a syntactic requirement from the
ABNF; is RFC 2119 language really necessary?
/me shrugs
Probably not, but, while Juha should confirm, I assume that part of the origin of this text is that several other International Standard identifiers, e.g., ISBNs, all hyphens and treat them as
optional. It might be wise to reinforce the message that the
URN:NBN solution to the problems that causes is to clearly say "no" and say that clearly enough that even those whose eyes
glaze over at ABNF will get the message. Whether it is better
done by something like the sentences above or by saying "Hyphens are prohibited by the ABFN, see Section XXXX" is, IMO, a matter of editorial style and preference.
Juha: I believe this may be an error inherited from RFC 3188. The forbidden character should be colon.
urn:nbn:se:uu:diva-284370
This is a Swedish URN:NBN assigned by the Uppsala university. Organizations which have a sub-namespace may divide their sub-namespace further if necessary, using colons (e.g. Uppsala could create a namespace ID urn:nbn:se:uu:thesis:). Given the special role the colon has, sub-namespace identifiers must not contain them, since theoretically allowing colons could cause duplicate assignments. So if there is nbn:fi subnamespaces nbn:fi:aa, every NID in the form nbn:fi:aa:<string> must be a sub-namespace of nbn:fi:aa.
Changing the specification might cause problems with backwards compatibility had some libraries assigned sub-namespaces with colons in them. I don't think that this is the case. So the next version of the I-D could say "MUST NOT contain any hyphens or colons".
Post by Benjamin Kaduk
Section 8
John Klensin provided significant editorial and advisory
support for late versions of the draft.
Presumably that's "later versions"?
Yes.
I really don't care. If one thinks this is an editorial problem, leave it to the RFC Editor. If one thinks it is substantive, remember that, while this is a -00 draft, the I-D itself has been through many iterations under other names, so it depends on how you count because I had nothing to do with early version of the I-D and at most only a reviewer/participant role in RFC 3188. If this were a different sort of document and I cared, I could make a strong case that I've been involved enough and have written enough text to be listed as a Contributor, but I think the nature of this document is that it is better if Juha is sole author without contributors other than Alfred.
FWIW, I can't see why attribution at this level should be an IESG problem unless you have reason to believe that IPR rules are being violated.
Juha: I think "later versions" is fine.
Finally, to avoid writing a separate note even though it will make this a paragraph longer, I think several of the comments you )Benjamin) and Adam have made make a strong case for a clarifying update to RFC 8141. In principle, I agree with that.
It is little surprise to me that new URN namespace proposals are exposing issues that, if we had more ability to predict them, would have been reflected in 8141 itself. The difficulty with such an update is that, at the time 8141 and 8254 were completed, the URNBIS WG had run out of energy and was developing a level of acrimony that made further progress unlikely. If we were to try to open 8141 to do a clarification, I can just about guarantee that some of those who were the sources of frustration that led to that acrimony would insist that no document move forward until their pet issues and easy solutions were addressed. That, in turn, would result in a situation like the i18n one, only with less downside if the issues are not addressed and more issues that soul require solving fundamental philosophical disagreements in IETF community. I can't recommend going there, but it doesn't seem to me that trying to clarify 9141 by text in a single namespace definition is the solution either.
Juha: library community has been using URN:NBNs succesfully since RFC 3188 was published. Tens of millions of identifiers have been assigned. From libraries' point of view, the important thing is that the revised RFC validates some new URN:NBN assignment practices which we did not foresee when RFC 3188 was written. I do hope that the revision will not get stuck on technicalities or philosophical disagreements which have only minor impact on practical work.
In the long term I do hope that allowing the use of r-, q- and f-components will help libraries and other URN users such as film industry to build smarter URN resolvers. There is definitely a need for that.
All the best,
Juha
PS. I volunteer to produce the next version of the I-D, but should I use the txt or XML version? And if the latter, where do I get it (last time I edited the txt version).
_______________________________________________
urn mailing list
https://www.ietf.org/mailman/listinfo/urn
Loading...