RFC Series publishes first RFC with non-ASCII characters

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC Series publishes first RFC with non-ASCII characters

Heather Flanagan (RFC Series Editor)
Hello all,

RFC 8187, "Indicating Character Encoding and Language for HTTP Header
Field Parameters", is the first RFC to be published with UTF-8 encoding
and include characters not in the basic ASCII character set. This
document has been, with the author's consent, patience, and support,
used to test the existing tool chain to produce RFCs to see where the
environment has difficulty in handling non-ASCII characters. The RPC is
continues this testing with the PRECIS document cluster (C326), which is
currently in AUTH48.

The RPC and the Tools Team has identified several areas that need to be
modified to support these characters. Some of those areas will
ultimately be handled with the new format tools; others have been
modified as part of the more general work to prepare for the new RFC
format. For the documents being used to test the toolchain, a
significant amount of manual processing is required to publish the
RFCs with non-ASCII characters in the final text. In order to keep
overall processing times down, leave staff enough time to test
the other tools that are being developed as part of the RFC format
project, and allow the editors time to create and/or update new
procedures as the v3 tools are released, no additional non-ASCII
documents outside of RFC 8187 and the PRECIS cluster will be published
until the new format tools are in production.

Many thanks go out to Julian Reschke and Peter Saint-Andre for their
support, the Tools Team for answering a variety of technical questions,
and to the RPC staff who worked to ensure that the RFC
publication process resulted in readable RFCs.


Heather Flanagan, RSE

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

HANSEN, TONY L
On 9/14/17, 5:07 PM, "rfc-interest on behalf of Heather Flanagan (RFC Series Editor)" <[hidden email] on behalf of [hidden email]> wrote:

    RFC 8187, "Indicating Character Encoding and Language for HTTP Header
    Field Parameters", is the first RFC to be published with UTF-8 encoding
    and include characters not in the basic ASCII character set. ...
       
    Heather Flanagan, RSE


Congratulations, Heather, to you, your team, and the authors. This is a big step forward, and I know it took lots of hard work and community involvement to get here.

I think the steps the team took used the right balance of progress and restraint in this important area.

Congratulations again.

        Tony Hansen


_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Patrik Fältström-4
On 15 Sep 2017, at 18:18, HANSEN, TONY L wrote:

> Congratulations, Heather, to you, your team, and the authors. This is a big step forward, and I know it took lots of hard work and community involvement to get here.
>
> I think the steps the team took used the right balance of progress and restraint in this important area.
>
> Congratulations again.

Indeed! Congratulations!

   Patrik Fältström

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest

signature.asc (210 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Martin J. Dürst
In reply to this post by Heather Flanagan (RFC Series Editor)
On 2017/09/15 06:07, Heather Flanagan (RFC Series Editor) wrote:
> Hello all,
>
> RFC 8187, "Indicating Character Encoding and Language for HTTP Header
> Field Parameters", is the first RFC to be published with UTF-8 encoding
> and include characters not in the basic ASCII character set. This
> document has been, with the author's consent, patience, and support,
> used to test the existing tool chain to produce RFCs to see where the
> environment has difficulty in handling non-ASCII characters.

Congratulations to everybody involved! Really great to see this move
forward!

Regards,   Martin.

P.S.: Looking at the RFC, there were exactly 5 non-ASCII characters (two
'£'s and one '€' in the examples, and two 'ü's, one in Julian's address
and one in my name in the acks. Now if I ever need a claim to fame, I
can claim that my name contributed to the first non-ASCII characters in
RFCs :-).

P.P.S.: Regarding the content of that RFC, I think may already know, but
for the record, I'll repeat here that I think it's way outdated. I
really wish the HTTP WG would take some hints from this RFC work and
finally get the message that straightforward UTF-8 is the way to go (we
all knew that ages ago, anyway).


> The RPC is
> continues this testing with the PRECIS document cluster (C326), which is
> currently in AUTH48.
>
> The RPC and the Tools Team has identified several areas that need to be
> modified to support these characters. Some of those areas will
> ultimately be handled with the new format tools; others have been
> modified as part of the more general work to prepare for the new RFC
> format. For the documents being used to test the toolchain, a
> significant amount of manual processing is required to publish the
> RFCs with non-ASCII characters in the final text. In order to keep
> overall processing times down, leave staff enough time to test
> the other tools that are being developed as part of the RFC format
> project, and allow the editors time to create and/or update new
> procedures as the v3 tools are released, no additional non-ASCII
> documents outside of RFC 8187 and the PRECIS cluster will be published
> until the new format tools are in production.
>
> Many thanks go out to Julian Reschke and Peter Saint-Andre for their
> support, the Tools Team for answering a variety of technical questions,
> and to the RPC staff who worked to ensure that the RFC
> publication process resulted in readable RFCs.
>
>
> Heather Flanagan, RSE
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Patrik Fältström-4
On 16 Sep 2017, at 2:22, Martin J. Dürst wrote:

> P.P.S.: Regarding the content of that RFC, I think may already know, but for the record, I'll repeat here that I think it's way outdated. I really wish the HTTP WG would take some hints from this RFC work and finally get the message that straightforward UTF-8 is the way to go (we all knew that ages ago, anyway).

+1

RFC 2130 from April 1997:

> This report recommends the use of ISO 10646 as the default Coded
> Character Set, and UTF-8 as the default Character Encoding Scheme in
> the creation of new protocols or new version of old protocols which
> transmit text. These defaults do not deprecate the use of other
> character sets when and where they are needed; they are simply
> intended to provide guidance and a specification for
> interoperability.

;-)

   paf

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest

signature.asc (210 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Tim Bray-3
In reply to this post by Heather Flanagan (RFC Series Editor)
My hearty thanks to all those who dragged this process into the 21st century. I might go so far as to say 💓 💓 💓!

Seriously, strong work.

On Sep 14, 2017 2:07 PM, "Heather Flanagan (RFC Series Editor)" <[hidden email]> wrote:
Hello all,

RFC 8187, "Indicating Character Encoding and Language for HTTP Header
Field Parameters", is the first RFC to be published with UTF-8 encoding
and include characters not in the basic ASCII character set. This
document has been, with the author's consent, patience, and support,
used to test the existing tool chain to produce RFCs to see where the
environment has difficulty in handling non-ASCII characters. The RPC is
continues this testing with the PRECIS document cluster (C326), which is
currently in AUTH48.

The RPC and the Tools Team has identified several areas that need to be
modified to support these characters. Some of those areas will
ultimately be handled with the new format tools; others have been
modified as part of the more general work to prepare for the new RFC
format. For the documents being used to test the toolchain, a
significant amount of manual processing is required to publish the
RFCs with non-ASCII characters in the final text. In order to keep
overall processing times down, leave staff enough time to test
the other tools that are being developed as part of the RFC format
project, and allow the editors time to create and/or update new
procedures as the v3 tools are released, no additional non-ASCII
documents outside of RFC 8187 and the PRECIS cluster will be published
until the new format tools are in production.

Many thanks go out to Julian Reschke and Peter Saint-Andre for their
support, the Tools Team for answering a variety of technical questions,
and to the RPC staff who worked to ensure that the RFC
publication process resulted in readable RFCs.


Heather Flanagan, RSE

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Jari Arkko-2
Indeed. This was an important and very useful milestone. Thanks, all who helped make it happen. And yes, I’m sure there’s plenty of work ahead too.

Jari

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Toerless Eckert-2
In reply to this post by Heather Flanagan (RFC Series Editor)
One usability comment:

As a frequent reader of RFC, primarily using the text format, it would be
great if there was a human readable tag, eg: as another header line indicating
enough about the format used by the document to quickly know what the
minimum required viewer is to get a guaranteed correct display
- ASCII, UTF8, HTML (when graphics are needed), ...

Eg:


Internet Engineering Task Force (IETF)                        J. Reschke
Request for Comments: 8187                                    greenbytes
Obsoletes: 5987                                           September 2017
Category: Standards Track
Format: ASCII / UTF8 / "graphics" / ...

Cheers
    Toerless
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Toerless Eckert-3
In reply to this post by Heather Flanagan (RFC Series Editor)
One usability comment:

As a frequent reader of RFC, primarily using the text format, it would be
great if there was a human readable tag, eg: as another header line indicating
enough about the format used by the document to quickly know what the
minimum required viewer is to get a guaranteed correct display
- ASCII, UTF8, HTML (when graphics are needed), ...

Eg:


Internet Engineering Task Force (IETF)                        J. Reschke
Request for Comments: 8187                                    greenbytes
Obsoletes: 5987                                           September 2017
Category: Standards Track
Format: ASCII / UTF8 / "graphics" / ...

Cheers
    Toerless
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Julian Reschke
In reply to this post by Heather Flanagan (RFC Series Editor)
On 2017-09-14 23:07, Heather Flanagan (RFC Series Editor) wrote:
> ...
> The RPC and the Tools Team has identified several areas that need to be
> modified to support these characters. Some of those areas will
> ultimately be handled with the new format tools; others have been
> modified as part of the more general work to prepare for the new RFC
> format. For the documents being used to test the toolchain, a
> significant amount of manual processing is required to publish the
> RFCs with non-ASCII characters in the final text. In order to keep
 > ...

Out of curiosity: as the current version of xml2rfc produces the UTF-8
plain text just fine (*), what additional processing are you referring to?

> overall processing times down, leave staff enough time to test
> the other tools that are being developed as part of the RFC format
> project, and allow the editors time to create and/or update new
> procedures as the v3 tools are released, no additional non-ASCII
> documents outside of RFC 8187 and the PRECIS cluster will be published
> until the new format tools are in production.
> ...

Understood.

Now what does this mean for drafts being written right now, and which
are not expected to be ready for publication in the next, let's say, 12
months? Can we start submitting I-Ds with non-ASCII characters right now
(without getting stopped by id-nits...)?

Best regards, Julian

(*) with the possible exception of the BOM :-)
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Heather Flanagan (RFC Series Editor)
On 9/22/17 5:51 AM, Julian Reschke wrote:

> On 2017-09-14 23:07, Heather Flanagan (RFC Series Editor) wrote:
>> ...
>> The RPC and the Tools Team has identified several areas that need to be
>> modified to support these characters. Some of those areas will
>> ultimately be handled with the new format tools; others have been
>> modified as part of the more general work to prepare for the new RFC
>> format. For the documents being used to test the toolchain, a
>> significant amount of manual processing is required to publish the
>> RFCs with non-ASCII characters in the final text. In order to keep
> > ...
>
> Out of curiosity: as the current version of xml2rfc produces the UTF-8
> plain text just fine (*), what additional processing are you referring
> to?

There are the manual checks to determine if everything renders exactly
as expected. There are a number of discussions as we learn how to deal
with half- and full-width characters in the same document, and try to
determine the best path forward to make such characters in tables to
line up properly. There are even more discussions as we learn more about
invisible space characters outside of the ASCII realm and try to
understand if those are something we need to write scripts to identify.
xml2rfc is just one component of getting these documents published.

>
>> overall processing times down, leave staff enough time to test
>> the other tools that are being developed as part of the RFC format
>> project, and allow the editors time to create and/or update new
>> procedures as the v3 tools are released, no additional non-ASCII
>> documents outside of RFC 8187 and the PRECIS cluster will be published
>> until the new format tools are in production.
>> ...
>
> Understood.
>
> Now what does this mean for drafts being written right now, and which
> are not expected to be ready for publication in the next, let's say,
> 12 months? Can we start submitting I-Ds with non-ASCII characters
> right now (without getting stopped by id-nits...)?

That's an excellent question. I think this is something to discuss with
both the IESG and the Tools Team, though I am fairly confident that
"right now" isn't possible. An idnits update is one of the tools
contracted for, and is not yet complete (see the expected timeline here:
https://trac.tools.ietf.org/tools/ietfdb/wiki/FormatToolsPlan). Also,
we've seen that the datatracker isn't rendering non-ASCII characters
correctly in the PDF automatically generated from RFC 8187
(https://trac.tools.ietf.org/tools/ietfdb/ticket/2370). So, there is
still some work to do before this will all work.

-Heather

>
> Best regards, Julian
>
> (*) with the possible exception of the BOM :-)
>

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Julian Reschke
On 2017-09-22 18:54, Heather Flanagan (RFC Series Editor) wrote:

> On 9/22/17 5:51 AM, Julian Reschke wrote:
>> On 2017-09-14 23:07, Heather Flanagan (RFC Series Editor) wrote:
>>> ...
>>> The RPC and the Tools Team has identified several areas that need to be
>>> modified to support these characters. Some of those areas will
>>> ultimately be handled with the new format tools; others have been
>>> modified as part of the more general work to prepare for the new RFC
>>> format. For the documents being used to test the toolchain, a
>>> significant amount of manual processing is required to publish the
>>> RFCs with non-ASCII characters in the final text. In order to keep
>>> ...
>>
>> Out of curiosity: as the current version of xml2rfc produces the UTF-8
>> plain text just fine (*), what additional processing are you referring
>> to?
>
> There are the manual checks to determine if everything renders exactly
> as expected. There are a number of discussions as we learn how to deal

Ok.

> with half- and full-width characters in the same document, and try to
> determine the best path forward to make such characters in tables to
> line up properly. There are even more discussions as we learn more about

Seriously: don't bother.

> invisible space characters outside of the ASCII realm and try to
> understand if those are something we need to write scripts to identify.

If this is understood to be a problem, let's have a script that checks
for these characters.

> xml2rfc is just one component of getting these documents published.

 > ...

>> Understood.
>>
>> Now what does this mean for drafts being written right now, and which
>> are not expected to be ready for publication in the next, let's say,
>> 12 months? Can we start submitting I-Ds with non-ASCII characters
>> right now (without getting stopped by id-nits...)?
>
> That's an excellent question. I think this is something to discuss with
> both the IESG and the Tools Team, though I am fairly confident that
> "right now" isn't possible. An idnits update is one of the tools
> contracted for, and is not yet complete (see the expected timeline here:
> https://trac.tools.ietf.org/tools/ietfdb/wiki/FormatToolsPlan). Also,

Well, idnits s just advisory. All it can block is automatic submission.
(I wish it did not).

> we've seen that the datatracker isn't rendering non-ASCII characters
> correctly in the PDF automatically generated from RFC 8187
> (https://trac.tools.ietf.org/tools/ietfdb/ticket/2370). So, there is
> still some work to do before this will all work.

Looking at the PDF linked from the datatracker
(<https://www.rfc-editor.org/rfc/pdfrfc/rfc8187.txt.pdf>), I don't see a
problem.

That said: I don't think that a minor bug in a conversion tool for a
supplemental output format is sufficient reason to block progress...

Best regards, Julian
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Frank Ellermann-3
In reply to this post by Heather Flanagan (RFC Series Editor)
On 14/09/2017, Heather Flanagan (RFC Series Editor) <[hidden email]> wrote:

> the first RFC to be published with UTF-8 encoding and include characters not in the
> basic ASCII character set.

Interesting, and a quick sanity test with
https://tools.ietf.org/html/rfc8187 works for me (= can see an € in
3.2.3), that should also cover all WikiMedia and MediaWiki projects
linking to tools.ietf.org (not limited to RFCs, I-Ds are also often
linked via tools.ietf.org). So now I can finally replace an ancient
copy of vintage 2005 draft-hoffman-utf8-rfcs-01.txt by a more up to
date RFC 8140 ;-)

Skeptical, maybe the Unicode emoji for ;-) does not yet work
everywhere, some folks including me are still using Windows 7 and not
planning to upgrade before 2020.
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Julian Reschke
In reply to this post by Heather Flanagan (RFC Series Editor)
On 2017-09-22 18:54, Heather Flanagan (RFC Series Editor) wrote:
> ...
> That's an excellent question. I think this is something to discuss with
> both the IESG and the Tools Team, though I am fairly confident that
> "right now" isn't possible. An idnits update is one of the tools
> contracted for, and is not yet complete (see the expected timeline here:
> https://trac.tools.ietf.org/tools/ietfdb/wiki/FormatToolsPlan). Also,
 > ...

Currently says "End of January". Note that all I'm asking for is a way
to disable the "all ASCII" check.

> we've seen that the datatracker isn't rendering non-ASCII characters
> correctly in the PDF automatically generated from RFC 8187
> (https://trac.tools.ietf.org/tools/ietfdb/ticket/2370). So, there is
> still some work to do before this will all work.

FWIW, this is not about the PDF (which is fine), but the fact that
datatrackers's TXT-to-HTML conversion doesn't work properly (in contract
to the one used for the HTML on tools.ietf.org).

It would be awesome to see some progress here.

Best regards, Julian
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFC Series publishes first RFC with non-ASCII characters

Henrik Levkowetz
Hi Julian,

On 2017-10-24 15:29, Julian Reschke wrote:

> On 2017-09-22 18:54, Heather Flanagan (RFC Series Editor) wrote:
>> ...
>> That's an excellent question. I think this is something to discuss with
>> both the IESG and the Tools Team, though I am fairly confident that
>> "right now" isn't possible. An idnits update is one of the tools
>> contracted for, and is not yet complete (see the expected timeline here:
>> https://trac.tools.ietf.org/tools/ietfdb/wiki/FormatToolsPlan). Also,
>  > ...
>
> Currently says "End of January". Note that all I'm asking for is a way
> to disable the "all ASCII" check.
It should be feasible to tweak the current idnits in this respect -- but
that's going to permit non-ascii in a lot of places where the exact rules
would not permit non-ascii.  Better checks will still have to wait for the
new idnits.

>> we've seen that the datatracker isn't rendering non-ASCII characters
>> correctly in the PDF automatically generated from RFC 8187
>> (https://trac.tools.ietf.org/tools/ietfdb/ticket/2370). So, there is
>> still some work to do before this will all work.
>
> FWIW, this is not about the PDF (which is fine), but the fact that
> datatrackers's TXT-to-HTML conversion doesn't work properly (in contract
> to the one used for the HTML on tools.ietf.org).
>
> It would be awesome to see some progress here.
It's on my to-do list, but not at the very top.  Will see what I can do.


Best regards,

        Henrik


_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest

signature.asc (817 bytes) Download Attachment