Bug in RFC Search page...

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug in RFC Search page...

Brian E Carpenter-2
https://www.rfc-editor.org/search/rfc_search_detail.php?rfc=8187&pubstatus%5B%5D=Any&pub_date_type=any

This returns a result mentioning an "ASCII" version of the RFC. There is no ASCII
version of RFC 8187.

(According to some people, there is no UTF-8 version either because of the BOM.)

Regards
   Brian


_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Carsten Bormann
On Sep 26, 2017, at 21:30, Brian E Carpenter <[hidden email]> wrote:
>
> https://www.rfc-editor.org/search/rfc_search_detail.php?rfc=8187&pubstatus%5B%5D=Any&pub_date_type=any
>
> This returns a result mentioning an "ASCII" version of the RFC. There is no ASCII
> version of RFC 8187.

I think we just need to stop calling plain-text ASCII.

(Note that the HTML versions of many documents are ASCII.)

Grüße, Carsten

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Toerless Eckert-3
What terminology do you propose

ASCII text -> plain-text
UTF-8 text -> plain-text unicode
HTML ASCII -> HTML
HTML UTF-8 -> HTML unicode
...

?

(as a reminder because i didn't get any response, i would love to see an RFC header
 field indicating the format of the document... ;-)

On Tue, Sep 26, 2017 at 10:33:29PM +0200, Carsten Bormann wrote:

> On Sep 26, 2017, at 21:30, Brian E Carpenter <[hidden email]> wrote:
> >
> > https://www.rfc-editor.org/search/rfc_search_detail.php?rfc=8187&pubstatus%5B%5D=Any&pub_date_type=any
> >
> > This returns a result mentioning an "ASCII" version of the RFC. There is no ASCII
> > version of RFC 8187.
>
> I think we just need to stop calling plain-text ASCII.
>
> (Note that the HTML versions of many documents are ASCII.)
>
> Grüße, Carsten
>
> _______________________________________________
> rfc-interest mailing list
> [hidden email]
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Carsten Bormann

> On Sep 26, 2017, at 22:41, Toerless Eckert <[hidden email]> wrote:
>
> What terminology do you propose
>
> ASCII text -> plain-text
> UTF-8 text -> plain-text unicode

Well, both are plain-text, and both are unicode.
Plain-text-beyond-ascii would be more like it for the second, but why call that out.
(And no, we don’t need a third category Plain-text-beyond-the-basic-multilingual-plane, or Plain-text-with-astral for short.)

> HTML ASCII -> HTML
> HTML UTF-8 -> HTML unicode

I don’t think that distinction ever needs to be made, because HTML embeds metadata about its charset, and there are no real interop problems when you do that.

Grüße, Carsten

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Brian E Carpenter-2
In reply to this post by Carsten Bormann
On 27/09/2017 09:33, Carsten Bormann wrote:
> On Sep 26, 2017, at 21:30, Brian E Carpenter <[hidden email]> wrote:
>>
>> https://www.rfc-editor.org/search/rfc_search_detail.php?rfc=8187&pubstatus%5B%5D=Any&pub_date_type=any
>>
>> This returns a result mentioning an "ASCII" version of the RFC. There is no ASCII
>> version of RFC 8187.
>
> I think we just need to stop calling plain-text ASCII.

Sure, the first level fix is to change the result to say "plain text"
instead of "ASCII". We can't change it to say "UTF-8 text" without resolving
a little dispute about the BOM ;-).

    Brian

>
> (Note that the HTML versions of many documents are ASCII.)
>
> Grüße, Carsten
>
>

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Toerless Eckert-2
In reply to this post by Carsten Bormann
On Tue, Sep 26, 2017 at 10:49:56PM +0200, Carsten Bormann wrote:

>
> > On Sep 26, 2017, at 22:41, Toerless Eckert <[hidden email]> wrote:
> >
> > What terminology do you propose
> >
> > ASCII text -> plain-text
> > UTF-8 text -> plain-text unicode
>
> Well, both are plain-text, and both are unicode.
> Plain-text-beyond-ascii would be more like it for the second, but why call that out.

There is a lot of tool chains out there that will not render UTF-8 correctly,
therefore it is IMHO extremely viable to have a clear indication of the
minimum toolchain features required to render a document correctly (in the first
page header of an RFC wold be great).

Also: As long as the "minimum rendering" features of documents are not explicitly
noted in human readable form anywhere, we do not need to define a terminology.

> (And no, we don???t need a third category Plain-text-beyond-the-basic-multilingual-plane, or Plain-text-with-astral for short.)

Not sure what you are referring to "More UTF-8 characters than possible with iso8859" ?
Given how we should not publish text-only documents other than ascii or UTF-8, i think we can happily
ignore those legacy options.

> > HTML ASCII -> HTML
> > HTML UTF-8 -> HTML unicode
>
> I don???t think that distinction ever needs to be made, because HTML embeds metadata about its charset, and there are no real interop problems when you do that.

Given how browsers are quite inconsistent in the characer sets they load, it would be nice to have explicit indication of the character sets required to render a document. I've seen PDF documents where i had to go to page 100 before some crucial text was not rendered because it used some character set not available to me.

> Grüße, Carsten
     ^
https://en.wikipedia.org/wiki/Capital_%E1%BA%9E

Oh, the innovation...

---
[hidden email]
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Toerless Eckert-3
In reply to this post by Brian E Carpenter-2
On Tue, Sep 26, 2017 at 10:49:56PM +0200, Carsten Bormann wrote:

>
> > On Sep 26, 2017, at 22:41, Toerless Eckert <[hidden email]> wrote:
> >
> > What terminology do you propose
> >
> > ASCII text -> plain-text
> > UTF-8 text -> plain-text unicode
>
> Well, both are plain-text, and both are unicode.
> Plain-text-beyond-ascii would be more like it for the second, but why call that out.

There is a lot of tool chains out there that will not render UTF-8 correctly,
therefore it is IMHO extremely viable to have a clear indication of the
minimum toolchain features required to render a document correctly (in the first
page header of an RFC wold be great).

Also: As long as the "minimum rendering" features of documents are not explicitly
noted in human readable form anywhere, we do not need to define a terminology.

> (And no, we don???t need a third category Plain-text-beyond-the-basic-multilingual-plane, or Plain-text-with-astral for short.)

Not sure what you are referring to "More UTF-8 characters than possible with iso8859" ?
Given how we should not publish text-only documents other than ascii or UTF-8, i think we can happily
ignore those legacy options.

> > HTML ASCII -> HTML
> > HTML UTF-8 -> HTML unicode
>
> I don???t think that distinction ever needs to be made, because HTML embeds metadata about its charset, and there are no real interop problems when you do that.

Given how browsers are quite inconsistent in the characer sets they load, it would be nice to have explicit indication of the character sets required to render a document. I've seen PDF documents where i had to go to page 100 before some crucial text was not rendered because it used some character set not available to me.

> Grüße, Carsten
     ^
https://en.wikipedia.org/wiki/Capital_%E1%BA%9E

Oh, the innovation...

[hidden email]

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Matthew Kerwin
In reply to this post by Toerless Eckert-2


On 27 Sep. 2017 07:58, "Toerless Eckert" <[hidden email]> wrote:
On Tue, Sep 26, 2017 at 10:49:56PM +0200, Carsten Bormann wrote:

> (And no, we don???t need a third category Plain-text-beyond-the-basic-multilingual-plane, or Plain-text-with-astral for short.)

Not sure what you are referring to "More UTF-8 characters than possible with iso8859" ?
Given how we should not publish text-only documents other than ascii or UTF-8, i think we can happily
ignore those legacy options.

Emoji are astral codepoints, aren't they? 🤞🏻


> > HTML ASCII -> HTML
> > HTML UTF-8 -> HTML unicode
>
> I don???t think that distinction ever needs to be made, because HTML embeds metadata about its charset, and there are no real interop problems when you do that.

Given how browsers are quite inconsistent in the characer sets they load, it would be nice to have explicit indication of the character sets required to render a document. I've seen PDF documents where i had to go to page 100 before some crucial text was not rendered because it used some character set not available to me.

If you mean font support (I'm not going to trip myself up over the difference between character sets and encodings and all that, but I'm pretty sure 'Unicode' has you covered for characters/codepoints) that's a mostly-solved problem in the modern web, with webfonts and the like.

So the useful words to put in the table would be basic document types (PDF/text/HTML). We can bicker over what "text" means WRT UTF-8 elsewhere.

Cheers
-- 
Matthew Kerwin 

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Toerless Eckert-2
On Wed, Sep 27, 2017 at 08:32:15AM +1000, Matthew Kerwin wrote:
> Emoji are astral codepoints, aren't they? ????????

Thanks!

> If you mean font support (I'm not going to trip myself up over the
> difference between character sets and encodings and all that, but I'm
> pretty sure 'Unicode' has you covered for characters/codepoints) that's a
> mostly-solved problem in the modern web, with webfonts and the like.

Sure, sitting in an airplane, trying to read a document and getting a
pop-up window to go on the internet, create i think an apple-id to
be able to download some asian character set (if i remember it correctly).

> So the useful words to put in the table would be basic document types
> (PDF/text/HTML). We can bicker over what "text" means WRT UTF-8 elsewhere.

I definitely would like to have an indication if it's "more than ASCII" text
(eg: foreign characters included).

Cheers
    Toerless

> Cheers
> --
> Matthew Kerwin

> _______________________________________________
> rfc-interest mailing list
> [hidden email]
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Matthew Kerwin


On 27 September 2017 at 09:47, Toerless Eckert <[hidden email]> wrote:
On Wed, Sep 27, 2017 at 08:32:15AM +1000, Matthew Kerwin wrote:
> Emoji are astral codepoints, aren't they? ????????

Thanks!

> If you mean font support (I'm not going to trip myself up over the
> difference between character sets and encodings and all that, but I'm
> pretty sure 'Unicode' has you covered for characters/codepoints) that's a
> mostly-solved problem in the modern web, with webfonts and the like.

Sure, sitting in an airplane, trying to read a document and getting a
pop-up window to go on the internet, create i think an apple-id to
be able to download some asian character set (if i remember it correctly).


​I don't know about PDFs, but if you download the HTML version of a page there's usually an option to download all the linked resources (images, CSS, etc.) at the same time, so it should continue to work offline.  Although, I don't know if that includes fonts linked from CSS.​

 
> So the useful words to put in the table would be basic document types
> (PDF/text/HTML). We can bicker over what "text" means WRT UTF-8 elsewhere.

I definitely would like to have an indication if it's "more than ASCII" text
(eg: foreign characters included).


​Sure, I can understand the need for a multidimensional "requirements for accurately viewing this resource" description.  Soon enough it will have to be able to describe the basic format (PDF/HTML/plain text), the character range (7-bit ASCII, Latin-1[*], BMP, Supplementary[†]), and whether it includes embedded images.  A single-word description probably isn't enough, though.

Meanwhile, I figured the words in that column were basically representative of the formats described in RFC 7990 (and its antecedents.)  In which case, all the rest is implied.

Cheers

[*] i.e. the Basic Latin + Latin-1 Supplement blocks; same as ISO-8859-1

[†] Some tools still have issues with characters that don't fit in UCS-2. There's also "includes four-byte UTF-8 sequences" which is a different thing, but has caused me issues in the past with some tools.​
--
  Matthew Kerwin
  http://matthew.kerwin.net.au/

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: character sets, Bug in RFC Search page...

John Levine
In reply to this post by Toerless Eckert-2
In article <[hidden email]> you write:
>Given how browsers are quite inconsistent in the characer sets they load, it would be nice to have explicit indication
>of the character sets required to render a document. I've seen PDF documents where i had to go to page 100 before some
>crucial text was not rendered because it used some character set not available to me.

Those are buggy PDFs.  A PDF is supposed to contain all the fonts it
uses beyond a small standard set, but there is a lot of PDF crudware
that wrongly assumes that whatever fonts it finds on the machine where
it's running will exist everywhere.  

>There is a lot of tool chains out there that will not render UTF-8 correctly,
>therefore it is IMHO extremely viable to have a clear indication of the
>minimum toolchain features required to render a document correctly (in the first
>page header of an RFC wold be great).

Hey, maybe we could put a couple of bytes at the begining that say
this is a UTF-8 document.

R's,
John
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: character sets, Bug in RFC Search page...

Toerless Eckert-2
On Wed, Sep 27, 2017 at 05:53:48PM -0000, John Levine wrote:
> Those are buggy PDFs.  A PDF is supposed to contain all the fonts it
> uses beyond a small standard set, but there is a lot of PDF crudware
> that wrongly assumes that whatever fonts it finds on the machine where
> it's running will exist everywhere.  

I have stopped trying to really understand all those details for the last
two decades, but i fear what you describe sounds like the original intention
of adobe when they thought fonts embedded in PDFs could not be extracted
and (illegally) be reused outside of the PDF. Once that hack happened (2 decades ago ?),
the problem of licensed/payware fonts and their use has AFAIK lead to
less inclusion of fonts into PDF.

> >There is a lot of tool chains out there that will not render UTF-8 correctly,
> >therefore it is IMHO extremely viable to have a clear indication of the
> >minimum toolchain features required to render a document correctly (in the first
> >page header of an RFC wold be great).
>
> Hey, maybe we could put a couple of bytes at the begining that say
> this is a UTF-8 document.

I smell a BOM joke, even though i do not even WANT TO understand that stuff and nuked that thread.

There are enough people taking care of making documents more machine friendly.
I only care about the elements that make them more human friendly ;-)

Cheers
    Toerless
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: character sets, Bug in RFC Search page...

Leonard Rosenthol-3
In reply to this post by John Levine
There is a standard BOM for UTF-8, just as there is for UCS2.  Seems like a good idea, though most systems don't use it...

Leonard

-----Original Message-----
From: rfc-interest [mailto:[hidden email]] On Behalf Of John Levine
Sent: Wednesday, September 27, 2017 1:54 PM
To: [hidden email]
Cc: [hidden email]
Subject: Re: [rfc-i] character sets, Bug in RFC Search page...

In article <[hidden email]> you write:
>Given how browsers are quite inconsistent in the characer sets they
>load, it would be nice to have explicit indication of the character
>sets required to render a document. I've seen PDF documents where i had to go to page 100 before some crucial text was not rendered because it used some character set not available to me.

Those are buggy PDFs.  A PDF is supposed to contain all the fonts it uses beyond a small standard set, but there is a lot of PDF crudware that wrongly assumes that whatever fonts it finds on the machine where it's running will exist everywhere.  

>There is a lot of tool chains out there that will not render UTF-8
>correctly, therefore it is IMHO extremely viable to have a clear
>indication of the minimum toolchain features required to render a
>document correctly (in the first page header of an RFC wold be great).

Hey, maybe we could put a couple of bytes at the begining that say this is a UTF-8 document.

R's,
John
_______________________________________________
rfc-interest mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rfc-editor.org%2Fmailman%2Flistinfo%2Frfc-interest&data=02%7C01%7C%7Cbe3f71e51afd492186ab08d50819e6f5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636423829726381788&sdata=rmOSOTx4azrI97SsPWivmcIKHdwQy%2Fcfk7aA0hupTnc%3D&reserved=0
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: character sets, Bug in RFC Search page...

Martin J. Dürst
In reply to this post by Toerless Eckert-2


On 2017/09/28 03:04, Toerless Eckert wrote:

> On Wed, Sep 27, 2017 at 05:53:48PM -0000, John Levine wrote:
>> Those are buggy PDFs.  A PDF is supposed to contain all the fonts it
>> uses beyond a small standard set, but there is a lot of PDF crudware
>> that wrongly assumes that whatever fonts it finds on the machine where
>> it's running will exist everywhere.
>
> I have stopped trying to really understand all those details for the last
> two decades, but i fear what you describe sounds like the original intention
> of adobe when they thought fonts embedded in PDFs could not be extracted
> and (illegally) be reused outside of the PDF. Once that hack happened (2 decades ago ?),
> the problem of licensed/payware fonts and their use has AFAIK lead to
> less inclusion of fonts into PDF.

For us, that should be irrelevant, because the fonts that we are
(planning on) using are open source, and can be embedded without
problems. Also, as far as I understand, the variant that we choose for
the official PDFs actually requires that all fonts used (except for a
very small base set) are included.

Regards,   Martin.

P.S.: In addition, please note that font embedding usually uses
subsetting, and our documents have very few non-ASCII characters, so the
additional memory needed for font embedding isn't much of a deal.
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Martin J. Dürst
In reply to this post by Matthew Kerwin
On 2017/09/27 07:32, Matthew Kerwin wrote:

> Emoji are astral codepoints, aren't they? 🤞🏻

Many of them indeed are, but not all of them

> Given how browsers are quite inconsistent in the characer sets they load,
> it would be nice to have explicit indication of the character sets required
> to render a document.

> If you mean font support (I'm not going to trip myself up over the
> difference between character sets and encodings and all that, but I'm
> pretty sure 'Unicode' has you covered for characters/codepoints) that's a
> mostly-solved problem in the modern web, with webfonts and the like.

Webfonts is one thing. The other is that even if we move beyond ASCII,
we are still very conservative. The chance that an RFC soon will use
some of the characters accepted in some recent version of Unicode is
quite low (unless somebody wants to make a point, in which case the RFC
editor will probably stop them). And for the characters from older
Unicode versions, the OS and/or the browser has the necessary fonts
available, unless somebody is using a hopelessly outdated OS/browser
(which would be a bad idea anyway).

Regards,   Martin.
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Martin J. Dürst
In reply to this post by Toerless Eckert-2
On 2017/09/27 06:57, Toerless Eckert wrote:

> There is a lot of tool chains out there that will not render UTF-8 correctly,
> therefore it is IMHO extremely viable to have a clear indication of the
> minimum toolchain features required to render a document correctly (in the first
> page header of an RFC wold be great).

First, most of the 'toolchain' doesn't render anything, and for most
purposes, will just pass through UTF-8 if it passes through all 8 bits.

Second, rendering usually works so that there might be something
illegible, but the rest of the document (in ASCII) isn't affected. And
the guidelines by the RFC editor are designed so that the document can
still be read in such a case.

Third, I think overall it's much easier to fix the remaining tool chains
than to have everybody be detracted by additional distinctions that
should be irrelevant in this day and age.

Regards,   Martin.
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: Bug in RFC Search page...

Toerless Eckert-3
Most users would not even be able to figure out why something is not diplsayed
right with some tool unless they have some easy way to map the problem to
some type of content that they are aware of. Therefore such a header
would get very useful content information.

On Mon, Oct 02, 2017 at 10:41:46AM +0900, Martin J. D?rst wrote:

> On 2017/09/27 06:57, Toerless Eckert wrote:
>
> >There is a lot of tool chains out there that will not render UTF-8 correctly,
> >therefore it is IMHO extremely viable to have a clear indication of the
> >minimum toolchain features required to render a document correctly (in the first
> >page header of an RFC wold be great).
>
> First, most of the 'toolchain' doesn't render anything, and for most
> purposes, will just pass through UTF-8 if it passes through all 8
> bits.
>
> Second, rendering usually works so that there might be something
> illegible, but the rest of the document (in ASCII) isn't affected.
> And the guidelines by the RFC editor are designed so that the
> document can still be read in such a case.
>
> Third, I think overall it's much easier to fix the remaining tool
> chains than to have everybody be detracted by additional
> distinctions that should be irrelevant in this day and age.
>
> Regards,   Martin.
> _______________________________________________
> rfc-interest mailing list
> [hidden email]
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: character sets, Bug in RFC Search page...

Leonard Rosenthol-3
In reply to this post by Martin J. Dürst
According to RFC 7995:
For RFCs, require PDF/A-3 with conformance level "U".  This
      captures the archivability and long-term stability of PDF 1.7
      files, mandatory Unicode mapping (Sections 14.8.2.4.2 ("Unicode
      Mapping in Tagged PDF") and 9.10.2 ("Mapping Character Codes to
      Unicode Values") of [PDF]), and many of the requirement features.

PDF/A-3 requires that *all* fonts be embedded – so this should never be a problem.

Leonard

On 10/1/17, 9:28 PM, "rfc-interest on behalf of Martin J. Dürst" <[hidden email] on behalf of [hidden email]> wrote:

   
   
    On 2017/09/28 03:04, Toerless Eckert wrote:
    > On Wed, Sep 27, 2017 at 05:53:48PM -0000, John Levine wrote:
    >> Those are buggy PDFs.  A PDF is supposed to contain all the fonts it
    >> uses beyond a small standard set, but there is a lot of PDF crudware
    >> that wrongly assumes that whatever fonts it finds on the machine where
    >> it's running will exist everywhere.
    >
    > I have stopped trying to really understand all those details for the last
    > two decades, but i fear what you describe sounds like the original intention
    > of adobe when they thought fonts embedded in PDFs could not be extracted
    > and (illegally) be reused outside of the PDF. Once that hack happened (2 decades ago ?),
    > the problem of licensed/payware fonts and their use has AFAIK lead to
    > less inclusion of fonts into PDF.
   
    For us, that should be irrelevant, because the fonts that we are
    (planning on) using are open source, and can be embedded without
    problems. Also, as far as I understand, the variant that we choose for
    the official PDFs actually requires that all fonts used (except for a
    very small base set) are included.
   
    Regards,   Martin.
   
    P.S.: In addition, please note that font embedding usually uses
    subsetting, and our documents have very few non-ASCII characters, so the
    additional memory needed for font embedding isn't much of a deal.
    _______________________________________________
    rfc-interest mailing list
    [hidden email]
    https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rfc-editor.org%2Fmailman%2Flistinfo%2Frfc-interest&data=02%7C01%7C%7Ce4f3c149651b412d8c9b08d50934ca37%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636425044797831153&sdata=PKyyCPHSCVBoLn4%2BxsydsCeXES6LekLkAydieM1vjew%3D&reserved=0
   

_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: character sets, Bug in RFC Search page...

Toerless Eckert-2
Nice!

I just do not know why there seems to be disinterest to have
human readable document format information/disclaimer. Heck, every other
interest group has managed to get considerations & disclaimers into RFCs
(security, IANA, TSV, rfc2119, et. pp).

"This PDF RFC was created to ensure maximum compatibility. It requires
no external data to display (such as fonts), and the formatting
does ot use features newer than PDF 1.7 defined in 2007. If you encounter problems
with this document a viewer/renderer newer than 2012, please call 1-800-RFC-EDIT"

;-P

Cheers
    Toerless

On Mon, Oct 02, 2017 at 02:59:16PM +0000, Leonard Rosenthol wrote:

> According to RFC 7995:
> For RFCs, require PDF/A-3 with conformance level "U".  This
>       captures the archivability and long-term stability of PDF 1.7
>       files, mandatory Unicode mapping (Sections 14.8.2.4.2 ("Unicode
>       Mapping in Tagged PDF") and 9.10.2 ("Mapping Character Codes to
>       Unicode Values") of [PDF]), and many of the requirement features.
>
> PDF/A-3 requires that *all* fonts be embedded ??? so this should never be a problem.
>
> Leonard
>
> On 10/1/17, 9:28 PM, "rfc-interest on behalf of Martin J. Dürst" <[hidden email] on behalf of [hidden email]> wrote:
>
>    
>    
>     On 2017/09/28 03:04, Toerless Eckert wrote:
>     > On Wed, Sep 27, 2017 at 05:53:48PM -0000, John Levine wrote:
>     >> Those are buggy PDFs.  A PDF is supposed to contain all the fonts it
>     >> uses beyond a small standard set, but there is a lot of PDF crudware
>     >> that wrongly assumes that whatever fonts it finds on the machine where
>     >> it's running will exist everywhere.
>     >
>     > I have stopped trying to really understand all those details for the last
>     > two decades, but i fear what you describe sounds like the original intention
>     > of adobe when they thought fonts embedded in PDFs could not be extracted
>     > and (illegally) be reused outside of the PDF. Once that hack happened (2 decades ago ?),
>     > the problem of licensed/payware fonts and their use has AFAIK lead to
>     > less inclusion of fonts into PDF.
>    
>     For us, that should be irrelevant, because the fonts that we are
>     (planning on) using are open source, and can be embedded without
>     problems. Also, as far as I understand, the variant that we choose for
>     the official PDFs actually requires that all fonts used (except for a
>     very small base set) are included.
>    
>     Regards,   Martin.
>    
>     P.S.: In addition, please note that font embedding usually uses
>     subsetting, and our documents have very few non-ASCII characters, so the
>     additional memory needed for font embedding isn't much of a deal.
>     _______________________________________________
>     rfc-interest mailing list
>     [hidden email]
>     https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rfc-editor.org%2Fmailman%2Flistinfo%2Frfc-interest&data=02%7C01%7C%7Ce4f3c149651b412d8c9b08d50934ca37%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636425044797831153&sdata=PKyyCPHSCVBoLn4%2BxsydsCeXES6LekLkAydieM1vjew%3D&reserved=0
>    
>
> _______________________________________________
> rfc-interest mailing list
> [hidden email]
> https://www.rfc-editor.org/mailman/listinfo/rfc-interest
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest
Reply | Threaded
Open this post in threaded view
|

Re: character sets, Bug in RFC Search page...

John Levine
On Mon, 2 Oct 2017, Toerless Eckert wrote:
> "This PDF RFC was created to ensure maximum compatibility. It requires
> no external data to display (such as fonts), and the formatting
> does ot use features newer than PDF 1.7 defined in 2007. If you encounter problems
> with this document a viewer/renderer newer than 2012, please call 1-800-RFC-EDIT"

"This RFC is in PDF/A-3, rather than what some randome buggy PDF software
creates.  If you encounter problems with this document using a viewer, you
need a better viewer."

There is no secret to creating PDFs that work, other the secret that you
really should read the specs to know what to do.

Regards,
John Levine, [hidden email], Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly
_______________________________________________
rfc-interest mailing list
[hidden email]
https://www.rfc-editor.org/mailman/listinfo/rfc-interest