Discussion:
[Docutils-users] reST->HTML: internal anchors: problem casting section names to IDs
Tom Roche
2016-03-31 22:54:34 UTC
Permalink
summary
=======

Small reST document (linked and attached) has sections with unique names. When I use docutils/restview to convert it to HTML, all but one section id is created by (speaking `sed`ishly) `s/ /-/g`. However *one* section id is "hashed" (i.e., created like a backref), which breaks an explicit internal anchor. Why not create all section IDs in the same way? More importantly, how to fix this (presuming as I do that it's a problem)?

details
=======

background
----------

I frequently generate HTML from reStructuredText, either directly or indirectly. I also frequently do internal linking: i.e., I create explicit links from text in one section of a document to another section. I'm currently working on a reST document which also exhibits the following problem (and have also experienced this previously), which I have "boiled down" to a relatively simple file name=problematic_naming_of_internal_anchors.rst , which I have mounted @

https://bitbucket.org/!api/2.0/snippets/tlroche/LR9oL/HEAD/files/problematic_naming_of_internal_anchors.rst

That is linked in "raw mode" (i.e., no rendering by Bitbucket) so you should see just the characters in the file, as in a text editor. (If you can't follow the link, note I have also attached the contents of the file to this post, following my .sig.) Please also note that the following is NOT about how Bitbucket renders reST (though BB reproduces the problem), since BB has its own problems with {section naming, internal anchors} as detailed here:

https://bitbucket.org/site/master/issues/11314/restructuredtext-link-fragments-require

However, presuming this problem is caused by docutils (as detailed below), fixing it would also improve the lives of everyone writing reST for display "in the cloud."

problem
-------

The problem I wish to raise here is exhibited by `restview <https://pypi.python.org/pypi/restview>`_, which I believe renders by just driving docutils. (Specifically, my version of `restview` renders with docutils-0.12, per header in generated HTML.) The document (problematic_naming_of_internal_anchors.rst) has the following section names, all of which are unique:

for further processing
integrate
move
short-term
next hardware run
short-term bodywear
long-term
long-term bodywear
long-term house goods
lighting

The problem can be illustrated by comparing the section IDs generated for the section names={long-term bodywear, short-term bodywear} and the success of hand-coded links and generated/TOC links to those sections in the text.

1. reST section name='short-term bodywear' generates HTML=
<div class="section" id="short-term-bodywear">
<h2><a class="toc-backref" href="#id8">short-term bodywear</a></h2>
Note the form of the div attribute='id': it is the section name with all spaces replaced by dashes, aka 's/ /-/g'. This is as I expect (therefore good :-)

1.1. My hand-coded internal link to that section
.. |short-term bodywear| replace:: *short-term bodywear*
.. _short-term bodywear: #short-term-bodywear
*see also* |short-term bodywear|_
<p><em>see also</em> <a class="reference external" href="#short-term-bodywear"><em>short-term bodywear</em></a></p>
(I dunno why 'class="reference external"', since this is an internal link, but that's a quibble.)
<li><a class="reference internal" href="#short-term-bodywear" id="id8">short-term bodywear</a></li>
2. reST section name='long-term bodywear' generates HTML=
<div class="section" id="id1">
<h2><a class="toc-backref" href="#id10">long-term bodywear</a></h2>
Note the form of the div attribute='id', which is NOT as I expect. I expect the generated ID to use the same rule (s/ /-/g) as was used to generate the ID from section name='short-term bodywear'; instead the div/section ID is "hashed" by appending a serial number to string='id'.

2.1. This unexpected behavior breaks my hand-coded internal reference to section name='long-term bodywear'
.. |long-term bodywear| replace:: *long-term bodywear*
.. _long-term bodywear: #long-term-bodywear
*see also* |long-term bodywear|_
<p><em>see also</em> <a class="reference external" href="#long-term-bodywear"><em>long-term bodywear</em></a></p>
<li><a class="reference internal" href="#id1" id="id10">long-term bodywear</a></li>
solution/questions
------------------

ISTM docutils should _always_

1. for unique section names: generate `div id`s by `s/ /-/g`
2. for duplicate section names (and all backrefs): generate `div id`s by serial numbering, i.e. appending a serial number to string='id'

So my first question is, am I missing something? Is there a reason to *not* behave thusly? If not:

My second question is, is there any reason to believe that docutils is *not* producing the above behavior? If so, please lemme know and I'll put an `issue on restview <https://github.com/mgedmin/restview/issues>`_. If not:

My third question presumes this behavior is due to a problem with docutils: is there anything else I should do to help get this fixed? Do I need to make an issue in a tracker? or do something to further debug the problem? or Something Completely Different?

conclusion/attachment
---------------------

If possible, please reply to me (directly) as well as to the list, and
TIA, Tom Roche <***@pobox.com>-----problematic_naming_of_internal_anchors.rst follows to EOF

===
foo
===

.. contents:: **Table of Contents**

for further processing
======================

integrate
---------

move
----

short-term
==========

next hardware run
-----------------

short-term bodywear
-------------------

.. howto style a link (e.g., make it italic): see http://docutils.sourceforge.net/FAQ.html#is-nested-inline-markup-possible
.. |long-term bodywear| replace:: *long-term bodywear*
.. _long-term bodywear: #long-term-bodywear

*see also* |long-term bodywear|_

long-term
=========

long-term bodywear
------------------

.. |short-term bodywear| replace:: *short-term bodywear*
.. _short-term bodywear: #short-term-bodywear

*see also* |short-term bodywear|_

long-term house goods
---------------------

lighting
~~~~~~~~

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Reply All" to reply to the list.
Marc 'BlackJack' Rintsch
2016-04-01 10:20:58 UTC
Permalink
Post by Tom Roche
ISTM docutils should _always_
1. for unique section names: generate `div id`s by `s/ /-/g`
2. for duplicate section names (and all backrefs): generate `div
id`s by serial numbering, i.e. appending a serial number to
string='id'
So my first question is, am I missing something? Is there a reason
My second question is, is there any reason to believe that docutils
is *not* producing the above behavior? If so, please lemme know and I'll
put an `issue on restview <https://github.com/mgedmin/restview/issues>`_.
My third question presumes this behavior is due to a problem with
docutils: is there anything else I should do to help get this fixed? Do
I need to make an issue in a tracker? or do something to further debug
the problem? or Something Completely Different?
I would not rely on the way id attributes are generated at all. That's
an implementation detail IMHO and it's also a HTML thing, so this breaks
when generating a PDF via LaTeX anyway.

In the case presented you don't even have to do this because the link
text matched the heading linked, so you can simply omit the link
directives. Those are also responsible for the "reference external"
instead of "reference internal" classes on the links.

A workaround would be to move the link directives after the headlines,
then docutils sees the headlines first and generates the ID(s) as you
expect them to be.

Ciao,
Marc 'BlackJack' Rintsch
--
“Give a man a fire and he's warm for a day,
but set fire to him and he's warm for the rest of his life.”
-- Terry Pratchett, Jingo
Guenter Milde
2016-04-01 16:29:39 UTC
Permalink
Post by Tom Roche
summary
=======
Small reST document (linked and attached) has sections with unique
names. When I use docutils/restview to convert it to HTML, all but one
section id is created by (speaking `sed`ishly) `s/ /-/g`. However *one*
section id is "hashed" (i.e., created like a backref), which breaks an
explicit internal anchor. Why not create all section IDs in the same
way?
...
Post by Tom Roche
The problem can be illustrated by comparing the section IDs generated
for the section names={long-term bodywear, short-term bodywear} and the
success of hand-coded links and generated/TOC links to those sections
in the text.
1. reST section name='short-term bodywear' generates HTML=
<div class="section" id="short-term-bodywear">
<h2><a class="toc-backref" href="#id8">short-term bodywear</a></h2>
Note the form of the div attribute='id': it is the section name with
all spaces replaced by dashes, aka 's/ /-/g'. This is as I expect
(therefore good :-)
...
Post by Tom Roche
2. reST section name='long-term bodywear' generates HTML=
<div class="section" id="id1">
<h2><a class="toc-backref" href="#id10">long-term bodywear</a></h2>
Note the form of the div attribute='id', which is NOT as I expect.
2.1. This unexpected behavior breaks my hand-coded internal reference
to section name='long-term bodywear'
.. |long-term bodywear| replace:: *long-term bodywear*
.. _long-term bodywear: #long-term-bodywear
*see also* |long-term bodywear|_
<p><em>see also</em> <a class="reference external"
href="#long-term-bodywear"><em>long-term bodywear</em></a></p>
Here is the problem: you define an external target with the URL
"#long-term-bodywear" (which happens to point to the same document)
therefore also: "reference-external".

Moreover, you define this external link *before* the equally named section.
Now, when Docutils reaches the section header, the name is already "used up"
and the standard fallback naming (via id-number) kicks in.
Post by Tom Roche
2.2. However the generated TOC link to that section works by
<li><a class="reference internal" href="#id1" id="id10">long-term bodywear</a></li>
Yes, as is common for name duplication.
Post by Tom Roche
solution/questions
------------------
ISTM docutils should _always_
1. for unique section names: generate `div id`s by `s/ /-/g`
2. for duplicate section names (and all backrefs): generate `div id`s
by serial numbering, i.e. appending a serial number to string='id'
So my first question is, am I missing something? Is there a reason to
*not* behave thusly?
The point is, this could only be done for "link names", a common namespace
for section names and other targets.


The following works here as expected::


.. howto style a link (e.g., make it italic): see http://docutils.sourcefor
.. |long-term bodywear| replace:: *long-term bodywear*
.. |short-term bodywear| replace:: *short-term bodywear*


short-term bodywear
-------------------

*see also* |long-term bodywear|_

long-term bodywear
------------------

*see also* |short-term bodywear|_



Günter


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Rep
Tom Roche
2016-04-01 23:17:42 UTC
Permalink
[footnotes follow .sig]

Tom Roche[1]
Small reST document[2] has sections with unique names. When I use docutils/restview to convert it to HTML, all but one section id is created by (speaking `sed`ishly) `s/ /-/g`. However *one* section id is "hashed" (i.e., created like a backref), which breaks an explicit internal anchor.
Marc 'BlackJack' Rintsch[3]
Here is the problem: you define an external target with the URL "#long-term-bodywear" (which happens to point to the same document) therefore also: "reference-external".
Doh! Thanks for clarifying. I had assumed all links were created in the same way.
.. howto style a link (e.g., make it italic): see http://docutils.sourcefor
.. |long-term bodywear| replace:: *long-term bodywear*
.. |short-term bodywear| replace:: *short-term bodywear*
short-term bodywear
-------------------
*see also* |long-term bodywear|_
long-term bodywear
------------------
*see also* |short-term bodywear|_
... and that renders correctly locally via `restview` and remotely via Bitbucket[4].

thanks again, Tom Roche <***@pobox.com>

[1]: https://sourceforge.net/p/docutils/mailman/message/34982519/
[2]: flawed version @ https://bitbucket.org/!api/2.0/snippets/tlroche/LR9oL/9277baf6d61904d1725c39dae4df8b7550192ebc/files/problematic_naming_of_internal_anchors.rst
[3]: https://sourceforge.net/p/docutils/mailman/message/34984893/
[4]: fixed version @ https://bitbucket.org/!api/2.0/snippets/tlroche/LR9oL/HEAD/files/problematic_naming_of_internal_anchors.rst

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Reply All" to reply to the list.
Tom Roche
2016-04-01 23:30:43 UTC
Permalink
one more mistake to correct :-(
Post by Tom Roche
Tom Roche[1]
Small reST document[2] has sections with unique names. When I use docutils/restview to convert it to HTML, all but one section id is created by (speaking `sed`ishly) `s/ /-/g`. However *one* section id is "hashed" (i.e., created like a backref), which breaks an explicit internal anchor.
- Marc 'BlackJack' Rintsch[3]
+ Günter Milde[3]
Post by Tom Roche
Here is the problem: you define an external target with the URL "#long-term-bodywear" (which happens to point to the same document) therefore also: "reference-external".
Doh! Thanks for clarifying. I had assumed all links were created in the same way.
.. howto style a link (e.g., make it italic): see http://docutils.sourcefor
.. |long-term bodywear| replace:: *long-term bodywear*
.. |short-term bodywear| replace:: *short-term bodywear*
short-term bodywear
-------------------
*see also* |long-term bodywear|_
long-term bodywear
------------------
*see also* |short-term bodywear|_
... and that renders correctly locally via `restview` and remotely via Bitbucket[4].
[1]: https://sourceforge.net/p/docutils/mailman/message/34982519/
[3]: https://sourceforge.net/p/docutils/mailman/message/34984893/
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Reply All" to re

Loading...