Discussion:
[Docutils-users] Hyperlink is highlighted incompletely if it contains non-ASCII characters
Helmut Jarausch
2014-03-22 08:17:20 UTC
Permalink
Given the line
http://wiki.selfhtml.org/wiki/Perl/Reguläre_Ausdrücke

in an rst-file

rst2html translates this to

<p><a class="reference external"
href="http://wiki.selfhtml.org/wiki/Perl">http://wiki.selfhtml.org/wiki/Perl</a>/Reguläre_Ausdrücke</p>
------------------------------------------------------------------------------------------------------
this ^^^^ should be here ^

Is this a bug or do I miss something?

Many thanks for a hint,
Helmut
engelbert gruber
2014-03-22 12:28:53 UTC
Permalink
Post by Helmut Jarausch
Given the line
http://wiki.selfhtml.org/wiki/Perl/Reguläre_Ausdrücke
try ::

without unlauts http://wiki.selfhtml.org/wiki/Perl/Regulare_Ausdrucke
or this `reguläre Ausdrücke`_
or that way `http://wiki.selfhtml.org/wiki/Perl/Reguläre_Ausdrücke`_
hope it helps (and maybe --input-encoding=)

.. _reguläre Ausdrücke:
http://wiki.selfhtml.org/wiki/Perl/Reguläre_Ausdrücke

.. _http://wiki.selfhtml.org/wiki/Perl/Reguläre_Ausdrücke:
http://wiki.selfhtml.org/wiki/Perl/Reguläre_Ausdrücke

cheers
Philipp A.
2014-03-22 14:24:22 UTC
Permalink
Post by engelbert gruber
without unlauts http://wiki.selfhtml.org/wiki/Perl/Regulare_Ausdrucke
unacceptable. the link contains them, so removing them breaks not only the
spelling, but also the link.

or this `regulÀre AusdrÌcke`_
Post by engelbert gruber
or that way `http://wiki.selfhtml.org/wiki/Perl/RegulÀre_AusdrÌcke`_
hope it helps (and maybe --input-encoding=)
if --input-encoding=utf-8 can’t fix it, it smells like a bug to me.

best, philipp
engelbert gruber
2014-03-22 14:56:48 UTC
Permalink
Post by Philipp A.
Post by engelbert gruber
without unlauts http://wiki.selfhtml.org/wiki/Perl/Regulare_Ausdrucke
unacceptable. the link contains them, so removing them breaks not only the
spelling, but also the link.
or this `reguläre Ausdrücke`_
Post by engelbert gruber
or that way `http://wiki.selfhtml.org/wiki/Perl/Reguläre_Ausdrücke`_
hope it helps (and maybe --input-encoding=)
if --input-encoding=utf-8 can't fix it, it smells like a bug to me.
http://en.wikipedia.org/wiki/Uniform_resource_locator lists

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~

other characters must be encoded %20 for a blank, ...

at least that is what i understood.

cheers
Philipp A.
2014-03-22 17:25:48 UTC
Permalink
Post by engelbert gruber
http://en.wikipedia.org/wiki/Uniform_resource_locator lists
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~
other characters must be encoded %20 for a blank, ...
at least that is what i understood.
that’s what the spec says, but everything except low-level code in browsers
and servers accepts such urls when pasted into address bars, links with
verbatim unicode in urls are clicked and so on.

i’d say that RST should support that as well.
engelbert gruber
2014-03-22 21:07:15 UTC
Permalink
docutils does , as does html.
html element is::

<a href="url">text</a>

there is url and there is text, it might be identical and it might not.

reST does::

`text <url>_
or text_

.. _text: url

just because sometimes url and text are identical and might work for you
it is not a general solution.

as --input-encoding was already mentioned, what to do about
--output-encoding, docutils
will have to honour that for text, but not for url, as url references
something somewhere else.

am i wrong or simply lost ?
Post by engelbert gruber
http://en.wikipedia.org/wiki/Uniform_resource_locator lists
Post by engelbert gruber
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~
other characters must be encoded %20 for a blank, ...
at least that is what i understood.
that's what the spec says, but everything except low-level code in
browsers and servers accepts such urls when pasted into address bars, links
with verbatim unicode in urls are clicked and so on.
i'd say that RST should support that as well.
Loading...