Discussion:
[Docutils-users] publish_parts and table of contents?
Tony Narlock
2017-07-18 06:28:34 UTC
Permalink
Hello docutils users,

Is there a way to get the equivalent to “.. contents::” from publish_parts?

http://docutils.sourceforge.net/0.4/docs/api/publisher.html

If not, is there any way to get the table of contents programmatically?

If not, is there any reason / issue with it being a feature?

The practical use is a developer pulling out parts the HTML output would
like control over the table of contents output.
Tony Narlock
2017-07-19 12:34:12 UTC
Permalink
I’m getting there,

I’ve overrides the Writer and added a new visitor attribute for “toc”.

visitor_attributes = Writer.visitor_attributes + ('toc', )

So “toc” should be collected during assemble_parts.

From here, I want to inject the TOC the document into “toc”.

There’s transforms.parts.Content(), but I’m not sure where and how I’d put
it into play.

There is also a directive for parsers.rst.directives.parts.Contents. I’m
curious how I would go about injecting that.

So how do you think I can go about that? Could you give me a code example
to how I could get the equivalent of .. contents:: showing up in a “toc”
part of writer?

Best,

On July 18, 2017 at 1:32:12 PM, Guenter Milde via Docutils-users (
Post by Tony Narlock
Is there a way to get the equivalent to “.. contents::” from publish_parts?
According to
http://docutils.sourceforge.net/docs/api/publisher.html#publish-parts-details
,
no.
Post by Tony Narlock
If not, is there any way to get the table of contents programmatically?
This could be done in a wrapper; or a custom writer or front-end.

Places to look include the transforms.parts.Content() and the used HTML
writer.
Post by Tony Narlock
If not, is there any reason / issue with it being a feature?
I don't know.
Post by Tony Narlock
The practical use is a developer pulling out parts the HTML output would
like control over the table of contents output.
The simple and secure approach would be CSS styling --- unless the task is
more complicated.

GÃŒnter




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Reply All" to reply to the list.
Tony Narlock
2017-07-19 21:04:18 UTC
Permalink
On July 19, 2017 at 11:33:54 AM, Guenter Milde via Docutils-users (
Post by Tony Narlock
I’m getting there,
I’ve overrides the Writer and added a new visitor attribute for “toc”.
visitor_attributes = Writer.visitor_attributes + ('toc', )
So “toc” should be collected during assemble_parts.
From here, I want to inject the TOC the document into “toc”.
There’s transforms.parts.Content(), but I’m not sure where and how I’d put
it into play.
The transform generates the TOC by travelling the document tree after
parsing is complete.

A good representation of its output can be gained, if you create a
`minimal example` sample document, translate it to pseudoxml and examine
the result.

If the toc represents the logical structure you want to show in your output
document, fine. You can leave the transform as-is.
If not, explain the changes you think are necessary.

That means, in my situation, the transform itself is OK as is.

I just want publish_parts to output the HTML contents of TOC to its own
part.

e.g., In the same way publish_parts gives “html_body” and “fragment”, I
want it to return “toc”, which is the HTML output of the table of contents.
Post by Tony Narlock
There is also a directive for parsers.rst.directives.parts.Contents. I’m
curious how I would go about injecting that.
This should be the class triggered by the ``.. contents::`` directive. It
will ensure the transform is called at the right time and does the right
thing.

Yes. I pretty much want a new “part” created by HTMLWriter when
publish_parts is used that only contains “.. contents::” output.
Post by Tony Narlock
So how do you think I can go about that? Could you give me a code example
to how I could get the equivalent of .. contents:: showing up in a “toc”
part of writer?
Rather, you should provide a minimal working example showing what you want
to achieve (+ a diff between current and desired output).

https://gist.github.com/tony/c4fc5661fcd4b7de71c65dd8a52c9ea4



GÃŒnter


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Reply All" to reply to the list.

Thanks,

P.S. Using the new docutils 0.14rc2 release. Thanks GÃŒnter.
Tony Narlock
2017-07-20 13:10:25 UTC
Permalink
This has been helpful, thank you.

It would be indispensable to get a code example or demonstration.

On July 19, 2017 at 5:27:15 PM, Guenter Milde via Docutils-users (
Post by Tony Narlock
On July 19, 2017 at 11:33:54 AM, Guenter Milde via Docutils-users (
...
Post by Tony Narlock
Post by Tony Narlock
There’s transforms.parts.Content(), but I’m not sure where and how I’d put
it into play.
The transform generates the TOC by travelling the document tree after
parsing is complete.
I just want publish_parts to output the HTML contents of TOC to its own
part.
e.g., In the same way publish_parts gives “html_body” and “fragment”, I
want it to return “toc”, which is the HTML output of the table of contents.
Then, you can leave the transform and directive alone.
Post by Tony Narlock
Yes. I pretty much want a new “part” created by HTMLWriter when
publish_parts is used that only contains “.. contents::” output.
Post by Tony Narlock
So how do you think I can go about that? Could you give me a code example
to how I could get the equivalent of .. contents:: showing up in a “toc”
part of writer?
Rather, you should provide a minimal working example showing what you want
to achieve (+ a diff between current and desired output).
https://gist.github.com/tony/c4fc5661fcd4b7de71c65dd8a52c9ea4
1. Currently, table of contents is only outputted through directive.
...
Post by Tony Narlock
3. I want it to be available in "toc" *without* using the directive in the
source.
For this, you would need to run the "Contents" transform also if the
document does not contain the "contents" directive . Dont know how to do
this and whether it is worth the effort. I suppose, a wrapper adding the
directive to the input string may achieve the same result far more easy.
Post by Tony Narlock
2. I do not to position table of contents in the RST. (therefore, I
specifically do not want it in html_body)
You can suppress it with the "strip-elements-with-class" setting.


For an example of parts implementation, have a look how the "footer" part
is implemented in writers/_html_base.py. It is not trivial, make sure to
understand the code around every occurence of "footer" in the source file.

I am looking at those, I think I have it figured out how to add “toc”. This
is done by appending ’toc' to visitor_attributes. This gets collected
during assemble_parts().

But I’m not sure of where and how to inject a directive (e.g. Contents) via
Writer. That’d be needed to get the table of contents to be added to “toc”.



GÃŒnter


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Reply All" to reply to the list.
Tony Narlock
2017-07-21 16:07:44 UTC
Permalink
So here is where I am:
https://gist.github.com/tony/1a03b7668c9e33672f4465dd63c6076b

On July 20, 2017 at 11:54:07 AM, Guenter Milde via Docutils-users (
Post by Tony Narlock
On July 19, 2017 at 5:27:15 PM, Guenter Milde via Docutils-users (
...
Post by Tony Narlock
The transform generates the TOC by travelling the document tree after
parsing is complete.
...
Post by Tony Narlock
Post by Tony Narlock
https://gist.github.com/tony/c4fc5661fcd4b7de71c65dd8a52c9ea4
1. Currently, table of contents is only outputted through directive.
...
Post by Tony Narlock
3. I want it to be available in "toc" *without* using the directive in the
source.
For this, you would need to run the "Contents" transform also if the
document does not contain the "contents" directive.
Post by Tony Narlock
2. I do not to position table of contents in the RST. (therefore, I
specifically do not want it in html_body)
I suppose rather than messing with "parts", you can use the publish_*
functions in a wrapper script:

Don't use ``.. contents..`` in the source.

1. Parse the rst source with publish_doctree()

Returns a doctree object.


2. Export doctree to HTML with publish_from_doctree()

3. Run the toc-generating transform on the doctree.
Returns a "toc doctree".

Where would it be? Am I applying the transform correctly in the paste?



4. Export the "toc doctree" with publish_from_doctree().

Assuming I’m running the transform correctly, I see no difference in the
output.



This is just an idea, not tested and detailled.
Post by Tony Narlock
It would be indispensable to get a code example or demonstration.
This is left as an exercise to the reader.

This has been educational and is helping me understand internals better. I
prefer vanilla docutils whenever possible.

Any more ideas?




GÃŒnter



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Reply All" to reply to the list.
Tony Narlock
2017-07-21 20:26:25 UTC
Permalink
Here’s where it’s at now (after looking at footer and PEP code):
https://gist.github.com/tony/9c0d5eaa081b5ff611b7ca9e86a83046

Output:

<div class="toc">
<hr class="toc" />
<div class="contents topic" id="contents">
<p class="topic-title first">Contents</p>
</div>

</div>

So stuff is showing in TOC. But the pending contents information doesn’t
seem to be rendering.

On July 21, 2017 at 11:07:44 AM, Tony Narlock (***@git-pull.com) wrote:

So here is where I am:
https://gist.github.com/tony/1a03b7668c9e33672f4465dd63c6076b

On July 20, 2017 at 11:54:07 AM, Guenter Milde via Docutils-users (
Post by Tony Narlock
On July 19, 2017 at 5:27:15 PM, Guenter Milde via Docutils-users (
...
Post by Tony Narlock
The transform generates the TOC by travelling the document tree after
parsing is complete.
...
Post by Tony Narlock
Post by Tony Narlock
https://gist.github.com/tony/c4fc5661fcd4b7de71c65dd8a52c9ea4
1. Currently, table of contents is only outputted through directive.
...
Post by Tony Narlock
3. I want it to be available in "toc" *without* using the directive in the
source.
For this, you would need to run the "Contents" transform also if the
document does not contain the "contents" directive.
Post by Tony Narlock
2. I do not to position table of contents in the RST. (therefore, I
specifically do not want it in html_body)
I suppose rather than messing with "parts", you can use the publish_*
functions in a wrapper script:

Don't use ``.. contents..`` in the source.

1. Parse the rst source with publish_doctree()

Returns a doctree object.


2. Export doctree to HTML with publish_from_doctree()

3. Run the toc-generating transform on the doctree.
Returns a "toc doctree".

Where would it be? Am I applying the transform correctly in the paste?



4. Export the "toc doctree" with publish_from_doctree().

Assuming I’m running the transform correctly, I see no difference in the
output.



This is just an idea, not tested and detailled.
Post by Tony Narlock
It would be indispensable to get a code example or demonstration.
This is left as an exercise to the reader.

This has been educational and is helping me understand internals better. I
prefer vanilla docutils whenever possible.

Any more ideas?




GÃŒnter



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Reply All" to reply to the list.
Tony Narlock
2017-07-22 00:41:21 UTC
Permalink
Thanks for your help on this.

This is *way* trickier than it looks, with all due respect. Clocked in
almost two days on this so far.

Just trying to get the table of contents separate from html_body.
Seriously considering adding ..contents:: to the source, building HTML and
ripping out the ToC via LXML.

Love reStructuredText and docutils (been having quite a few internal
successes lately), but this particular task feels like going against the
grain.

On July 21, 2017 at 3:39:58 PM, Guenter Milde via Docutils-users (
Post by Tony Narlock
https://gist.github.com/tony/1a03b7668c9e33672f4465dd63c6076b
No time to look.
Post by Tony Narlock
On July 20, 2017 at 11:54:07 AM, Guenter Milde via Docutils-users (
Post by Tony Narlock
On July 19, 2017 at 5:27:15 PM, Guenter Milde via Docutils-users (
...
I suppose rather than messing with "parts", you can use the publish_*
Don't use ``.. contents..`` in the source.
1. Parse the rst source with publish_doctree()
Returns a doctree object.
2. Export doctree to HTML with publish_from_doctree()
Does this work?

Yes, this just gives CSS + HTML for way more than I need. Am I supposed to
see anything special in the HTML or are you just checking that
publish_doctree+publish_from_doctree works (it does).

Way more than html_body (all I need, aside from ToC). And I’m not sure what
I can do with this content?
Post by Tony Narlock
Post by Tony Narlock
3. Run the toc-generating transform on the doctree.
Returns a "toc doctree".
Where would it be?
In docutils/transforms/parts.py
Post by Tony Narlock
Am I applying the transform correctly in the paste?
Post by Tony Narlock
4. Export the "toc doctree" with publish_from_doctree().
Assuming I’m running the transform correctly, I see no difference in the
output.
So I suppose you don't apply it correctly.

The idea is to collect generate a TOC by travelling over the doctree in
the same manner as it is done by the "Contents" transform.

Therefore, it should be possible to use
docutils.transforms.parts.Contents.build_contents() and pass it the
startnode of the doctree returned by "publish_parts".
Post by Tony Narlock
Post by Tony Narlock
This is just an idea, not tested and detailled.
GÃŒnter



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Reply All" to reply to the list.
David Goodger
2017-07-22 03:40:12 UTC
Permalink
Post by Tony Narlock
Thanks for your help on this.
This is way trickier than it looks, with all due respect.
Just because you're trying to hack Docutils without a sufficiently
deep understanding of the internals.
Post by Tony Narlock
Clocked in almost
two days on this so far.
Hopefully this exercise has improved your understanding!
Post by Tony Narlock
Just trying to get the table of contents separate from html_body. Seriously
considering adding ..contents:: to the source, building HTML and ripping out
the ToC via LXML.
Love reStructuredText and docutils (been having quite a few internal
successes lately), but this particular task feels like going against the
grain.
Have you read the documentation? There's no one place for what you
want, it's spread out. See:

* http://docutils.sourceforge.net/docs/ref/transforms.html
* http://docutils.sourceforge.net/docs/peps/pep-0258.html#transformer
* http://docutils.sourceforge.net/docs/dev/hacking.html

Also, see the code. There's lots of inline documentation in docstrings
and comments.

Ultimately, you need to understand the flow of data in Docutils, how
all the components interrelate.

No, no, no, don't tug on that. You never know what it might be attached to.

— Buckaroo Banzai (during brain surgery)

I think the attached code will get you most of the way to where you want to go.

DG
Post by Tony Narlock
On July 21, 2017 at 3:39:58 PM, Guenter Milde via Docutils-users
Post by Tony Narlock
https://gist.github.com/tony/1a03b7668c9e33672f4465dd63c6076b
No time to look.
Post by Tony Narlock
On July 20, 2017 at 11:54:07 AM, Guenter Milde via Docutils-users (
Post by Tony Narlock
On July 19, 2017 at 5:27:15 PM, Guenter Milde via Docutils-users (
...
I suppose rather than messing with "parts", you can use the publish_*
Don't use ``.. contents..`` in the source.
1. Parse the rst source with publish_doctree()
Returns a doctree object.
2. Export doctree to HTML with publish_from_doctree()
Does this work?
Yes, this just gives CSS + HTML for way more than I need. Am I supposed to
see anything special in the HTML or are you just checking that
publish_doctree+publish_from_doctree works (it does).
Way more than html_body (all I need, aside from ToC). And I’m not sure what
I can do with this content?
Post by Tony Narlock
Post by Tony Narlock
3. Run the toc-generating transform on the doctree.
Returns a "toc doctree".
Where would it be?
In docutils/transforms/parts.py
Post by Tony Narlock
Am I applying the transform correctly in the paste?
Post by Tony Narlock
4. Export the "toc doctree" with publish_from_doctree().
Assuming I’m running the transform correctly, I see no difference in the
output.
So I suppose you don't apply it correctly.
The idea is to collect generate a TOC by travelling over the doctree in
the same manner as it is done by the "Contents" transform.
Therefore, it should be possible to use
docutils.transforms.parts.Contents.build_contents() and pass it the
startnode of the doctree returned by "publish_parts".
Post by Tony Narlock
Post by Tony Narlock
This is just an idea, not tested and detailled.
GÃŒnter
Matěj Cepl
2017-07-22 05:04:23 UTC
Permalink
Post by David Goodger
Post by Tony Narlock
Thanks for your help on this.
This is way trickier than it looks, with all due respect.
Just because you're trying to hack Docutils without a sufficiently
deep understanding of the internals.
Yeah, but that’s the problem for most people who would like to
hack on docutils. I am following this thread with some level of
dread, because these are exactly operations I will probably need
if I am thinking about writing that rst2epub. And frankly this
thread does not increase my faith in my own ability to write
such script (if I had the time to do so, that is).

I will certainly study your attached example.

Best,

Matěj
--
http://matej.ceplovi.cz/blog/, Jabber: mcepl<at>ceplovi.cz
GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8

He uses statistics as a drunken man uses lamp-posts... for
support, rather than illumination.
-- Andrew Lang



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "Reply All" to reply to t
Tony Narlock
2017-07-22 23:13:18 UTC
Permalink
I can confirm getting the node information.

The issue I have is getting the HTML from toc_list.

The only other problem I have is:

from docutils.writers.html5_polyglot import Writer
core.publish_from_doctree(toc_list, writer=Writer())

Traceback (most recent call last):
File “./try2.py", line 48, in <module>
core.publish_from_doctree(toc_list, writer=Writer())
File ".venv/lib/python3.6/site-packages/docutils/core.py", line 521, in
publish_from_doctree
return pub.publish(enable_exit_status=enable_exit_status)
File ".venv/lib/python3.6/site-packages/docutils/core.py", line 218, in
publish
self.apply_transforms()
File ".venv/lib/python3.6/site-packages/docutils/core.py", line 199, in
apply_transforms
self.document.transformer.apply_transforms()
File “.venv/lib/python3.6/site-packages/docutils/transforms/__init__.py",
line 162, in apply_transforms
self.document.note_transform_message)
AttributeError: 'bullet_list' object has no attribute
'note_transform_message'
Post by Tony Narlock
Thanks for your help on this.
This is way trickier than it looks, with all due respect.
Just because you're trying to hack Docutils without a sufficiently
deep understanding of the internals.
Post by Tony Narlock
Clocked in almost
two days on this so far.
Hopefully this exercise has improved your understanding!
Post by Tony Narlock
Just trying to get the table of contents separate from html_body.
Seriously
Post by Tony Narlock
considering adding ..contents:: to the source, building HTML and ripping out
the ToC via LXML.
Love reStructuredText and docutils (been having quite a few internal
successes lately), but this particular task feels like going against the
grain.
Have you read the documentation? There's no one place for what you
want, it's spread out. See:

* http://docutils.sourceforge.net/docs/ref/transforms.html
* http://docutils.sourceforge.net/docs/peps/pep-0258.html#transformer
* http://docutils.sourceforge.net/docs/dev/hacking.html

Also, see the code. There's lots of inline documentation in docstrings
and comments.

Ultimately, you need to understand the flow of data in Docutils, how
all the components interrelate.

No, no, no, don't tug on that. You never know what it might be attached to.

— Buckaroo Banzai (during brain surgery)

I think the attached code will get you most of the way to where you want to
go.

DG
Post by Tony Narlock
On July 21, 2017 at 3:39:58 PM, Guenter Milde via Docutils-users
Post by Tony Narlock
https://gist.github.com/tony/1a03b7668c9e33672f4465dd63c6076b
No time to look.
Post by Tony Narlock
On July 20, 2017 at 11:54:07 AM, Guenter Milde via Docutils-users (
Post by Tony Narlock
On July 19, 2017 at 5:27:15 PM, Guenter Milde via Docutils-users (
...
I suppose rather than messing with "parts", you can use the publish_*
Don't use ``.. contents..`` in the source.
1. Parse the rst source with publish_doctree()
Returns a doctree object.
2. Export doctree to HTML with publish_from_doctree()
Does this work?
Yes, this just gives CSS + HTML for way more than I need. Am I supposed to
see anything special in the HTML or are you just checking that
publish_doctree+publish_from_doctree works (it does).
Way more than html_body (all I need, aside from ToC). And I’m not sure what
I can do with this content?
Post by Tony Narlock
Post by Tony Narlock
3. Run the toc-generating transform on the doctree.
Returns a "toc doctree".
Where would it be?
In docutils/transforms/parts.py
Post by Tony Narlock
Am I applying the transform correctly in the paste?
Post by Tony Narlock
4. Export the "toc doctree" with publish_from_doctree().
Assuming I’m running the transform correctly, I see no difference in the
output.
So I suppose you don't apply it correctly.
The idea is to collect generate a TOC by travelling over the doctree in
the same manner as it is done by the "Contents" transform.
Therefore, it should be possible to use
docutils.transforms.parts.Contents.build_contents() and pass it the
startnode of the doctree returned by "publish_parts".
Post by Tony Narlock
Post by Tony Narlock
This is just an idea, not tested and detailled.
GÃŒnter
Loading...