[Docutils-users] extract code from code directive from restructuredtext using docutils

Discussion:

Mark Andrews

2015-02-03 11:54:00 UTC

(I asked the following question on stackoverflow two days ago. It has not
been given any answers or even comments and very few views, so I was hoping
that it would ok to ask it again here.)

I would like to extract out the source code verbatim from code directives
in a restructuredtext string.

What follows is my first attempt at doing this, but I would like to know if
there is a better (i.e. more robust, or more general, or more direct) way
of doing it.

Let's say I have the following rst text as a string in python:

s = '''

My title
========

Use this to square a number.

.. code:: python

def square(x):
return x**2

and here is some javascript too.

.. code:: javascript

foo = function() {
console.log('foo');
}

'''

To get the two code blocks, I could do

from docutils.core import publish_doctree

doctree = publish_doctree(s)
source_code = [child.astext() for child in doctree.children if 'code'
in child.attributes['classes']]

Now *source_code* is a list with just the verbatim source code from the two
code blocks. I could also use the *attributes* attribute of *child* to find
out the code types too, if necessary.

It does the job, but is there a better way?

Guenter Milde

2015-02-15 16:42:14 UTC

Permalink

[-- Type: text/plain, Encoding: --]
(I asked the following question on stackoverflow two days ago. It has not
been given any answers or even comments and very few views, so I was hoping
that it would ok to ask it again here.)
I would like to extract out the source code verbatim from code directives
in a restructuredtext string.
What follows is my first attempt at doing this, but I would like to know if
there is a better (i.e. more robust, or more general, or more direct) way
of doing it.
s = '''
My title
========
Use this to square a number.
.. code:: python
return x**2
and here is some javascript too.
.. code:: javascript
foo = function() {
console.log('foo');
}
'''
To get the two code blocks, I could do
from docutils.core import publish_doctree
doctree = publish_doctree(s)
source_code = [child.astext() for child in doctree.children if 'code'
in child.attributes['classes']]
Now *source_code* is a list with just the verbatim source code from the two
code blocks. I could also use the *attributes* attribute of *child* to find
out the code types too, if necessary.
It does the job, but is there a better way?

Looks plain and clean to me. If you are using Docutils anyway, then this is
the way to go. If you want to skip the overhead of parsing the complete
document just to extract code, you could also create a "minimal parser",
that just looks for ".. code::" (or maybe also literal blocks) and copies
the following indented block. (This is what PyLit does.)

Günter

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Docutils-users mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-users

Please use "R

Michael Prisant

2015-02-15 20:13:12 UTC

Permalink

Thanks for drawing attention to PyLit. This looks like a great package --
Gunter do you intend to continue maintaining it?

Michael

Post by Mark Andrews

[-- Type: text/plain, Encoding: --]
(I asked the following question on stackoverflow two days ago. It has not
been given any answers or even comments and very few views, so I was

hoping

that it would ok to ask it again here.)
I would like to extract out the source code verbatim from code directives
in a restructuredtext string.
What follows is my first attempt at doing this, but I would like to know

there is a better (i.e. more robust, or more general, or more direct) way
of doing it.
s = '''
My title
========
Use this to square a number.
.. code:: python
return x**2
and here is some javascript too.
.. code:: javascript
foo = function() {
console.log('foo');
}
'''
To get the two code blocks, I could do
from docutils.core import publish_doctree
doctree = publish_doctree(s)
source_code = [child.astext() for child in doctree.children if 'code'
in child.attributes['classes']]
Now *source_code* is a list with just the verbatim source code from the

two

code blocks. I could also use the *attributes* attribute of *child* to

find

out the code types too, if necessary.
It does the job, but is there a better way?

Looks plain and clean to me. If you are using Docutils anyway, then this is
the way to go. If you want to skip the overhead of parsing the complete
document just to extract code, you could also create a "minimal parser",
that just looks for ".. code::" (or maybe also literal blocks) and copies
the following indented block. (This is what PyLit does.)
GÃŒnter
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Docutils-users mailing list
https://lists.sourceforge.net/lists/listinfo/docutils-users
Please use "Reply All" to reply to the list.

--
Michael G. Prisant

<***@gmail.com>