I mentioned in my last post how useful Ben Welsh's code recipe's are. Count this post as my effort to encourage the practice among coding journalists.
Since launching the NewsHour's Annotated State of the Union, I've gotten a few questions about how it worked, particularly about linking comments to paragraphs. What's needed is paragraph-level permalinks. As it turns out, that's pretty easy to do.
The first thing you'll need is a block of clean HTML. Then, you'll need something that can parse and modify that HTML. Fortunately, tools abound.
Doing it server side
Start by importing the library and loading your HTML:
from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(html)
soup instance gives you a bunch of methods to traverse a block of HTML. In this case, we want to find all the paragraphs, then loop through them and alter each one, added an
id attribute, then adding an anchor link as a child element.
paras = soup.findAll('p', recursive=True)
findAll method does exactly what it sounds like: It finds every
<p> tag. Make sure to add
recursive=True as an argument. That keeps BeautifulSoup from going more than one level deep in the DOM, which means it will ignore anything in
blockquotes or lists. If you want permalinks for those, drop the recursive argument, or make it
Now, let's do something with these paragraphs:
for i, p in enumerate(paras): p['id'] = "p%s" % i
We're enumerating each paragraph here, creating a second variable,
i, for each iteration of the loop. Use that to set the
id attribute on each
<p>, which just follows Python's dictionary style.
Now we just need to add a link back to the same paragraph. That means creating a new tag, then inserting it as a child of the
from BeautifulSoup import BeautifulSoup, Tag, NavigableString soup = BeautifulSoup(html) paras = soup.findAll('p', recursive=False) for i, p in enumerate(paras): p['id'] = 'p%s' % i # here's where we create a new link a = Tag(soup, 'a') a['href'] = '#p%s' % i a['title'] = 'Link to this paragraph' a.insert(0, NavigableString(' #')) # link text p.append(a) # just add our new a tag to the end of p's contents
Create a tag by instantiating a
Tag object, passing in our
soup instance and the tag we're creating,
<a> in this case. Then, like we did before, set attributes like dictionary keys. Link text (or any text) is a
NavigableString instance, which we insert into our
a tag. Once that's done, we have a complete link. Just append that to our
p tag, and it will drop in at the end. The important thing to remember here is that BeautifulSoup treats HTML element attributes like dictionaries and contents like lists.
Finally, let's wrap this in a function so it's reusable. And since I'm a Django guy, I'll make it a template filter, so I can drop it into my blog entry template. Here's the complete code:
from django import template from BeautifulSoup import BeautifulSoup, Tag, NavigableString register = template.Library() @register.filter def enumerate_paras(html): """ Given a block of HTML, create id attributes on each paragraph (p) and write in a link at the end of each. >>> html = '''<p>First line.</p> ... <p>Second line.</p>''' >>> print enumerate_paras(html) <p id="p0">First line.<a href="#p0" title="Link to this paragraph"> #</a></p> <p id="p1">Second line.<a href="#p1" title="Link to this paragraph"> #</a></p> """ soup = BeautifulSoup(html) paras = soup.findAll('p', recursive=False) for i, p in enumerate(paras): p['id'] = 'p%s' % i a = Tag(soup, 'a') a['href'] = '#p%s' % i a['title'] = 'Link to this paragraph' a.insert(0, NavigableString(' #')) # link text p.append(a) # just add our new a tag to the end of p's contents return soup.renderContents()
That's it. A template filter is just a Python function. We're returning
soup.renderContents() in this case, which outputs a string of HTML. You could also use
str(soup) or just
soup (in which case, Django's template system will turn it into unicode). Just make sure you return something.