Friday, May 25, 2007

Image of the Day 'extension' of FeedFeeder experience

Introduction

For a huge work project I was asked to extend FeedFeeder so we could have an image of the day type. I like FeedFeeder and this would give me the chance to speed up on Five and Zope 3. I was so excited! Five turned out to be lots of fun and not that hard to learn. Zope 3 Products looked like tons of fun, sort of like Plone products on steroids or J2EE done right.

And yet the effort turned into an exercise of frustration. Why?

The thing I like about Plone (and now Five / Zope 3 (Z3)) is that once you grok something, it is really easy to make new content types or extend new ones. You can do that via UML using ArchGenXML, or simply code it out yourself. I really enjoy this part of plone, being able to handle new content types so easily.

Things go downhill

Alas, FeedFeeder is a weird amalgam of ArchGenXML output and Five technology. ArchGenXML generated a lot of the boilerplate, but then the authors went and did things in all sorts of weird places that violate what you are supposed to do with ArchGenXML output. It is very obvious that FeedFeeder is the designer's method of learning and Five and component based design. Which is nice, but what you get is something that is very hard to extend and control. By combining ArchGenXML and Zope 3 and Five the way they did, it is actually more work to make it do what we want.


Normally adding a new content type in Plone / Five / Zope 3 / ArchGenXML, you just extend/implement an existing content type and modify either a ZCML (Z3, Five) or Install.py (Plone) or AppInstall (ArchGenXML) and maybe stick in an adapter (Z3, Five). Well, not so much in FeedFeeder, where to add an extended/new content type you have to modify the following: configure.zcml, install.py, and pretty much rebuild the content type from scratch. Then add in views and skins. Then pray it works. Just getting a simple extension without any modification to work was looking to be way, way to much work.

After working on it for hours yesterday at CC, and an hour last night, and during the wee hours between 3 and 5 am when I thought I had an epiphany, I was getting really frustrated. This wasn't like association classes where I knew that if I could just find the right bits in the lackluster docs it would work, I really felt like I was walking through someone's spaghetti code.

The Recovery

So I start thinking about building my own product from scratch to do the work. This annoyed me. I hate reinventing the wheel. Oh well, time to take notes on what it was doing. And I discovered right away that Feedfeeder curiously had something called enclosures. I researched it out, and you know what? Enclosures where a way to include content in Atom and RSS feeds.

I tested it out. I created a set of sample Atom and RSS feeds with enclosures with images. Then I created a FeedFeeder folder called 'iotd feed' to grab them. Then I check, and FeedFeeder grabbed the images and stored them as enclosure objects!

POW!!! Done!!! Feedfeeder does it already! All we need to do now is create a smart folder or view that looks for the 'iotd feed' folder to supply image of the day views!

Notes & Lessons learned

  • I'm still not happy with FeedFeeder's internal architecture. But since out of the box it does everything we need and probably more, do we need to care?
  • When I hit a brick wall like this I need to do more research. Especially when it comes to something that is using a standard. For example, I need to get a 100% understanding of something before I start trying to invent something new.
  • Time Summary:
    • Time spent learning Five: 1 hour
    • Time spent playing with Zope 3: 2 hours
    • Time spent trying to extend FeedFeeder gracefully: 4 hours
    • Time spent extending FeedFeeder via gruesome hacking: 2 Hours
    • Time spent examining FeedFeeder for things needed in a new product: 15 minutes
    • Time spent researching Atom/RSS for how they handle enclosures: 5 minutes
    • Time spent testing out how FeedFeeder handled enclosures: 15 minutes
    • Total Time: 9 hours and 35 minutes
  • Lesson Learned: Do your @#$%ing research before you commence work!

Wednesday, May 23, 2007

Interfaces in Zope 3 and Five

We use Plone a lot on the job. And Plone is leaning towards Zope 3 these days, which means Interfaces (thanks to Zope 2's inclusion of Five)! Until now I've not had a reason to really poke at Interfaces, because Python, unlike Java, doesn't really need them. Also, the Plone work I've done so far has been via UML, external methods, views (ZPT), and fancy install scripts.

However, now I'm working with feedfeeder, a Plone ATOM/RSS handler and we need to extend it to include an image-of-the-day content type. And feedfeeder is built with a lot of Interfaces and Five technology.

Alas, I don't have a Zope 3 book handy here at work.

Until my copy comes in, I'm using the Zope 3 tutorial created by the Zope 3 book author. Good stuff indeed. I'll post my thoughts when I'm done.

Wednesday, May 16, 2007

Beautiful Soup is Beautiful

I have a bunch of content stored on an old instance of pmwiki. I've never liked pmwiki, since it seems to only have a half-hacked state method, and just in general feels insecure. Also, I've found that wikis can be useful, but if you have short content on each page, often a FAQ style treatment will do better than a regular wiki.

So I decided to convert the pmwiki pages into a pbwiki toc construct. It would put all the content onto one page, and use the tag to provide a top level table of contents. That meant I would have to:
  1. Scrape the pmwiki content index for all the meaningful links.
  2. Scrape out the title and urls of each link.
  3. Grab the content from each link.
  4. Reformat it all to work in the pbwiki format.
I've done screen scraping before, but not in Python, and not in this scope of effort. Well, Python seems to do everything well so I opened up htmllib and started to play, thinking I would be done by brunch-time.

Immediately I'm unhappy with htmllib. The docs suck. And it just seems awkward to use once I figure it out. Doesn't feel Pythonic, although I'm sure I'm wrong in that respect somehow. Its just for me, my Python pseudo code often ends up being close to the end effort. And this was not the case.

Then a work buddy told me about Beautiful Soup. Its an HTML/XML parser that is real easy to use and can work with badly formed HTML, like the sort that pmwiki sometimes generates. Its not optimized for speed, but for usability. Thats fine with me, because this is a one-time operation on maybe 150-200 entries.

The final effort worked real nice. Not super fast, but real easy to code. Beautiful Soup meant what I thought would be a quick and simple task remained so.

Wednesday, May 9, 2007

The Dark Side begins its seduction...

...but I don't think many will convert.

I'm talking about the Python community embracing the Microsoft Silverlight Dynamic Language Runtime (DLR). This lets us write Iron Python for the browser, in the same way that Action Script is used in Flash, or Java in Applets.

It just seems odd. Plus, Iron Python, for all its virtues, has to do some Microsoftisms in order to work. This means that unlike moving my code wrote using cpython on Windows to Mac OS X to FreeBSD to UNIX to Linux, I have to worry about when I move from cpython to Iron Python. So this means we will have a big code set (Iron Python) in DLR that won't work anywhere else.

Bleah.

Here is to hoping that Silverlight does not take off well.

Tuesday, May 8, 2007

JSON and Python

I hate XML.
I love JSON.

Now that I've made that clear, lets go into how I want Python to handle my JSON.
  1. I want a built-in function to JSONify.
  2. I want a built-in function to deJSONify.
Now in the Python world there are multiple JSON libraries. cJSON, simplejson, demjson and more. I've used a few of these and they all seem good. However, most have extraneous methods that I don't care about, or name their json handling methods funny. For example, simplejson has you do loads() and dumps() for loading and dumping of objects/strings, and load() and dump() for loading and dumping of files. Kind of nice, but all I really want is a JSONify and a deJSONify function from my JSON handler. I can do the rest!

So what I've thought about doing is writing a Python JSON package that would do the following:
  1. Load a JSON handling package. It would have a list to select from so that if simplejson was not already on your machine, it might check for cjson, demjson, etc and grab the alternate instead.
  2. Have a JSONify function that would convert Python objects to JSON.
  3. Have a deJSONify function to convert JSON into Python objects.
This looks like a fun and handy little project.