This module provides classes for parsing web syndication feeds in RSS and
To parse RSS, use Syndication::RSS::Parser.
To parse Atom, use Syndication::Atom::Parser.
If you want my advice on which to generate, my order of preference would
- Atom 1.0
- RSS 1.0
- RSS 2.0
My reasoning is simply that I hate having to sniff for HTML (see Syndication::RSS).
Syndication is Copyright
2005-2006 mathew , and is licensed under the same terms as
Built and tested using Ruby 1.8.4. Needs only the standard library.
Ruby already has an RSS library as part of the standard library, so you
might be wondering why I decided to write another one.
I started out trying to document the standard rss module, but found the
code rather impenetrable. It was also difficult to see how it could be made
documentable via Rdoc.
Then I tried writing code to use the standard RSS library, and discovered
that it had a number of (what I consider to be) defects:
- It doesn’t support RSS 2.0 with extensions (such as iTunes podcast
feeds), and it wasn’t clear to me how to extend it to do so.
- It doesn’t support RSS 0.9.
- It doesn’t support Atom.
- The API is different depending on what kind of RSS feed you are parsing.
I asked around, and discovered that I wasn’t the only person
dissatisfied with the RSS library. Since fixing the problems would have
resulted in breaking existing code that used the RSS module, I opted for an
This is the result. The first release was version 0.4, which was actually
my fourth attempt at putting together a clean, simple, universal API for
RSS and Atom parsing. (The first three never saw public release.)
Here are what I see as the key improvements over the rss module in the Ruby
- Supports all RSS versions, including RSS 0.9, as well as Atom.
- Provides a unified API/object model for accessing the decoded data, with no
need to know what format the feed is in.
- Allows use of extended RSS 2.0 feeds.
- Simple API, fully documented.
- Test suite with over 220 test assertions.
- Commented source code.
- Less source code than the standard library rss module.
- Faster than the standard library (at least, in my tests).
- Optional support for RSS 1.0 Dublin Core, Syndication and Content modules,
Apple iTunes Podcast elements, and Google Calendar.
- Content module decodes CDATA-escaped or encoded HTML content for you.
- Supports namespaces, and encoded XHTML/HTML in Atom feeds.
- Dates decoded to Ruby DateTime objects. Note, however, that this is slow,
so parsing is only performed if you ask for the value.
- Simple to extend to support your own RSS extensions, uses reflection.
- Uses REXML fast stream parsing API for speed, or built-in TagSoup parser
for invalid feeds.
- Non-validating, tries to be as forgiving as possible of structural errors.
- Remaps namespace prefixes to standard values if it recognizes the
In the interests of balance, here are some key disadvantages over the
standard library RSS support:
- No support for generating RSS feeds, only for parsing them. If
you’re using Rails, you can use RXML; if not, you can use rss/maker.
My feeling is that XML generation isn’t a wheel that needs
- Different API, not a drop-in replacement.
- Incomplete support for Atom 0.3 draft. (Anyone still using it?)
- No support for base64 data in Atom feeds (yet).
- No Japanese documentation.
- No XSL output options.
- Slower if there are dates in the feed and you ask for their values.
There are, of course, other Ruby RSS/Atom libraries out there. The ones I
- Much smaller than syndication or rss.
- Completely non-validating.
- Backwards compatible with rss in standard library.
- Doesn’t use a real XML parser.
- No support for namespaces.
- Incomplete Atom support (e.g. can’t get name and e-mail of
elements as separate fields, you still have to decode XHTML
- No documentation.
For the record, I started work on my library long before simple-rss was
This one solves most of the same problems as Syndication; however the two were
developed in parallel, in ignorance of each other.
Feedtools builds in database caching and persistance, and HTTP fetching.
Personally, I don’t think those belong in a feed parsing
library—they are easily implemented using other standard libraries if
you want them.
- Lots of test cases.
- Used by lots of Rails people.
- Knows about many more namespaces.
- Can generate feeds.
- Skimpy documentation.
- Uses HTree then XPath parsing, rather than a single stream parse.
- Tries to unify RSS and Atom APIs, at the expense of Atom functionality.
(Which could also be a pro, depending on your viewpoint.)
Here’s my design philosophy for this module:
- The interface should be via standard Ruby objects and methods; e.g.
feed.channel.item.title, rather than (say) a dictionary hash.
- It should be easier to parse RSS via the module than to hack something
together using REXML, even if all you want is a list of titles and URLs.
- It should be easy to add support for new RSS extensions without needing to
know anything about reflection or other advanced topics. Just define a
mixin with a bunch of appropriately-named methods, and you’re done.
- The code should be simple to understand.
- Even so, good complete documentation is extremely important.
- Be lenient in what you accept.
- Be conservative in what you generate.
- Get well-formed feeds parsing reliably, then worry about broken feeds.
- Atom will hopefully be the future. Provide full support for RSS, but
don’t hold Atom back by trying to force it into an RSS data model.
Here are some possible improvements:
- RSS and Atom generation. Create objects, then call Syndication::FeedMaker
to generate XML in various flavors. This probably won’t happen until
an XML generator is picked for the Ruby standard library.
- Faster date parsing. It turns out that when I asked for parsed dates in my
test code, the profiler showed Date.parse chewing up 25% of the total CPU
time used. A more specific ISO8601 specific date parser could cut that down
- Additional Google Data support. I just wanted to be able to display my
upcoming calendar dates, but clearly there is a lot more that could be
implemented. Unfortunately, recurring events don’t seem to have a
clean XML representation in Google’s data feeds yet.
There are doubtless things I could have done better. Comments, suggestions,
etc are welcome; e-mail .