Website syndication with RSS
By Dafydd ReesWhat is RSS and how does it help?
Consuming RSS using AmphetaDesk
Registering new feeds with AmphetaDesk
Tips for publishers creating RSS feeds
It can be difficult staying current when you have to read many different websites and mailing lists. Most community-based websites use mailing lists, but thanks to the ever-rising tide of spam, people are becoming increasingly reluctant to disclose their e-mail addresses.
And though on the one hand, the information that you can glean from websites and mailing lists can be quite valuable, you probably dont have the time to check many websites regularly and manage large numbers of mailing list subscriptions. If this sounds familiar, using content aggregating software based on RSS technology may be just what youre looking for.
What is RSS and how does it help?
In simple terms an RSS feed is just a file on a website that people can download to get a summary of all changes to published content and the corresponding links. RSS stands for Rich Site Summary or Really Simple Syndication. Rather than browse lots of sites every day, people use RSS aggregator software that regularly download these files in the background. The aggregator software then weaves all this content together - usually into a single, locally held web page. Readers can then get daily summaries of all the changes that occur on multiple sites from this page. This means that you can keep up with lots of websites without subscribing to vast numbers of mailing lists, and without divulging your e-mail address to many, potentially dubious mailing lists.Some readers prefer to read RSS feeds to mailing lists. Subscription and unsubscription are merely a matter of telling your aggregator which feeds you wish to read. This is also good for the web publisher because theres no need to maintain a web-enabled subscription list, nor provide facilities for recovering forgotten subscription passwords. This could also mean less Data Protection Act compliance work.
With RSS, readers that arent prepared to divulge their e-mail addresses can receive content that would otherwise require mailing list subscription. The originating website plays a purely passive role by publishing the RSS. When a reader decides to unsubscribe, theres nothing the website publishers can do to stop it. To be successful, RSS publishers have to concentrate on providing compelling, relevant content rather than trying to use technology to force readers to accept content through push technology warfare.
Consuming RSS using AmphetaDesk
There are many RSS aggregator programs around. In this article, well discuss AmphetaDesk - an open source aggregator available for Linux, Macintosh and Windows. You can download AmphetaDesk from http://www.disobey.com/amphetadesk.Windows users need to download a .zip file. At present, AmphetaDesk doesnt have a standard, Windows-based installation program, so you have to copy out the folder present in the .zip file and move it to a convenient location, say inside your My Documents folder. If youre using Microsoft Windows XP, you can do this by double clicking on the .zip file which will display the contents in a new window. Copy the folder shown in this window into My Documents. Users of other versions of Windows will have to use an archiver that can unpack .zip files, such as WinZip.
When you run AmphetaDesk, its going to try to update all the pre-set feeds registered. For this reason, you should run the program after connecting to the Internet. You run AmphetaDesk by double-clicking on the file with the pill icon in the folder youve copied into My Documents. When you do this, the aggregator starts up. The following window should appear:
Figure 1: The AmphetaDesk Application WindowDont close this window, because that would shut down the program; instead, you should minimise this window. You should see the pill icon in the system tray. After a minute or so, your web browser should pop up with a web page that looks like this:
Figure 2: The main web pageFigure 2 shows the main AmphetaDesk web page generated and served locally from a running copy of AmphetaDesk. Individual news channels are shown as tables, in which each row is a separate news item within the feed. Clicking on the title of each news item follows a link to the complete article on the originating web site.
Feeds are shown in descending order of freshness, so that new material is read first. This means that its easy to skim-read large numbers of feeds, focusing only on whats changed and jumping directly to articles of interest.
By default, AmphetaDesk polls each RSS feed website every three hours for updates although you can change this by using the page named My Settings. This is fine if youre prepared to leave AmphetaDesk running on your always-on broadband, or corporate LAN connected computer. However, if youre using a dial-up link, youre probably going to rely on the fact that AmphetaDesk checks for updates every time it is launched, so you simply start the program after dialling and wait for the main AmphetaDesk web page to appear.
AmphetaDesk can also be forced to check for new feeds, to open a new browser window on the main page, and shut down using the menu displayed by right-clicking on the pill icon in the system tray, as shown in Figure 3.
Figure 3: Right clicking on the pill icon in the system tray, presents the three main options.Registering new feeds with AmphetaDesk
By clicking on Add a Channel in the main window Figure 2, you can manually enter the address of an RSS feed. Some websites provide an easier way of registering feeds with AmphetaDesk: If for example, you go to the IT Wales RSS feed page at http://www.itwales.com/rss, you will see rows of buttons. Each row corresponds with an RSS feed. Each feed has three buttons as shown in Figure 4, below. The button depicting the AmphetaDesk pill allows one-click registration with AmphetaDesk. The button marked XML is a direct link to the RSS file itself. The other button, depicting a mug is for one-click registration of an RSS feed with the Radio Userland blogging system.
Figure 4: RSS feed registration buttons as they would appear on a website. The IT Wales subscription buttons are at http://www.itwales.com/rss.
Tips for publishers creating RSS feeds
Which RSS format?There are now many different RSS file formats. Every format is a separate XML dialect, although the complexity of formats varies considerably, from simple formats like RSS 0.91 to formats based on RDF which provide advanced metadata. Personally, Id prefer to get started with one of the older and simpler formats like RSS 0.91 because there is more software support for it and because theres probably more business value in providing basic syndication data quickly to a large audience than investing time and effort to provide very sophisticated features that only a few people will appreciate.
RSS is about summaries
Some feed publishers include multiple paragraphs, complex formatting and even entire articles within an RSS feed. Big, fat RSS feeds dont help the reader and can even be seen as arrogant or anti-social. Although your feed might be important to you it is only one of many from the viewpoint of the reader. Its unlikely that most people need to publish more than a title, publication date and a link for each article. Publishing a minimal RSS feed means that it downloads quickly and works even with the most basic aggregation software.
Which URL?
Give some consideration to the URL where you choose to publish the files. Changing the URL at which an RSS feed has been published can be a disaster because users will need to register the new URL in their aggregator. Its better to publish your feed at a short, simple and memorable URL, and make a long-term comitment to support it at that address. Of course many web servers, especially Java 2 Enterprise edition-compliant ones such as Jakarta Tomcat, support URL mapping so that even if the programs and the files behind the website change the URLs published can remain unchanged.
Managing the bandwidth
Publishing RSS feeds could mean encouraging large numbers of people to download a file from your website. The HTTP protocol provides a means of asking a web server whether a file has changed since a particular date. Web servers supporting this only send the file if it has changed since the date supplied, allowing great savings on bandwidth. If you are concerned about bandwidth issues you might want to read about some other peoples experiences at http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers.
Free publicity
There are websites where you can register your RSS feed. This allows third party websites to provide search and syndication services on your content. For example you can register your RSS feed at http://www.oreillynet.com/meerkat/ and http://www.syndic8.com.
Hit tracking
Links back to the BBC News website from its RSS feeds dont point directly to the articles. Links in its RSS are special URLs that include the type of RSS feed as well as the identity of the article to be retrieved. Its easy to imagine using HTTP redirects or internal server forwarding to return the same article for a different URL. Web publishers can use this trick to differentiate between ordinary hits and those originating in RSS feeds. The BBC news feeds can be found at http://www.bbc.co.uk/syndication.
What else can RSS do?
RSS provides a way of monitoring anything non-confidential that moves through a series of discrete changes over a period of hours, days or months that can be reached from an internet or intranet-connected machine.Its easy to write programs that index data either in the file system or in a database and format it as RSS. RSS feeds dont have to be about website changes. I have a patch for CruiseControl, a piece of software that continually builds and tests software. By patching the web-based reporting, I was able to create a feed that highlights changes in a software repository. This feed can be used in an aggregator just like any other.
Many on-the-fly conversion programs exist that simply re-arrange content fetched live from third-party websites. Perhaps the best known of these is the Bill Gates Wealth Clock http://philip.greenspun.com/WealthClock. As an experiment Ive written an on-the-fly converter that maps a discussion forum recent changes page into RSS. Of course, this isnt as reliable as patching the discussion forum software to create the feed directly, but it does provide a workable stopgap measure until the original site can be upgraded and it does demonstrate that you can manufacture RSS feeds even for websites that dont support them directly.
It isnt difficult to find new, useful applications for RSS.





