Table of contents
2 Technologies and Terminology
2.2 Really Simple Syndication (RSS)
2.4 Blogs, Blogging and Blogosphere
2.5 Microblogging and Twitter
3.1.1 Meta Content Format (MCF) and Resource Description Format (RDF)
3.1.2 Channel Definition Format (CDF)
4.1 Online or Offline
4.3 Competitive Analyses
4.3.1 Google Reader
5 Blueprint of a collaborative feed reader
5.1 Graphical User Interface (GUI)
5.1.1 Subscription Lists
5.1.4 Searching feeds
5.2.1 Average Rating
5.2.2 Subscription Count
5.2.4 Coverage Percentage
5.3 Collaborative Filtering
5.4 Supported Formats
5.5 XLS Transformation
5.7 Tag cloud
5.8 Export and Import
5.9 OPML Format
5.11 Podcasts / Video Podcasts
5.12 Search / Search Agent
List of figures
figure 1: rss graphics in accordance to www.rss-specifications.com
figure 2: Google Reader
figure 3: videocast and tag cloud
figure 4: Draft collaborative feed reader
figure 5: IBM CoffeeReader
figure 6: Item-to-Item Collaborative Filtering from Amazon
figure 7: XSLT process
figure 8: Exemplary XSLT process
figure 9: Tag cloud from Amazon.com
figure 10: unofficial podcast logo
List of abbreviations
Abbildung in dieser Leseprobe nicht enthalten
Management of information is gaining rising importance in knowledge intensive projects. Many information sources in the web provide feeds for easy accessibility. While there are a variety of software tools for personal feed consumption, collaborative approaches are still rare.
This research project focuses on the theoretical aspect of feeds and their technical background like Atom and RSS. Furthermore it gives an overview about the historical development and intention of feeds and how it is used today.
It also provides an overview about related research projects and existing tools. The paper concludes with an ideal social feed reader using common principles of social software, like tagging, social networking, social recommendation and microblogging.
The access to the big bulk of information in the internet and also enterprise specific intranet has become more crucial. The borders between specialised and common knowledge are blurred. We are overstrained by the masses of emails, new Wikipedia entries, project timetable changes and updates on web pages.1
Nevertheless we have to keep track of the information flow in case something interesting comes along. The publications can be accessed via multiple devices like PDA’s, cell phones, notebooks, as well as MP3 and video players. Furthermore software vendors are releasing new applications on mobile devices, news agencies are establishing new channels to access the newest content and normal internet users are shifting away from being content customers and become content publishers. Participation is the big challenge of the internet as a platform, named Web 2.0.2
The intention of this paper is the discourse about the basic technologies and related research projects as well as existing tools. The paper finishes with a blueprint of a nice-to-have feed reader facilitating co-operation in working groups.
2 TECHNOLOGIES AND TERMINOLOGY
Feeds, also known as web feeds or news feeds are publish-subscribe delivery vehicles to spread recently updated content to subscribers.
Technically feeds are data in a specific format, which is primarily not directly humanreadable but can be interpreted and pre-processed by applications for the user to read. Feeds “[…] are gaining wide acceptance with applications spanning information delivery, sensor monitoring, auction systems, and air traffic control ” . 3
Little orange buttons labeled with XML, feed, RSS, Atom and further variants are available to represent the syndication - see figure one for an excerpt. In comparison to email, syndication formats do not need any personal information from the user before subscribing or unsubscribing, in contrast to email newsletters.
figure 1: rss graphics in accordance
Abbildung in dieser Leseprobe nicht enthalten
2.2 REALLY SIMPLE SYNDICATION (RSS) “RSS is a web content syndication format”.
This dialect of XML must be valid to the XML 1.0 specification of the World Wide Web Consortium (W3C). The current version of RSS is 2.0, but RSS 1.0 can still be found as an alternative. The abbreviation RSS represents either Really Simple Syndication or, out of date sometimes Rich Site Summary / RDF Site Summary. Software can be utilized to keep track of published content via the RSS format, so called ‘aggregators’ or ‘feed readers’.
“Atom is an XML-based document format that describes lists of related information known as ‘ feeds ’ . Feeds are composed of a number of items, known as ‘ entries ’ , each with an extensible set of attached metadata. ” 5
Atom has some crucial differences in comparison to RSS6:
- Atom contains a XML schema.
- RSS - elements can contain plaintext or HTML but it is not labeled, in contrast to Atom where elements can be used with plaintext , HTML or XHTML
- You can use relative URIs in an Atom document.
- In contrast to RSS, where the <description> element can contain a summary or the whole entry (which is very irritating due to a missing labeling mechanism), Atom utilizes a separate <summary> and <content> element to express the content.
- Each entry in an Atom - feed obtains a unique identifier.
Hence Atom is the competitor of RSS to one single standardized format of web syndication.
2.4 BLOGS, BLOGGING AND BLOGOSPHERE
“A blog is a web page that contains brief, discrete blocks of information called posts. These posts are arranged in reverse-chronological order […]. Each post is uniquely identified by an anchor tag, and it is marked with a permanent link that can be referred to by others […]”7
Software developers introduced new techniques like permalinks or trackbacks to the Web 2.0 community and lead them to a prospering grow. Publishers of weblogs are known as bloggers and the whole environment of blogs and bloggers is the blogosphere.8
2.5 MICROBLOGGING AND TWITTER
Microblogging is a relatively new and fast growing technology to share thoughts, status and information. Mobile users can publish their information with the aid of internet connected mobile devices like cell phones, Personal Digital Assistants (PDAs) or even notebooks. The length of is typically between 130 and 200 characters per message.9
“Text messages are uploaded to a microblogging service such as Twitter10, Jaiku11 and others, and then distributed to group members.”12
The main differences to the standard blogging are the frequency and length of the posts. Microblogging posts are restricted in length and updated much faster than regular blogs. A “normal” blogger will update his or her blog once every couple of days. Microbloggers will send several updates to their microblog(s) every day.
The following section gives a short overview about the development history of the two content delivery formats, RSS and Atom and their dispartment from one single standard. Most of this section is taken from Hammersley.13
3.1.1 META CONTENT FORMAT (MCF) AND RESOURCE DESCRIPTION FORMAT (RDF)
The first steps to RSS were made by the software developer Ramanathan V. Guha and his invention of the MCF. This Meta Content Framework was driven by the need of a standardized framework which was able to describe objects with attributes and linking them together in relationships. At Netscape Guha and Tim Bray transformed the MCF to an XML based format which grew up to become the RDF project.14 In its fullest form it is the basis of the Semantic Web, where “computers can understand the meaning of, and the relationships between, documents and other data.”15
3.1.2 CHANNEL DEFINITION FORMAT (CDF)
Microsoft followed the development of XML based content description formats with its CDF. It was released in March 1997 and submitted to the W3C shortly after.16 The language is able to characterize the content of a web page, as well as a “sites particular rating, scheduling, logos and metadata.”17
The first fully RDF - based format, RSS 0.9, was published in April 2001 by Dan Libby as an acronym for RDF Site Summary.18
RSS 0.91 was brought to the market three days later and created a trend for the future. It was not RDF - based and therefore not compatible to the first specification.
The release of version 1.0 on December 6, 2000 was the fallback to an RDF - based data model with a high complexity.
Two weeks later, Dave Winer from UserLand Software released the 0.92 version, an alternative to the RDF based versions (0.9 and 1.0).
After the release of RSS 2.0 on September 16, 2002 UserLand declared the standard as frozen and without further developing possibilities.
On July 15, 2003 UserLand gave the copyrights to Harvard and they published it under the Creative Commons Attribution / Share Alike Licence. It is assured that version 0.9 and 0.91 are also valid 2.0 files.19
To sum up, the standard is forked but both are basing on the XML standard of W3C20 and are extensible with own created namespaces, modules or containers.
Atom addressed the shortcomings of RSS and quickly widespread after the release of Atom 0.3. An early adopter was Google with the implementation in services such as Gmail and Google News.
In August 2005 an independent group of webloggers suggested the Atom 1.0 format to the IETF. It was approved and published in the RFC 4287 document, as Proposed Standard.21
4.1 ONLINE OR OFFLINE
It is a determining question whether the approach should be online available and therefore follow the thoughts of ‘cloud computing’ or whether a desktop based solution would still be enough. Software vendors offer many different approaches of feed readers and even feed aggregators addressing the different needs. The single solutions comprise advantages and disadvantages and of course it is also a question of personal preferences.
Online applications take advantage of a software already existing on nearly every computer, the web-browser. Features like history, bookmarks and search are already implemented. Modern web applications emulate the behaviour of desktop solutions and provide an accustomed environment to the user, with the advantages of relatively easy maintenance and platform independence.
The process of feed reading is closely bound to a permanent internet connection. This is surely a downside but in times of internet flatrates and mobile access a minor issue.
The user might want to read a full article of a feed excerpt, even if locally cached he has to switch between feed reader and web browser to follow included links, commenting on entries, streaming video or audio files.
In a web-based solution you just open another tab to receive the external data. To assure the collaborative requirement, online applications early began to offer communities for thought sharing, connecting and networking.
On the other hand desktop applications offer a responsive interface and fast data processing. Feeds can be stored locally an accessed everywhere e.g. in a plane or train.
You do not have to develop a very complex online web structure with a user concept, enough hard disk space to store all personal information and guaranteed access to the feed lists.
The platform dependence burdened the project with a higher workload for maintenance and development in the initial phase. To work around the platform dependence issue you can use JAVA, but the applications tend to an unresponsive interface.
Platform dependence comprise the necessity of synchronization, such as current ‘read-status’ or comments.
The feed reader should be web-based due to some advantages compared to a single desktop-based solution:
- It is platform independent and can be accessed with all types of web browsers.
- It provides a central data storage like personalized views, user data, additional metadata for the feeds and their entries.
- It enriches the platform with a collaborative background for user interaction and recommendation of information.
4.3 COMPETITIVE ANALYSES
As mentioned above the ‘nice-to-have’ feed reader is web-based and the following section will perform a small market analyse of already utilized feed readers. The focus lays on collaborative features and their arrangement in the web browser. It does not provide a full overview of the whole online feed reader market likewise desktop-based solutions are out of scope.
4.3.1 GOOGLE READER
Google Reader is the feed syndication approach from Google, one of the biggest suppliers of web-based software. It also offers some collaborative features like flagging articles, same as favourites in a web browser, recommending articles and publishing comments - see figure beside.22 To keep the overview about all subscriptions they also created a trend submenu to see how often the feeds were updated by the syndicator and how many posts you have read in the past and other
Abbildung in dieser Leseprobe nicht enthalten
figure 2: Google Reader
useful statistical evaluations. To round off the collaborative software they enabled the user to share their feeds and recommend them to others.
Newsgator is a fast growing software producer providing solutions for information management, desktop based software and furthermore web based clients to improve productivity.23
1 Hoguhton-Jan, “Being Wired or Being Tired - 10 Ways to Cope with Information Overload.”
2 Alby, "Web 2.0. Konzepte, Anwendungen, Technologien", chap. 6.
3 Liu, Ramasubramanian, and Sirer, “Client Behavior and Feed Characteristics of RSS, a PublishSubscribe System for Web Micronews.”
4 “RSS 2.0 Specification (RSS 2.0 at Harvard Law).”
5 IETF, “The Atom Syndication Format.”
6 cf. Alby, "Web 2.0. Konzepte, Anwendungen, Technologien", 154
7 Doctorow, Powers, and Trott, Essential Blogging.
8 cf. Alby, "Web 2.0. Konzepte, Anwendungen, Technologien“
10 cf. “Twitter: What are you doing?.”
11 cf. “Jaiku | Your Conversation.”
12 “What is Microblogging?.”
13 Hammersley, "Developing Feeds with RSS and Atom", p. 2-10.
14 cf. “Resource Description Framework (RDF) / W3C Semantic Web Activity.”
15 Hammersley, "Developing Feeds with RSS and Atom", p. 3.
16 cf. “Channel Definition Format Submission 970309.”
17 Hammersley, "Developing Feeds with RSS and Atom", p. 3.
18 cf. “Resource Description Framework (RDF) / W3C Semantic Web Activity.”
19 cf. “RSS 2.0 Specification (RSS 2.0 at Harvard Law).”
20 cf. “Extensible Markup Language (XML) 1.0 (Fifth Edition).”
21 IETF, “The Atom Syndication Format.”
22 “Google Reader.”
23 “NewsGator - Enterprise Social Computing via Social Sites on SharePoint, RSS and Widgets.”