Since late 2009 there has been a tendency towards a previously unknown openness in providing governmental data to the public. Valuable data sources are opened not only to selected scientists, but everybody who has internet connection available. The following shall be an overview on Open Linked Data, Government Data Sets and how they are made available in different parts of the world. Starting with the U.K. countries in northern Europe, the U.S.A and Australia have already opened up their databases to the public. The different ways and similarities in Open Data, Government Data and Linked Open Government Data shall be outlined. Furthermore the different user groups and their needs will be considered.
Categories and Subject Descriptors
H.4 [ Information Systems ]: public information systems
Management, Documentation, Design, Economics, Standardization
Linked Data, Open Data, Linked Open Government Data, Semantic Web, data.gov, data.gov.uk
Copyright is held by the author
2011 January, Koblenz, Germany
During the last few years the giant amounts of data the governments around the world hold has become more and more part of public interest. These data sets collected by governmental organizations contain valuable data for the economy, education and science.
The web has become the mayor source of information and so it was only logical that at a certain point in time parts of the governmental data would become desired to be available for more people. Together with the open source movement all over the world finally Open Data appeared.
Several years ago Project Gutenberg appeared and made available great literature of the past. The first version documents were completely without any links and mainly those were not even needed. It was aimed at users who would read ancient books that are no longer protected by copyright. In fact there are several of these isolated projects.
Now that governments all over the world feel an increasing pressure for more transparency they started opening their data to the web, too. One of the most known attempts might be the “Open government directive” by the U.S. President in December 2009. The result of this directive was widely noticed and the amount of data published on the U.S. page data.gov has grown rapidly as Castro  summarized just four month after the directive.
On the other side of the Atlantic such efforts have been started in Great Britain, too. The corresponding data.gov.uk provides a large number of data for the public. In the EU a long existing data source called Eurostat is now becoming increasingly known and demanded. Since 2003 the EU PSI Directive simply demands the opening of governmental data. This directive must be implemented into national laws of European Union member states and is the driving force behind the efforts across Europe.
Furthermore, the data sources are being more and more connected by simple links and new web technologies. The increased possibilities and needs to provide not only data, but also a context leads to emerging semantic web technology. The often so called Web 3.0 develops with new technologies and standards. Different use and purposes are developing, becoming more web based and attracting different groups of users. What was only open to some scientists in the past is becoming accessible and interesting for the broad and heterogeneous web community. The existing target groups of different users addressed are changing and increasing, which results in different considerations for the ways to present and connect existing and new data.
2. Overview on the Terms
As simple as the terms sound, there are still slightly different views on what they mean. Such terms come up through the internet and develop more or less naturally, this means they are not standardized by any organization. Other terms are developed and explained by organizations like W3C.
Some of the definitions are explicitly defined in reliable sources, other are implicitly formed by the combination of well defined terms and the use of this combination.
2.1 Open Data
When is Open Data really “open”? Open Data is data available for the use by others. This data is shared in order to be used and republished. Among the different definitions “widest possible use of data” is turning up repeatedly. The use of Open Data is generally not restricted. It can be used by individuals and organizations, where commercial use is not explicitly excluded. Open Data can also be downloaded, copied and distributed by everybody. The results of this data being used can also be published.
In some cases there is still a non-commercial clause in place, but following most organizational definitions on the web, this is not really open data .
Following a clear definition for open data: “A piece of knowledge is open if you are free to use, reuse, and redistribute it.”
The model of creative common licenses also provides a definition of several grades of openness under licenses from simply allowing an unchanged distribution up to allowing almost anything with just an obligation to mention the original creator. These licenses can include or exclude commercial use, forming six versions.
Open Data doesn't necessarily mean that this data is linked in one or the other way. Nevertheless Open Data means that it can be linked in one or the other way.
2.2 Linked data
The term Linked Data alone doesn't mean that this data is open, in the first place it is linked to other data of the same or an other kind . Like there can be Open Data that is not linked, there can be Linked Data that is not fully open.
Some data might not have a meaning alone, but become useful as soon as it is linked to other data. Even meaningful data can be enriched and become more meaningful once it is linked to other data. Other data might be misleading without connection to the right context .
2.3 Open Linked Data
Open Linked Data is simply Open Data that has been connected with links that add value to this data.
A well known example of Open Linked Data is Wikipedia. Since such a framework of Open Linked Data is a never complete task, it can only be achieved with the participation of the enormous workforce available within the community like in the Open Data Movement. Open Linked Data doesn't necessarily follow standards, but it is desirable to follow standards to make usage easier.
Open Linked Data is one of the major stepping stones on the path to the more standardized, planned and modeled semantic web. Some sources of data are naturally desired to be usable in such open linked ways, but traditionally are not available as such. Among these large sources with increasing value through the technical development and possibility to process large data amounts is Government Data.
2.4 Government Data
Government Data is not born as Open Data. Government Data can and always will exist in non-open forms. Some Government Data must even remain confidential for security purpose, but other data can be published for the sake of transparency or its value to science and economy.
There is increasingly more Open Government Data these days, but nothing secures that this data is linked to other data. Some Government Data alone might be useless even if being openly available. The data might miss the context through linkage or usability for non-governmental users. There has been government data on the web for years, e. g. on Eurostat. Most of the available data was simply published without much context. The target group was small and users consisted mainly of scientists, students, politicians and some mainly large companies.
Nowadays this is changing. Governments are trying to show more transparency and publishing data sets is a rather simple way to show these efforts.
2.5 Open Government Data Sets
Just publishing Government Data is not a very interesting thing anymore. With the development of new technologies this data needs to be combined and complete data sets are desired instead of just slices of the existing data. The data sets shall not simply cover parts, but give a broader view on the issues. One of the widest known examples is the gross domestic product, which is not so interesting or scientifically useful alone, so it is usually published in relations to previous years or other countries.
Data Sets of all fields of public service can be interesting to scientists. A more challenging task is the to publish the data targeting at all possible users of the web community. Since such aims are hardly reachable the context and usability can be created by participation of the online community itself . The semantic web technology provided the framework for a development towards Linked Open Government Data.
2.6 Linked Open Government Data
The term Linked Open Government Data (LOGD) has been established where Open Government Data Sets are embedded in portals linking this data in the standards of the Semantic Web. LOGD includes some kind of infrastructure that supports the Open Government Data and makes it more valuable and usable.
To return to the example of the gross domestic product, it is possible to add information about price levels and relative purchasing power. The more data we link to this, the more conclusions we can draw from it.
Once these data sets a properly linked very user-friendly applications can return data like consumption data for certain areas by just entering a postcode and choosing form available data sets .
3. Data Quality's path to Semantic Web
Open Data can be available in different forms and qualities. The data can simply be offered for downloading or embedded somehow to enrich it, or the possible use.
3.1 Raw data
Raw Data can be in diverse formats without connection to other information. This data can be simply visible or downloadable from a web page. Providing such raw data is not a new phenomenon. Raw data has been provided to non-governmental institutions through the internet long before it became a connected web like nowadays. In the early days of the internet such data was simply available for education research etc. The new issue about the raw data you can download these days is, that it is open to everybody and easily discoverable to a much greater audience.
Raw data might be there in classic formats like TXT, CSV or a bit more modern in XML or formats like Excel and other Microsoft Office formats, that are also supported by open source products. Such Raw Data can be used in several local applications or be embedded into web applications by the user. Most Raw Data is provided as tables or spreadsheets, sometimes with some explanation that will help to put it into some context.
The Raw Data might be linkable in different ways. It might only be possible to place links somewhere else that lead to the download. In that case it will remain a kind of dead end for the traffic lead there, that leaves people just to return to the origin of the link for more related information.
It might be possible that there can also be links placed to the data to make it linked data. If this is there, it is meant to become linked data, and the tasks of linking it is just left to the open source and web community in order to save public funds. Such prepared data will become linked data within very short time after the go-live of the portal.
3.2 Linked Data
Linked Data means that the data is connected to other related data in the first place. Anyway there is a difference between simple links and more related information coming with the links. The linking might happen in different ways, making it difficult to compare them. Since this has always been a problem in the web world, there have been efforts to standardize them.
The definitions of linked data have been published by on W3C.ORG by Tim Berners-Lee as “rules” and “steps”:
- ISBN (eBook)
- ISBN (Book)
- File size
- 599 KB
- Catalog Number
- Institution / College
- University of Koblenz-Landau – Institut für Wirtschafts- und Verwaltungsinformatik
- Open Data Open Linked Data Government Data LOGD Open Data movement semantic web web 3.0 transparency public data meta-data public information systems RDF