# Open Data @ BFH CAS DA - Part I Study notes [and slides](./bfh-opendata-slides.html) for the Open Data lecture by Oleg Lavrovsky at the Berne University of Applied Sciences [CAS in Data Analysis](http://www.ti.bfh.ch/de/weiterbildung/weiterbildungsangebote/cas/datenanalyse/tabs/uebersicht.html). The course of study is designed for professionals interested in data projects, building experience in the analysis of data using desktop tools. The intent of this lecture is to present a practitioner perspective as well as some introductory background on open data, the open data movement, and several real-world projects - with details of the data involved, legal conditions and technical challenges. > *„Eine Überfülle an Information ist kein nebensächliches Problem. Große Mengen an Rohdaten bilden eine politische Tatsache. Die wachsenden Datenmengen führen zu einer Zentralisierung der Kontrolle. In der Kommunikation verringert sich dagegen die Informationsmenge durch die Interaktion der Menschen und ihre Interpretationen..“* -- Richard Sennett, [Wikipedia - Informationsüberflutung](https://de.wikipedia.org/wiki/Informations%C3%BCberflutung) ## 1. Attention ... is in short supply in our information overloaded society. *Data* is effectively put to *use* when there is the possibility of change in the *information*. Cycles of transforming data to useful information, lead us to *knowledge*. Question: how accessible and trustworthy are the filters to our knowledge? The open data [movement](https://en.wikipedia.org/wiki/Open_data), is concerned with sustainable and more universal access to data, leading to growth in each of these domains. We may even purport that the value of *data analysis* grows in correlation with the number of degrees of *openness* (i.e. openness to fellow experts, to colleagues, the wider organization, fellow citizens, entrusted algorithms) that are enabled by the transformation of data to knowledge. Put another way: we are interested in this virtuous cycle common to information systems: ``` Attention -> [ Data -> Information ] -> Knowledge -> Attention ``` The cycle above is dramatically boosted when data can flow directly to the end-user, through machine and human usable ways, creating feedback loops of information and knowledge. It still requires people to discover and pay attention to your message, then creates new opportunities for shared knowledge with constituents, customers, etc. ``` Attention -> Open Data -> New Information -> Shared Knowledge ``` While a similar problem is being addressed in more technical ways in various domains of information security such as [computational trust](https://en.wikipedia.org/wiki/Computational_trust), most of the open data movement is focused on the rewiring of interpersonal and organisational borders through data sharing. A leading light in this area is the [Open Knowledge](https://okfn.org) network, represented by the association [Opendata.ch](https://opendata.ch) in Switzerland. > *"Where there is perfect certainty, there is no information: There is nothing to be said."* -- Jimmy Soni & Rob Goodman [on Claude Schannon](http://nautil.us/issue/51/limits/how-information-got-re_invented) ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/seeing-theory.png) > *"At the core of Bayesian statistics is the idea that prior beliefs should be updated as new data is acquired."* -- Above image and quote from [Seeing Theory](http://www.seeingtheory.io), Daniel Kunin & al., Brown University ## 2. Definitions The concept of **Open Data** can be defined for our purposes as follows: “A piece of content or data is open if anyone is free to use, reuse, and redistribute it - subject only, at most, to the requirement to attribute and share-alike." -- [The Open Definition](http://opendefinition.org/) Open Data is legally supported by **licenses**, such as ... - [ODC-BY](https://opendatacommons.org/licenses/by/summary/) - [ODC-PDDL](https://opendatacommons.org/licenses/pddl/summary/) - [Creative Commons](https://creativecommons.org) Open Data is practically motivated through various **guidelines** ... - [5 stars](http://5stardata.info/en/) of Linked Open Data - Open Data Institute [badges](https://certificates.theodi.org/en/about/badgelevels) - Goals of the [Opendata.ch](https://opendata.ch/organisation/manifest/) community - Terms of [Opendata.swiss](https://opendata.swiss/en/terms-of-use/), more on this below ## 3. Methods Open Data is also about more open ... - **Formats**, e.g. [CSV](http://dataprotocols.org/), [GeoJSON](http://geojson.org/) - **Standards**, e.g. [Data Packages](http://frictionlessdata.io/data-packages/) - **Aggregators** of data, e.g. [Morph.io](http://morph.io), [Common Crawl](http://commoncrawl.org/) - **Catalogues** of metadata, e.g. [CKAN](http://ckan.org), [DCAT-AP](https://handbook.opendata.swiss/de/library/ch-dcat-ap.html) - **Tools** of wrangling, e.g. [GoodTables](http://goodtables.io) While there is a lot to say about the current technologies of big relational databases, web services and data harvesters, the future of Open Data is Linked ... ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/tblfivestars.jpg) -- Five stars of Linked Open Data, via [Cafepress](https://www.cafepress.com/mf/62597433/5-star-linked-open-data_mugs?productId=597992118) Linked Open Data sounds complicated, but is one of the key mechanism enabling relevant, detailed information searches - such as what we have gotten used to seeing daily in Google results. See [Moz blog](https://moz.com/blog/what-is-semantic-search) for a colorful explanation. It is also really impressive to think about the Web of data represented by the [Linked Open Data Cloud](https://lod-cloud.net/) Publishing data as 5-star [Linked RDF](https://de.wikipedia.org/wiki/Linked_Open_Data) is not especially hard. It requires some new tools, built around the computational approaches of graph databases and supporting the RDF and SPARQL languages - plus an awareness of the idea of the [semantic web](https://de.wikipedia.org/wiki/Semantic_Web) and ontologies, as we briefly covered in class. At a basic level it can be used in any web page with markup such as [microformats](http://microformats.org/code/hcard/creator) and [schema.org](http://schema.org/), which can be mapped to RDF. A good introduction in German can be found in [Linked (Open) Data - Von der Theorie zur Praxis](http://linkeddata.fh-htwchur.ch/) (HTW Chur), and in English from [Cambridge Semantics](http://www.cambridgesemantics.com/semantic-university/getting-started-semantics). For a more hands-on experience, a great way to get started is the [Wikidata Query engine](https://query.wikidata.org/). We looked into some detail into the Linked Open Data projects in Switzerland, watching this video by the [Linked Data Service – LINDAS](https://lindas-data.ch/) of the Swiss National Archives:
I also shared a recent community project, an [Advent's Calendar](https://xn--op-yka.ch/) (opü.ch) of Linked Open Data queries, which highlights the diversity and practical range of interesting things to discover on the Semantic Web. ## 4. In Switzerland Data **portals** build upon the experience of numerous community projects and prior efforts to organize information online relevant to a diverse user base. The main function is to make important **metadata** - such as time of update, terms of use, ownership, typology, schema - available in one place, searchable and cross-referenceable. ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/opendata-swissmetnet.png) --Screenshot of [opendata.swiss](https://opendata.swiss) They also host an important dialogue, serving to illustrate the challenges of publishing and reusing complex data, such as geographic data ("geodata"). ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/geoadmin-broadband.jpg) --Screenshot of [map.geo.admin.ch](http://map.geo.admin.ch) Starting with government departments who were early adopters of online publishing, like [BFS](http://www.bfs.admin.ch/bfs/portal/de/index/themen/) and [Swisstopo](https://www.swisstopo.admin.ch/), the central Swiss Open Government Data portal, [opendata.swiss](https://opendata.swiss), harvests datasets from numerous [public organizations](https://opendata.swiss/en/organization) into one place and supports efforts in [data publication](http://handbook.opendata.swiss). ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/opendata-swiss-steuern.png) --Screenshot of [opendata.swiss](https://opendata.swiss) Portals help users to understand and adopt the **terms of use**, both to be able to negotiate the various limitations and responsibilities placed on data reuse, as well as to consider the possible conditions under which future datasets are accessible. ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/opendata-swiss-terms.png) --Screenshot of [opendata.swiss](https://opendata.swiss/en/terms-of-use/) Note that these Terms (*Nutzungsbedingungen*), while similar in form to the [Creative Commons](http://creativecommmons.org) levels, are not the same as **licenses**. These are often applied to open data internationally, providing firmer legal grounding for further use and support. See [Open Licenses Service](http://licenses.opendefinition.org/) for examples. Data authorship, protection, and general rights of data producers and users are in Switzerland currently undergoing intense development, and are targets of legal scrutiny and debate. Stay tuned! We spent time in class going through a bunch of data sets from portals, loading them in web and desktop software, talking about the implications of the licensing constraints and file formats. ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/csvimport.png) *Screenshot of CSV importer in [Libreoffice Calc](https://de.libreoffice.org/discover/calc/)* ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/qgis.jpg) *Screenshot of various open datasets we loaded into [QGIS layers](https://www.qgis.org/de/site/)* ## 5. Community building > *Das Thema Open Data bewegt eine grosse Vielfalt von Akteuren in Behörden, Medien, Firmen und der wachsenden Schweizer Community einzelner Entwickler, Designer und Aktivisten. Die Dynamik ist da, der politische Wille entsteht, den Austausch findet statt.* -- [make.opendata.ch](http://make.opendata.ch) From the activity described above, partly an outcome of large international movements affecting all fields of business, academic and the civic sphere, partly the hard work of local changemakers. The result is fertile ground for an 'ecosystem' of open data providers and builders. Here are some examples of Swiss open data community projects: [Open Budgets](http://make.opendata.ch/wiki/project:open_budget) ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/openbudgets.png) [Transport Open Data](http://transport.opendata.ch) ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/opentransport.png) [Food Data Packages](http://food.schoolofdata.ch/) ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/datacentral.png) ... and there are [many more](https://forum.schoolofdata.ch/t/showcases-of-open-data-apps/253/1) to discover and contribute to. In class, we looked at several of these open data showcases, and talked about how social impact and business value continues to be generated in this way. Such communities endeavour to make data - already open data but in principle any data - even more usable and accessible to a wider public. One important vehicle is the **Hackathon**, a public event where data owners and users meet to work on brainstorming and prototyping possible new uses for data. > Video: [What is the value of open data?](https://vimeo.com/177211784#t=64s) - Interview with Oleg Lavrovsky in English by infoclio.ch At such hackdays or hackathons we focus on the "Data" and the "Use" in the equation above, trying to solve the chicken-and-egg problem of having no reasons to make data available which nobody knows anything about. Visit [hack.opendata.ch](http://hack.opendata.ch) to learn about past and upcoming events. ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/opentourism.png) In each case, understanding and using such projects - as well as creating new ones - requires special competencies, a fundamental one is the ability to think critically with abstract, factual knowledge. **Data literacy** means being an active user of data, being aware of possible "bugs" in the facts and opinions of others - ultimately the ability to base one's own decisions on verifiable evidence. ![](https://opendata.utou.ch/presentations/bfh%202019.1/img/ogdhandbook.png) There are several projects in Switzerland to improve educational material and create shared resources for data literacy, such as: - The OGD Handbook at [handbook.opendata.swiss](http://handbook.opendata.swiss) provides guidance for government and people who work with the public sector. A working group of Opendata.ch - [OpenSchoolMaps](https://OpenSchoolMaps.ch/) (community-based) and [sCHoolmaps.ch](https://www.schoolmaps.ch/) (government-run) are two of the many useful educational resources for working with (open) (geo)data. - [SchoolofData.ch](http://schoolofdata.ch), part of a civic society initiative involved in [research programs](https://schoolofdata.org/2016/01/08/research-results-part-1-defining-data-literacy/) with a grassroots international organization. ![](https://github.com/school-of-data/r-consortium-proposal/raw/master/R%20course%20survey%20responses/image_6.png) -- From [R survey responses](https://github.com/school-of-data/r-consortium-proposal/blob/master/R%20course%20survey%20responses/R%20course%20survey%20responses.md), School of Data on GitHub ## 6. Hands-on In this part of the introductory lecture, we discussed how to data is published and accessibility improved in several scenarios. We learned about some of the boundaries between private, instutitional and public data, looked at the mechanisms with which it is published, and the forms in which it leads to effective collaboration. Data Packages are an integral part of the design and complementary to [open data portals](https://ckan.org/2014/06/09/the-open-knowledge-data-packager/), in that they foster exchange of metadata within a wider community, encourage simple standards of universal access, and provide a mechanism for data validation, stricter attribution and better referencing of terms of use. In Moodle, I have suggested an exercise to use live Open Data using the [opendata.swiss API](https://handbook.opendata.swiss/support/api.html) with an [example R script](https://gist.github.com/loleg/aef3fd6aa91e2a65c80627bb0f29f49d) shared. --- © [Oleg Lavrovsky](mailto:open@datalets.ch), January 2019 Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.