VU Humanities Graduate Seminar January 30th, 2014

Today we attended and presented at the faculty of humanities at the VU during the afternoon program of the graduate seminar. The theme of the afternoon, Revolutions in the Humanities, consisted of several project presentations and ended with a panel discussion between professors from the university who in their work contribute to the faculty of humanities.

The discussion was triggered by a themed edition of De Groene Amsterdam, a independent Dutch weekly magazine, named Humanities, alive and kicking. In this edition, there partly was an emphasis on the embedding knowledge and resources from beta sciences and this was proposed as one reason for the current successes within the field. The search for scientific relevance in the humanities has been an important topic of discussion since the quest for valorization that is a result of the economic crisis and the subsequent budget cuts and reorganizations in scientific institutions.

The leading idea is that technology and the widespread accessibility of information should play in integral part in social sciences. Teaming up with beta sciences and information scientists in particular is a start for this scientific revolution. This idea is not new and a relevant question is why the humanities are late to the game. The panel discussion exposed either a lack of interest, understanding or both.

While there is much ground to cover, we believe that our project, INVENiT, is a good example of how close interaction between the alpha and beta fields, and collaborative research goals leads to a fascinating new research approach and holds great potential in acquiring new knowledge and insights. The reactions we received indicate that our brief presentation already inspired people in different fields and from backgrounds to rethink how technology and information can influence their research efforts. Take a look at the presentation below and if you have any questions, please get in touch.

Links

Meeting Rijksmuseum PK-online

On 16 December 2013 we had a meeting with Rianneke van der Houwen at the Rijksmuseum. In this meeting we learned a lot about the digitalisation of the Rijksmuseum prints and especially the annotating process. The whole digitalisation program started six years ago with the demand of the Dutch government to digitally register the objects that are in possession of a museum. At this moment 180.000 prints are digitalised and in the current pace the whole collection will be finished around 2027.

The most important part of this meeting was our visit to the annotators-office. We got the chance to have an inside look into the annotating process. This is especially valuable for our project because now we have a better understanding of the kind of information that actually gets digitalised. First of all there is the software that is used for annotating. This is a program called Adlib. With this program the annotators can mention who created the print, where it was made, what is depicted etc. Adlib also gives them the opportunity to create references to other sources, like Wikipedia or STCN (Short-Title Catalogue Netherlands).

IMG_0039 edited

The annotating process is, of course, not perfect. Time restraints especially limits the information that gets annotated from a print. Most annotators have a certain minimum of prints that they need to annotate, this limits the time they can spend on a single print. The negative consequences of the time limit is that the annotation of a print with a lot of different icons will not mention most of these icons. Usually an overarching word is used for a group of icons, like the name of a certain game board instead of all the icons on the board.

The information that gets annotated is also dependent on the annotator. What he or she perceives as the most important part of the print will get annotated, while other parts will not. One of the problems with this is that you can never know what people want from a print. So the subjectivity severely limits the search possibilities of the print database. The expertise of the annotators limits this problem to a certain extent. They’re knowledge gives them the opportunity to look at what was probably important to the creator of the print. Symbolic figures and their role in the print for example will be noticed by the annotator. In this way important icons from our perspective on the creator will almost always be annotated. But this still leaves out the icons that the annotators now perceive as unimportant.

It is good to see that there is some basic information that almost always gets annotated if it is known. This is information about the year it was created, the creator and the overall subject. To get this common information available in a good and clear way for anyone searching the Rijksmuseum online archive is just one step. Down the line it would be great to get really specific information of a print available for someone searching for it. This would be especially valuable for professional (art-) historians. As we have seen there are certain limits on achieving this with the annotating process. Still there is a lot of information being annotated and getting this available for visitors of the Rijksmuseum online prints collection is what we want to achieve.

Delpher

The Koninklijke Bibliotheek launched Delpher, a search engine giving access to millions of historical text resources, varying from magazines, newspapers and books. The data is available via a public API and can also be downloaded as a complete dataset.

To us, Delpher is a great example of the opportunities that lie in making data publicly available and serves a wide variety of users, from researches in historical fields (art history, general history, anthropology, etc.) but also a more general public who would like to find something in the past from their own lives.

It is a tremendous effort to digitize, store and make available these enormous amounts of information. To make it accessible without annotation, Delpher uses Optical Character Recognition to build full-text indexes.

While the technological feats are great, it is important to be critical as well. First, Delpher states it chooses quantity over quality and says the technology they use is not yet capable of precise OCR, let alone, recognizing context or meaning. How these challenges are currently addressed is unclear.

Playing around with Delpher quickly shows it is slow. This can be fixed by both upscaling resources and using different/better search algorithms. While Delpher is not open about the both of them, it is unclear which technique is more profitable, in search times and investment cost.

A quick test also shows that Delpher can return many results. Querying Philips returns almost a million results. It is doubtful that these are all relevant and in such an example, without filtering, prioritizing and ranking results the search engine becomes difficult to use and understand.

We also argue that the interaction and presentation of the front end is not very modern and lacks a general UX quality. The aim of the project is to make historical texts available for the public and with that comes the responsibility to make the date consumable.

At this time, the website still caries the BETA label and we are curious to see improvements over time. Delpher is a great tool for anyone interested in historical text and for data scientists. Head over and take a look at http://www.delpher.nl.

PiLOD 2.0

PiLOD 2.0 is the second iteration of the Linked Open Data think tank, a joined effort to make government data open and linkable by addressing legislative en technical problems. These issues are addressed in the form of seven cases, each of them focussing on a different aspect of the problem space and solved by experts form public and private institutions. The cases look at many aspects of LOD, for instance provenance of laws and court rulings, valorization of publicly funded scientific efforts within the LOD community, awareness of ministers and legislators, etc.

The case Frontend, initiated by Waag Society, is particularly interesting as it has similar objectives to what we are doing for Invenit. Waag has recently published a beautiful interactive map with information about all buildings in the Netherlands colored by their age. The map data comes from the Kadaster and is made available via the pubic CitySDK. We urge you to take a look if you haven’t already.

The example objectifies our mission; making linked date useful. Of course we can think about what usefulness means and that it depends on many aspects of an application, such as the nature of de data, the audience en the goal of a project, but the example shows us that good application design where LOD is an integral part of the design and development process, can lead to stunning results.

We are very eager to see if there is enough common ground to team up with Waag and others in their case and see if we can use the Rijksmuseum collection to prove to the world the benefits and potential of LOD. The next PiLOD meeting will be held at the VU on January 29th (subject to change) and might be the perfect opportunity to get better acquainted with the other parties and the project.

Trough this way we also like to thank the NWO for hosting the event and to the other case leaders and speakers for their inspiring talks and demonstrations.

The PiLOD project has a website that is publicly accessible for anyone interested and has a newsletter that is send out by Geonovum, one of the more visible participants.

Welcome

Humanities researchers depend in their research on the efficiency and effectiveness of the search functionality provided in various cultural heritage collections online (e.g. images, videos and textual material). The Rijksmuseum Prints collection, with over 600.000 prints, is one of those collections. The search results within the current search implementation is primarily centered around information of individual objects. Humanities scholars, on the other hand, search for more complex results based on deeper reflection over clusters of artifacts and concepts. This project aims to answer to the demands of these scholars. In order to do this we need a novel semantic search approach.

In this project we hope to generate this new search approach through the clustering of search results based on semantic patterns in linked cultural heritage data.  We will be able to rank the semantic patterns by their importance for each specific art-historical genre. We believe that, in this way, we will ultimately help in providing the necessary functionality for the formulation, refining and answering of humanities research questions.

Public kick-off embedded research projects

In order to get more acquainted with other projects that combine the humanities and computer sciences, we decided to visit the Public kick-off embedded research projects, which hosted a lot of projects doing just that.

We were welcomed by Rens Bod, Professor of Computational and Digital
Humanities
at UVA, who claims that the humanities are ‘booming’ in ICT research. Our own project and the projects shown at this kick-off, certainly shows that this claim isn’t an empty one. One of the main aims of these projects, again according to R. Bod, should be to make concrete products that actually lead to something practical. Implicit on the background was valorization; the research done should lead to commercially viable products.

The valorization aspect, and more general the expected results of the interdisciplinary projects, wasn’t always clearly represented in the projects. This could have been the result of the early stage in which the projects are still in. Another reason could be that the aim of some of the projects seemed primarily to lead to results for the humanities , but not so much for ICT, or the other way around. With respect to our own research it seems important to me to keep this in mind. The research should lead to valuable information or products for both the disciplines. We should also emphasize that the goal of the interdisciplinary aspect of this project is to complement each other in order to get more valuable research results.

Some projects stood out in their relevance for our own project. Crowdsourcing for cultural heritage is one of those projects. Especially interesting for our project was the focus on ‘discovering and describing the best practices for crowdsourcing projects conducted by cultural heritage institutions.’ This includes targeting relevant information in order to annotate items in online collection. An aspect that is especially important in our research. Another project was interesting because of their good combination between history and ICT: Sailing Networks: Mapping Colonial Relations with Suriname’s seventeenth-century sailing letters. The balance between the humanities and the ICT division was especially good, something that we also strife for.

All in all the many projects that were presented today are a good example of the increasing cooperation between the humanities and the computer sciences. Hopefully the cooperation of these disciplines will also be evident in the results of these projects. One of the goals of our project is to do just that.