All posts by Cristina Bucur

INVENiT² project presentation for the “Digital Humanities” course

On Tuesday, the 11th of April 2017, Cristina-Iulia Bucur, one of the previous academy assistants for INVENiT² gave a presentation about the project during the “Digital Humanities” course at the Vrije Universiteit Amsterdam. The presentation was titled “INVENiT II – New ways of opening up cultural religious heritage” and briefly described the experience of the project team throughout the period in which the project ran.

The presentation focused on why it is important to link data to research as a way to better support scholars and researchers in the humanities field, with emphasis on the cultural religious heritage at the UBVU, the University Library of Vrije Universiteit Amsterdam. The workflow needed for this and the steps that were taken during the project were briefly described. First, two 18th century illustrated bibles from the “Special Collections” of UBVU were digitized, then Linked Data was used as a framework to publish this (meta)data and also link individual prints to bibles. Next, various crowd- and nichesourcing events were organized to further annotate and enrich the data about these prints. Finally, the new information was incorporated into the UBVU system. This way, the biblical prints have been enriched with new information and can better support the research of scholars.

The presentation slides can be found below.



SIKS Course – Advances in Information Retrieval

On the 18th and 19th of June Cristina attended a course titled “Advances in Information Retrieval” organized by the Netherlands Research School for Information and Knowledge Systems (SIKS) in Vught, The Netherlands.

During the first day of the course, Theo van der Weide, professor at Radboud University Nijmegen, presented some basic approaches to Information Retrieval (IR), discussed about IR classic models (boolean, vector, probabilistic) and some computational aspects of these. He ended with implications of Big Data and with examples on how numerical stability while operating on large datasets is important. Next, Evangelos Kanoulas, assistant professor in computer science at University of Amsterdam, gave a presentation about search engine evaluation, focusing on two strategies of evaluation of search engines: one based on offline collected data and the other on online in-situ evaluation methods. He also described how to build and use a benchmark collection to evaluate the quality of a search algorithm and also other evaluation techniques like A/B testing and interleaving.

Prof.dr.Theo van der Weide: Foundations of IR
Theo van der Weide: Foundations of IR

In the second day of the course, Suzan Verberne, a professor at University Nijmegen, and researcher in the fields of Natural Language Processing (NLP) and Information Retrieval (IR) gave a presentation about ways to model user behaviour and how to use these models for simulating user interaction such that this can be used to evaluate IR methods.

Presentation Suzan Verberne
Presentation Suzan Verberne

Finally, Djoerd Hiemstra, associate professor at the University of Twente, presented basic techniques for indexing very large document collections like inverted files, index compression, top-k query optimization techniques. On a practical side, attendees had to estimate indexing and processing times for these techniques. In the end, students were able to have not only a better understanding of the scale of the web, but also on how to build a large-scale web search engine. It was a crash-course on how to build a small scale Google in one afternoon.

Cristina would like to thank Lora Aroyo and the VU University Amsterdam for making it possible to attend this course.

Link: SIKS course – Advances in IR


INVENiT² team inspired by workshops

During the last two months, members of the INVENiT team participated in workshops that were inspiring for the work of the project. The workshops attended dealt with knowledge organization systems in general and with one in particular, an Early Modern bibliography designed by the Royal Library (KB). These two workshops are briefly presented below.

    1. KnoweScape Workshop, March 4-5, 2015 Amsterdam

On March 4 to 5, Cristina attended a workshop titled “Evolution and variation of classification systems – KnoweEscape workshop” organized by the eHumanities group of KNAW – the Royal Netherlands Academy of Arts and Sciences. During the two days of the workshop, there were various presentations dealing with how knowledge organization systems (KOSs) are represented, how classification is made, and how KOS changes through time. The main theme for discussion throughout the workshop was how to create a Metadata Observatory where an atlas of ontologies, KOS and metadata schemes can be included and searched in all their versions. This is an ambitious goal as, until now, the current KOS landscape is comprised of isolated and heterogeneous systems that have different versions, are not linked together and are built for various scopes. The Metadata Observatory would allow the creation of not only a space where knowledge organization systems can be compared between one another throughout time but also accessed and searched in an unitary manner.

The keynote speaker of the workshop was Joseph Tennis from the University of Washington, Seattle, who gave an interesting talk about the various KOSs and their stability, mentioning as an example the evolution of the Dewey Decimal System – the world`s most widely used library classification system. He emphasized that by looking into the different evolution of KOSs, this would help in the understanding and future building of organization systems. Richard Smiraglia, from the University of Milwaukee, Wisconsin, and one of the leaders in the field of Knowledge Organization, presented a few empirical methods for knowledge evolution across KOSs as a way to understand information systems better.

Aida Slavic from the UDC Consortium, gave a presentation about how KOSs can be managed considering the evolution of concepts and their representation in the particular case of the Universal Decimal Classification (UDC) system. Valentine Charles from Europeana talked about the special case of the Europeana portal as a way of linking cultural heritage with KOSs, while Toby Burrows from the University of Australia gave an insight into HuNI, one of the biggest Australian cultural heritage database. Albert-Merono-Penuela (Data Archiving and Networked Services – DANS, VU University Amsterdam) had a presentation about KOS versioning based on his work on Dutch census data and Paul Groth from Elsevier underlined in his presentation some strategies on how data analysis can be performed in a rapidly changing environment, giving as an example databases from the field of bioinformatics. Almila Akdag Salah from the eHumanities group of KNAW gave an extensive and very graphic presentation on how classification systems can be represented and viewed in the form of different knowledge maps. The workshop ended with a presentation by Christophe Gueret (DANS, VU University Amsterdam) on WWW standards on publishing web data.

Andrea Scharnhorst from DANS and the eHumanities group presided over the entire workshop introducing speakers, summarizing ideas and creating an environment fruitful for discussions.

    2. STCN Workshop, April 13, 2015 The Hague

On April 13, Cristina, Leon, and Inger attended a workshop at the Royal Library of the Netherlands (KB). The renewed Short Title Catalogue of the Netherlands (STCN) formed the central theme of the workshop. The STCN is a so-called retrospective bibliography of books that were written in Dutch and/or published in the (Northern) Netherlands between 1540 and 1800. First off, Els Stronks and Marieke van Delft introduced the goals and the new features of the STCN. The renewed STCN is based on the Resource Description Framework (RDF) model. Juliette Lonij explained the basics of RDF models in general.

During the rest of the morning the participants tested the new STCN interface. Peter Boot gave an instruction on how to use the interface and guided the participants through different SPARQL exercises. The exercises included linking authors/publishers to titles of their works, grouping and ordering titles chronologically, and retrieving various forms of numerical information. At the end of the day, Leon gave a pitch about INVENiT. He showed the possibility of recreating a book historical context by combining the STCN and the Rijksmuseum dataset.

Pitch presentation of INVENiT² by Leon at KB
Pitch presentation of INVENiT² at KB

Both workshops were helpful and inspiring allowing the team to communicate with experts in the field and to gain precious knowledge that can help in the development of the project. The workshop about KOS proved to be very useful in the Crowdsourcing Launch of the project on April 1, while in the following weeks, the academy assistants will implement the knowledge they gained during the STCN workshop to solve search queries that may be important for the progress of INVENiT².


  1. KnowEscape Workshop:
  2. STCN workshop: