Tutorial: Sharing Media Suite annotations following the FAIR principles

Tutorial description, case, and objectives

This tutorial will introduce how to share and contextualise annotations created in the Media Suite in a way that follows the principles of FAIR (Findability, Accessibility, Interoperability, Reusability). The FAIR guiding principles emphasise “enhancing the ability of machines to automatically find and use data, in addition to supporting its reuse by individuals” (Wilkinson et al., 2016). FAIR must be seen as a way of thinking, a consensus, and does not make suggestions for specific technologies or solutions. Instead, it acts as a direction to help you evaluate your choices to make your research, data and datasets as accessible as possible to others.

The FAIR guiding principles have become more and more prominent within various academic disciplines, and also within the digital scholarship. FAIR is also relevant to the CLARIAH Media Suite. As Susan Aasman et al. explain in their essay on the video annotation tool of the Media Suite, “[t]he growing importance of digital research infrastructures, archives and tools, has enticed media historians to rethink their research practices more and more in terms of methodological transparency, tool criticism and reflection.” This includes the data you use for your research, and consistently and comprehensively organising this data to make them as accessible as possible.

Due to the way the Media Suite is organised and because of privacy reasons, it won’t be possible to automatically create and share datasets to such an extent that it can fully be called FAIR. However, it is still desirable to make your annotations as FAIR as possible, especially if you want to export your research data to another environment. Moreover, sharing your dataset outside the Media Suite is essential for making your dataset FAIR as it enables others to reuse and/or enhance your dataset. This tutorial focuses on the steps you need to take towards making your annotation data FAIR and options to consider. Finally, the tutorial covers lists of repositories and thesauri that may enable you to make your dataset as FAIR as possible by following established standards used by a diverse set of research communities, including the pros and cons to help you choose the most fitting options for your particular dataset.

Objectives

This tutorial functions as a tool to assess your dataset in light of the FAIR guiding principles and to help you make annotations accordingly.

Types and levels of teaching and research

This tutorial supports sharing and contextualising annotations and research data. In general, the tutorial may be used in any field where building, annotating, and linking corpora consisting of different media items are necessary skills, such as qualitative analysis research as well as data-driven approaches to media analysis. Moreover, this tutorial is also useful for students and researchers who want to familiarise themselves with the ethics and contemporary discussions within the digital humanities concerning data management and stewardship, with the CLARIAH Media Suite as a case study.

Prerequisites

This tutorial is suited for advanced users with experience in working with the CLARIAH Media Suite’s annotation functionalities. It is recommended to be familiar with the annotation workflow before following this tutorial. If you’re not familiar with the Media Suite’s annotation functionalities, you can consult the tutorial on this topic here .

Steps: How to make your data FAIR

Introduction: What is FAIR?

As previously mentioned, FAIR stands for Findability, Accessibility, Interoperability, and Reusability. It was first introduced by Wilkinson et al. in 2016 and has since been endorsed and adopted by many data stewards, specialists, researchers, and other stakeholders. How does FAIR data stewardship influence user-generated annotations in the CLARIAH Media Suite? Within CLARIAH, the FAIR principles have been used in several research projects, such as the PLAYFAIR and Fair Photos (See https://www.clariah.nl/nl/projecten/fair-data-for-historical-games and https://zenodo.org/records/8096991.). In these projects, the focus lies on re-use and connecting datasets. Within the CLARIAH Media Suite, the FAIR principles are also applicable but the focus lies on generating FAIR (meta)data. In this part of the tutorial, we will take a look at what the FAIR guiding principles specifically mean for creating annotations. To do this, we first need to take a look at the kinds of annotations you can make, and the different possibilities within them. Throughout this tutorial, several questions will be asked to help you navigate through all the annotation options and choose the ones that fit the specifics of your research.

Creating FAIR annotations

There are four kinds of annotations you can make in the Media Suite: codes, comments, links, and metadata. These kinds of annotations can be made standardised or customised. This section explains the different possibilities within these kinds of annotations. In the steps below, we walk you through how these different kinds of annotations raise questions about how to document and standardize your data.

Codes

Codes stand for “classifications from selected or custom vocabularies”, which means that you can categorise an item or segments of the item according to the specifics of your research. Within this type of annotation, you can choose between so-called vocabularies, which are two thesauri, GTAA and UNESCO, and custom.

In the CLARIAH Media Suite, there are two thesauri you can choose from: GTAA and UNESCO . GTAA stands for Gemeenschappelijke Thesaurus voor Audiovisuele Archieven (Common Thesaurus for Audiovisual Archives) and is being used by several Dutch Archives such as NISV and Eye Filmmuseum. The UNESCO thesaurus is more suitable for interdisciplinary research not specifically tied to audiovisual material or the Netherlands. Depending on the scope and potential uses of your research, you can choose a thesaurus (or both) accordingly.

Schermafbeelding 2024-04-12 om 11.35.07.png

Creating annotations that make use of these thesauri has the benefit that your annotations fit a standard format, which makes your data more findable next to other datasets. A downside of using a thesaurus is that the terms might not fit the specifics of your research. For example, film-related terms, such as “production company” and “militant cinema”, have different meanings in other disciplines or due to their niche usage might not always be included in a thesaurus. In this case, customised codes might be a better fit.

If you choose to create customised codes, it is advisable to explicitly mention and explain your reasoning somewhere, such as in a README file. A README file is a file you attach to your dataset explaining how you organised your data and what you need to properly read and (re)use the dataset, such as prerequisites, structures, and technicalities, but also your contact information. For an example of what to include in such a README file, click here .

It is also possible to use taxonomies and controlled vocabularies that are shared by others*. You can find these for instance on Onomy (media/audiovisual specific taxonomies) and Linked Open Vocabularies (generic controlled vocabularies). On these websites, you can search for a fitting taxonomy or controlled vocabulary based on subject, discipline, or language. When creating codes, you can use these as a reference to structure your codes coherently. You can explain your reasoning behind your choices in the README file.

*A taxonomy is “a collection of terms that are organized into some structure that provides some semantic understanding of those terms”. ( https://picturepark.com/content-management-blog/best-practices-for-dam-taxonomy-metadata-tags-and-controlled-vocabularies#:~:text=A%20vocabulary%20is%20made%20up,of%20the%20terms%20it%20contains .) A controlled vocabulary is a collection of standardized or agreed-upon terms that are used to identify content.

Example questions to ask about your research:

Does your research contain a specific case study that cannot be defined well with terms from an existing thesaurus? Can you pragmatically use terms from an existing thesaurus to describe your case, or is this not desirable? In the case of the latter, it is better to customise your codes.
Does your research fit best in a Dutch context or is it more internationally oriented? What consequences may this have for how and where you want to share your research data at a later point?
Does your research fit best within an AV heritage context or can you make use of more generic thesauri (for example UNESCO), and what would be the potential drawbacks of the latter?

Comments

Comments are textual comments or notes and are a good place to write down anything that can’t be tied to a controlled vocabulary. The difference between codes and comments is the way they are being ordered within your project. Codes are used to categorize bookmarked items and connect them to other bookmarked items whereas comments are used to write down information specific to the bookmarked item itself. Therefore, codes can be used for a more quantitative approach, such as dividing all your bookmarked films into colour or black and white film. Comments, on the other hand, can be used to write down information specific to the item itself that does not necessarily relate to the other bookmarked items. For instance, you can point out wrong titles or write down questions. Below, you can see the differences between comments and codes:

Schermafbeelding 2024-04-12 om 11.45.13.png

For the clarity of potential users of your dataset, it is advised to still make this section consistent in some way. For example, you can create an overview of how you have been using this comment section. Use terms that don’t have ambiguous meanings and are open to multiple interpretations. Also in the case of comments, a README file can help you specify what these comments were created for.

Example questions to ask about creating comments:

Which information gathered from the item or segment do you want to capture in the comment? Is the comment section the right place for this information or are there other annotation types that would work as well?
Are there terms you could use to write down your comments accurately? If so, what are your definitions of these terms?

Links

Links refer to external web resources, identified by URLs. In the CLARIAH Media Suite, you can make use of two APIs that provide access to URLs from shared vocabularies, namely Wikidata and Europeana . It is also possible to add your own URLs under Custom if you wish to use a different vocabulary. Links to shared vocabularies enable you to describe your dataset with well-defined, shared terms, making it possible to link it to data outside the CLARIAH Media Suite that uses the same terms. Using shared terms helps researchers to find data, link them across collections and reuse them in different contexts, supporting the FAIR principles.

Wikidata is a freely accessible knowledge base that is readable and editable for humans and machines. It is a practical way to add contextual information (for example, persons, organizations, geographical places, historical events) as annotations in a readable manner, as long as there are items about what you refer to. Wikidata is also used for the Sound and Vision Collections and to make connections with it in the Media Suite. In the tutorial on searching and exploring with linked Wikidata , it is explained how LOD can help to identify different entities with the same name and searching inside and outside the Media Suite while using it. For more information on how Wikidata works click here .

Schermafbeelding 2024-04-12 om 12.00.41.png

Europeana is an organisation that “supports the cultural heritage sector in its digital transformation” (Europeana, last accessed on March 8, 2024, https://www.europeana.eu/nl/share-your-data). On Europeana, you can find a variety of online heritage collections from European archives ranging from film to books, arts, photography, music, fashion, and archaeology. Europeana is therefore a great way to link your audiovisual materials to other types of cultural heritage within a European context.

If you want to link the item to a web source, such as a YouTube video, you can select Custom . To avoid long URLs in your annotations, you can give them an alternative name in Label .

Example questions to ask about links:

To what kind of information do you want to link your item to? Which web source would be the best fit according to the descriptions above, WikiData, Europeana, or custom?

Metadata Metadata are metadata fields that you can customise yourself. As the items from the CLARIAH Media Suite contain metadata from the database in which the item is stored, this specific place is for the researcher’s own metadata. Like the comments annotations, it is advised to make this section as consistent as possible and to make clear what purpose the created metadata serve. However, it should be noted that this functionality is very labour-intensive. It is advised to use this function primarily when you need to create a controlled vocabulary from scratch. You can also find controlled vocabularies that might be a good fit with your research on onomy.org/. Onomy is a website “where you can create and share taxonomies, folksonomies, and other forms of controlled vocabularies for use on the semantic web” (onomy.org, https://onomy.org/). Here, you can also share your customised thesaurus once you have created one so other researchers can use yours.

Schermafbeelding 2024-04-12 om 12.17.20-b89616.png

Add your FAIR dataset to a repository

Now that you have documented your data in a manner that takes into account FAIR principles, you can now try to share your dataset according to these principles. Sharing and contextualising your data and datasets is an important part of FAIR; it enables your research to be found and potentially re-used. In this part of the tutorial, two repositories are highlighted that are specifically fitting for a dataset created in the CLARIAH Media Suite. These repositories are The DANS Data Station Social Sciences and Humanities and Zenodo .

The DANS Data Station Social Sciences and Humanities is the central institutional repository for research data in the Netherlands. Adding your dataset to this repository means that it is likely to be grouped together with and connected to other datasets created with the CLARIAH Media Suite and it contributes to creating more transparency around the research practices connected to the Media Suite. As this data station is specifically made for Social Sciences and Humanities, it is most suitable for datasets that fit well within these disciplines. If you need help with depositing your data, you can check this manual .

If your research is more interdisciplinary, you might want to consider Zenodo . In Zenodo, you can either add your dataset to a ‘community’ or create your own. In case your research is more particular and/or has an interdisciplinary approach, this repository might be a better fit for you as datasets from other fields can be added to it. Another consideration for choosing this repository is the scope of your research. As Zenodo is internationally focussed, it is likely that the dataset will be more findable by international researchers if made available via this platfom. You can find a guide on how to submit data on Zenodo here .

If you think your dataset does neither fit the DANS Data Station nor Zenodo, you can also check the repository decision tool created by Utrecht University for trustworthy alternative repositories. With the help of a few questions, this tool helps you find a repository that fits the specific needs for sharing and publishing your data.

Next to depositing your data into a repository, we also advice you to make a JSON file of your dataset. A JSON file is namely a great way to share your data as it is saved in a machine readable format so it can be read by different programs. Hence, you can share your data for further use that is yet unknown at the moment. A tutorial on exporting JSON files will soon be released.

Sources

Wilkinson, Mark D., Michel Dumontier, Ijsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific data 3, no. 1 (2016): 160018–160018.

Overview

Feeling overwhelmed with all the options? You can find an overview of all the annotation options discussed in this tutorial here below:

Schermafbeelding 2024-04-12 om 12.22.14.png

Thesauri

Looking for a suitable thesauri? Here are some thesauri you could consider for your research:

Atria Vrouwenthesaurus https://atria.nl/bibliotheek-archief/collectie/thesaurus/?core=thes&facet.field=%7B!ex%3Dcategory%20key%3Dcategory%7Dcategory&facet.mincount=1&facet=true&fq%5B%5D=&fq%5B%5D=taal%3Aned&letter=&q=&rows=10&sort=keywordSort%2Basc&wt=json&start=0.
Linked Open Vocabularies: https://lov.linkeddata.es/dataset/lov/Links to an external site.
Onomy: https://onomy.org/Links to an external site.
Schema: https://schema.org/Links to an external site.
Network of Terms: https://termennetwerk.netwerkdigitaalerfgoed.nl/Links to an external site.

Tutorial: Sharing Media Suite annotations following the FAIR principles

by Meg Weijers, Universiteit van Amsterdam, April 2024

Media Studies

Search Guide: Researching TV News in the NISV Archive

by Mary-Joy van der Deure, Utrecht University, March 2024

This search guide provides an overview of the types of news available in the Netherlands Institute for Sound and Vision’s television collection, and how these can be found in the Media Suite.

Television Studies

Tutorial: Creating a Data Visualization with Google Sheets and the Media Suite

by Jasmijn van Gorp & Mary-Joy van der Deure, Utrecht University, February 2024

This tutorial centers around the creation of a data visualization in Google Sheets. While creating the visualization, the tutorial lets users reflect upon data in the CLARIAH Media Suite while implementing a criticial data studies perspective.

Television Studies Journalism Critical Data Studies

Tutorial: The Viewing Rates of the Fabeltjeskrant

by Jasper Keijzer, April 2023

This tutorial introduces you to using the CLARIAH Media Suite’s search tool and bookmarking functionality for research with the viewing and listening rates collection. The tutorial specifically introduces you to ways of using the Media Suite that support television history teaching and research that aim at critically evaluating processes of viewing rate research and data driven research in television history.

Tutorial: Searching and analysing the Sound and Vision Radio Collection using Automatic Speech Recognition

by Alexander Badenoch, Utrecht University, February 2023

ASR Search Compare Tool

Tutorial: Segmenting and Annotating Sesamstraat

by Grietje Hoogland, Utrecht University, January 2023

Based on the example of Sesamstraat, the Dutch version of Sesame Street, this tutorial introduces you to ways of using the Media Suite that support archive-based television history teaching and research.

Television Studies Media Studies

Curated Playlist: Finding and Understanding International Radio in the Media Suite

by Mary-Joy van der Deure, Utrecht University, December 2022

Tutorial: Creating an Annotated Corpus of Historical Television Documentaries and Newspaper Articles

by Norah Karrouche, Vrije Universiteit, September 2022

Focusing on the documentary and curatorial work of Dutch architect Herman Haan, in this tutorial you will learn how to compile and annotate a corpus of historical television documentaries and newspaper articles.

Tutorial: Reconstructing the Genealogy of a TV-Clip

by Jasmijn van Gorp, Utrecht University, September 2022

Television Studies History

Curated Playlist: Dutch Illustrated Biblical Youth Television

by Grietje Hoogland, May 2022

This curated playlist offers a selection of illustrated biblical youth television programs, to demonstrate that the level of traditionality depends on the broadcasting association rather than the period of time in which it was made.

Television Studies Media Studies

Curated Playlist: Controversial Youth Television in the Netherlands

by Grietje Hoogland, March 2022

This curated playlist offers an introduction to and selection of some of youth programs that were perceived as controversial by the media, parents of the children watching, political parties or broadcasting associations themselves because their contents were deemed too violent, sexually explicit, blasphemous, insensitive to racial issues, sexist or other.

Television Studies Media Studies

Curated Playlist: Finding Interstitials in the Television Archive

by Jasmijn Van Gorp, February 2022

This curated playlist offers characteristic examples of Dutch television interstitials. It is first and foremost a search guide for users with an interest in the collection of Interstitials at the Netherlands Institute for Sound and Vision. It informs users of the Media Suite on relevant search strategies to probe interstitials. In order to reproduce the query and my search paths, I saved the settings for all my queries and embedded them in my curated playlist. As more items are added to the collection on a daily basis, these embedded queries enable reproducibility regardless of the size of the collection. Moreover, it is the appendix of the book chapter “Interstitial Data: Tracing Metadata in Search Systems” (Van Gorp, 2022), which outlines a method to investigate the role of metadata in search systems.

Television Studies Media Studies

Tutorial: Searching and Exploring with Linked Data and Wikidata in the Media Suite

by Mari Wigham & Christian Olesen, December 2021

In this tutorial you learn how to use Linked Data to refine you search results in combination with the metadata categories “Persons - all”, “Persons - production”, “Persons - guests” and “Persons - subject of discussion”. The tutorial works with linked data from the Netherlands Institute for Sound and Vision (NISV) using the Media Suite Search tool, while also exploring links to Wikidata outside of the Media Suite environment.

Media Studies

Tutorial: Visual Analysis and Historical Source Criticism of Desmet's Film Posters and Business documents

by Klaas de Zwaan & Christian Olesen, May 2021

In this tutorial you will learn how to carry out a source critical analysis of items in and relating to Eye Filmmuseum’s Jean Desmet Film, Poster and Paper Collections. To this end, you will be using the Media Suite’s Search, Segmentation and Annotation features. Working on both internal and external source critical levels, this comprises carrying out a visual analysis of a historical film poster, and contextualizing its distribution and use in the promotion of a film based on business documents and contemporary film journals.

Film Studies

Tutorial: Viewing Rates for Television History

by Jasper Keijzer, April 2021

Television Studies

Tutorial: Searching and Analysing Automatic Speech Recognition (ASR) transcripts as Data Layer in Television Collections

by Emillie de Keulenaar & Liliana Melgar, March 2021

This tutorial introduces you to the Media Suite’s Automatic Speech Recognition search functionality for television studies and/or television history research from a tool and data critical perspective.

Television Studies

Tutorial: Video Segmentation, Annotation and Structuralist Film Analysis

by Klaas de Zwaan & Christian Olesen, February 2021

In this tutorial you will learn how to carry out a structuralist film analysis in the Media Suite using the environment’s video segmentation functionalities, based on principles from film theorist Raymond Bellour’s approach to film segmentation. After completing the tutorial, you will know how to make an overall analysis of the structure of a film by splitting it up into sequences, understood as a series of scenes linked by their content.

Film Studies

Tutorial: Analyzing Dutch Content In English Using Online Translation Tools

by Max Broekhuizen, Erasmus University, December 2020

To facilitate the use of the Media Suite in the international classroom and in international research projects, this page presents an overview of and evaluates the results of using Using Google Translate for creating search queries and translating metadata from the Media Suite. First, Google translate was used to translate search queries from English to Dutch before entering them in the Media Suite search bar. Second, the metadata of collection items findable using English search terms was translated from Dutch to English.

Tutorial: Searching, annotating and linking for film historical research

by Christian Olesen, November 2020

This tutorial introduces you to using the Media Suite’s annotation and linking functionalities in film studies teaching and/or film historical research.

Film Studies

Tutorial: Logging in, Workspace and Creating a User Project

by Christian Olesen, November 2020

This tutorial introduces you to the use of the CLARIAH Media Suite’s work space and guides you through creating a user project. Before using the Media Suite at any level in course work or in research it is essential for you to familiarize yourself with these functionalities of the Media Suite environment.

Tutorial: Finding and Bookmarking Oral History Interviews

by Max Broekhuizen, September 2020

In this tutorial, you will learn how to build a corpus of oral history interviews focussing on a specific theme, using the Media Suite Search tool and User Project functionalities.

Oral History

Tutorial: Searching and Bookmarking for Television History

by Christian Olesen, April 2020

This tutorial introduces you to using the CLARIAH Media Suite's search tool and bookmarking functionality for television studies and/or television history research. Upon completing this tutorial, you will have learned to use the Media Suite for searching and building a corpus consisting of television materials from Netherlands Institute for Sound and Vision's Television Collection and bookmarking and entering items into a personal user project.

Television Studies

Tutorial: Sharing Media Suite annotations following the FAIR principles

Tutorial description, case, and objectives

Objectives

Types and levels of teaching and research

Prerequisites

Steps: How to make your data FAIR

Introduction: What is FAIR?

Creating FAIR annotations

Example questions to ask about your research:

Example questions to ask about creating comments:

Example questions to ask about links:

Add your FAIR dataset to a repository

Introduction: the importance of sharing and contextualising data/datasets

Sources

Overview

Thesauri