Is the data enriched automatically?
Some of the collections in the Media Suite have been enriched automatically, via Optical character recognition or by Automatic Speech Recognition.
Automatic Speech Recognition (ASR)
ASR is a process applied to recorded, digitized audio materials to convert the audio signal into a textual representation. In CLARIAH WP5, this process is being applied to the entire audio-visual collection of The Netherlands Insitute for Sound and Vision (NISV).
From Version 3 (launched in July 2018), the textual outputs are made available for search and interactive navigation of the audio (radio and television) resources.
The Most complete, up-to date automatic speech recognition reports (and other statistics of the NISV collections) are available at the NISV collection statistics website .
Here you will find dynamic and updated reports of the progress of ASR as it is being processed (e.g., as in the screenshot below). The numbers correspond to the entire collection of NISV, showing the amount of digitized items with a carrier and, from those, the amount of items that have ASR.
- At this moment, the timeline charts start in year zero. This will be fixed during spring 2019.
If you would like to use these graphs for a publication, you can use the
direct link to the graph
, zoom in to the timeline, and use the camera icon above the graph to download the result. Please don’t forget to cite the chart properly by including:
- Description of the image or title of the image (as it appears in each graphic’s caption, or adding more detail if necessary to interpret the graphic)
- Publisher: The Netherlands Institute for Sound and Vision
- Editor: Mari Wigham and Willem Melder
- Edition or version (charts in this website are updated automatically, thus, the date of update is the same as the date of download/copy)
- Access information (Website’s URL and/or graphic URL)
Other sources of information about ASR processing :
- Overview: in the release notes, per version, see Release notes Version 3 .
- Paper on Speech Recognition and Scholarly research
- In the Collection Inspector tool , adding the Audio-visual collection of Sound and Vision, and inspecting the fields with the label ASR (use the search box).
Optical character recognition (OCR)
This type of automatic enrichment is available for these collections:
- The Newspaper collection of the National Library of The Netherlands
The Desmet paper collection provided by the Eye Film Museum , enriched in collaboration with CREATE (University of Amsterdam)
(Last update: April 4, 2019) . If you have any questions, please contact us.