Choosing a document typology is always a conundrumdocument types · vocabulary · coar · metadata
As reflected in the title, it is not a simple job to define a document typology for a library platform. This article gives a very brief overview of the problem and the choices made for SONAR.
Why a document typology
As always, we have to start the analysis with the end users (back to our personas!), in order to define the objectives of the typology.
- All user groups, but especially students and researchers, will use types for discovery, for example to refine search results or identify with precision a specific document.
- Librarians and more broadly information specialists will use a typology to monitor a repository (OA rate, content and consultation statistics, etc.)
Monitoring purposes become more and more important, particularly with the Swiss national strategy on Open Access and its action plan. This document states that at least books, book parts and articles should be monitored.
Comparing uses and standards
As good professionals of the library domain, we analysed existing standards.
Having a look at the sister project RERO ILS using RDA (Resource Description and Framework), we observed that this standard recommends to use and combine 4 different typologies to determine a document type:
- content types: 23 values possible
- media types: 8 values possible
- carrier types: 56 values possible
- mode of issuance: 4 values possible
Theoretically, 5’152 possible document types can be defined with RDA (not considering that multiple values can be used per category!), but this doesn’t even allow to identify a thesis or to distinguish a book chapter from a journal article. RDA is therefore not adequate to the target audience.
Another standard typology, this one specifically dedicated to institutional repositories, is proposed by the OA repository association COAR. This vocabulary is proposed in RDF with persistent identifiers, mappings to other vocabularies and labels in about 15 languages, including the three official Swiss idioms. It includes a list of 71 controlled values, designed in the form of a small thesaurus (hierarchy with broader and narrower terms as well as synonyms).
The COAR vocabulary has a good coverage of our needs. To be sure of that, we established a comparison with the document types used in some of the Swiss OA repositories (Arodes, BORIS, EDOC, EPFL, ETH, RERO DOC, UNIGE, Zora and of course COAR). You have an overview of it in the picture below. Please note that this is a working document!
You can also download this comparison in PDF.
Planned typology for SONAR (at this stage of the project)
Below is represented the typology adopted for SONAR, inspired from the COAR vocabulary and consisting of 14 main types. At the metadata creation, the user selects a type and, where applicable, a subtype. In the discovery interface, a facet can display the types either in two levels (as currently in RERO DOC) or also in a simplified way all in one level (as in RERO Explore).
From the COAR vocabulary, only a subset of the types have been selected, on the one hand to have a simplified and only two levels typology, and on the other hand to use only relevant types. Some values, marked with a “(S)” for “SONAR”, are not part of COAR but can be easily mapped to a corresponding COAR broader value.
|2. book part|
|3. conference object||conference paper
conference paper not in proceedings
conference poster not in proceedings
|4. contribution to journal||data paper
journal article (default)
|7. non-textual object (S)||moving image
|11. report||internal report
other type of report
report to funding agency
research report (default)
|12. thesis||bachelor thesis
habilitation thesis (S)
advanced studies thesis (S)
|13. working paper|