Choosing a document typology is always a conundrum

           · · ·

As reflected in the title, it is not a simple job to define a document typology for a library platform. This article gives a very brief overview of the problem and the choices made for SONAR.

Why a document typology

As always, we have to start the analysis with the end users (back to our personas!), in order to define the objectives of the typology.

Monitoring purposes become more and more important, particularly with the Swiss national strategy on Open Access and its action plan. This document states that at least books, book parts and articles should be monitored.

Comparing uses and standards

As good professionals of the library domain, we analysed existing standards.

Having a look at the sister project RERO ILS using RDA (Resource Description and Framework), we observed that this standard recommends to use and combine 4 different typologies to determine a document type:

Theoretically, 5’152 possible document types can be defined with RDA (not considering that multiple values can be used per category!), but this doesn’t even allow to identify a thesis or to distinguish a book chapter from a journal article. RDA is therefore not adequate to the target audience.

Another standard typology, this one specifically dedicated to institutional repositories, is proposed by the OA repository association COAR. This vocabulary is proposed in RDF with persistent identifiers, mappings to other vocabularies and labels in about 15 languages, including the three official Swiss idioms. It includes a list of 71 controlled values, designed in the form of a small thesaurus (hierarchy with broader and narrower terms as well as synonyms).

The COAR vocabulary has a good coverage of our needs. To be sure of that, we established a comparison with the document types used in some of the Swiss OA repositories (Arodes, BORIS, EDOC, EPFL, ETH, RERO DOC, UNIGE, Zora and of course COAR). You have an overview of it in the picture below. Please note that this is a working document!

Table: typology of Swiss IR (working document)

Typology of Swiss IR (working document)

You can also download this comparison in PDF.

Planned typology for SONAR (at this stage of the project)

Below is represented the typology adopted for SONAR, inspired from the COAR vocabulary and consisting of 14 main types. At the metadata creation, the user selects a type and, where applicable, a subtype. In the discovery interface, a facet can display the types either in two levels (as currently in RERO DOC) or also in a simplified way all in one level (as in RERO Explore).

From the COAR vocabulary, only a subset of the types have been selected, on the one hand to have a simplified and only two levels typology, and on the other hand to use only relevant types. Some values, marked with a “(S)” for “SONAR”, are not part of COAR but can be easily mapped to a corresponding COAR broader value.

Type Subtype
1. book
2. book part
3. conference object conference paper
conference paper not in proceedings
conference poster
conference poster not in proceedings
conference proceedings
other (S)
4. contribution to journal data paper
journal article (default)
review article
other (S)
5. dataset
6. lecture
7. non-textual object (S) moving image
still image
cartographic material
musical notation
8. patent
9. periodical journal
10. preprint
11. report internal report
other type of report
policy report
project deliverable
report part
report to funding agency
research report (default)
technical report
other (S)
12. thesis bachelor thesis
doctoral thesis
master thesis
habilitation thesis (S)
advanced studies thesis (S)
other (S)
13. working paper
14. other