Digital humanities and the technologies of the semantic web: decolonizing description for the sake of digital humanities

Organizers: Marijana Tomić, Manuel Burghardt, Mirna Willer, Anne Gilliland & Gordon Dunsire

In cooperation with: Theresa Zammit Lupi

Despite many efforts to standardize description of manuscript and early print or at least to attain an agreed on best practice list of data elements, the attempt cannot be fully achieved though as almost each manuscript, as well as early print, due to its uniqueness, requires some specific element of description. Being produced by human hand rather than by machine, almost any manuscript possesses some unique characteristic. These characteristics are typically described as intently as ordinary manuscript characteristics, as they usually witness tradition and culture of the author, scribe, reader, scriptorium, place of production, etc. And it is usually this data which the humanities researcher needs for their research, i.e. they interpret the material, explain it knowledgeably and draw conclusions about it. Also, it can change or add to the end users’ perception of a particular period, author, scribe, nation, etc. Therefore, the description of this kind of material results in complex set of data elements, and it requires, apart from the skills in bibliographic description, skills in palaeography, codicology, history, literacy studies, etc. Typically, these descriptions are characterized by local rules and by diverse levels of detail.

The problem of standardized manuscript and early print description was introduced mostly in the advent of machine-readable cataloguing, and it increased with the massive digitization of manuscripts and early print projects. The introduction of semantic web technologies and the concept of OWA (Open World Assumption) added another layer of complexity, but also of possibilities.

The workshop is based on the following assumptions:

(1) The change of heritage institutions’ user needs coming from humanities research has resulted in paradigmatic shift at the advent of digital humanities due to the advanced use of information technology, quantitative research, project research with remote scientists from different disciplines involved, re-use of research data, etc.

(2) Manuscript and early prints held in heritage institutions (with emphasis on library and archival collections) are of special interest for humanities research, and as such, description of that material should serve to humanities researchers (in terms of data elements and level of description, as well as access points) in order to enable searching of the collection and to provide enough information for identification of items of interest (whether previously known or unknown).

(3) The prerequisite for that is identification of features of interest of humanities researchers (with regard to the diverse nature of research studies in humanities and, accordingly, diverse interests of researchers), using verified vocabularies (the diverse nature of research is reflected by diverse vocabularies in use, and thus collaborating systems have to accommodate the use of those multiple vocabularies), and mapping those features into metadata element sets used for bibliographic and/or archival description of manuscripts and early prints. A close cooperation between librarians and archivists at one side, and humanities researchers on the other side is needed in order to achieve a standardized or at least harmonized and meaningful description of that material.

(4) Semantic web environment requires the identification and representation of data sets and their vocabularies within the RDF environment. It therefore requires high level of standardization of description, and promotes harmonization of vocabularies used for descriptive purposes by different communities.

The goal of the workshop is to:

(1) Discuss the main questions concerning description of manuscripts and early prints

a. Existence of (inter)national standards, guidelines, prescriptive lists, cataloguing codes, etc., for manuscripts and early prints description

b. Vocabularies (ontologies) used for manuscripts and early prints description and possibilities for its harmonization and use by different communities

(2) Discuss the representation of manuscript and early prints bibliographic data as linked data

a. Interaction of the manuscript community with the semantic web community

b. Possibilities of using local description schemas and linking them to the global ones (if there are such, or at least to ones that are considered best practice)

(3) Discuss the implications of the information multiverse for bibliographic and archival information

(4) Discuss the problems of research data in humanities research and the concept of Open World Assumption (OWA): manuscript and early print description has to consider OWA in that: “The absence of a statement is not a statement about absent data; the data may be stated elsewhere or at another time,” and that “the absence of a statement is not a statement of non-existence”, and specifically:

a. features of humanities research that distinguish it from other sciences, one of them being the difference of opinion, while in other sciences everything is encouraged to be true of false

b. Open World Assumption of linked data concept and technology: where data from different sources, viewpoints, world assumptions and values are linked/juxstaposed so the user can form his/her opinion. Building a corpus for humanities research, as well as research itself occupy a long period of time, therefore any complex bibliographic interrelations between the published and unpublished texts should be organized and described.

Case study will be used to explore how to use RDF to describe manuscripts and early prints in multiple contexts from multiple perspectives.

Participants will be encouraged in the workshop call (see below) to bring their own examples relating to describing manuscripts, early prints and/or archival records in multiple contexts from multiple perspectives to be raised during the discussion. Short workshop presentations will be included in the appropriate topic.

The workshop will be divided in two parts:

(1) Discussing manuscripts and early prints description

a. Presentations from the point of view of information sciences (bibliographic description, standards, metadata) and humanities (needs of users of heritage collections from humanities)

b. Description of and access to heritage collections: discussion with the goal to compare description principles, standards and guidelines used for manuscripts and early prints description

c. Description of manuscripts (codex or archival document) using different vocabularies or metadata element sets[1]

d. Closing discussion with conclusions on standards used for manuscripts and early prints description: mapping and/or alignment – harmonization

(2) Semantic web: publishing vocabularies in OMR (Open Metadata Registry)[2]

The final part of the workshop will be discussion on workshop results.

Topics for the call for short workshop presentations:

Ø Bibliographic information organization in the semantic web: bibliographic models, standards and practices

Ø The local and the global, the global and the local: bibliographic and technical methodologies

Ø Ontologies for humanities research in the semantic web

Ø Digital Humanities Research Across Disciplines and Cultures

Ø Research data and digital humanities

Ø Bibliographic information organization and digital humanities

Ø Semantic web and digital humanities

Ø Heritage collections and digital humanities research

Ø Description in liminal spaces: subverting systems to liberate description

Ø Conceptual modelling for cultural heritage and recordkeeping

Ø Moving across diverse standards

Ø Human rights and description: exposing archives, exposing people

Ø Archives, libraries, museums integration through metadata

Ø Building and discovering/uncovering virtual collections

Ø Decolonising international/national standards

Ø The role of independent/community/radical descriptive practices

Ø Theoretical foundations and possible new paradigms in information sciences

[1] Examples can be taken from:

(1) RBMS Controlled Vocabularies: Controlled Vocabularies for Use in Rare Book and Special Collections Cataloging (developed and maintained by the Bibliographic Standards Committee of the Rare Books and Manuscripts Section (ACRL/ALA). The thesauri provide standardized vocabulary for retrieving special collections materials by form, genre, or by various physical characteristics that are typically of interest to researchers and special collections librarians, and for relating materials to individuals or corporate bodies,

(2) Glossary of BL Catalogue of illuminated manuscripts (Michelle P. Brown, Understanding Illuminated Manuscripts: A Guide to Technical Terms (J. Paul Getty Museum: Malibu and British Library: London, 1994),

(3) Vocabulaire codicologieque,

(4) Theresa Zammit Luppi: Terminology for music manuscript codicological description

(5) TEI Manuscript description