Making digital scholarly editions based on Domain Specific Languages

Authors

Simone Zenzaro
Institute for Computational Linguistics “A. Zampolli”
https://orcid.org/0009-0009-4361-345X
Federico Boschetti
Institute for Computational Linguistics “A. Zampolli”
https://orcid.org/0000-0002-7810-7735
Angelo Mario Del Grosso
Istituto di linguistica computazionale Consiglio Nazionale delle Ricerche
https://orcid.org/0000-0002-4867-6304

Synopsis

Over time textual scholars have refined the methods to represent the codicological, paleographic, philological, and other aspects relevant for the study of documents (i.e. material objects) and texts (i.e. immaterial entities). According to a general trend observable along the last four centuries not only in the STEM disciplines but in every domain of knowledge, the specific languages adopted by the scholars to represent the objects of their studies evolved improving both in precision and concision. It suffices to compare critical apparatuses sampled in a wide temporal span for a quick verification. 

Indeed, it is surprising that in the digital age the collective effort of the scholars to optimize the representation and the transmission of their domain-specific knowledge has been penalized and verbose solutions (e.g. through XML encoding) or, on the contrary, non-verbal solutions (e.g. through GUIs) have been adopted. 

The classical scholarly practices represent a valuable synthesis of centuries of knowledge in specific domains, so it is paramount to preserve such standards. 

Another relevant aspect is the ability to endow the scholars with a methodology that retains and expands all the expressiveness needed to deal with the text challenges. 

The digital counterpart has also produced and established standards. 

The methodology based on Domain Specific Languages (DSLs)1 requires the definition of a formal language derived from the well established ecdotic practices that are already a set of editorial conventions and convey the analytical representation of the information in the text. For example, critical apparatuses are already a quasi-formal domain language and are therefore suitable for the definition of a DSL via a context-free grammar. 

The next step is to feed a rich text editing tool with the DSL in order to enable the corresponding language interpretation. The result is to provide scholars with a reusable and modular computer-assisted environment that eases the creation and analysis of the scholarly edition. At the same time, computational functionalities empower the process with multimodal search, classification and prediction strategies of philological phenomena, consistent and systematic coherence checks of the editorial conventions and errors, analysis and recall of information deduced from the context or from external sources (e.g. vocabulaires and corpora) via machine learning algorithms, etc. 

Moreover, a fully collaborative environment allows scholars to contribute to an ongoing cooperative edition. In this context it is possible to widen the access to the text to scholars, students, practitioners, and volunteers. 

Finally, this approach ensures the compatibility with the standards accepting toward and producing from the DSL a compliant representation of the edited text that can interoperate with the digital humanities community and the galaxy of related tools. 

The DSL-based methodology is well known and exploited mostly outside the scholarly 

editing domain. Being a formal language, a DSL has its roots in the language theory and the first attempts saw the effort to use them to describe natural languages. That path has been proved to be infeasible due to the ambiguity of natural languages but this is not the case with the philological domain. The markdown language is an example of a commonly used DSL, but its scope is a general purpose description of the structure of a document. Thus it is not meant to describe philological textual phenomena. Leiden+2, instead, is a good example of the application of a DSL in the domain of traditional papyrology conventions. 

Adopting a DSL for the scholarly editing process allows the philologist to remain close to the classical practices while enabling the possibility to improve the process with the digital capabilities. The only constraint enforced by this approach is the ambiguity elimination. 

 

1 For further details see S. Zenzaro, A. M. Del Grosso, F. Boschetti,,G. Ranocchia. 2022. “Verso la definizione di criteri per valutare soluzioni di scholarly editing digitale: il caso d’uso GreekSchools”, in proceedings of AIUCD2022. Lecce. DOI:10.6092/unibo/amsacta/6848 

2 Cfr https://papyri.info/docs/leiden_plus 

Downloads

Published

April 29, 2025