
From the beginning of April, the Historical Corpus of Dutch (HCD) can be consulted through , made available by the Institute for the Dutch Language.
The HCD is a diachronic, regionally balanced, multi-genre corpus of written Dutch. It is constructed along three variational dimensions: time, region and genre. Built by researchers from the VUB and Leiden University, it aims to fill an important gap in the research infrastructure for historical Dutch, which for a long time lacked a balanced corpus with data from all centuries and from different regions and genres.
Variational dimensions:
Time: The HCD covers the sixteenth to the nineteenth century. Textual material was chosen from around the middle of each century: 1550, 1650, 1750, and 1850. For each of these dates, a margin of 20 years before and 20 years after the date was built in to find sufficient sources, resulting in four time periods: 1530-1570, 1630-1670, 1730-1770, and 1830-1870.
Region: The HCD comprises textual material from four regions in the northern and southern Low Countries: Holland and Zeeland in the north (in the present-day Netherlands), and Brabant and Flanders in the South (in present-day Belgium). Holland and Brabant can be considered central regions, while Zeeland and Flanders occupy a more peripheral position so that the corpus can also be used to investigate centre-periphery dynamics. Texts originate from larger cities such as Amsterdam, Antwerp, Middelburg, and Ghent, but also from smaller towns and villages (e.g. Arnemuiden, Strijpen).
Genre: The HCD comprises administrative texts, ego-documents, and pamphlets. The administrative texts are handwritten, formal texts, such as town council meeting reports and resolutions. The authors of these texts were generally used to writing because of their profession. The sources for this genre were related to guilds or to industry on the one hand, and to the general administration on the other. Ego-documents are less formal, handwritten texts such as travelogues, diaries and chronicles of local events or family history. The pamphlets are published texts, mostly commentaries or polemics about current affairs, politics or religious topics, while they also include public ordinances and regulations. Due to the variety of documents, printed pamphlets may vary on the continuum between more and less formal.
Procedure
All textual materials were manually transcribed from photographs of the original documents and checked multiple times. When we used existing transcriptions, as in the case of some administrative texts, these were checked against the original archival material. References to publications, libraries and archives can be found in Van de Voorde (2022).
Scope
The HCD consists of 209 texts, together accounting for 463,248 words. It consists of 58 administrative texts, 60 ego-documents and 91 pamphlets. It aimed for 10,000 words per region and per period for each genre. For the sake of representativeness, these 10,000 words were preferably spread over multiple documents. In most cases, these are fragments and not complete texts. The figure below, taken from Van de Voorde et al. (2023), shows the number of words per genre, period and region. Most deviations from the intended 10,000 words can be found in the sixteenth century. A smaller gap can be noted for the nineteenth-century ego-documents from Brabant.

The value of this new corpus is illustrated by means of case studies in Van de Voorde, Rutten, Vosters, Van der Wal & Vandenbussche 2023.
Literature
Van de Voorde, Iris. 2022. Pluricentricity in language history: Building blocks for an integrated history of Dutch (16th-19th century). Amsterdam: LOT.
Open access:
2023. ‘Historical Corpus of Dutch: A new multi-genre corpus of Early and Late Modern Dutch’. Taal & Tongval 75: 114-132.
Open access:
From