Beserman Corpus

You are viewing the main page of the Beserman Udmurt corpus. The corpus contains transcribed texts of various genres which were recorded during fieldwork in 2003–2015. Currently, there are about 65,000 tokens in the corpus, however, the corpus will eventually grow as more texts are being recorded and transcribed.

Several oral genres are represented in the corpus, including monologues, dialogues (including spontaneous), linguistic experiments recordings, etc. All texts have been morphologically annotated, glossed and translated into Russian. To see the full glossing and translations of sentences, click “Display options” and choose glossed output layout. Search results are produced in random order by default. If you would like to read a whole text sentence by sentence, you can download it in pdf format from this page.

For providing online access to the corpus we used the search platform developed for the Eastern Armenian National Corpus (EANC). A page explaining how to make search queries in this platform is available at the help page of EANC.

We also invite all those who are interested in Udmurt language to visit the Literary Udmurt corpus that we developed earlier.