• Boglárka VERMEKI
Keywords: corpus linguistics, child language, spontatneous speech, language teaching


The purpose of this study is to reveal the characteristics of children’s language usage, with particular attention to the composition of their vocabulary, with the help of corpus linguistic investigations, such as word frequency and keyword analyses. The KorSzak Children’s Language Corpus, which is the basis of the present research, is a dynamic corpus for pedagogical purposes currently consisting of 73 recordings of twenty-seven children aged 11-15. During the video and audio recordings, the children-informants talk freely about particular topics (e.g. animals, leisure activities) in pairs or small groups. The current research presents, in detail, the informants’ most frequently used words, classifying them into word classes and lexical units associated with them and examining their lexico-grammatical patterns. The application of the investigation’s findings in language education will be discussed as an outlook of the presentation.

Author Biography

Boglárka VERMEKI

Belgrádi Egyetem, Filológiai Kar
Hungarológia Tanszék
Belgrád, Szerbia


Baumann Tímea – Majoros Judit – Pelcz Katalin – Schmidt Ildikó – Szita Szilvia – Vermeki Boglárka. 2020. Bemutatkozik a Korpusznyelvészeti és Szakmódszertani Munkacsoport. Hungarológiai Évkönyv 21 (1–2): 32–41.
Biber, Douglas – Reppen, Randi. 2002. What does frequency have to do with grammar teaching? Studies in Second Language Acquisition (24): 199–208.
Conrad, Susan. 2000. Will corpus linguistics revolutionize grammar teaching in the 21st century? TESOL Quarterly (34): 548–560.
Hoey, Michael. 2005. Lexical priming: A new theory of words and language. Abingdon, England: Routledge.
Hunston, Susan. 2002. Corpora in applied linguistics. Cambridge: Cambridge University Press.
Hunston, Susan. 2022. Corpora in Applied Linguistics. Cambridge: Cambridge University Press.
Kaltenböck, Gunther – Mehlmauer-Larcher, Barbara. 2005. Computer corpora and the language classroom: on the potential and limitations of computer corpora in language teaching. ReCALL (171): 65–84.
Kilgarriff, Adam. 2015. Statistics used in Sketch Engine. https://www.sketchengine. eu/wp-content/uploads/ske-statistics.pdf (2023. jan. 31.) KorSzak Gyermeknyelvi Korpusz. 2020. Sketch Engine.
McCarten, Jeanne. 2010. Corpus-informed course book design. In A. O’Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics. 413–427. London: Routledge.
McEnery, Tony – Xiao, Richard – Tono, Yukio. 2006. Corpus-based language studies: An advanced resource book. London: Routledge.
Meunier, Fanny – Reppen, Randi. 2015. Corpus versus non-corpus-informed pedagogical materials: grammar as the focus In The Cambridge Handbook of English Corpus Linguistics, Biber, D., Reppen, R. (Eds.) Cambridge: Cambridge University Press.
O’Keefe, Anne – McCarthy, Michael – Carter, Ronald. 2007. From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press.
Sketch Engine é. n. a. Wordlist – frequency lists and linguistic datbases. (2023. jan. 31.)
Sketch Engine é. n. b. Keywords and term extraction – identifying typical words. (2023. jan. 31.) Sketch Engine é. n. c. huTenTen: Corpus of the Hungarian Web. (2023. febr. 4.)
Sketch Engine é. n. d. Trial and paid account limitations. https://www.sketchengine. eu/guide/account-limitations/ (2023. febr. 4.)
Szita Szilvia – Pelcz Katalin. é. n. MagyarOK teaching materials for Hungarian, levels A1 to B2. yarok_hp2 (2023. febr. 4.)
24. 11. 2023.