
Linguistic corpus of native Czech and Slovak learners acquiring Italian language

Browse the Corpus Learners Learn More

codeFork on GitHub


Czech-IT! is an open-source and open-data linguistic corpus of native Czech learners acquiring Italian language, composed by different kinds of communicative situations, in order to reveal a wide range of phenomena.



The primary goal of this project is to give an interrogable and searchable corpus in the field of Second Language Acquisition studies.

A strong separation between the data and the analysis of the data can result in a transparent way to research and to testify the hypotheses from different perspectives and theoretical frameworks.


Second Language Acquisition (SLA) is a fertile field of research in linguistic studies, either by applied and empirical standpoints or from theoretical and general perspectives.

This corpus stands for comparative and contrastive analyses exhibited by linguistic patterns among languages during the acquisitional path.



The project is based on quantitative analyses of the dataset, which is constituted by an amount of different kinds of communicative situations, in order to retain a wide range of linguistic behaviors and styles.

Browse the Corpus


The corpus is organized in the form of a tabular file format (spreadsheet), in which the entries are listed in a raw modality. These are navigable in the webspace by their communicative situation which yields a diamesic-oriented taxonomy.

Computational analyses

Computational analyses as tokenization, POS-tagging, word counting are provided in order to result in a fully queryable corpus. The data can be queried also for some fields of learners' knowledge.


Research activities

  • M.Petolicchio. Sintagmi nominali in Italiano L2: apprendenti cechi e slovacchi
    November 24th, 2018
    XIX. mezinárodní setkání romanistů, Palacky University Olomouc (CZ)
  • M.Petolicchio. On Noun Phrases in Italian L2 by Czech and Slovak Learners
    October 26th, 2018
    CIAL 2018, University of Languages of Azerbaijan, Baku
  • M.Petolicchio. Open access linguistic corpus (Czech/Slovak to Italian L2)
    October 4th, 2018
    TechLing 2018. Universidade Autonoma de Lisboa (PT)
  • M.Bolpagni, M.Petolicchio. Presentazione di Czech-IT!
    April 26th, 2018
    Univerzita Konštantína Filozofa v Nitre [survey]
  • M.Bolpagni, M.Petolicchio. 7th AIUCD Conference
    January 31st - February 3rd, 2018
    Università degli Studi di Bari
  • M.Bolpagni, M.Petolicchio. Introducing Czech-IT!
    December 1st, 2017
    Università degli Studi di Udine
  • M.Bolpagni, M.Petolicchio. Loquit - Colloquia di Italianistica
    November 3rd, 2017
    Palacky University, Olomouc

Active members

  • Marco Petolicchio
    Computational analyses, digital strategies
    Palacky University, Olomouc

Cite the project

Petolicchio, Marco, & Bolpagni, Marcello. (2017).
Czech-IT! - Linguistic corpus of native Czech learners acquiring Italian language
(Version v.1.0) [Data set]. Zenodo.
DOI: 10.5281/zenodo.824985



Acquire data and insert learners in the dataset in a coherent way.


Obtain automated informations about the categorial status of the elements in the sentences in order to provide a quantitative analysis.


Result in a quantitative way to display the corpus and provide statistical information about the learning path in function of learners and language abilities.