Ensuring the confidentiality of statistical outputs

As in the 2011 census, some of the tables contain deliberate errors in the data – admittedly minor, but still confusing to the attentive observer. This is to ensure the confidentiality of individuals.

No one has the right to know someone's marital status, whether they own or rent their home, their ethnic nationality, or mother tongue. Since census and population data are also published at the village level and for narrow population groups, some tables may reveal that there is a single 40–44-year-old man living in village N, and everything asked about this man in the census will be made public to anyone who knows who this man in village N is. In order to avoid such a situation, it is internationally forbidden to publish tables in which the value (frequency) of some cells is 1 or 2.

Various methods can be used to avoid this situation, all of which modify the data set. One way is not to publish the low-frequency cells, replacing their value with a symbol. Unfortunately, this does not always work– in most cases, a simple calculation can be used to work out the contents of the hidden cell from the other cells.

Another option is to add random error, or noise, to the results. In this census, Estonia applied a slightly adapted variant of the cell key method to remove confidential values from the tables. The cell key method recommended by Eurostat and developed by the Australian Bureau of Statistics adds a small amount of noise to the selected cells in the frequency table so that there are no confidential values in the table. By generating a random key for each person, it is ensured that the values of the same characteristics in different tables remain the same. Since the cell key method also adds noise to numbers other than the confidential ones, and Estonia is a very small country, we had to adjust the method a bit so that the big numbers (e.g. the population of Estonia) do not change. For this purpose, we applied the cell key method only to cities, rural municipalities, towns, small towns, and villages (excl. Tallinn, Tartu, Narva, as these cities are big enough). Tables at the level of Tallinn, Tartu, Narva, all counties, and larger units show original values. The total population of every settlement is also correct, but within counties the values are randomly scattered in various age groups, ethnic groups, and other population groups. Therefore, for example, if the number of Estonians in each age group in the rural municipality M is added together, the obtained number may differ from the number of Estonians living in M rural municipality shown in the table.

However, it is worth noting that in the whole table, at most 25% (more often 15–20%) of all frequencies are scattered. The modified numbers differ from the original by an average of 2, and 50% of the modified numbers are less than 25, while 25% of the modified numbers are less than 5.