-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathslovak-cult-stat-pilot.qmd
322 lines (190 loc) · 53.5 KB
/
slovak-cult-stat-pilot.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
---
title: "Pilot Program for Novel Music Industry Statistical Indicators in the Slovak Republic"
subtitle: "Business-to-government data sharing, novel re-use of public sector information for the creation of missing marco-, industry-, and institutional KPIs for the Slovak cultural and creative industry strategy implementation (working document version 0.1)"
version: 0.1.3
doi: "10.5281/zenodo.10372026"
title-block-banner: "#00348A"
author:
- name: Daniel Antal
orcid: 0000-0003-1689-0557
papersize: A4
format:
html:
toc-depth: 3
epub: default
docx:
reference-doc: docx/OpenMusE_simple_template.docx
pdf:
colorlinks: true
latex:
- lof: true
editor: visual
toc: true
lang: en-GB
date: today
bibliography:
- bib/capsurveys.bib
- bib/datalicensing.bib
- bib/indicators.bib
- bib/datagovernance.bib
- bib/datapooling.bib
- bib/privatelyhelddata.bib
- bib/administrativedata.bib
- bib/OpenMusE.bib
- bib/openmusicrepositories.bib
- bib/slovakia.bib
- bib/statreg.bib
- bib/surveyharmonization.bib
- bib/statisticalmethodology.bib
- bib/wikidata.bib
---
{{< pagebreak >}}
## Executive Summary
Access to high-quality official statistics is a public good that also increases the quality of decision-making on the enterprise and public policy levels. The problem of the cultural and creative sector institutions and enterprises is that they do not have access to the "music industry" or "film industry" statistics like banking or car manufacturing businesses do. They are less likely to engage effectively in the democratic tax policy-making process, for example. Cultural policy designers are also disadvantaged compared to their colleagues in tourism, agriculture, financial services or manufacturing.
Slovakia has already made a significant and exemplary investment in creating a satellite account system for the cultural and creative industries [@horecka_summary_2022], which allows architecture and advertising to have high-quality data. However, the other creative industries can still only rely on more aggregated statistical indicators useful for macro-level public policy design but have minimal use for institutional or business policies. Our suggested cooperation aligns with the best European and UN statistical practices and recommendations and the data and digitisation strategies of the European Union and the Slovak government. They would enable the Ministry of Culture and the Institute for Cultural Policy to rely on evidence when adopting and implementing their public policies; it would also emancipate music businesses to rely better on the public good of high-quality indicators.
`Open Music Europe` offers a 'data-to-policy' pipeline, which extends the music data pipeline to evidence-based business and policy administration [@openmuse_2023]. A data pipeline is a method in which raw data is ingested from various data sources and then ported to data store, for further analysis, in this case, to an open, shared, collaborative music observatory. We extend this pipeline using reproducible research techniques, a novel application of the Open Policy Analysis Guidelines, and good statistical practices to support evidence-based policy analysis, scientific music research and sound business strategy building. In this last leg of the pipeline, we emphasise usability for our project's target audiences and good documentation practices. We want to ensure that our data is high quality and well understood to support robust and correct business, scientific or policy conclusions.
Based on the *Memorandum o porozumení o využití výsledkov analýz otvorených politík v kontexte slovenského kultúrneho a kreatívneho priemyslu a sektorových verejných politík v spolupráci s konzorciom pre výskum a inovácie s názvom OpenMuse*. \[Memorandum of Understanding on utilizing the Open Policy Analysis results of the OpenMuse Research and Innovation Consortium in the context of Slovak cultural and creative industries and sectors' public policies, [@open_music_europe_sk_mou_2023]\] we held our first stakeholder consultation in Bratislava on **21 September 2023**, where we invited representatives of the [Ministerstvo kultúry SR](https://www.culture.gov.sk/) / [Ministry of Culture of the Slovak Republic](https://www.culture.gov.sk/en/) and its Instiute of Cultural Policy / [Inštitút kultúrnej politiky](https://via-cultura.sk/), the [Štatistický úrad SR](https://www.susr.sk/) / *Statistical Office of the Slovak Republic* (henceforth: `SOSR`), and [Infostat](http://www.infostat.sk/web2015/en), Institute of Informatics and Statistics, and representative Slovak stakeholders from the industry ([SOZA](https://moja.soza.sk/)) and from the music heritage sector and music libraries ([Hudobné centrum](https://hc.sk/).)
::: callout-tip
Latest consultation version: [Pilot Program for Novel Music Industry Statistical Indicators in the Slovak Republic](https://music.dataobservatory.eu/documents/open_music_europe/slovakia/slovak-cult-stat-pilot.html) on the Digital Music Observatory website with `docx`, `epub` and `pdf` downloads. You can comment in Word/Google Docs the docx file if you do not want to work on the markdown [souce](https://github.com/antaldaniel/data-ppp/blob/main/slovak-cult-stat-pilot.qmd).
For reference: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10372026.svg)](https://doi.org/10.5281/zenodo.10372026). See source documents and version history on [GitHub](https://github.com/antaldaniel/data-ppp).
You can download the PowerPoint presentations and summaries of our staeholder discussion with Dr Joost Poort (University of Amsterdam), Daniel Antal, CFA (Reprex), Dr. James Edwards (SINUS) and Magr. Tomas Miks (SOZA) [here](https://github.com/dataobservatory-eu/report-european-music-economy/raw/main/presentations/Open_Music_Europe_Slovakia_20230921.pptx).
:::
With colleagues from IKP and the `SOSR`, we reviewed all the strategic indicators of the cultural, sustainable development and digitisation policy goals of Slovakia [@2030_digital_transformation_slovakia_sk_2019; @slovak_sdg_strategy_2020; @slovak_cci_strategy_2023] and the potential data sources of desired but not available policy indicators on macro-economic, industry, and institute/enterprise level. The creation of such KPIs is essential for the controlling of the policy execution and for the creation of transmission mechanisms to bring down the policy execution to the level of at least national organisations like SOZA or Hudobné centrum. The following document is detailed summary of the statistical aspects of the data coordination and collaboration presented in the workshop; we would like to discuss this approach on **10 October in Bratislava** with relevant statistical and music policy stakeholders.
We believe that our approach conforms not only with the relevant Slovak cultural, development, digitisation and statistical policies, but would constitute a best practice in terms of implementing and improving the creation of structural business indicators with the help of *privately-held data* [@ess_position_privately_held_2017; @hleg_towards_b2g_data_2020; @european_data_strategy_2020; @un_sbr_guidelines_2020].
The structure of this working document is as follows:
- [Our approach to the problem](#our-approach-to-the-problem) in a nutshell conforming the case study and sandbox recommendations of the High-Level Expert Group on *European strategy on business-to-government data sharing for the public interest*, which is a supporting document of the *European Strategy for Data*, using good examples from various European countries (Norway is reviwed here.)
- [Data coordination](#data-coordination) proposed by `Open Music Europe` among IKP, SOZA, Hudobné centrum, SOSR and the Open Music Europe Consortium.
- [Statistical production](#statistical-production) of the novel indicators by voluntary application of official statistical guidelines and quality assurance, using advanced survey harmonisation, and application of the Eurostat indicator development methodologies.
- [Conclusions](#conclusions)
- [References](#references) to the cited regulations, policy documents and academic literature.
::: callout-warning
```{r bannerpng, echo=FALSE, message=FALSE, fig.align='center', out.width="560px"}
knitr::include_graphics(file.path("png", "banner_slovak-cult-stat-pilot.png"), dpi = 300)
```
Funded by the European Union under Grant No. 101095295. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission's Citizens, Equality, Rights and Values Programme, or the official views of the Ministerstvo kultúry SR, Štatistický úrad SR, Infostat or IKP. Neither the European Union, the Slovak public bodies nor the granting authority can be held responsible for them.
:::
## Our approach to the problem {#our-approach-to-the-problem}
The music industry has no similar access to high-quality governmental statistics as a public good that `SOSR` produces, such as banking or car manufacturing. We want to remedy this problem and its consequences with a proposed public-private partnership pilot project to improve the Slovak cultural and creative satellite accounts and generally the KPIs of the Slovak cultural strategy.
- State statistical survey programs under-sample the music sector, for example, in the `Roč 1-01 and Roč 1-01` enterprise surveys or in LFS. The reason is the relatively small weight of the music sector in the Slovak economy, the fragmentation into small units, and the relatively small importance of the music professionals in the labour force. We want to remedy this situation by adding privately collected, ex ante harmonised questionnaire-based survey data to the existing Slovak surveys to reach a necessary joint dataset that allows statistical generalisation for music.
- State statistical survey programs use the NACE and ISCO classification systems to group enterprises and individuals, which are not suitable for the creative industries due to their high level of informality and the prevalence of informal learning instead of accredited education. We suggest the creation of the Slovak Music Industry Register, and a derived SBR satellite following UN/Eurostat methodology to improve the identification of relevant data in the existing collection (music is covered in the collection but cannot be identified as "music").
- The music industry, due to its high international standardisation and digitisation, has an unusually large number of well-maintained digital data resources that are not yet utilised in a way that credit card, supermarket, or energy distributor data is already used to create more granular and timely statistical products. Our Slovak Music Industry Register proposal would allow data coordination with well-standardised and quality-controlled music industry data.
In short, the statistical system does not have sufficient data about the music industry or the film industry; furthermore, it uses categorisation nomenclatures in the form of NACE and ISIC that are not suitable to filter out the existing data for most cultural and creative industries (CCIs) except architecture and advertising.
Several European countries are experimenting with new regulations that make data provision mandatory for certain enterprises like supermarkets, mobile telephone operators or credit card companies to provide privately-held, commercial data for official statistical purposes. The European System of Statistics (ESS), where `SOSR` is a member, had been advocating for the use of privately-held data in the last ten years.
> Statistical surveys and administrative registers are and will remain important sources for the production of official statistics. However, the traditional data gathering methods could be in the future enhanced and enriched by big data analytics. To achieve this, it is essential that data held by private actors can be used by statistical authorities as raw material for innovative value-added services and statistical products, which will also boost the economy by creating new jobs and encouraging investment in data-driven sectors. Increased efficiency and faster delivery by statistical offices of innovative digital products and services will be one of the cornerstones for evidence-based decisions and contribute to lessening the burden on statistical respondents and notably on businesses. Improved access to data would enable statistical offices to provide more granular and timely statistics that would be more useful to enterprises and the citizens alike. It would also mean providing valuable feedback information to the data holders or delivering more tailored statistical services to companies, which might help them in return to better develop their business model. [@ess_position_privately_held_2017 pp. 3-4]
In preparation of several new data regulations within the European data strategy, the High-Level Expert Group on Business-to-Government Data Sharing presented an important and practical report under the title *Towards a European strategy on business-to-government data sharing for the public interest.* They use the B2G, or business-to-government abbreviations to new statistical procedures when official statisticians start to use private data; this is the opposite of the traditional "open data" regime, which allows the re-use of public sector information for the private sector.
The *High-Level Expert Group* advises that
> B2G data-sharing collaborations should be organised:
>
> -- in testing environments ('sandboxes') for pilot testing ('pilots') to help assess the potential value of data for new situations in which a product or service could potentially be used ('use cases'),
>
> -- via public-private partnerships. [@hleg_towards_b2g_data_2020, p. 8.]
> Member States should have in place structures to support B2G data sharing. These structures could be a body (or bodies) tasked with assisting public-sector organisations and private companies or civil-society organisations in entering into new data-sharing partnerships and facilitating the sharing of good practice. Over time, such structures could become trusted third parties between the public and private sectors, by bringing the relevant players together. [@hleg_towards_b2g_data_2020, p. 37]
> All B2G data sharing can ultimately be understood as a form of public-private partnership (PPP). However, from a practical point of view, sandboxes and specific PPPs can be put in place for specific public-interest purposes or to target specific challenges related to B2G data sharing. [@hleg_towards_b2g_data_2020, p 41]
Our recommendation is to test in the sandbox of the `Open Music Europe` research and innovation action and project supported by the European Union, based on the *Memorandum of Understanding on utilizing the Open Policy Analysis results of the OpenMuse Research and Innovation Consortium in the context of Slovak cultural and creative industries and sectors' public policies* [@open_music_europe_sk_mou_2023] to develop a use case in 2023-2025 that can inform interested partners by 2025 about potential public-private partnerships.
Sæbø and Dimakos review the statistical production methods for complication with privately held data, which provide insights for the data management of `Open Music Europe`, too. They also present the current state of affairs with the example of the Household Budget Survey [@saebo_new_data_sources_2023] which has been replaced by sampling privately-held data of credit card companies and supermarkets after decades of household surveying.
Until 2012, Norwegian statisticians measured household spending with a sample survey of Norwegians reporting their purchases; they asked a large sample of people drawn from the Norwegian population register (established in 1964) to fill in questionnaires. The data quality was particularly poor with food and beverages because such purchases, as opposed to infrequent buying of new television sets or festival tickets, happen daily; the reporting burden or the burden on memory is taxing when people retrospectively must fill out a questionnaire about buying bread, butter and orange juice.
Norway created statistical registers to tap into governmental data stores in 1990 and into municipal ones in 1995; by 2019, it utilised about 100 records and drew data from 30 public institutional sources [@saebo_new_data_sources_2023, p1.]. Like most countries, such "administrative data" was retrieved from other governmental entities, not the private sector. Collecting data from people, companies, and non-profits still relies on census-like comprehensive and sample surveys that take the form of filling out a digital or paper format or answering questions to an interviewer who fills out the form instead of the respondent.
Our project is facing similar problems. Surveying is costly and often inaccurate. Asking randomly selected music creators about their received royalties over a year requires the respondent to answer questions after opening and reviewing various royalty statements or necessarily filters through the individual's cognitive biases and memory lapses. `Statistics Norway` realised that instead of asking 7000 households about what they were buying in the supermarkets, it is far more accurate and potentially cheaper to acquire the data directly from the sales logs of the supermarkets or the payment transaction records of the credit card companies. CEEMID, the predecessor of `Open Music Europe`, relied on similar techniques that experimenting government statisticians used: we kept asking anonymously and randomly music creators about their received royalties, but we also compared the data with the actual anonymised payouts of Artisjus in Hungary and SOZA in Slovakia.
The lessons of Norway are interesting because the new statistical law (in force since 2021) allows such data collection after a cost/benefit analysis and risk reduction carried out by `Statistics Norway`. Norway, like all EFTA countries and Eurostat participates in ESS and applies the same statistical EU/EEA regulations that Slovakia, Bulgaria or Hungary applies. While in `Open Music Europe`, we are not planning and not even advocating state-mandated data collection; we find these criteria useful for voluntary data sharing with the government based on individual agreements, which we endorse in the music sector.
## Data coordination {#data-coordination}
"A register aims to be a complete list of the objects in a specific group of objects or population." [@anders_register-based_2007] We are planning music industry registers where the objects are *music works* and *sound recordings* (in statistical terms, music products), and the populations are *music authors*, *music performers*, *groupings of performers* (as the majority of the musicians perform, record, release in groups, ensembles, orchestras), *record labels* (which may be formal and informal businesses) and *music publishers* (enterprises.) From a statistical point of view, our planned music industry registers can be seen as "administrative registers" because they were not initially created for a statistical purpose by a statistical authority.
A *statistical register* is a continuously or regularly updated set of objects for a given population. It contains information on the identification and accessibility of population units and other attributes supporting the population surveying process. It serves as a constantly updated list of potential data sources: people or enterprises, for example, who may be invited to a sample survey or a census. The statistical register is a coordination tool for data collection (everybody who should provide data is found) and and, at the same time, a significant data quality management tool (we know if somebody was not found, how it will distort our resulting datasets). For example, as earlier stated, `Statistics Norway` applies about 100 statistical registers. Our primary concern in Slovakia is the creation of a music business register because this could provide indicators for the public policy-level and institutional/enterprise-level implementation of the Slovak cultural strategy.
The authoritative source on statistical business registers is the *United Nations Guidelines on Statistical Business Registers. Final draft prior to official editing* [@un_sbr_guidelines_2020], which is heavily based on the former UNECE *Guidelines on statistical business registers* [@unece_sbr_guidelines_2015]. The European guideline is the *European business statistics methodological manual for statistical business registers. 2021 edition* [@eurostat_sbr_manual_2021].
The statistical business register is an essential tool for creating survey frames or sample frames, in other words, to organise statistical data collection. In non-technical terms, this register is necessary to decide who should get a data request.
\- For a \*sample survey\*, the register is used to draw a lottery of those members of the population who will be invited to provide data.
\- In a \*census\*-type survey, all registered members of the population, for example, all music labels, will receive an invitation to an interview or form.
\- In the case of \*a register-based survey\*, all members of the register, for example, all collective management societies in the territory, will be requested to send data directly from their databases.
![Comparison of three types of statistical surveys. Based on A. Wallgren and B. Wallgren: Register-based Statistics---Administrative Data for Statistical Purposes](png/three-types-of-surveys.png).
### Slovak music industry enterprise register
The music industry needs a special register because the main nomenclature for categorising businesses, NACE, does not have a music industry entry; furthermore, only very few music enterprises participate in the state survey program because of their small size.
Most, but not all, music businesses are categorised as `J58` (together with enterprises working for the film, television or radio industries) or `R90` (together with dance, theatrical, circus and other performers, actors, and even film producers.) Any statistics calculated using the `J58` or `R90` category, for example, the total or average net turnover of such enterprises or their share in GVA and GDP, are only helpful for the macro-level cultural and creative industry policy design. The management of a music enterprise cannot compare their revenue, employee compensation, or corporate profit and loss to such statistics because their changes in time represent not only the business results of other music enterprises but also unrelated film or theatrical revenues. For a public policy planner, a decrease in `J58` is an early indicator of possible GVA decline and a reduction in tax receipts but gives no guidance if the problem is in the film, television or music industry. For the management of music enterprises, this indicator is far less helpful. When the total or average turnover decreases, or increases in `J58`, such a structural business statistical indicator change may be consistent with *both* decreasing music and growing film industry revenues.
### Informal enterprises, ensembles and projects
The music industry also shares a characteristic with many creative industries in the sense that it is often project-based: many professionals work together to stage or record music for a limited time. Because music (like, for example, theatre and film) is project-based, music professionals are often attached to many enterprises that do not take a legal form. For instance, a classical musician may work partly in a symphonic orchestra and partly in a chamber orchestra. These orchestras may have overlapping and distinct staff members, and they work on different cultural products (recordings) and services (ticketed performances), potentially even with other cultural groups, such as with actors in staging a theatrical play, opera or ballet, or working on a film or television project.
This problem is highlighted in the administration of the `KULT05` *Survey of musical ensembles and artistic ensembles in Slovakia* \[Hudobné telesá a umelecké súbory v SR\], where the subjects are ensembles (or project "enterprises") of music performers and their managerial, artistic or technical support staff, like tour managers, sound engineers, or orchestra directors. In enterprise surveys (even among non-profits), SOSR uses the ICO identifier as a Slovak unique identifier for the subjects instead of the often ambiguous and error-prone names. But in `KULT05`, we often find subjects with the same ICO number: there are institutions that maintain several orchestras or ensembles whose "employees", i.e., the musicians and their support staff, may overlap but not necessarily the same. Consequently, statistical subjects, i.e., ensembles, do not have an ICO identifier because they are semi-independent entities or "projects" of a larger institution or legal person. The `KULT05` survey can be seen as a mixed survey because it targets both formal institutions and informal institutions (i.e. ensembles of music professionals) that do not have an enterprise form. Such a mixed survey management requires a better identifier than the ICO.
The use of ICO or the names of ensembles is problematic in the retrospective analysis of the microdata that we will touch upon in [Retrospective harmonisation of survey data](#retrospective-survey-harmonisation) later.
Our solution to this problem is the creation of a strict namespace and using authority files to identify non-enterprise (and enterprise) entities for our music industry register.
::: callout-tip
#### Authority Files
When we want to work with longitudinal or panel data, we have to ensure that \[fictional example: `New Košice Orchestra` and `Contemporary Košice Orchestra` are kept as one observational unit if they only went through name changes, and if the identify fundamentally changed, they are kept separately. \]
The KULT microdata files (**to our knowledge**) use two identifiers: names and the *organisation identification number* (ICO), which the SOSR assigns to every legal person and other non-governmental or governmental institutions. For our purposes, neither the ICO nor the name titles are ideal identifiers alone. The use of ICO alone is not practical for the retrospective harmonisation of the datasets because some entities do not have an ICO number, or some ICO organisations have several statistical subjects (for example, the same organisation maintains several orchestras.) The name titles have been manually registered, and they are inconsistent. Over an extended period, names and ICO numbers may change (for example, with the fusion of two or more organisations.)
:::
Namespaces (in the statistical and data science practice) and authority files (in information and library science) ensure that entities are matched correctly. As it is getting more and more common to join data from different databases (this is the aim of our data coordination program, too), it is an increasingly widespread good practice to use globally unique identifiers that are also "permanent" (PID), or at least, very-long term. The ICO is a very good namespace connecting enterprise data within the Slovak Republic. Still, it fails when we want to link data from neighbouring countries or with informal enterprises, bands, and ensembles.
Global namespaces provide global identifiers for (statistical) units beyond the territory of Slovakia, and authority files connect such identifiers with known names name variations and even define a preferred name (in our fictional example, is it `New Košice Orchestra` or `Contemporary Košice Orchestra`?)
By design, global namespaces and authority files are machine-actionable; they can be read by humans, activating an html representation for human browsing and an RDF serialisation for data applications. We will encourage using at least two PIDs for each observational unit, following the best practices that will be elaborated in greater detail in [Retrospective harmonisation of survey data](#retrospective-survey-harmonisation) later.
- We will use `ISNI` or `VIAF` identifiers for natural and legal persons. The use of `ISNI` is paid, and the organisation must initiate it. `VIAF` is free because it is a public service of the Slovak National Library, but it goes through a curation process. For our purposes, they are equally good and they can be used interchangeably.
- We will also use a `QID` for data coordination. The `QID` is a globally unique identifier in Wikidata and Dbpedia. They are used in the statistical, research and cultural heritage domains as temporary or necessary global IDs when using an authority file is not possible or takes a long time (for example, we have to encourage each orchestra to obtain its own `ISNI` number.) *We also asked Wikimedia Slovensko to formally partner with our project*
The use of Wikidata is getting more and more common among knowledge organisations and even EU organisations for the coordination of namespaces or authority files. Originally developed as a reconciliation tool for Wikipedia, Europeana already recognised its value for pan-European data harmonisation in 2015. Since then, several European countries have used it as a decentralised, curated, shared authority control system. We think that VIAF is the most suitable authority control, but the flexibility and functionality of Wikidata make it a worthy parallel system in itself [@bianchini_beyond_2021; @van_veen_wikidata_2019; @rossenova_wikidata_2022]. We reached out to the Wikimedia Foundation and *WMSK*, former official legal name *Wikimedia Slovenská republika* to not only use their open source product, i.e., Wikibase for authority control reconciliation but as a tool to push our knowledge and our namespace to the Wikidata. [@fagerving_wikidata_2023]
### Music professional population register
Creating a registry of persons who engage professionally in music activities is not more complex than registering legal persons. The larger population size, the special protection of data and the privacy of individuals pose challenges in creating and using the population register in practice.
What type of natural persons should be part of the Slovak Music Industry Register?
- **Authors** who register without a publisher and self-publish music works and music lyrics.
- **Creators of sound recordings**, i.e., self-releasing producers and performers of recorded fixations of music performance that are released to be bought, streamed, broadcasted or played in the public against payment. Private persons who are *not* represented by a legal person music label.
- **Concert** and **festival promoters**, **booking agents**, **solo performers** and members of musical groups (ensembles, choirs, orchestras, bands) that create live music performances without being employed in a music institution or enterprise as a legal person.
- **Music professionals** that work within the broader music industry as suppliers of transport for music groups, producers or merchants of "merchandise", music schools and rehearsal studios and other enterprises, as sole proprietors or in other forms that do not constitute a form of employment in a legal person.
National collective management societies like SOZA, SLOVGRAM, or OZIS have a comprehensive list of creators who register their intellectual property for future payments or receive payments because certain forms of such revenues, like radio broadcasting royalties, are only paid for by these organisations. The creators they represent individually or as members form the economically more active part of the music industry population. Experience of these organisations shows that some groups of music creators, particularly young creators in music genres that are not radio-friendly because of their eccentric music or explicit lyrics, need to register with these organisations because they do not expect broadcasting royalties. National collective management societies must try to attract such creators to their member base because the copyright management infrastructure offered by such organisations goes beyond the administration of the royalties that may not be available for such artists.
Creators tend to be the minority of the music workforce. Behind a solo artist or a band of four, we usually find at least five music professionals behind the stage who manage the sound, light, stage technology, the administration and sale of tickets and merchandise, transport and install the technology, or control the crowd at entry, exit and in emergency. In more prominent stages and larger ensembles, the support crew size expands to 200 people, while even a symphonic orchestra is made of "only" 80-100 performers. Although our earlier research shows that within the music ecosystem, individuals usually perform more than one activity (they promote concerts and perform music or act as both sound engineers and music creators), a very sizable part of the music industry population has no copyrights or neighbouring rights and remains invisible for the collective management societies.
The natural person population of the music industry appears in population censuses every ten years and randomly in sample surveys, like the Labour Force Survey (LFS) or the Adult Education Survey (AES), which is conducted in Slovakia and every EU member state in a harmonised manner. The problem of these sample surveys is similar to the enterprise surveys: it is based on a categorisation, ISCO, which does not allow effective filtering out of responses given by the people working or learning in the music industry. Because of the relatively small size of the music sector, it is possible (but far from certain) that there are not enough responses recorded in the `SOSR` surveys to create statistics about the music industry; unfortunately, even if the sample size is large enough, it is impossible to select those responses that relate to music.
In the case of natural persons, we suggest the creation of ex-ante harmonised surveys, which ensures that enough music professionals (as persons) give answers to questions that a few respond as members of a random sample in LFS or AES. The coordination of these surveys is nevertheless more complicated because we cannot simply use a natural person identifier as a company registry number to file or join data. Yet, the creation of the population register is still a first step. Before we define who belongs personally to the music industry, it is not possible to administer a survey that gauges their opinions or characteristics.
In the past years, SOZA, in methodological cooperation with Artisjus in Hungary and HDS in Croatia, carried out sample surveys in the music professional population that achieved a high level of representativeness among creators, even though these surveys were not based on a music professional population register. At the same time, they did not know who filled out the CEEMID music professional surveys; two mechanisms secured representativeness. Even though explicit stratification was not possible with the survey without a register, SOZA (and Artisjus) could ascertain that the entire population received a questionnaire. The form remained anonymous, but the whole population was invited to fill it out. Furthermore, as SOZA (and Artisjus) administers exclusively and comprehensively particular types of royalties, we could compare the true statistics (mean, median, standard deviation) of actual royalty payments to anonymously reported amounts. We collected answers until the anonymous sample's statistics converged to the known, true statistical values in these anchoring points.
This leads to the third part of the use of the register: access to administrative data, i.e., demographic and income information from comprehensive records of IT systems that were not designed primarily for statistical production but for the administration of music royalties (SOZA, SLOVGRAM, OZIS), or the documentation of music heritage (Hudobné centrum and the Slovak music libraries.)
### Ex ante survey harmonisation
Retrospective survey harmonisation usually refers to social science surveys conducted with a questionnaire, when researchers expose randomly selected respondents to randomly assigned treatments---for example, ask the respondent about their subjective well-being before and after taking a pill or a placebo and combining it with blood pressure or weight measurement. These measurements may be human-transcribed to the survey questionnaire or recorded by a different tool when a new data integration problem occurs.
Our researchers have long been engaged in retrospective survey harmonisation, for example, in the case of Cultural Access and Participation surveys with the methodology created by the ESSNet-Culture working group of Eurostat and the participating EU national statistical authorities [@de_haan_virtuele_2008; @frank_guy_essnet-culture_2012; @de_haan_nowadays_2012]. We have extensive experience administering CAP surveys in Slovakia and Hungary and retroactively harmonising them with CAP surveys carried out within various EU-harmonised survey programs, such as Eurobarometer, EU-SILC and AES. Retrospective survey harmonisation can join data from different surveys if they use a similar sampling method and questionnaire items. If ex ante harmonisation is possible before the fieldwork, a much higher quality of harmonisation is possible.
"To ensure that answers from respondents surveyed in different settings carry minimal methodological errors and biases and can be meaningfully compared, both data producers and secondary users combine surveys from different sources, that is, they harmonize survey data. Generally, they do so at different stages of the survey lifecycle. Data producers mostly employ harmonization ex-ante, when designing and implementing comparative studies (input harmonization) and when processing the survey data in preparation for their public release (ex-ante output harmonization). \[...\] Secondary users apply harmonization methods retrospectively to already released data files." [@wysmulek_expost_2022]
We would like to ex ante harmonise additional surveys to meet the subsample requirements for music professionals and music enterprises in `Roč 1-01` and `Roč 2-01`; or, if this is not possible, to use "small area statistics" to work from smaller samples.
## Statistical production {#statistical-production}
The methodological framework that `Statistics Norway` is preparing has three interlinked pillars:
- Data minimisation by sampling
- Improving confidentiality
- Minimisation of data storage.
After consultation with the data protection authority of Norway, `Statistics Norway` concluded that sampling should occur outside the statistical office, i.e., before the data leaves the original data controller, for example, a collective rights management organisation. The `Open Music Europe` consortium members hold a similar view with a slightly different data governance angle. In our case, we are dealing with strictly voluntary data provision for the statistical authorities. Collective management organisations can only maintain their duty to their members if they do it themselves and guarantee the reduced risks of data sharing. Of course, we must consider the differences of the underlying data; in the Norwegian example "nano-level" transnational data is considered; in our case, we are considering microdata coordination and harmonisation.
`Statistics Norway` also experienced new data quality and reproducibility issues that `Open Music Europe` faces. It had been unprecedented in the production of official statistics that crucial parts of the statistical production or quality control are "outsourced", but in the case of privately-held data, this is necessary for data protection. (From the viewpoint of SOZA and other collective management societies, they are "insourcing" those statistical procedures that government statisticians are outsourcing due to data protection concerns.)
In this case, the problem arises that the privately held organisations incur significant data processing costs. In the case of `Open Music Europe`, these costs are recovered from a public European Union grant, where data programming, coordination, and management are paid for by SOZA, UTU and Reprex. There is a lively debate on the European level on how and to what extent national (official)statistical budgets should reimburse the costs of privately-held data providers. Filling out census or sample surveys is mandatory. Citizens and institutions usually pay with their time or their employees' time when they fill out these questionnaires or answer interviews. There is a consensus that participation in statistical interviews, for example, should not be incentivised with payments. These arguments cannot be held in the case of a few select private institutions expected to in-source costly statistical production and quality control measures necessary for trustworthy data provision.
`Open Music Europe` aims to solve these problems with the introduction of several novelties:
- The voluntary applications of specific guidelines originally designed for government statisticians;
- The use of open-source statistical software that applies statistical methods and procedures in a way that satisfies the needs of official statisticians;
- The use highly standardised, stable datasets;
- The use of open data management practices in the open music observatory, which design and document procedural safeguards for data quality.
### Voluntary application of procedures
Our data management plan and practices aim for high-level compatibility and interoperability with official statistics. We use the definitions from the appropriate ESS glossaries and regulations whenever possible and try to apply (with the necessary modifications).
- We follow the definitions from Article 2 of the *2019/1700 EU Regulation on establishing a common framework for European statistics relating to persons and households, based on data at individual level collected from samples*, the definitions of the *2016/679 EU General Data Protection Regulation*, and the Article 3 of *223/2009 EC Regulation on European statistics* [@eu_regulation_2019-1700; @gdpr; @gdpr_consolidated_text; @eur-lex_consolidated_ec_regulation_223-2009_2015; @ec_regulation_223-2009]. While the latter regulation is no longer in force, its definitions became parts of the European statistical vocabulary. For administrative data we use the [Cross-linked glossary for ADMIN data](https://cros-legacy.ec.europa.eu/content/admin-cross-linked-glossary_en) in the *ADMIN Knowledge repository* of the European System of Statistics.
- The *European Statistics Code of Practice* and the *Quality Assurance Framework of the European Statistical System* [@european_statistics_code_of_practice_2017; @ESS_QAF_2019] are aimed at national statistical authorities. We aim to apply their provisions wherever possible, with the necessary difference in interpretation for state-mandated and voluntary data collection.
- On the level of data coordination, we follow the recent statistical guidelines on using privately-held data to create official statistics [@unece_sbr_guidelines_2015; @un_sbr_guidelines_2020; @eurostat_sbr_manual_2021] , and, in broader terms, the management and policy literature of business-to-government data sharing [@ess_position_privately_held_2017; @hleg_towards_b2g_data_2020; @vigorito_privately_held_2022; @susha_b2g_data_2022].
- We thrive to build a best policy practice in Slovakia within the context of the *European data strategy* [@european_data_strategy_2020] and utilise the regulatory innovations of the *Data Governance Act* [@data_governance_act_2022].
### Application of open algorithms and open-source statistical code
Within the framework of rOpenGov, UTU and Reprex are developing extensions (software packages) to the R statistical environment and language. National statistical offices widely use R in statistical production. Even if they do not use it, R is a high-level, interpreted language that production statisticians can read, even if they are used to working with different computer languages. To provide the necessary quality control of the statistical production pipeline, the data providers (in Slovakia, initially SOZA) will only use open-source statistical code that government statisticians can vet. The researchers of UTU and Reprex will send all key elements of their software to peer review first on CRAN, then within the broader data science and academic statistical community. This measure will provide a necessary safeguard to the production quality and reproducibility regarding the production itself.
### Data stability: highly-standardised administrative data sources
Another safeguard of the reproducibility concerning the data sources is the relative stability of collective management data. Music licensing is a very highly standardised global activity. Royalty accounting music metadata standards ensure that SOZA can cooperate and exchange information about distributing money to rightsholders in almost every country. Global organisations like CISAC and BIEM govern these standards, often implementing stable international standards or agreements. Data stability, another critical aspect of statistical reproducibility within a state statistical program, can be considered better founded than in the case of many public administrative records that `SOSR` or other national statistical offices have been using in the past decades.
### Application of the open collaboration method
We practice the open collaboration method, rooted in available knowledge management (library science) and open-source software development. As long-time developers of open-source statistical software on the rOpenGov platform, we use good project management practices, code sharing and data sharing to facilitate work across business, scientific, and government entities and allow the participation of individual researchers, developers or citizen scientists, too.
- We have adopted the *Open Policy Analysis Guidelines* to make our work easy to follow, review and connect to [@OPA_framework_2020]. This means that like open source software developers would do, we make our policy analytic work, including this document and its supporting reader, references, charts, available for review, criticism and improvements in a standard file structure on the GitHub open repository.
- The software library extensions to the R statistical environment and language are available in development version, daily freshed in the GitHub repositories of [rOpenGov](https://ropengov.org/) managed by the University of Turku Data Science Group; and after necessary peer-review and tasting on the [The Comprehensive R Archive Network](https://cran.r-project.org/).
- All our developers and researchers must adhere to the [Contributor Covenant](https://www.contributor-covenant.org/version/2/1/code_of_conduct/) stewarded by the Organization for Ethical Source.
### Retrospective harmonisation of survey data {#retrospective-survey-harmonisation}
| No. | Survey Title (SK) | Survey Title (EN) |
|----------|-------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
| `KULT05` | Hudobné telesá a umelecké súbory v SR | Musical ensembles and artistic ensembles in Slovakia |
| `KULT10` | Ročný výkaz no knižnici | Annual survey of libraries |
| `KULT11` | Ročný výkaz no audiovízii | Annual audiovisual survey |
| `KULT16` | Ročný výkaz o verejných podujatiach v oblasti profesionálnej hudobnej kultúry | Annual survey on the public events in the field of professional music |
| `KULT19` | Ročný výkaz o výrobe a distribúcii zvukových záznamov hudobných diel | Annual report on the production and distribution of sound recordings of musical works |
#### Codebook creation
The KULT survey documentation contains Slovak language descriptions of the variables in a narrative format, often with coding information. We create a machine-readable and programmatically easy-to-handle variable name for each variable, therefore creating the mapping 168 "total_own_turnover", and whenever necessary, we make a new auxiliary variable.
#### Variable names, auxiliary variables
The auxiliary variables are usually constant attribute variables in each dataset, and they were not explicitly coded, such as the time reference for the year 2007 or the unit of measure reference for SKK or EUR. We will use only ASCII characters for programmatic use to avoid spaces and characters with special meanings in various programming languages. Variable naming is usually following either the camelCase or the snake_case convention. Because the tidy use of R is recommended, we will create snake case variable names.
#### Concept mapping
We will translate each variable label to English and add the 1-2 most relevant concepts for all variables to make the variable descriptions machine-actionable. Concept mapping requires a good command of the domain and statistically controlled vocabularies. Our initial concept mapping will go through two layers of peer review: Hudobne Centrum, the data steward organisation of the KULT surveys, will review our mapping first from a musical point of view, and we will ask the SOSR for a statistical revision.
As an end result, we can bring each dataset column to a machine-actionable format that already offers itself for ex-ante harmonisation with further surveys and data sources. We will make the new joined datasets available in CSV, Excel, and SPSS formats for convenience.
#### Namespace and entity matching
The KULT surveys are enterprise surveys by nature, although they relate to social or cultural enterprises, which often do not have an enterprise form or even a legal personality.
### Crosswalking
A schema crosswalk is a table that shows equivalent elements (or "fields") in more than one database schema. It maps the elements in one schema to the equivalent elements in another. In this case, we will create a schema crosswalk table for connecting the SOSR statistical microdata to a more user-friendly, bilingual, improved, and enriched KULT database.
### Creation of novel indicators for the Slovak music industry and potentially other CCIs
In our projects, we follow the best practices of key business information, statistical, and evidence-based policy indicator design. In doing so, we would like to find synergies among various recent innovations in statistics and open science. Throughout the project, we will follow the Eurostat guidelines on creating new indicators [@eurostat_harmonised_indicators_1_2014; @eurostat_harmonised_indicators_2_2017; @eurostat_harmonised_indicators_3_2017], which will ensure broad consensus forming among stakeholders around the objectives and methodology of the improved measurements.
We reviewed three Slovak policy documents for our work which are relevant: the sectorial Stratégia kultúry a kreatívneho priemyslu Slovenskej republiky 2030 [@slovak_cci_strategy_2023]---henceforth: Slovak CCI Strategy, the Slovakia's Vision and Development Strategy 2030 - a long-term strategy for sustainable development [@slovak_sdg_strategy_2020]---henceforth: Slovak Strategic SDGs, which has significant cultural elements, and the 2030 Digital Transformation Strategy for Slovakia [@2030_digital_transformation_slovakia_en_2019; @2030_digital_transformation_slovakia_sk_2019]--henceforth: Slovak Digitisation Strategy, which is highly relevant for the innovation we are doing in Slovakia under the label "Listen Local" in WP2.3. These documents were made in different years, and they are consistent with the EU policy document of that given year; they show little cross-referencing however within the Slovak documents. This means that for example, the most dated Slovak Digitisation Strategy is having no reference to the cultural and creative industries at all, but also, the later Slovak CCI Strategy has very little concrete reference to the SDG policies and no reference to the Slovak Digitisation Strategy. The Slovak CCI Strategy is a well-designed policy paper, and it is easy to link to the relevant, horizontal SGD and digitisation policies.
Our literature and data source review is building on the earlier work of KEA European Affairs, who, in cooperation with the European Commission's Structural Reform Support Service (SRSS), reviewed the policy documents preceding the final version of the Slovak CCI Strategy, and almost all data sources that are relevant for our work, too [@kea_cultural_industries_slovakia_2020].
The "triple transition" is highly relevant to our work because these horizontal policies guide public investment, have new rules on private investment spending, and foresee changes in economic and tax policies. It immediately connects with the 2nd strategic priority of the Slovak CCI Strategy, 2 Efficiently funded culture to systematically reduce the infrastructure and modernisation gap, increase the efficiency of the finance management and financing of culture and creative industries, and complement public funding sources with private sources. Furthermore, we also agree with the strategy's prioritisation to put 3 Dignified culture, i.e., the proper remuneration for the workforce of the Slovak cultural sectors, on the top of the policy intervention agenda, because in CCSIs, the bigger part of the gross value added or gross product is the value created by the workforce (labour) and not fixed capital assets. The following KPIs of this strategic objective are very relevant to our work.
The indicators from the text below are added to the slides, too.
The Slovak strategic SDGs have a strong cultural component, too [@slovak_sdg_strategy_2020, p. 58.], which partly overlap with the Slovak CCI Strategy.
## Conclusion
## References {#references}