COVID-19 Integrated Surveillance Data in Italy
COVID-19 integrated surveillance data provided by the Italian National Institute of Health and processed via UnrollingAverages.jl to deconvolve the weekly simple moving averages.
COVID-19 integrated surveillance data provided by the Italian National Institute of Health and processed via UnrollingAverages.jl to deconvolve the weekly simple moving averages.
Every week the National Institute for Nuclear Physics (INFN) imports an anonymous individual-level dataset from the Italian National Institute of Health (ISS) and converts it into an incidence time series data organized by date of event and disaggregated by sex, age and administrative level with a consolidation period of approximately two weeks. The information available to the INFN is summarised in the following meta-table
The original data has been stored here, reorganised here.
The input data has been stored here and contain the following information:
daily_incidences_by_region
folder:
daily_incidences_by_region_sex_age
folder:
The output data has been stored here and contain the following information:
Raw data are downloaded from the INFN (direct download here), decompressed, stored in the 0_archive
folder and then organized into the 1_structured_archive
folder via the execution of the data_organization.jl script.
In general, given the moving average (or rolling mean) of a time series, it's not possible to recover the original series unless n original points are known where n is the width of the window adopted in the moving average, but since epidemiological surveillance incidence series are strictly composed of natural numbers, we can leverage this property to come up with a finite number of candidate original series, and then prune these down to as little as possible, hopefully only one, final recovered series.
The whole procedure is performed via the execution of the main.jl script and the related technical details can be found the documentation of UnrollingAverages.jl package.
The averaged time series to be unrolled (i.e. recovered, reconstructed or de-averaged) are those stored in the 2_input/daily_incidences_by_region_sex_age
folder: they are organized in .csv files, each of which reporting the 10 age-specific time series of a particular incidence in a particular region. Each dataset has two counterparts that are further stratified by sex.
Since the smaller the numbers involved the better UnrollingAverages.jl seems to perform, we opted for unrolling the sex-stratified series first and then aggregate them later. Since not all the age and sex stratified averaged series allows UnrollingAverages.jl to find an unique original series and no further sex-stratified information is provided by INFN, we attempted to directly unroll the sex-aggregated time series for which CovidStat provides additional information in the form of age-aggregated original time series, that we employed to select that combination of age-disaggregated series proposed by UnrollingAverages.jl which summed to the age-aggregated original time series provided by INFN. The utilized age and sex-aggregated may be found in the 2_input/daily_incidences_by_region
folder. We'll refer to the last selection algorithm as the cross-sectional consistency constraint.
The successfully reconstructed time series are then saved in the 3_output/data
folder (both aggregated and disaggregated by sex), while the visualisations of those that are age-stratified and sex-aggregated may be found in 3_output/figures
.
If you use these data in your work, please cite this repository using the metadata in CITATION.bib
.
COVID-19 Surveillance Data Modelling and Management Pipeline in Piedmont.
COVID-19 integrated surveillance data provided by the Italian National Institute of Health and processed via UnrollingAverages.jl to deconvolve the weekly simple moving averages.
COVID-19 Surveillance Data Modelling and Management Pipeline in Piedmont.
A Julia package to deconvolve simple moving averages of time series.