README.md 13.8 KB
Newer Older
numeroteca's avatar
numeroteca committed
1
COVID19 en España e Italia / CODVID-19 in Spain and Italy
numeroteca's avatar
numeroteca committed
2 3
=================

4
**EN**
numeroteca's avatar
numeroteca committed
5

6
This is a repository that hosts two different projects:
7

8 9
 * R script to analyze COVID-19 in Spain, Italy and France. Website with upated charts: https://lab.montera34.com/covid19
 * Data collection of COVID-19 by province in spain (collected from various data sources). [More info](https://github.com/montera34/escovid19data). It is currently moving as an independent project.
numeroteca's avatar
numeroteca committed
10

11 12 13 14 15 16
**ES**

Este repositorio contiene dos proyectos diferentes:

  * Script de R para facilitar el análisis del COVID19 en España, Italia y Francia. Puedes ver los gráficos producidos en la web que hemos montado en montera34: https://lab.montera34.com/covid19
  * Los datos de COVID-19 por provincias. Una iniciativa ciudadana para recolectarlos de diferentes fuentes, en vista de que el gobierno no lo facilita. [Más información](https://github.com/montera34/escovid19data). Se está mudando actualmente a un repositorio independiente.
17

numeroteca's avatar
numeroteca committed
18 19
## Licencia / License

20
[GNU GENERAL PUBLIC LICENSE. V3 (GNU GPLv3)](https://code.montera34.com:4443/numeroteca/covid19/-/blob/master/LICENSE.md).
numeroteca's avatar
numeroteca committed
21 22

## Cómo usarlo / how to use it
23

24
Si vas a publicar visualizaciones, antes lee este artículo [Ten Considerations Before You Create Another Chart About COVID-19](https://medium.com/nightingale/ten-considerations-before-you-create-another-chart-about-covid-19-27d3bd691be8). Todo poder conlleva una gran responsabilidad.
numeroteca's avatar
numeroteca committed
25

26
When you go to any of the scripts you can run it and it will produce all the visualizations. Sometimes data directly takes them from the source some others are stored locally.
numeroteca's avatar
numeroteca committed
27 28 29

## File structure

numeroteca's avatar
numeroteca committed
30
```
numeroteca's avatar
numeroteca committed
31
├── coronavirus.Rproj                     # R project
32 33 34
├── analysis                              # scripts to process data and generate charts
│   ├── comparativa-bases-de-datos.R      # R script: to compare databases: ISCII, datadista and esCOVID19data
│   ├── count_catalunya.R                 # R script: to process Catalunya data
35
│   ├── evolution_compare.R               # R script: process and create plots compare countries
36 37
│   ├── evolution_france.R                # R script: process and create plots France  
│   ├── evolution_italia.R                # R script: process and create plots Italia  
38 39
│   ├── process_spain_regions_data.R      # R script: process Spain by comunidades autonomas data
│   ├── charts_spain_regions.R            # R script: create charts Spain by comunidades autonomas
40 41 42
│   ├── evolution_spain_provinces_maps.R  # R script: generate map by provinces to make animated gif
│   ├── charts_spain_provinces.R          # R script: create charts Spain by provinces
│   └── process_spain_provinces_data.R    # R script: process Spain by provinces
numeroteca's avatar
numeroteca committed
43 44
├── data
│   ├── original                                  # original data
numeroteca's avatar
numeroteca committed
45
│   │   └── spain
46
│   │       ├── ccaa-poblacion.csv            # population per region
47
│   │       ├── provincias-poblacion.csv      # population per province
48
│   │       └── covid10_spain_provincias.csv  # covid19 data by province downloaded from spreasheer 
numeroteca's avatar
numeroteca committed
49
│   └── output                                # processed data: by date and comunidad autónoma in Spain
50 51
│       ├── spain
│       │    ├── covid19-provincias-spain_consolidated.csv  # exported province data
numeroteca's avatar
numeroteca committed
52
│       ├── covid19-cases-uci-deaths-by-ccaa-spain-by-day-accumulated.csv     # merge all variables
numeroteca's avatar
numeroteca committed
53
│       ├── covid19-casos-registrados-por-ccaa-espana-por-dia-acumulado.csv   # registered cases accumulated
numeroteca's avatar
numeroteca committed
54
│       ├── covid19-fallecimientos-por-ccaa-espana-por-dia-acumulado.csv      # deceassed
numeroteca's avatar
numeroteca committed
55 56
│       └── covid19-ingresos-uci-por-ccaa-espana-por-dia-acumulado.csv        # intensive care
├── img
57 58
│   ├── france    # France plots
│   ├── italia    # Ftaly plots
numeroteca's avatar
numeroteca committed
59 60 61 62 63 64
│   └── spain
│       ├── regions                       # Comunidades autónomas (regions) charts
│       │    ├── 20200312                 # previous days plots
│       │    ├── 20200313                 # previous days plots
│       │    ├── covid19_*.png            # last plots generated
│       └── provinces                     # Provinces charts
numeroteca's avatar
numeroteca committed
65 66
├── LICENSE.md
└── README.md
numeroteca's avatar
numeroteca committed
67
```
numeroteca's avatar
numeroteca committed
68

numeroteca's avatar
numeroteca committed
69 70
## Datos / Data 

71
### España (Spain)
numeroteca's avatar
numeroteca committed
72

73
<img src="https://lab.montera34.com/covid19-r/img/spain/regions/covid19_muertes-por-dia-comunidad-autonoma-superpuesto-log_media.png"  width="450" alt="Media de muertes por día en los 6 días anteriores por comunidad autónoma. España. COVID-19]">
numeroteca's avatar
numeroteca committed
74

75
<img src="https://lab.montera34.com/covid19-r/img/compare/covid19_fallecimientos-por-region-superpuesto-offset-log_since-5deceased.png"  width="450" alt="Fallecimientos acumulados por día desde el primer día que hubo 5 fallecimientos o más en regiones de España (CCAA), Italia y Francia. Escala logarítmica. España. COVID-19">
numeroteca's avatar
numeroteca committed
76

77 78
#### Provincias

79
Más información sobre la recogida de datos por provincias en [EsCOVID19data](https://github.com/montera34/escovid19data).
80

81
Read more information at [EsCOVID19data](https://github.com/montera34/escovid19data).
82

83
#### Comunidades autónomas
84

85 86
Data are extracted from official PDF sources by Datadista [in this repository](https://github.com/datadista/datasets/tree/master/COVID%2019).

87 88
Del respositorio de Datadista (https://github.com/datadista/datasets/tree/master/COVID%2019) que los extrae a su vez de las tablas de la situación diaria de la enfermedad por el coronavirus (COVID-19) en España que publica el Ministerio de Sanidad, Consumo y Bienestar Social (https://www.mscbs.gob.es/profesionales/saludPublica/ccayes/alertasActual/nCov-China/situacionActual.htm) en incómodos PDF. Cita a Datadista como fuente de los datos extraídos. 

89 90
Los datos procesados listos para usarse (formato largo) están disponibles en el directorio [data/output](https://code.montera34.com:4443/numeroteca/covid19/-/tree/master/data/output).

91
Processed data are avilable (long format) in this directory.  [data/output](https://code.montera34.com:4443/numeroteca/covid19/-/tree/master/data/output).
92

93
There is a file with all the data (registered cases, intensive care patients and deaths), see data structure below: [/data/output/covid19-cases-uci-deaths-by-ccaa-spain-by-day-accumulated.csv](https://code.montera34.com:4443/numeroteca/covid19/-/blob/master/data/output/covid19-cases-uci-deaths-by-ccaa-spain-by-day-accumulated.csv)
94 95 96 97

Data structure:

* `date` Day
98 99 100 101
* `region_code` Region code (INE code number for comunidad autónoma)
* `region` Spanish region (comunidad autónoma)
* `country` COuntry the region belongs to
* `population` population of the region
102
* `cases_registered` Number of registered cases. It is the sum of PCR+ and TestAc positive tests.
103
* `PCR` Number of PCR+ positive tests
104
* `TestAc` Number of positive anticuerpos (antibody) positive test
105
* `cases_per_100000` Number of registered cases per 100.000 people
106
* `intensive_care` Number of intensive care patients (UCI in Spanish). [Read Datadista README for exceptions](https://github.com/datadista/datasets/blob/master/COVID%2019/readme.md)
107
* `intensive_care_per_1000000` Number of intensive care patients per 100.000 people (UCI in Spanish)
108 109
* `deceassed` Number of deceassed
* `deceassed_per_100000` Number of deceassed per 100.000 people
110 111
* `recovered` Number of recovered
* `recovered_per_100000` Number of recovered per 100.000 people
112 113
* `hospitalized` Number of hospitalized
* `hospitalized_per_100000` Number of hospitalized per 100.000 people
114

115 116
Example of observations:

117 118 119 120 121 122 123
| "date"       | "region\_code" | "region"    | "country" | "population" | "cases\_registered" | "cases\_per\_100000" | "intensive\_care" | "intensive\_care\_per\_1000000" | "deceassed" | "deceassed\_per\_100000" | "recovered" | "recovered\_per\_100000" |
|--------------|----------------|-------------|-----------|--------------|---------------------|----------------------|-------------------|---------------------------------|-------------|--------------------------|-------------|--------------------------|
| 2020\-03\-14 | 1              | "Andalucía" | "Spain"   | 8414240      | 269                 | 3\.2                 | NA                | NA                              | 2           | 0\.24                    | NA          | NA                       |
| 2020\-03\-15 | 1              | "Andalucía" | "Spain"   | 8414240      | 437                 | 5\.19                | NA                | NA                              | 6           | 0\.71                    | NA          | NA                       |
| 2020\-03\-16 | 1              | "Andalucía" | "Spain"   | 8414240      | 554                 | 6\.58                | 11                | 0\.13                           | 7           | 0\.83                    | 0           | 0                        |
| 2020\-03\-17 | 1              | "Andalucía" | "Spain"   | 8414240      | 683                 | 8\.12                | 13                | 0\.15                           | 11          | 1\.31                    | 0           | 0                        |
| 2020\-03\-18 | 1              | "Andalucía" | "Spain"   | 8414240      | 859                 | 10\.21               | 21                | 0\.25                           | 19          | 2\.26                    | 38          | 4\.52                    |
124 125


126
When no data is available `NA` is indicated. Intensive care patients data have not been published since March 13th.
127

128
#### Población por comunidades autónomas (2019)
129 130

Población por comunidades autónomas del INE: https://www.ine.es/jaxiT3/Datos.htm?t=2853#!tabs-tabla
131

132

133
### Italia
numeroteca's avatar
numeroteca committed
134 135 136

Italian data: https://github.com/pcm-dpc/COVID-19

137
### France
numeroteca's avatar
numeroteca committed
138

139 140
Warning: this section needs update.

141
Official data from national government: https://www.data.gouv.fr/fr/datasets/cas-confirmes-dinfection-au-covid-19-par-region/
142

143
The main source for french data is this repo: https://github.com/opencovid19-fr/data
numeroteca's avatar
numeroteca committed
144

145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
The repo gathers data from national and regional administrations and unifies it. All the structured data is disponible in two formats:

 * CSV. Direct link to CSV file: https://github.com/opencovid19-fr/data/blob/master/dist/chiffres-cles.csv
 * JSON. Direct link to JSON file: https://github.com/opencovid19-fr/data/blob/master/dist/chiffres-cles.json

This is the structure:

| Columns       | Description                                                |
|---------------|------------------------------------------------------------|
| date          | date                                                       |
| granularite   | disaggregation level                                       |
| maille_code   | code of the state, region or country (just FRA for France) |
| maille_nom    | name of the state, region or country (just France)         |
| cas_confirmes | registered cases                                           |
| deces         | deceassed                                                  |
| reanimation   | intensive care hospitalizations                            |
| hospitalises  | hospitalizations                                           |
| gueris        | recovered                                                  |
| depistes      | number of discovered cases\*                               |
| source_nom    | name of the source                                         |
| source_url    | URL of the source                                          |
| source_type   | type of source                                             |
167

168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
\* don't know the different with registered cases. this column is empty except for Polynésie et Nouvelle Calédonie 

#### Regiones (Régions)

Regional data is in the main exit file linked above.

#### Provincias (Départements)

The data for départements is also in the main file. There is also (https://github.com/opencovid19-fr/data/tree/master/data-sources/sante-publique-france)[this dataset] to gather the data before to push it in the exit file.

This is the structure:

| Input | Output       | Description                        |                                                    |
|-------|--------------|------------------------------------|----------------------------------------------------|
| dep   | code & nom   | Code et nom du département (code and name of the state)                                 |
| sexe  | --           | Non repris en sortie (this data is not used in exit file)                               |
| jour  | date         | Date de la donnée (date)                                                                |
| hosp  | hospitalises | Nombre de personnes hospitalisées  (number of people hospitalized)                      |
| rea   | reanimation  | Nombre de personnes en réanimation (number of people hospitalized with intensive care)  |
| rad   | gueris       | Nombre de personnes guéries (number of recovered people)                                |
| dc    | deces        | Nombre de personnes décédées (number of deceassed people)                               |

Direct link to this dataset: https://github.com/opencovid19-fr/data/blob/master/data-sources/sante-publique-france/covid_hospit.csv

192
## Autoría | Authorship
numeroteca's avatar
numeroteca committed
193

numeroteca's avatar
numeroteca committed
194
Pablo Rey Mazón ([@numeroteca](https://twitter.com/numeroteca)) y Alfonso Sánchez Uzábal ([@skotperez](https://twitter.com/skotperez))desde [montera34.com](https://montera34.com).
numeroteca's avatar
numeroteca committed
195

numeroteca's avatar
numeroteca committed
196 197
Contact: covid19@montera34.com

198
En este post [Análisis de propagación de COVID-19 por comunidades autónomas en España](http://numeroteca.org/2020/03/12/covid19-comunidades-autonomas-espana/) recopilamos algunos resultados y reflexiones.
numeroteca's avatar
numeroteca committed
199

200 201 202

# This repository

numeroteca's avatar
numeroteca committed
203 204
This repository lives at https://code.montera34.com:4443/numeroteca/covid19 and is mirrored in [Github](https://github.com/numeroteca/covid19).

205
If you are using these scripts or data, please let us know at info@montera34.com