This function constructs time series of counts for Germany's municipalities (Gemeinden) and districts (Kreise).
xwalk_ags(
data,
ags,
time,
xwalk,
variables = NULL,
strata = NULL,
weight = NULL,
fuzzy_time = FALSE,
verbose = TRUE
)
A data frame or a data frame extension (e.g. a tibble).
Name of the character variable (quoted) with municipality AGS (Gemeinden, 8 digits) or district AGS (Kreise, 5 digits).
Name of the variable (quoted) identifying the year (YYYY format). Values will be coerced to integers.
Name of the crosswalk. The following crosswalks are available:
xd19
, xd20
for district-level data
between 1990-2019/2020.
xm19
, xm20
for municipality-level
data between 1990-2019/2020.
Either a vector of names (quoted) for
variables to interpolate or NULL
to disable interpolation and
return the data matched with the xwalk
.
Vector of variable names (quoted) or NULL
. See
details.
Name of the interpolation weight or NULL
.
The following are available:
pop
: Population weights.
size
: Area weights.
emp
: Weights based on the number of employees (1998 onwards).
If FALSE
the crosswalk and the data
are matched exactly by ags
and time
. If TRUE
they are matched exactly by ags
and as best as possible on
time
. See details below.
If TRUE
the function outputs information on
the number of matched and unmatched rows.
If interpolation is requested, the crosswalked and interpolated
data are returned. If interpolation is not requested, the data
matched
with the crosswalk are returned. The following variables are added:
row_id
row number of data
before matching.
ags[*]
the crosswalked AGS.
year_xw
the matched year from the crosswalk.
[*]_conv
the interpolation weight.
diff
the absolute difference between year_xw
and time
.
This function facilitates the use of crosswalks constructed by the BBSR for municipalities and districts in Germany (Milbert 2010). The crosswalks map one year's set of district/municipality identifiers to later year's identifiers and provide weights to perform area or population weighted interpolation.
All data rows with NA
s in either the ags
or time
variable are excluded. The same applies to all rows with a value in
ags
or time
that never appears in the crosswalk.
Fuzzy matching uses the absolute difference between the year reported in the data and a crosswalk year. If there is a tie, crosswalk years from before the year reported in the data are preferred.
If area or population weighted interpolation is requested (i.e., when
variables
are supplied), the combination of the variables set
in ags
, time
and strata
need to uniquely
identify a row in data
.
Caution: Data from https://www.regionalstatistik.de/ sometimes includes
annual values for merged units (e.g., Städteregion Aachen, 05334)) and
for their former parts (Kreis Aachen, 05354 and Stadt Aachen, 05313).
When such data is crosswalked with fuzzy_time=TRUE
and
interpolated, the final counts will be off by approximately factor 2.
The reason is that the final output is the sum of the interpolated counts
for the parts and the measured count of the merged unit.
Milbert, Antonia. 2010. "Gebietsreformen–politische Entscheidungen und Folgen für die Statistik." BBSR-Berichte kompakt 6/2010. Bundesinsitut für Bau-, Stadt-und Raumfoschung.
data(btw_sn)
btw_sn_ags20 <- xwalk_ags(
data = btw_sn,
ags = "district",
time = "year",
xwalk = "xd20",
variables = c("voters", "valid"),
weight = "pop"
)
#>
#> Total number of obs: 155
#>
#> Excluded obs:
#> id/time NA AGS unk Year unk
#> 0 0 0
#>
#> Matched obs:
#> exact fuzzy
#> 126 NA
#>
#> Unmatched obs: 29
#>
head(btw_sn_ags20)
#> # A tibble: 6 × 4
#> # Groups: year [1]
#> year ags20 voters valid
#> <dbl> <chr> <dbl> <dbl>
#> 1 1998 14511 234333. 190286.
#> 2 1998 14521 344668 284724
#> 3 1998 14522 297625. 242744.
#> 4 1998 14523 228689 181538
#> 5 1998 14524 306745. 247699.
#> 6 1998 14612 402716. 328183.