--- title: "Divvy up events with partitions" date: "`r format(Sys.Date(), '%d %B %Y')`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Divvy up events with partitions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r include=FALSE} library(diyar) ``` This vignette will introduce you to `partitions()`. `partitions()` provides an alternative approach to implementing case definitions. In summary, it uses specific temporal boundaries as the window of occurrence. This differs from `episodes()` where the boundaries are calculated as durations relative to index events. `partitions()` produces a similar `S4` class identifier (`pane`) referred to as panes and share similar arguments with `episodes()`. To demonstrate this difference, let's review the `homes` dataset below. It has data on household members including their ages. We'll attempt to apply a case definition to identify a three-generation home, where each generation includes individuals aged not more than 16 years apart. ```{r warning=FALSE} homes <- data.frame(member = c("son_1", "son_2", "daughter_1", "father", "mother", "grand_father", "grand_mother"), age = c(4, 6, 17, 43, 40, 74, 69)) homes ``` The simplest approach would be to specify the age bands for each generation. In this context, these are the temporal boundaries. ```{r warning=FALSE, fig.width=7, message=FALSE} age_bands <- seq(0, 69, by =17) age_bands <- number_line(age_bands, age_bands + 16) age_bands homes$grp_1 <- partitions(homes$age, window = list(age_bands), separate = TRUE) homes schema(homes$grp_1, seed = 4, custom_label = paste0(homes$member, " \n(", homes$age, " yrs)")) ``` However, we can make the case that the children are all part of the same generation since no two are older than 16 years apart. This presents the main difference between `partitions()` and `episodes()`. Unlike `episodes()`, the duration (age gaps) between records is not a factor. Here records or events are linked together simply because they exist within the same interval (age gap). To correct this, we can start the age band from age 6 but this becomes difficult to manage when analysing multiple homes. Instead, we can use the `by` or `lenght.out` argument to create windows (`window`) relative to the first event (or `custom_sort`) only. Although this makes it more like `episodes()`, it is still different since all age gaps are relative to only one reference event `(I)`. ```{r warning=FALSE, fig.width=7} homes$grp_2 <- partitions(homes$age, by = 16, separate = TRUE) schema(homes$grp_2, seed = 4, custom_label = paste0(homes$member, " \n(", homes$age, " yrs)")) ``` Now that we have identified the generations, we can build on this by linking every record on the conditions that there's a specified number of generations (windows). Below we ask for three to four generations. ```{r warning=FALSE, fig.width=7} homes$grp_3 <- partitions(homes$age, by = 16, separate = FALSE, windows_total = number_line(3, 4)) homes schema(homes$grp_3, seed = 4, custom_label = paste0(homes$member, " \n(", homes$age, " yrs)")) ``` Despite the use of `by` and `length.out`, if the configurations of records relative to the index record changes, the resulting identifier can change as well. For example, if the `"mother"` and `"father"` were five years younger, this would place them in two different age gaps, resulting in a total of four generations. ```{r warning=FALSE, fig.width=7} homes$alt_age <- homes$age lgk <- homes$member %in% c("father", "mother") homes$alt_age[lgk] <- homes$alt_age[lgk] - 5 homes$grp_4 <- partitions(homes$alt_age, by = 16, separate = TRUE, windows_total = number_line(3, 4)) homes schema(homes$grp_4, seed = 4, custom_label = paste0(homes$member, " \n(", homes$alt_age, " yrs)")) ``` This makes a difference if our conditions changes to only three generations as the condition for our three-generation households. ```{r warning=FALSE, fig.width=7} homes$grp_5 <- partitions(homes$alt_age, by = 16, separate = FALSE, windows_total = number_line(3, 3)) homes schema(homes$grp_5, seed = 4, custom_label = paste0(homes$member, " \n(", homes$alt_age, " yrs)")) ``` We see that the household no longer has a common identifier that would identify it as a three-generation household. If we wish to address this, then `episodes()` would be the better option. ```{r warning=FALSE, fig.width=7} homes$grp_6 <- episodes(homes$alt_age, case_length = 16) homes schema(homes$grp_6, seed = 4, show_labels = c("length_arrow", "length"), custom_label = paste0(homes$member, " \n(", homes$alt_age, " yrs)")) ``` Unlike `partitions()`, additional analyses is required to flag the whole household as a three-generation household. For example, we can count the number of "occurrences" ( age gaps in `epid` talk). ```{r warning=FALSE, fig.width=7} as.data.frame(homes$grp_6) homes$t3_home <- length(unique(homes$grp_6@wind_id[[1]])) == 3 homes ``` Similar to `episodes()`, everything we've discussed above can be done separately for different subsets of the dataset by using the `strata` argument. For example, different households. ```{r warning=FALSE, fig.width=7} duplicate <- rbind(homes[1:2], homes[1:2]) duplicate$house_hold <- c(rep("london", 7), rep("hull", 7)) duplicate$grp_1 <- partitions(duplicate$age, by = 16, separate = FALSE, windows_total = number_line(3, 4), strata = duplicate$house_hold) duplicate$grp_2 <- episodes(duplicate$age, case_length = 16, strata = duplicate$house_hold) ``` ```{r warning=FALSE, fig.width=7, fig.height=8} duplicate schema(duplicate$grp_1, seed = 5, custom_label = paste0(duplicate$member, " (", duplicate$age, " yrs) in \n", duplicate$house_hold)) ``` ```{r warning=FALSE, fig.width=7, fig.height=7} schema(duplicate$grp_2, seed = 4, show_labels = c("length_arrow", "length"), custom_label = paste0(duplicate$member, " (", duplicate$age, " yrs) in \n", duplicate$house_hold)) ```