tabulate occurrences and exposures
occ.exp.RdGiven a variable indicating when an event happened,
a time window we are interested in, and possibly a set
of covariates, tabulate counts of event occurences and
exposures in the given time interval.
Note that you have to be careful about observations that
don't experience an event, but still count for exposure;
see the example below.
Usage
occ.exp(
data,
start.obs,
end.obs,
event,
age.groups,
time.periods,
id.var = NULL,
covars = NULL,
age.offsets = NULL,
time.offsets = NULL,
weights = NULL,
discretize = FALSE,
exp.scale = 1
)Arguments
- data
the dataset containing events
- start.obs
vector of values (one per row of
data) with the starting point of the observation window for each row, in CMC format- end.obs
vector of values (one per row of
data) with the ending point of the observation window for each row, in CMC format- event
the column of the dataset that indicates the date of an event. observations that contribute exposure but no events should have this set to a value that will never occur in the time period; for example, -1
- age.groups
an age.groups object
- time.periods
a time.periods object
- id.var
the variable giving the unique rows of the dataset for each individual (UNDER DEVELOPMENT)
- covars
the name of covariates over which occurrences and exposures should be aggregated; defaults to NULL, meaning totals are computed over the entire dataset
- age.offsets
if not NULL, then the age.periods are to be interpreted relative to these times (one for each row). this is usually a birth date
- time.offsets
if not NULL, then the time.periods are to be interpreted relative to these times (one for each row). useful for computing quantities like "X months before interview", where interview happened at different times for different respondents
- weights
the weight to apply to occurrences and exposures; defaults to 1
- discretize
if TRUE, turn reported amounts of exposure into a 0/1 exposure or no exposure; 0 exposure when < half of the time interval was spent in the cell and 1 exposure when >= half of the time interval was spent in the cell; thus, each sib can contribute to exposure in one and only one cell
- exp.scale
amount by which to scale exposure; if, for example, dates are measured in months, but you want to measure rates in years, then this should be 1/12. It defaults to 1
Details
TODO
write unit tests
fill an example in the documentation below
at start of code, handle defaults more elegantly
id.var is not implemented; might be better to directly handle multiple events (see below)
handle multiple time periods
handle multiple events
handle GK weights (possibly not in this function)
possibly refactor in the future to pass in dataset of lifelines and separate list of event dates
what if event date is missing?
what about things that vary with the event, eg mother's age when child was born?
better description of dates; this was developed using CMC codes from the DHS surveys, but it should work for any interval scale
Examples
if (FALSE) { # \dontrun{
## THESE EXAMPLES ARE NOT UPDATED!
## Please disregard for the time being...
## RECODE so that observations w/ no births show up
## in the dataset at least once by giving them a first
## birth at the (impossible) CMC code of -1. This ensures
## that they never contribute a birth, but that they
## still count for exposure.
##
## NB: this is a key step. if we don't do this,
## women who haven't had any births are removed
## from the dataset, biasing rates upward...
bdata.coded <- bdata
bdata.coded$bdate[ is.na(bdata.coded$bdate) &
bdata.coded$bnum == "01" ] <- -1
bdata.coded <- subset(bdata.coded, ! is.na(bdate) )
## NO COVARIATES:
## now use compute.occ.exp to get counts of
## births and exposure between 1980 and 1990
## for ages 0 to 60
## TODO -- need to write this example using new
## version
## WITH COVARIATES:
## use compute.occ.exp to get counts of
## births and exposure by 5-year period
## between 1970 and 2005,
## for 5-year age groups [0,5), ..., [60,65)
## by the covariates
## urban, highestedlevel, and religion
## (NOTE: this is just illustrative. we wouldn't recommend
## substantively interpreting the results of this example.)
## TODO -- need to write this example using new
## version
} # }