abortion_attitudes.Rmd
This vignette demonstrates estimation of public attitudes toward abortion from responses to a single survey item, using the dynamic multi-level regression and post-stratification (MRP) model implemented in dgmrp()
.
shape()
prepares input data for use with the modeling functions dgirt()
and dgmrp()
. Here we use the included opinion
dataset.
dgirt_in_abortion <- shape(opinion, item_names = "abortion", time_name = "year",
geo_name = "state", group_names = "race3", geo_filter = c("CA", "GA", "LA",
"MA"), id_vars = "source")
#> Applying restrictions, pass 1...
#> Dropped 5 rows for missingness in covariates
#> Dropped 633 rows for lacking item responses
#> Applying restrictions, pass 2...
#> No changes
In this call to shape()
we specified:
abortion
);year
), since dgo models are dynamic;state
and race3
), because dgo models are group-level.Notice that we named only one of these variables defining respondent groups using the group_names
argument. The geo_name
argument always takes the variable giving respondents’ local geographic area; it will be modeled differently.
Using the argument geo_filter
, we subset the input data to the given values of the geo_name
variable. And with the id_vars
argument, we named an identfier that we’d like to keep in the processed data. (Other unused variables will be dropped.)
summary()
gives a high-level description of the result.
summary(dgirt_in_abortion)
#> Items:
#> [1] "abortion"
#> Respondents:
#> 23,007 in `item_data`
#> Grouping variables:
#> [1] "year" "state" "race3"
#> Time periods:
#> [1] 2006 2007 2008 2009 2010
#> Local geographic areas:
#> [1] "CA" "GA" "LA" "MA"
#> Hierarchical parameters:
#> [1] "GA" "LA" "MA" "race3other" "race3white"
#> Modifiers of hierarchical parameters:
#> NULL
#> Constants:
#> Q T P N G H D
#> 1 5 5 60 12 1 1
get_n()
and get_item_n()
give response counts.
dgmrp()
fits a dynamic multi-level regression and post-stratification (MRP) model to data processed by shape()
. Here, we’ll use it to estimate public attitudes toward abortion over time, for the groups defined by state
and race3
. (Specifically, by their Cartesian product.)
Under the hood, dgmrp()
uses RStan for MCMC sampling, and arguments can be passed to RStan’s stan()
via the ...
argument of dgmrp()
. This is almost always desirable. Here, we specify the number of sampler iterations, chains, and cores.
The model results are held in a dgmrp_fit
object. Methods from RStan like extract()
are available if needed because dgmrp_fit
is a subclass of stanfit
. But dgo provides its own methods for typical post-estimation tasks.
For a high-level summary of the result, use summary()
.
summary(dgmrp_out_abortion)
#> dgirt samples from 4 chains of 1500 iterations, 750 warmup, thinned every 1
#> Drawn Mon Jul 16 21:11:34 2018
#> Package version 0.3.0
#> Model version 2017_01_04_singleissue
#> 117 parameters; 60 theta_bars (year state race3)
#> 5 periods 2006 to 2010
#>
#> n_eff
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 3.83 334.38 797.44 1197.63 2004.03 3000.00
#>
#> Rhat
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.9993 1.0013 1.0033 1.0107 1.0136 1.3423
#>
#> Elapsed time
#> chain warmup sample total
#> 1: 1 9S 9S 18S
#> 2: 2 11S 7S 18S
#> 3: 3 9S 9S 18S
#> 4: 4 12S 8S 20S
To apply scalar functions to posterior samples, use summarize()
. The default output gives summary statistics for the model’s theta_bar
parameters, which represent group means. These are indexed by time (year
) and group, where groups are again defined by local geographic area (state
) and any other respondent characteristics (race3
).
head(summarize(dgmrp_out_abortion))
#> param state race3 year mean sd median q_025
#> 1: theta_bar CA black 2006 0.7750365 0.02147546 0.7754757 0.7313382
#> 2: theta_bar CA black 2007 0.7973074 0.02879764 0.7965672 0.7430051
#> 3: theta_bar CA black 2008 0.7241110 0.02386927 0.7240320 0.6772661
#> 4: theta_bar CA black 2009 0.6866352 0.02207097 0.6874647 0.6395651
#> 5: theta_bar CA black 2010 0.7404908 0.01661350 0.7411805 0.7066591
#> 6: theta_bar CA other 2006 0.7346545 0.02353331 0.7347746 0.6880095
#> q_975
#> 1: 0.8150533
#> 2: 0.8537069
#> 3: 0.7701108
#> 4: 0.7282560
#> 5: 0.7703233
#> 6: 0.7796612
Alternatively, summarize()
can apply arbitrary functions to posterior samples for whatever parameter is given by its pars
argument.
summarize(dgmrp_out_abortion, pars = "xi", funs = "var")
#> param year var
#> 1: xi 2006 0.01793388
#> 2: xi 2007 0.05022481
#> 3: xi 2008 0.05459178
#> 4: xi 2009 0.04633517
#> 5: xi 2010 0.04034133
To access posterior samples in tabular form use as.data.frame()
. By default, this method returns post-warmup samples for the theta_bar
parameters, but like other methods takes a pars
argument.
head(as.data.frame(dgmrp_out_abortion))
#> param state race3 year iteration value
#> 1: theta_bar CA black 2006 1 0.7866555
#> 2: theta_bar CA black 2006 2 0.7356818
#> 3: theta_bar CA black 2006 3 0.7688791
#> 4: theta_bar CA black 2006 4 0.8016083
#> 5: theta_bar CA black 2006 5 0.7893962
#> 6: theta_bar CA black 2006 6 0.7904047
To poststratify the results use poststratify()
. Here, we use the group population proportions bundled as annual_state_race_targets
to reweight and aggregate estimates to strata defined by state-years.
poststratify(dgmrp_out_abortion, annual_state_race_targets, strata_names =
c("state", "year"), aggregated_names = "race3")
#> state year iteration value
#> 1: CA 2006 1 0.7242562
#> 2: CA 2006 2 0.7055358
#> 3: CA 2006 3 0.7156720
#> 4: CA 2006 4 0.7197653
#> 5: CA 2006 5 0.7169266
#> ---
#> 59996: MA 2010 2996 0.6823029
#> 59997: MA 2010 2997 0.7185541
#> 59998: MA 2010 2998 0.6714349
#> 59999: MA 2010 2999 0.7143973
#> 60000: MA 2010 3000 0.6911097
To plot the results use dgirt_plot()
. This method plots summaries of posterior samples by time period. By default, it shows a 95% credible interval around posterior medians for the theta_bar
parameters, for each local geographic area. Here we omit the CIs.
dgirt_plot()
can also plot the data.frame
output from poststratify()
, given arguments that identify the relevant variables. Below, we aggregate over the demographic grouping variable race3
, resulting in a data.frame
of estimates by state-year.
ps <- poststratify(dgmrp_out_abortion, annual_state_race_targets, strata_names =
c("state", "year"), aggregated_names = "race3")
head(ps)
#> state year iteration value
#> 1: CA 2006 1 0.7242562
#> 2: CA 2006 2 0.7055358
#> 3: CA 2006 3 0.7156720
#> 4: CA 2006 4 0.7197653
#> 5: CA 2006 5 0.7169266
#> 6: CA 2006 6 0.7088478
dgirt_plot(ps, group_names = NULL, time_name = "year", geo_name = "state")
In the call to dgirt_plot()
, we passed the names of the state
and year
variables. The group_names
argument was then NULL
, because there were no grouping variables left after we aggregated over race3
.