shape.Rd
This function shapes data for use in a dgirt or dgmrp model. Most
arguments give the name or names of key variables in the data. These
arguments end in _name
or _names
and should be character
vectors.
shape(item_data = NULL, item_names = NULL, time_name, geo_name, group_names = NULL, id_vars = NULL, time_filter = NULL, geo_filter = NULL, min_t_filter = 1L, min_survey_filter = 1L, survey_name = NULL, modifier_data = NULL, modifier_names = NULL, t1_modifier_names = NULL, standardize = TRUE, target_data = NULL, raking = NULL, max_raked_weight = NULL, weight_name = NULL, proportion_name = "proportion", aggregate_data = NULL, aggregate_item_names = NULL, constant_item = TRUE, ...)
item_data | A table in which items appear in columns and each row represents an individual's responses in some time period and local geographic area. |
---|---|
item_names | Item response variables. |
time_name | A time variable with numeric values. |
geo_name | A geographic variable representing local areas. |
group_names | Discrete grouping variables, usually demographic. Using numeric variables is allowed but not recommended. |
id_vars | Additional variables that should be included in the result, other than those specified elsewhere. |
time_filter | A numeric vector giving possible values of the time variable. Observed and unobserved time periods can be given. Defaults to observed values. |
geo_filter | A character vector giving values of the geographic variable. Defaults to observed values. |
min_t_filter | An integer minimum of time period appearances for included items. |
min_survey_filter | An integer minimum of survey appearances for included items. |
survey_name | A survey identifier. |
modifier_data | Table giving characteristics of local geographic areas in time periods. See details below. |
modifier_names | Variables giving modifiers of geographic hierarchical
parameters in |
t1_modifier_names | Variables to be used instead of those in
|
standardize | Whether to standardize the variables given by
|
target_data | A table giving population proportions for groups by local geographic area and time period. See details below. |
raking | A formula or list of formulas specifying the variables on which to rake survey weights. |
max_raked_weight | A maximum over which raked weights will be trimmed. Only applied after raking. To trim unraked weights, manipulate the input data directly. |
weight_name | A variable giving survey weights. |
proportion_name | The variable giving population proportions
for strata in |
aggregate_data | A table of trial and success counts by group and item. See details below. |
aggregate_item_names | A subset of values of the |
constant_item | Whether item difficulty parameters should be constant over time. |
... | Further arguments. |
An object of class dgirtIn
expected by dgirt
and
dgmrp
.
Individual-level data giving item responses is expected as argument
item_data
. Required arguments time_name
and geo_name
give the names of variables in item_data
that indicate time period and
local geographic area. Optional argument group_names
gives other
respondent characteristics to be modeled. item_data
is optional if
argument aggregate_data
is used. Note that the dgirt()
model
assumes consistent coding of the polarity of item responses for
identification.
Data for modeling geographic hierarchical parameters can be given with
argument modifier_data
, in which case argument modifier_names
is required and arguments t1_modifier_names
and standardize
are
optional.
shape()
aggregates the individual-level item response data given as
item_data
for modeling. Data already aggregated to the group level can
be provided with argument aggregate_data
.
The data given by aggregate_data
must be in a long table of trial and
success counts indexed by item, group, and time period. The variable names
given by arguments group_names
, geo_name
, andtime_name
should exist in aggregate_data
. Three fixed variable names must also
appear in aggregate_data
: item
giving item identifiers,
n_grp
giving counts of item-response trials, and s_grp
giving
counts of item-response successes. These counts should be adjusted
consistently with the transformations applied during the aggregation by
shape()
of the individual item_data
.
Use argument target_data
to adjust the weighting of groups toward
population targets via raking, using an adaptation of
rake
. To adjust existing survey weights in
item_data
, provide argument weight_name
. Otherwise,
observations in item_data
will be assigned equal starting weights.
Argument raking
defines strata. If you pass it a list of formulas like
list(~ x, ~ y)
, raking is first over x
, then over y
.
Given an additive formula like ~ x + y
, raking is over the
combinations of x
and y
. So, list(~ x, ~ y + z)
is first
over x
, then over y
-z
pairs. Argument
proportion_name
is optional.
For convenience, data in item_data
, modifier_data
,
aggregate_data
, and target_data
can be restricted (subsetted)
row-wise to the time periods given by argument time_filter
and the
local geographic areas given by argument geo_filter
.
Data can also be filtered column-wise to retain item variables that appear in
a minimum of time periods, using argument min_t_filter
, or a minimum
of surveys, with argument min_survey_filter
. Argument
survey_name
is required when filtering by survey.
If both row-wise and column-wise restrictions are specified, shape
iterates over them until they leave the data unchanged.
# model individual item responses shaped_responses <- shape(opinion, item_names = "abortion", time_name = "year", geo_name = "state", group_names = "race3")#>#>#>#>#>#> Items: #> [1] "abortion" #> Respondents: #> 144,250 in `item_data` #> Grouping variables: #> [1] "year" "state" "race3" #> Time periods: #> [1] 2006 2007 2008 2009 2010 #> Local geographic areas: #> [1] "AK" "AL" "AR" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "HI" "IA" "ID" "IL" #> [16] "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MS" "MT" "NC" "ND" "NE" #> [31] "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" #> [46] "VA" "VT" "WA" "WI" "WV" "WY" #> Hierarchical parameters: #> [1] "AL" "AR" "AZ" "CA" "CO" #> [6] "CT" "DC" "DE" "FL" "GA" #> [11] "HI" "IA" "ID" "IL" "IN" #> [16] "KS" "KY" "LA" "MA" "MD" #> [21] "ME" "MI" "MN" "MO" "MS" #> [26] "MT" "NC" "ND" "NE" "NH" #> [31] "NJ" "NM" "NV" "NY" "OH" #> [36] "OK" "OR" "PA" "RI" "SC" #> [41] "SD" "TN" "TX" "UT" "VA" #> [46] "VT" "WA" "WI" "WV" "WY" #> [51] "race3other" "race3white" #> Modifiers of hierarchical parameters: #> NULL #> Constants: #> Q T P N G H D #> 1 5 52 765 153 1 1# check sparseness of data to be modeled get_item_n(shaped_responses, by = "year")#> year abortion #> 1: 2006 33514 #> 2: 2007 9258 #> 3: 2008 32634 #> 4: 2009 13718 #> 5: 2010 55126