diff --git a/README.md b/README.md index 553e549..7b1472e 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,17 @@ # Dominance Analysis ## A Stata Implementaion -Dominance analysis (DA) determines the relative importance of independent variables in an estimation model based on contribution to an overall model fit statistic (see Grömping, 2007 for a discussion). DA is an ensemble method in which importance determinations about independent variables are made by aggregating fit metrics across multiple models, though the method usually requires the ensemble contain each possible combination of the independent variables in the full model. +Dominance analysis (DA) determines the relative importance of independent variables in an estimation model based on contribution to +an overall model fit statistic (see Grömping, 2007 for a discussion). DA is an ensemble method in which importance determinations +about independent variables are made by aggregating fit metrics across multiple models, though the method usually requires the +ensemble contain each possible combination of the independent variables in the full model. -The all possible combinations ensemble with _p_ independent variables in the full model results in $2^{p-1}$ models and fit statistics estimated. That is, each combination of _p_ variables alternating between included versus excluded (i.e., the 2 base to the exponent) where the constant(s)-only model is omitted (i.e., the -1 representing the distinct combination where no independent variables are included; see Budescu, 1993). +The all possible combinations ensemble with _p_ independent variables in the full model results in $2^{p}$ models and fit statistics +estimated. That is, each combination of _p_ variables alternating between included versus excluded (see Budescu, 1993). -`domin` is implemented as a flexible wrapper command that can be used with most Stata estimation commands that follow the standard `depvar indepvars` format and return a scalar-valued fit statistic; commands that do not either follow this format or do not return a scalar-valued fit statistic can be accommodated with a sub-wrapper command (an example of such a command is included below). +`domin` is implemented as a flexible wrapper command that can be used with most Stata estimation commands that follow the +standard `depvar indepvars` format and return a scalar-valued fit statistic; commands that do not either follow this format +or do not return a scalar-valued fit statistic can be accommodated with a sub-wrapper command (an example of such a command is included below). Some examples of the command as applied to Stata estimation commands are shown below after the discussion of installation. @@ -16,9 +22,9 @@ To install `domin` type: `ssc install domin` -In the Stata Command console window. `domin` is supported from Stata version `12.1`. +In the Stata Command console window. `domin`, `domme`, and all wrapper programs are supported from Stata version `15`. -Note that `domin` requires the SSC package `moremata` and will ask to install this package if it is not available. +Note that the `domme` sub-modle requires the SSC package `moremata` and will ask to install this package if it is not available. ## Updating @@ -28,887 +34,8 @@ To update `domin` once installed type: In the Stata Command console window. -# Extensive Introductory Example +# Extensive Examples -The example below focuses on a simpler predictive model but discusses conceptual elements of DA more extensively. +Please see `domin`'s wiki page for examples of the method in use. -## Simple Linear Regression-based DA - -The default analysis for `domin` is `regress` with `fitstat(e(r2))` and these options do not need to be typed (though if they are not, `domin` will throw a warning; see below). The results of the analysis are shown as below including general, conditional, and complete dominance results as well as the strongest dominance designations. - -### General Dominance Statistics - -``` -. webuse auto -(1978 Automobile Data) - -. domin price mpg rep78 headroom -Regression type not entered in reg(). -reg(regress) assumed. - -Fitstat type not entered in fitstat(). -fitstat(e(r2)) assumed. - - -Total of 7 sub-models - -General dominance statistics: Linear regression -Number of obs = 69 -Overall Fit Statistic = 0.2575 - - | Dominance Standardized Ranking - price | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - mpg | 0.2262 0.8787 1 - rep78 | 0.0218 0.0847 2 - headroom | 0.0094 0.0366 3 -------------------------------------------------------------------------------------- - --more- -``` - -General dominance statistics are the most commonly reported and easiest to interpret of the dominance statistics. General dominance statistics are derived as the weighted average marginal/incremental contribution to the overall fit statistic an independent variable makes across all models in which the independent variable is included. For example, _rep78_ has a value of **0.0218** which means, on average, _rep78_ results in an increment to the R square of about two percentage points when it is included in the model. - -In addition, the general dominance statistics can be compared to one another to determine which variables generally dominate others. Here, _rep78_'s larger general dominance statistic than _headroom_ indicates that it "generally dominates"--and is thus more important than--_headroom_. The only time a general dominance designation cannot be made between two independent variables is when they are exactly equal. - -General dominance statistics are useful in that they distill the entire ensemble of models estimated into a single value for each independent variable, which is why they are easiest to interpret. An additional useful property of the general dominance statistics is that they are an additive decomposition of the fit statistic associated with the full model (i.e., the general dominance statistics can be summed to obtain the value of the full model's fit statistic) and equivalent to Shapley values (see `findit shapley`). - -Finally, general dominance statistics for an independent variable are the arithmetic average of that independent variable's conditional dominance statistics discussed next. - -### Conditional Dominance Statistics - -``` -Conditional dominance statistics -------------------------------------------------------------------------------------- - - #indepvars: #indepvars: #indepvars: - 1 2 3 - mpg 0.2079 0.2262 0.2445 - rep78 0.0000 0.0218 0.0436 -headroom 0.0124 0.0094 0.0065 -------------------------------------------------------------------------------------- - --more- -``` -Conditional dominance statistics are computed as the average incremental contributions to the overall model fit statistic an independent variable makes within a single "order" of models--where "order" refers to a distinct number of independent variables in the estimation model. In the example above, order one is `#indepvars:1` and refers to the incremental contribution the independent variable makes by itself. Order two is `#indepvars:2` and refers to the average incremental contribution the independent variable makes beyond a single other independent variable. Finally, order three is `#indepvars:3` and refers to the incremental contribution the independent variable makes beyond the other two independent variables. - -Each independent variable will have _p_ different conditional dominance statistics. - -Conditional dominance statistics, considered as a set, provide more information about each independent variable than general dominance statistics. -Specifically, conditional dominance statistics show the effect of variable redundancy, collinearity, and suppression effects as the user can see how the inclusion of any independent variable is, on average, affected by the inclusion of other independent variables. In the above conditional dominance matrix, observe the difference between the patterns of results for _rep78_ and _headroom_. _rep78_ shows suppression-like effects in that it grows in predictive usefulness with more independent variables included in the model. By contrast, _headroom_ shows the opposite pattern shrinking in importance with more included independent variables. - -As with general dominance statistics, conditional dominance statistics can be compared to determine which conditionally dominates others. Consider _mpg_. It has larger conditional dominance statistics than independent variable _rep78_ across all three orders and thus conditionally dominates _rep78_. Conditional dominance is then determined for an independent variable over another when its conditional dominance statistics at each order are larger than another the other independent variable's statistics across all _p_ orders. If, at any order, the conditional dominance statistics for two independent variables are equal or there is a change rank order no conditional dominance designation can be made between those independent variables. -Because it involves comparing more statistics to one another, conditional dominance is a more stringent importance criterion than general dominance. - -Conditional dominance implies general dominance as well, but the reverse is not true. An independent variable can generally dominate another, but not conditionally dominate it. For instance, _rep78_ generally dominates, but does not conditionally dominate, _headroom_. - -### Complete Dominance Designations -``` -Complete dominance designation -------------------------------------------------------------------------------------- - - dominated?: dominated?: dominated?: - mpg rep78 headroom - dominates?:mpg 0 1 1 - dominates?:rep78 -1 0 0 -dominates?:headroom -1 0 0 -------------------------------------------------------------------------------------- - --more- -``` -Complete dominance designations are made by comparing all possible incremental contributions to model fit for two independent variables. Complete dominance is the strongest evidence of importance across dominance designations as it is the most stringent comparison, requiring that an independent variable **always** have a larger increment to model fit across all, 2^(p-2)^ comparable individual models in the ensemble. This metric is stringent as it is non-compensatory; it is not possible for some better incremental contributions to compensate for other, worse ones. - -By comparison to general and conditional dominance designations, the complete dominance designation has no natural statistic on which to report.That said, `domin` returns a complete dominance matrix which reads from the left to right. Thus, a value of **1** means that the independent variable in the row completely dominates the independent variable in the column. Conversely, a value of **-1** means the opposite, that the independent variable in the row is completely dominated by the independent variable in the column. A **0** value means no complete dominance designation could be made as the comparison independent variables' incremental contributions differ in relative magnitude from model to model. - -the example above shows that, _mpg_ has a larger incremental contribution to model fit than _headroom_ across all comparable models and completely dominates it. By contrast, _headroom_ and _rep78_ show a more complex relationship (as was seen in their conditional dominance statistics) and no conditional dominance designation can be made.Complete dominance implies both general and conditional dominance, but, again, the reverse is not true. Failure to obtain conditional dominance also implies failure to obtain complete dominance. - -### Strongest Dominance Designtions - -``` -Strongest dominance designations - -mpg completely dominates rep78 -mpg completely dominates headroom -rep78 generally dominates headroom -``` -Finally, if all three dominance statistics are reported (i.e., `noconditional` and `nocomplete` options are not used), a _strongest dominance designations_ list is reported. The strongest dominance designations list reports the strongest dominance designation between all pairwise, independent variable comparisons. - -# Brief Introductory Examples - -The examples below focuses expand to discuss more complex predictive models and Stata code but somewhat less regarding conceptual implications. - -## Ordered Logistic Regression-based DA - -A model like `ologit` is easy to accommodate in `domin` like below. - -``` -. domin rep78 trunk weight length, reg(ologit) fitstat(e(r2_p)) all(turn) - -Total of 7 sub-models - -General dominance statistics: Ordered logistic regression -Number of obs = 69 -Overall Fit Statistic = 0.1209 -All Subsets Fit Stat. = 0.1003 - - | Dominance Standardized Ranking - rep78 | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - trunk | 0.0082 0.0680 2 - weight | 0.0021 0.0174 3 - length | 0.0102 0.0845 1 -------------------------------------------------------------------------------------- -Conditional dominance statistics -------------------------------------------------------------------------------------- - - #indepvars: #indepvars: #indepvars: - 1 2 3 - trunk 0.0131 0.0077 0.0039 -weight 0.0027 0.0016 0.0020 -length 0.0139 0.0097 0.0070 -------------------------------------------------------------------------------------- -Complete dominance designation -------------------------------------------------------------------------------------- - - dominated?: dominated?: dominated?: - trunk weight length - dominates?:trunk 0 1 -1 -dominates?:weight -1 0 -1 -dominates?:length 1 1 0 -------------------------------------------------------------------------------------- - -Strongest dominance designations - -length completely dominates trunk -trunk completely dominates weight -length completely dominates weight - -Variables included in all subsets: turn -``` - -As compared to the `regress`-based DA reported in the first example, this example includes a covariate that is controlled for across all model subsets. - -The `All Subsets Fit Stat. = 0.1003` result represents the amount of the McFadden pseudo-R-square that is associated with the variable in all subsets (i.e., in `all()`). - -Note that the dominance statistics reported are now residual-ized and reflect the removal of the all fitstats fit statistic. Variables included in all subsets are also reported at the end of the results display (e.g., `Variables included in all subsets: turn`). - -## Logistic Regression-based DA - -`logit` is another command that can be accommodated in `domin`. In this example, _rep78_ is used as a factor variable. - -``` -. domin foreign trunk weight, reg(logit) fitstat(e(r2_p)) sets((i.rep78)) - -Total of 7 sub-models - -General dominance statistics: Logistic regression -Number of obs = 59 -Overall Fit Statistic = 0.5229 - - | Dominance Standardized Ranking - foreign | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - trunk | 0.0834 0.1595 3 - weight | 0.2611 0.4994 1 - set1 | 0.1783 0.3411 2 -------------------------------------------------------------------------------------- -Conditional dominance statistics -------------------------------------------------------------------------------------- - - #indepvars: #indepvars: #indepvars: - 1 2 3 - trunk 0.2002 0.0495 0.0005 -weight 0.4154 0.2272 0.1407 - set1 0.2855 0.1444 0.1051 -------------------------------------------------------------------------------------- -Complete dominance designation -------------------------------------------------------------------------------------- - - dominated?: dominated?: dominated?: - trunk weight set1 - dominates?:trunk 0 -1 -1 -dominates?:weight 1 0 1 - dominates?:set1 1 -1 0 -------------------------------------------------------------------------------------- - -Strongest dominance designations - -weight completely dominates trunk -set1 completely dominates trunk -weight completely dominates set1 - -Variables in set1: i.rep78 -``` - -The `sets()` option incorporates independent variables as inseparable sets that are considered *an* independent variable in the DA. The label for each set is denoted sequentially as they are included (e.g., _set1_). In addition, variables in sets are included near the end of the output (i.e., `Variables in set1: i.rep78`). - -# Intermediate Examples - -These examples outline more complex applications of DA to Stata models. - -## Linear Regression-based DA with Non-additive and Non-linear Effects - -The example below outlines a process in which two quadratic effects and a product are residual-ized and used as independent variables in a linear regression-based DA to determine their importance. - -The process involves residual-izing the squared and product terms and using them, as is, in the DA. - -``` -. generate mpg2 = mpg^2 - -. generate headr2 = headroom^2 - -. generate mpg_headr = mpg*headroom - -. regress mpg2 mpg - - Source | SS df MS Number of obs = 74 --------------+---------------------------------- F(1, 72) = 2425.89 - Model | 5640570.83 1 5640570.83 Prob > F = 0.0000 - Residual | 167411.003 72 2325.15282 R-squared = 0.9712 --------------+---------------------------------- Adj R-squared = 0.9708 - Total | 5807981.84 73 79561.395 Root MSE = 48.22 - ------------------------------------------------------------------------------- - mpg2 | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- - mpg | 48.04619 .9754908 49.25 0.000 46.10159 49.9908 - _cons | -536.6594 21.51824 -24.94 0.000 -579.5552 -493.7636 ------------------------------------------------------------------------------- - -. predict mpg2r, resid - -. regress headr2 headroom - - Source | SS df MS Number of obs = 74 --------------+---------------------------------- F(1, 72) = 3197.55 - Model | 1947.18924 1 1947.18924 Prob > F = 0.0000 - Residual | 43.8453928 72 .608963789 R-squared = 0.9780 --------------+---------------------------------- Adj R-squared = 0.9777 - Total | 1991.03463 73 27.274447 Root MSE = .78036 - ------------------------------------------------------------------------------- - headr2 | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- - headroom | 6.10485 .1079609 56.55 0.000 5.889633 6.320066 - _cons | -8.607759 .3356446 -25.65 0.000 -9.276855 -7.938664 ------------------------------------------------------------------------------- - -. predict headr2r, resid - -. regress mpg_headr mpg headroom - - Source | SS df MS Number of obs = 74 --------------+---------------------------------- F(2, 71) = 1102.79 - Model | 24303.2769 2 12151.6384 Prob > F = 0.0000 - Residual | 782.348117 71 11.0189876 R-squared = 0.9688 --------------+---------------------------------- Adj R-squared = 0.9679 - Total | 25085.625 73 343.638699 Root MSE = 3.3195 - ------------------------------------------------------------------------------- - mpg_headr | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- - mpg | 2.837136 .0737653 38.46 0.000 2.690052 2.98422 - headroom | 20.40514 .5044586 40.45 0.000 19.39927 21.411 - _cons | -59.75086 2.619192 -22.81 0.000 -64.97338 -54.52834 ------------------------------------------------------------------------------- - -. predict mpg_headrr, resid - -. domin price mpg headroom mpg2r headr2r mpg_headrr -Regression type not entered in reg(). -reg(regress) assumed. - -Fitstat type not entered in fitstat(). -fitstat(e(r2)) assumed. - - -Total of 31 sub-models - -Progress in running all sub-models -0%------50%------100% -.................... -General dominance statistics: Linear regression -Number of obs = 74 -Overall Fit Statistic = 0.3948 - - | Dominance Standardized Ranking - price | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - mpg | 0.2274 0.5761 1 - headroom | 0.0142 0.0361 4 - mpg2r | 0.1044 0.2644 2 - headr2r | 0.0388 0.0984 3 - mpg_headrr | 0.0099 0.0251 5 -------------------------------------------------------------------------------------- -Conditional dominance statistics -------------------------------------------------------------------------------------- - - #indepvars: #indepvars: #indepvars: #indepvars: #indepvars: - 1 2 3 4 5 - mpg 0.2196 0.2191 0.2236 0.2323 0.2425 - headroom 0.0131 0.0091 0.0101 0.0156 0.0233 - mpg2r 0.1203 0.1104 0.1029 0.0973 0.0910 - headr2r 0.0429 0.0435 0.0419 0.0376 0.0282 -mpg_headrr 0.0072 0.0109 0.0124 0.0118 0.0072 -------------------------------------------------------------------------------------- -Complete dominance designation -------------------------------------------------------------------------------------- - - dominated?: dominated?: dominated?: dominated?: dominated?: - mpg headroom mpg2r headr2r mpg_headrr - dominates?:mpg 0 1 1 1 1 - dominates?:headroom -1 0 -1 0 0 - dominates?:mpg2r -1 1 0 1 1 - dominates?:headr2r -1 0 -1 0 1 -dominates?:mpg_headrr -1 0 -1 -1 0 -------------------------------------------------------------------------------------- - -Strongest dominance designations - -mpg completely dominates headroom -mpg2r completely dominates headroom -mpg completely dominates mpg2r -mpg completely dominates headr2r -mpg2r completely dominates headr2r -mpg completely dominates mpg_headrr -mpg2r completely dominates mpg_headrr -headr2r completely dominates mpg_headrr -headr2r conditionally dominates headroom -headroom generally dominates mpg_headrr -``` - -In this case, the squared term for _mpg_ (i.e., _mpg2r_) was actually more important than most other terms in the model. This approach is based on the article by LeBreton, Tonidandel, and Krasikova (2013). - -As an alternative experimental approach, quadratic and interaction terms are grouped as a set with their main effects term to determine IV-based importance with the use of _sets_ as below. - -``` -. domin price, sets((mpg c.mpg#c.mpg c.mpg#c.headroom) (headroom c.headroom#c.headroom c.mpg#c.headroom)) -Regression type not entered in reg(). -reg(regress) assumed. - -Fitstat type not entered in fitstat(). -fitstat(e(r2)) assumed. - - -Total of 3 sub-models - - -General dominance statistics: Linear regression -Number of obs = 74 -Overall Fit Statistic = 0.3948 - - | Dominance Standardized Ranking - price | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - set1 | 0.2315 0.5864 1 - set2 | 0.1633 0.4136 2 -------------------------------------------------------------------------------------- -Conditional dominance statistics -------------------------------------------------------------------------------------- - - #indepvars: #indepvars: - 1 2 -set1 0.3664 0.0966 -set2 0.2982 0.0284 -------------------------------------------------------------------------------------- -Complete dominance designation -------------------------------------------------------------------------------------- - - dominated?: dominated?: - set1 set2 -dominates?:set1 0 1 -dominates?:set2 -1 0 -------------------------------------------------------------------------------------- - -Strongest dominance designations - -set1 completely dominates set2 - -Variables in set1: mpg c.mpg#c.mpg c.mpg#c.headroom -Variables in set2: headroom c.headroom#c.headroom c.mpg#c.headroom -``` - -As can be seen, the pattern of importance is starkly different from that of the individual models as some of _mpg_'s effect in the prior model is carrying over and being ascribed to _headroom_ given the interaction term. - -The sets approach partly simplifies the determination of importance as there are fewer terms to separate out, but also complicates it as it is harder for the model to effectively distinguish between the contributions of the interacting terms. Again, this approach is experimental and empirical research on it as an approach has yet to be conducted. - -## Linear Regression-based Relative Weights Analysis with Bootstrapped Standard Errors - -`domin` can also estimate relative weights analysis (RWA) using the `epsilon` option. RWA is a faster, approximation to DA and obviates the each subset regression by orthogonal-izing independent variables using singular value decomposition (see `matrix svd`). - -`epsilon`'s singular value decomposition approach is not equivalent to the all possible combinations ensemble approach of DA but is many fold faster for models with many independent variables and tends to produce similar answers regarding relative importance. - -DA and RWA can also be `boostrap`-ped to obtain standard errors and evaluate statistic sampling stability if desired. - -``` -. bootstrap, reps(500): domin price mpg headroom trunk turn gear_ratio foreign length weight, epsilon -(running domin on estimation sample) - -Bootstrap replications (500) -----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 -.................................................. 50 -.................................................. 100 -.................................................. 150 -.................................................. 200 -.................................................. 250 -.................................................. 300 -.................................................. 350 -.................................................. 400 -.................................................. 450 -.................................................. 500 - -General dominance statistics: Custom user analysis -Number of obs = 74 -Overall Fit Statistic = 0.5806 - - | Dominance Standardized Ranking - price | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - mpg | 0.0803 0.1382 4 - headroom | 0.0148 0.0255 8 - trunk | 0.0349 0.0601 7 - turn | 0.0533 0.0918 6 - gear_ratio | 0.0619 0.1067 5 - foreign | 0.0910 0.1567 2 - length | 0.0824 0.1420 3 - weight | 0.1620 0.2791 1 -------------------------------------------------------------------------------------- - -. estat bootstrap - -Dominance analysis Number of obs = 74 - Replications = 500 - ------------------------------------------------------------------------------- - | Observed Bootstrap - price | coefficient Bias std. err. [95% conf. interval] --------------+---------------------------------------------------------------- - mpg | .08025451 .0056983 .02608266 .0428729 .1343196 (BC) - headroom | .01480511 .005788 .00994976 .0045644 .0253761 (BC) - trunk | .03488086 .0052553 .01569048 .0146097 .0700839 (BC) - turn | .05328024 .0088603 .01346109 .0228937 .0711584 (BC) - gear_ratio | .06193881 .0102595 .02574597 .0342715 .1201322 (BC) - foreign | .09099184 .0039005 .03310621 .0428155 .1721853 (BC) - length | .08242763 .0060164 .01522731 .0471752 .1077507 (BC) - weight | .16203422 .0023967 .03416152 .0966866 .2320645 (BC) ------------------------------------------------------------------------------- -Key: BC: Bias-corrected -``` - -Note that other Stata post-estimation commands such as `test` can be used to evaluate differences between these statistics. - -## Multivariate Linear Regression-based DA - -`domin` comes packaged with a sub-wrapper program `mvdom` that allows for estimating DA statistics with multiple dependent variables. - -The model reflected in the `mvdom`-based DA below is: - -`mvreg price gear_ratio foreign length weight = mpg headroom trunk turn` - -``` -. domin price mpg headroom trunk turn, reg(mvdom, dvs(gear_ratio foreign length weight)) fitstat(e(r2)) - -Total of 15 sub-models - -General dominance statistics: Multivariate regression -Number of obs = 74 -Overall Fit Statistic = 0.8659 - - | Dominance Standardized Ranking - price | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - mpg | 0.2605 0.3008 2 - headroom | 0.0794 0.0917 4 - trunk | 0.1779 0.2055 3 - turn | 0.3482 0.4020 1 -------------------------------------------------------------------------------------- -Conditional dominance statistics -------------------------------------------------------------------------------------- - - #indepvars: #indepvars: #indepvars: #indepvars: - 1 2 3 4 - mpg 0.6798 0.2314 0.0987 0.0319 -headroom 0.2923 0.0191 0.0052 0.0009 - trunk 0.5379 0.1251 0.0339 0.0148 - turn 0.7887 0.3239 0.1782 0.1018 -------------------------------------------------------------------------------------- -Complete dominance designation -------------------------------------------------------------------------------------- - - dominated?: dominated?: dominated?: dominated?: - mpg headroom trunk turn - dominates?:mpg 0 1 1 -1 -dominates?:headroom -1 0 -1 -1 - dominates?:trunk -1 1 0 -1 - dominates?:turn 1 1 1 0 -------------------------------------------------------------------------------------- - -Strongest dominance designations - -turn completely dominates mpg -mpg completely dominates headroom -trunk completely dominates headroom -turn completely dominates headroom -mpg completely dominates trunk -turn completely dominates trunk -``` - -The dominance statistics above account for multiple dependent variables and are based on the approach described by Azen and Budescu (2006). An alternative fit metric, _P_xy_, is reported below for the same model. - -``` -. domin price mpg headroom trunk turn, reg(mvdom, dvs(gear_ratio foreign length weight) pxy) fitstat(e(r2)) - -Total of 15 sub-models - -General dominance statistics: Multivariate regression -Number of obs = 74 -Overall Fit Statistic = 0.2229 - - | Dominance Standardized Ranking - price | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - mpg | 0.0677 0.3039 2 - headroom | 0.0218 0.0980 4 - trunk | 0.0452 0.2028 3 - turn | 0.0881 0.3952 1 -------------------------------------------------------------------------------------- -Conditional dominance statistics -------------------------------------------------------------------------------------- - - #indepvars: #indepvars: #indepvars: #indepvars: - 1 2 3 4 - mpg 0.1360 0.0631 0.0412 0.0307 -headroom 0.0585 0.0123 0.0086 0.0079 - trunk 0.1076 0.0373 0.0199 0.0160 - turn 0.1577 0.0830 0.0606 0.0510 -------------------------------------------------------------------------------------- -Complete dominance designation -------------------------------------------------------------------------------------- - - dominated?: dominated?: dominated?: dominated?: - mpg headroom trunk turn - dominates?:mpg 0 1 1 -1 -dominates?:headroom -1 0 -1 -1 - dominates?:trunk -1 1 0 -1 - dominates?:turn 1 1 1 0 -------------------------------------------------------------------------------------- - -Strongest dominance designations - -turn completely dominates mpg -mpg completely dominates headroom -trunk completely dominates headroom -turn completely dominates headroom -mpg completely dominates trunk -turn completely dominates trunk -``` - -And a final model uses the _epsilon_ estimator which uses the _P_xy_ metric. - -``` -. domin price mpg headroom trunk turn, reg(mvdom, dvs(gear_ratio foreign length weight)) epsilon - -General dominance statistics: Epsilon-based mvdom -Number of obs = 74 -Overall Fit Statistic = 0.2229 - - | Dominance Standardized Ranking - price | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - mpg | 0.0681 0.3056 2 - headroom | 0.0229 0.1030 4 - trunk | 0.0452 0.2028 3 - turn | 0.0866 0.3886 1 -------------------------------------------------------------------------------------- -``` - -## Gamma regression-based DA - -`domin` allows for any scalar valued fit metric to be used as a DA-able metric and thus statistics such as model deviance can be used. - -The example below uses a `glm` model with a gamma distribution. - -``` -. domin price mpg rep78 headroom, reg(glm, family(gamma) link(power -1)) fitstat(e(deviance)) consmodel reverse - -Total of 7 sub-models - -General dominance statistics: Generalized linear models -Number of obs = 69 -Overall Fit Statistic = 7.4580 -Constant-only Fit Stat. = 11.6141 - - | Dominance Standardized Ranking - price | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - mpg | -3.7935 0.9127 1 - rep78 | -0.2118 0.0510 2 - headroom | -0.1509 0.0363 3 -------------------------------------------------------------------------------------- -Conditional dominance statistics -------------------------------------------------------------------------------------- - - #indepvars: #indepvars: #indepvars: - 1 2 3 - mpg -3.6135 -3.8087 -3.9583 - rep78 -0.0007 -0.2270 -0.4077 -headroom -0.1876 -0.1661 -0.0989 -------------------------------------------------------------------------------------- -Complete dominance designation -------------------------------------------------------------------------------------- - - dominated?: dominated?: dominated?: - mpg rep78 headroom - dominates?:mpg 0 1 1 - dominates?:rep78 -1 0 0 -dominates?:headroom -1 0 0 -------------------------------------------------------------------------------------- - -Strongest dominance designations - -mpg completely dominates rep78 -mpg completely dominates headroom -rep78 generally dominates headroom -``` - -Important notes about this model is that a `consmodel` is estimated (e.g., `Constant-only Fit Stat. = 11.6141`). Like entries in the `all()` model, this value is subtracted from the DA statistics. - -The `consmodel` differs from `all()` in that applies only to the estimation and removal of model _cons_-tants/intercepts; this is useful when the constant-only model is non-zero. This DA also `reverse`s the interpretation of the fit metrics such that smaller values represent more important independent variables (i.e., larger negative values). - -Note also that Gamma models can be executed with the _epsilon_ estimator. - -``` -. domin price mpg rep78 headroom, reg(glm, family(gamma) link(power -1)) epsilon - -General dominance statistics: Epsilon-based glm -Number of obs = 69 -Overall Fit Statistic = 0.3157 - - | Dominance Standardized Ranking - price | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - mpg | 0.2874 0.9104 1 - rep78 | 0.0148 0.0469 2 - headroom | 0.0135 0.0427 3 -------------------------------------------------------------------------------------- -``` - -## Linear Mixed-effects Regression-based DA - -`domin` also comes packaged with a sub-wrapper program `mixdom` that allows for estimating DA statistics with linear mixed effects models. - -There can only be two levels of clustering/nesting in the data. The analysis below uses the within-cluster r-square and the underlying model would be represented by: - -`mixed ln_wage tenure hours age collgrad || id:` - -``` -. webuse nlswork, clear -(National Longitudinal Survey. Young Women 14-26 years of age in 1968) - -. -. domin ln_wage tenure hours age collgrad, reg(mixdom, id(id)) fitstat(e(r2_w)) sets((i.race)) - -Total of 31 regressions - -Progress in running all regression subsets -0%------50%------100% -.................... -General dominance statistics: Mixed-effects ML regression -Number of obs = 28036 -Overall Fit Statistic = 0.2688 - - | Dominance Standardized Ranking - ln_wage | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - tenure | 0.0958 0.3564 2 - hours | 0.0031 0.0114 5 - age | 0.0406 0.1510 3 - collgrad | 0.1175 0.4371 1 - set1 | 0.0119 0.0441 4 -------------------------------------------------------------------------------------- -Conditional dominance statistics -------------------------------------------------------------------------------------- - - #indepvars: #indepvars: #indepvars: #indepvars: #indepvars: - 1 2 3 4 5 - tenure 0.1389 0.1130 0.0916 0.0744 0.0611 - hours 0.0053 0.0040 0.0029 0.0020 0.0012 - age 0.0860 0.0590 0.0364 0.0180 0.0034 -collgrad 0.1461 0.1274 0.1132 0.1034 0.0975 - set1 0.0157 0.0135 0.0117 0.0100 0.0084 -------------------------------------------------------------------------------------- -Complete dominance designation -------------------------------------------------------------------------------------- - - dominated?: dominated?: dominated?: dominated?: dominated?: - tenure hours age collgrad set1 - dominates?:tenure 0 1 1 0 1 - dominates?:hours -1 0 -1 -1 -1 - dominates?:age -1 1 0 -1 0 -dominates?:collgrad 0 1 1 0 1 - dominates?:set1 -1 1 0 -1 0 -------------------------------------------------------------------------------------- - -Strongest dominance designations - -tenure completely dominates hours -age completely dominates hours -collgrad completely dominates hours -set1 completely dominates hours -tenure completely dominates age -collgrad completely dominates age -tenure completely dominates set1 -collgrad completely dominates set1 -collgrad conditionally dominates tenure -age generally dominates set1 - -Variables in set1: i.race -``` - -# Advanced Examples - -The examples here outline some of the most advanced features of `domin` - -## Multinomial Logistic Regression-based DA -### Custom Program Generation using BIC as Fit Statistic - -`domin` can be adapted to dominance analyze any user-generated model--so long as they adhere to a specific structure `domin` expects. - -Specifically, the program must `ereturn scalar` or `return scalar` the fit statistic and the program's `syntax` must follow `syntax varlist if , [option]`. - -In the example below, a custom program `myprog` is defined to obtain the Bayesian Information Criterion following a `mlogit` for DA purposes. - -``` -. program define myprog, eclass - 1. -. syntax varlist if , [option] - 2. -. tempname estlist - 3. -. mlogit `varlist' `if' - 4. -. estat ic - 5. -. matrix `estlist' = r(S) - 6. -. ereturn scalar bic = `estlist'[1,6] - 7. -. end - -. -. domin race tenure hours age nev_mar, reg(myprog) fitstat(e(bic)) consmodel reverse - -Total of 15 regressions - -General dominance statistics: Multinomial logistic regression -Number of obs = 28022 -Overall Fit Statistic = 36025.3867 -Constant-only Fit Stat. = 36482.9057 - - | Dominance Standardized Ranking - race | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - tenure | 13.2083 -0.0289 4 - hours | -84.8225 0.1854 2 - age | 3.0220 -0.0066 3 - nev_mar | -388.9268 0.8501 1 -------------------------------------------------------------------------------------- -Conditional dominance statistics -------------------------------------------------------------------------------------- - - #indepvars: #indepvars: #indepvars: #indepvars: - 1 2 3 4 - tenure 15.4426 10.2512 12.7821 14.3572 - hours -102.4587 -91.2406 -77.0439 -68.5468 - age 9.5965 2.0195 1.4031 -0.9312 -nev_mar -399.0177 -395.1525 -384.8246 -376.7124 -------------------------------------------------------------------------------------- -Complete dominance designation -------------------------------------------------------------------------------------- - - dominated?: dominated?: dominated?: dominated?: - tenure hours age nev_mar - dominates?:tenure 0 -1 -1 -1 - dominates?:hours 1 0 1 -1 - dominates?:age 1 -1 0 -1 -dominates?:nev_mar 1 1 1 0 -------------------------------------------------------------------------------------- - -Strongest dominance designations - -hours completely dominates tenure -age completely dominates tenure -nev_mar completely dominates tenure -nev_mar completely dominates hours -hours completely dominates age -nev_mar completely dominates age -``` - -## Logistic Regression-based DA: Revisited -### Multiply Imputed Estimates - -`domin` can also accommodate multiply imputed data by including the `mi` option with multiply imputed data. By default, all imputations are used but options can be passed to `mi estimate` using `domin`'s `miopts()` option. - -`domin` has been tested on `mi set` data styles of `wide`, `long`, and `flong`. - -``` -. webuse mheart1s20, clear -(Fictional heart attack data; bmi missing) - -. -. domin attack smokes age bmi hsgrad female, reg(logit) fitstat(e(r2_p)) mi - -Total of 31 regressions - -Progress in running all regression subsets -0%------50%------100% -.................... -General dominance statistics: Logistic regression -Number of obs = 154 -Overall Fit Statistic = 0.1056 - - | Dominance Standardized Ranking - attack | Stat. Domin. Stat. -------------+------------------------------------------------------------------------ - smokes | 0.0553 0.5234 1 - age | 0.0248 0.2354 3 - bmi | 0.0250 0.2369 2 - hsgrad | 0.0003 0.0028 4 - female | 0.0002 0.0016 5 -------------------------------------------------------------------------------------- -Conditional dominance statistics -------------------------------------------------------------------------------------- - - #indepvars: #indepvars: #indepvars: #indepvars: #indepvars: - 1 2 3 4 5 -smokes 0.0548 0.0549 0.0551 0.0554 0.0560 - age 0.0231 0.0238 0.0247 0.0257 0.0270 - bmi 0.0222 0.0235 0.0249 0.0264 0.0281 -hsgrad 0.0000 0.0001 0.0002 0.0004 0.0008 -female 0.0000 0.0001 0.0002 0.0002 0.0003 -------------------------------------------------------------------------------------- -Complete dominance designation -------------------------------------------------------------------------------------- - - dominated?: dominated?: dominated?: dominated?: dominated?: - smokes age bmi hsgrad female -dominates?:smokes 0 1 1 1 1 - dominates?:age -1 0 0 1 1 - dominates?:bmi -1 0 0 1 1 -dominates?:hsgrad -1 -1 -1 0 0 -dominates?:female -1 -1 -1 0 0 -------------------------------------------------------------------------------------- - -Strongest dominance designations - -smokes completely dominates age -smokes completely dominates bmi -smokes completely dominates hsgrad -age completely dominates hsgrad -bmi completely dominates hsgrad -smokes completely dominates female -age completely dominates female -bmi completely dominates female -bmi generally dominates age -hsgrad generally dominates female -``` +https://github.com/fmg-jluchman/domin/wiki \ No newline at end of file diff --git a/domin.ado b/domin.ado index e74a2c3..b9abe04 100644 --- a/domin.ado +++ b/domin.ado @@ -1,443 +1,61 @@ -*! domin version 3.4.2 3/7/2023 Joseph N. Luchman +*! domin version 3.5.0 8/14/2023 Joseph N. Luchman +// version information at end of file +**# Pre-program definition quietly include dominance.mata, adopath -program define domin, eclass //history and version information at end of file +program define domin, eclass -version 12.1 - -if replay() { //replay results - error if "by" - - if ("`e(cmd)'" != "domin") error 301 - - if _by() error 190 - - Display `0' - - exit 0 - -} - -syntax varlist(min = 1 ts) [in] [if] [aw pw iw fw] , [Reg(string) Fitstat(string) Sets(string) /// define syntax -All(varlist fv ts) noCOMplete noCONditional EPSilon mi miopt(string) CONSmodel REVerse] - -/*defaults and warnings*/ -if !strlen("`reg'") { //if no "reg" option specified - notify and use default "regress" - - local reg "regress" - - display "{err}Regression type not entered in {opt reg()}. " _newline /// - "{opt reg(regress)} assumed." _newline - -} - -if !strlen("`fitstat'") & !strlen("`epsilon'") { //if no "fitstat" and "epsilon" options specified - notify and make default "e(r2)" - - local fitstat "e(r2)" - - display "{err}Fitstat type not entered in {opt fitstat()}. " _newline /// - "{opt fitstat(e(r2))} assumed." _newline - -} - -if strlen("`epsilon'") & strlen("`fitstat'") { //warning if any "reg" or "fitstat" option specified with "epsilon" - - display "{err}Option {opt epsilon} cannot be used with {opt fitstat(e(r2))}." _newline /// - "Entries in {opt fitstat()} ignored." _newline - -} - -if !strlen("`mi'") & strlen("`miopt'") { //warning if "miopt" is used without "mi" - - local mi "mi" - - display "{err}You have added {cmd:mi estimate} options without adding the {opt mi} option. {opt mi} assumed." _newline - -} - -/*exit conditions*/ -if strlen("`epsilon'") & strlen("`sets'") { //"epsilon" and "sets" cannot go together - - display "{err}Options {opt epsilon} and {opt sets()} not allowed together." - - exit 198 - -} - -if strlen("`epsilon'") & strlen("`all'") { //"epsilon" and "all" cannot go together - - display "{err}Options {opt epsilon} and {opt all()} not allowed together." - - exit 198 - -} - -if strlen("`epsilon'") & strlen("`consmodel'") { //"epsilon" and "consmodel" cannot go together - - display "{err}Options {opt epsilon} and {opt consmodel} not allowed together." +if `c(version)' < 15 { - exit 198 - -} - -if strlen("`epsilon'") & strlen("`reverse'") ///"epsilon" and "reverse" cannot go together -display "{err}Options {opt epsilon} and {opt reverse} not allowed together." _newline /// -"{opt reverse} ignored." - - -if strlen("`epsilon'") & strlen("`weight'") { //"epsilon" disallows weights - - display "{err}Option {opt epsilon} does not allow {opt weight}s." + display "{err}As of {cmd:domin} version 3.5.0, the minimum version of Stata is 15." _newline /// + "If you have an older version of Stata, most functionality of {cmd:domin} is available from" _newline /// + `"{stata net install st0645:this} Stata journal article back to version 12."' exit 198 - -} - -if strlen("`epsilon'") & strlen("`mi'") { //"epsilon" and multiple imputation options cannot go together - - display "{err}Options {opt epsilon} and {opt mi} not allowed together." - exit 198 - } -if strlen("`mi'") { //are data actually mi set? - - capture mi describe +version 15 - if _rc { //if data are not mi set +if replay() { - display "{err}Data are not {cmd:mi set}." + if ("`e(cmd)'" != "domin") error 301 - exit `=_rc' - - } + if _by() error 190 - if !r(M) { //exit if no imputations + Display `0' - display "{err}No imputations for {cmd:mi estimate}." _newline - - exit 2001 + exit 0 - } - } -capture which lmoremata.mlib //is moremata present? - -if _rc { //if moremata cannot be found, tell user to install it. - - display "{err}Module {cmd:moremata} not found. Install {cmd:moremata} here {stata ssc install moremata}." +**# Program definition and argument checks +syntax varlist(min = 1 ts) [in] [if] [aw pw iw fw] , /// + [Reg(string) Fitstat(string) Sets(string) All(varlist fv ts) /// + noCONditional noCOMplete EPSilon CONSmodel REVerse noESAMPleok mi miopt(string)] // remove direct mi support - make it into a wrapper - exit 198 - -} - -/*disallow complete and conditional with epsilon option*/ if strlen("`epsilon'") { - + local esampleok "esampleok" local conditional "conditional" - local complete "complete" - } -/*general set up*/ -if strlen("`mi'") tempfile mifile //produce a tempfile to store imputed fitstats for retreival - -tempname ranks domwgts sdomwgts cdldom cptdom //temporary matrices for results - -gettoken dv ivs: varlist //parse varlist line to separate out dependent from independent variables - -gettoken reg regopts: reg, parse(",") //parse reg() option to pull out estimation command options - -if strlen("`regopts'") gettoken erase regopts: regopts, parse(",") //parse out comma if one is present - -local diivs "`ivs'" //create separate macro to use for display purposes - -local mkivs "`ivs'" //create separate macro to use for sample marking purposes - -if `:list sizeof sets' { //parse and process the sets if included - - /*pull out set #1 from independent variables list*/ - gettoken one two: sets, bind //pull out the first set - - local setcnt = 1 //give the first set a number that can be updated as a macro - - local one = regexr("`one'", "[/(]", "") //remove left paren - - local one = regexr("`one'", "[/)]", "") //remove right paren - - local set1 `one' //name and number set - - local ivs "`ivs' <`set1'>" //include set1 into list of independent variables, include characters for binding in Mata - - local mkivs `mkivs' `set1' //include variables in set1 in the mark sample independent variable list - - local diivs "`diivs' set1" //include the name "set1" into list of variables - - - while strlen("`two'") { //continue parsing beyond set1 so long at sets remain to be parsed (i.e., there's something in the macro "two") - - gettoken one two: two, bind //again pull out a set - - local one = regexr("`one'", "[/(]", "") //remove left paren - - local one = regexr("`one'", "[/)]", "") //remove right paren - - local set`++setcnt' `one' //name and number set - advance set count by 1 - - local ivs "`ivs' <`set`setcnt''>" //include further sets - separated by binding characters - into independent variables list - - local mkivs `mkivs' `set`setcnt'' //include sets into mark sample independent variables list - - local diivs "`diivs' set`setcnt'" //include set number into display list - - } - -} - - -if `:list sizeof ivs' < 2 { //exit if too few predictors/sets (otherwise prodices cryptic Mata error) - - display "{err}{cmd:domin} requires at least 2 independent variables or independent variable sets." +if strlen("`mi'") | strlen("`mi'") { + display "{err}{cmd:mi} and {opt miopt()} are depreciated. Please use " /// + "{help mi_dom}." exit 198 - -} - -/*finalize setup*/ -tempvar touse keep //declare sample marking variables - -tempname obs allfs consfs //declare temporary scalars - -mark `touse' //declare marking variable - -quietly generate byte `keep' = 1 `if' `in' //generate tempvar that adjusts for "if" and "in" statements - -markout `touse' `dv' `mkivs' `all' `keep' //do the sample marking - -local nobindivs = subinstr("`ivs'", "<", "", .) //take out left binding character(s) for use in adjusting e(sample) when obs are dropped by an anslysis - -local nobindivs = subinstr("`nobindivs'", ">", "", .) //take out right binding character(s) for use in adjusting e(sample) when obs are dropped by an anslysis - -if !strlen("`epsilon'") { //don't invoke program checks if epsilon option is invoked - - if !strlen("`mi'") capture `reg' `dv' `nobindivs' `all' [`weight'`exp'] if `touse', `regopts' //run overall analysis - probe to check for e(sample) and whether everything works as it should - - else { - - capture mi estimate, saving(`mifile') `miopt': `reg' `dv' `nobindivs' `all' [`weight'`exp'] if `keep', `regopts' //run overall analysis with mi prefix - probe to check for e(sample) and whether everything works as it should - - if _rc { //if something's amiss with mi... - - display "{err}Error in {cmd:mi estimate: `reg'}. See return code." - - exit `=_rc' - - } - - else estimates use `mifile', number(`:word 1 of `e(m_est_mi)'') //if touse doesn't equal e(sample) - use e(sample) from first imputation and proceed - - } - - quietly count if `touse' //tally up observations from count based on "touse" - - if r(N) > e(N) & !strlen("`mi'") quietly replace `touse' = e(sample) //if touse doesn't equal e(sample) - use e(sample) and proceed; not possible with multiple imputation though - - if _rc { //exit if regression is not estimable or program results in error - return the returned code - - display "{err}{cmd:`reg'} resulted in an error." - - exit `=_rc' - - } - - capture assert `fitstat' != . //is the "fitstat" the user supplied actually returned by the command? - - if _rc { //exit if fitstat can't be found - - display "{err}{cmd:`fitstat'} not returned by {cmd:`reg'} or {cmd:`fitstat'} is not scalar valued. See {help return list}." - - exit 198 - - } - - capture assert sign(`fitstat') != -1 //what is the sign of the fitstat? domin works best with positive ones - warn and proceed - - if _rc { - - display "{err}{cmd:`fitstat'} returned by {cmd:`reg'}." _newline /// - "is negative. {cmd:domin} is programmed to work best" _newline /// - "with positive {opt fitstat()} summary statistics." _newline - - } } -if !inlist("`weight'", "iweight", "fweight") & !strlen("`mi'") { //if weights don't affect obs - - quietly count if `touse' //tally up "touse" if not "mi" - - scalar `obs' = r(N) //pull out the number of observations included - -} - -else if inlist("`weight'", "iweight", "fweight") & !strlen("`mi'") { //if the weights do affect obs - - quietly summarize `=regexr("`exp'", "=", "")' if `touse' //tally up "touse" by summing weights - - scalar `obs' = r(sum) //pull out the number of observations included - -} - -else { - - quietly mi estimate, `miopt': regress `dv' `nobindivs' `all' [`weight'`exp'] if `keep' //obtain estimate of obs when multiply imputed - - scalar `obs' = e(N) //pull out the number of observations included - -} - -/*begin estimation*/ -scalar `allfs' = 0 //begin by defining the fitstat of the "all" variables as 0 - needed for dominance() function - -if `:list sizeof all' { //if there are variables in the "all" list - - if !strlen("`mi'") { //when there is no "mi" option specified - - quietly `reg' `dv' `all' [`weight'`exp'] if `touse', `regopts' //run analysis with "all" independent variables only - - scalar `allfs' = `fitstat' //the resulting "fitstat" is then registered as the value to remove from other fitstats - - } - - else { //if "mi" is specified - - quietly mi estimate, saving(`mifile', replace) `miopt': `reg' `dv' `all' [`weight'`exp'] if `keep', `regopts' //run mi analysis with "all" independent variables only - - mi_dom, name(`mifile') fitstat(`fitstat') list(`=e(m_est_mi)') //call mi_dom program to average fitstats - - scalar `allfs' = r(passstat) //the resulting average fitstat is then registered as the value to remove from other fitstats - - } - -} - -scalar `consfs' = 0 //begin by defining the fitstat of the constant-only model as 0 - needed for dominance() function - -if strlen("`consmodel'") { //if the user desires to know what the baseline fitstat is - - if !strlen("`mi'") { //if "mi" is not declared - - quietly `reg' `dv' [`weight'`exp'] if `touse', `regopts' //conduct analysis without independent variables - - scalar `consfs' = `fitstat' //return baseline fitstat - - } - - else { //if "mi" is declared - - quietly mi estimate, saving(`mifile', replace) `miopt': `reg' `dv' [`weight'`exp'] if `keep', `regopts' //conduct mi analysis without independent variables - - mi_dom, name(`mifile') fitstat(`fitstat') list(`=e(m_est_mi)') //compute average fitstat +**# Wrapper process for -domin- processing in Mata +mata: domin_2mata("`reg'", "`fitstat'", "`sets'", "`all'", /// + "`conditional'", "`complete'", "`epsilon'", "`consmodel'", "`reverse'", /// + "`esampleok'", "`weight'`exp'", "`in' `if'", "`varlist'") - scalar `consfs' = r(passstat) //return average baseline fitstat - - } - - if `:list sizeof all' scalar `allfs' = `allfs' - `consfs' - -} - -if strlen("`epsilon'") { //primary analysis when "epsilon" is invoked - - if ("`reg'" == "mvdom") `reg' `dv' `ivs' if `touse', `regopts' `epsilon' //invoke the epsilon version of mvdom - - else mata: eps_ri("`dv' `ivs'", "`reg'", "`touse'", "`regopts'") //mata function to obtain epsilon-based estimates - - matrix `domwgts' = r(domwgts) //translate r-class matrix into temp matrix that domin expects - - matrix `sdomwgts' = `domwgts'*(1/r(fs)) //produce standardized relative weights (i.e., out of 100%) - - mata: st_matrix("`ranks'", mm_ranks(st_matrix("r(domwgts)")':*-1, 1, 1)') //rank the relative weights - -} - -else { - - mata: model_specs = domin_specs() - - mata: model_specs.mi = st_local("mi") - mata: model_specs.reg = st_local("reg") - mata: model_specs.dv = st_local("dv") - mata: model_specs.all = st_local("all") - mata: model_specs.weight = st_local("weight") - mata: model_specs.exp = st_local("exp") - mata: model_specs.touse = st_local("touse") - mata: model_specs.regopts = st_local("regopts") - - mata: dominance(model_specs, &domin_call(), /// - st_local("conditional'"), st_local("complete"), /// - st_local("ivs"), /// - st_numscalar(st_local("allfs")), st_numscalar(st_local("consfs"))) //invoke "dominance()" function in Mata - - /*translate r-class results into temp results*/ - matrix `domwgts' = r(domwgts) - - matrix `sdomwgts' = r(sdomwgts) - - matrix `ranks' = r(ranks) - - if !strlen("`conditional'") matrix `cdldom' = r(cdldom) - - if !strlen("`complete'") matrix `cptdom' = r(cptdom) - -} - -/*display results - this section will not be extensively explained*/ -/*name matrices*/ -matrix colnames `domwgts' = `diivs' - -if strlen("`reverse'") { //if 'reverse', invert the direction and interpretation of rank and standardized weights - - mata: st_matrix("`sdomwgts'", (st_matrix("`domwgts'"):*-1):/sum(st_matrix("`domwgts'"):*-1)) - - mata: st_matrix("`ranks'", ((st_matrix("`ranks'"):-1):*-1):+cols(st_matrix("`ranks'"))) - -} - -matrix colnames `sdomwgts' = `diivs' - -matrix colnames `ranks' = `diivs' - -if !strlen("`complete'") { - - if strlen("`reverse'") mata: st_matrix("`cptdom'", st_matrix("`cptdom'"):*-1) //if 'reverse', invert the direction and interpretation of complete dominance - - matrix colnames `cptdom' = `diivs' - - matrix coleq `cptdom' = dominated? - - matrix rownames `cptdom' = `diivs' - - matrix roweq `cptdom' = dominates? - -} - -if !strlen("`conditional'") { - - matrix rownames `cdldom' = `diivs' - - local colcdl `:colnames `cdldom'' - - local colcdl = subinstr("`colcdl'", "c", "", .) - - matrix colnames `cdldom' = `colcdl' - - matrix coleq `cdldom' = #indepvars - -} +**# Ereturn processing +tempname domwgts cdldom cptdom stdzd ranks if !strlen("`epsilon'") & strlen("`e(title)'") local title "`e(title)'" @@ -445,8 +63,18 @@ else if strlen("`epsilon'") & strlen("`e(title)'") local title "Epsilon-based `r else local title "Custom user analysis" -/*return values*/ -ereturn post `domwgts' [`weight'`exp'], depname(`dv') obs(`=`obs'') esample(`touse') +matrix `domwgts' = r(domwgts) +ereturn post `domwgts' [`weight'`exp'], depname(`dv') obs(`=`r(N)'') esample(`touse') + +**# Ereturn scalars +ereturn scalar N = `=r(N)' + +matrix `domwgts' = r(domwgts)*J(colsof(r(domwgts)), 1, 1) +if strlen("`epsilon'") ereturn scalar fitstat_o = `=`domwgts'[1,1]' +else ereturn scalar fitstat_o = `=`domwgts'[1,1] + r(allfs) + r(consfs)' + +if `:list sizeof all' ereturn scalar fitstat_a = `=r(allfs)' +if strlen("`consmodel'") ereturn scalar fitstat_c = `=r(consfs)' if strlen("`setcnt'") { @@ -464,6 +92,7 @@ if strlen("`setcnt'") { else ereturn hidden scalar setcnt = 0 +**# Ereturn macros ereturn hidden local dtitle "`title'" ereturn hidden local reverse "`reverse'" @@ -480,14 +109,6 @@ if strlen("`epsilon'") ereturn local estimate "epsilon" else ereturn local estimate "dominance" -if strlen("`mi'") { - - if strlen("`miopt'") ereturn local miopt "`miopt'" - - ereturn local mi "mi" - -} - if strlen("`regopts'") ereturn local regopts `"`regopts'"' ereturn local reg `"`reg'"' @@ -500,21 +121,32 @@ ereturn local title `"Dominance analysis"' ereturn local cmdline `"domin `0'"' -ereturn scalar fitstat_o = r(fs) - -if `:list sizeof all' ereturn scalar fitstat_a = `allfs' - -if strlen("`consmodel'") ereturn scalar fitstat_c = `consfs' +**# Ereturn matrices +if !strlen("`complete'") { + matrix `cptdom' = r(cptdom) + mata: st_matrixcolstripe("`cptdom'", /// + (J(cols(st_matrix("`cptdom'")), 1, "dominated?"), /// + st_matrixcolstripe("e(b)")[,2])) + mata: st_matrixrowstripe("`cptdom'", /// + (J(cols(st_matrix("`cptdom'")), 1, "dominates?"), /// + st_matrixcolstripe("e(b)")[,2])) + ereturn matrix cptdom = `cptdom' +} -if !strlen("`conditional'") ereturn matrix cdldom `cdldom' - -if !strlen("`complete'") ereturn matrix cptdom `cptdom' +if !strlen("`conditional'") { + matrix `cdldom' = r(cdldom) + mata: st_matrixcolstripe("`cdldom'", /// + (J(cols(st_matrix("`cdldom'")), 1, "#indepvars"), /// + strofreal(1::cols(st_matrix("`cdldom'"))))) + ereturn matrix cdldom = `cdldom' +} -ereturn matrix ranking `ranks' +matrix `ranks' = r(ranks) +ereturn matrix ranking = `ranks' -ereturn matrix std `sdomwgts' +matrix `stdzd' = r(stdzd) +ereturn matrix std = `stdzd' -/*begin display*/ Display end @@ -522,7 +154,7 @@ end /*Display program*/ program define Display -version 12.1 +version 15 tempname domwgts sdomwgts ranks @@ -630,14 +262,14 @@ if e(estimate) == "dominance" & `=`cpttest'*`cdltest'' { local names `:colnames e(b)' mata: display((select(vec(tokens(st_local("names"))':+((st_matrix("`bestdom'"):==1):*" completely dominates "):+tokens(st_local("names")))', /// - regexm(vec(tokens(st_local("names"))':+((st_matrix("`bestdom'"):==1):*" completely dominates "):+tokens(st_local("names")))', /// - "completely dominates")) , /// - select(vec(tokens(st_local("names"))':+((st_matrix("`bestdom'"):==2):*" conditionally dominates "):+tokens(st_local("names")))', /// - regexm(vec(tokens(st_local("names"))':+((st_matrix("`bestdom'"):==2):*" conditionally dominates "):+tokens(st_local("names")))', /// - "conditionally dominates")), /// - select(vec(tokens(st_local("names"))':+((st_matrix("`bestdom'"):==3):*" generally dominates "):+tokens(st_local("names")))', /// - regexm(vec(tokens(st_local("names"))':+((st_matrix("`bestdom'"):==3):*" generally dominates "):+tokens(st_local("names")))', /// - "generally dominates")))') + regexm(vec(tokens(st_local("names"))':+((st_matrix("`bestdom'"):==1):*" completely dominates "):+tokens(st_local("names")))', /// + "completely dominates")) , /// + select(vec(tokens(st_local("names"))':+((st_matrix("`bestdom'"):==2):*" conditionally dominates "):+tokens(st_local("names")))', /// + regexm(vec(tokens(st_local("names"))':+((st_matrix("`bestdom'"):==2):*" conditionally dominates "):+tokens(st_local("names")))', /// + "conditionally dominates")), /// + select(vec(tokens(st_local("names"))':+((st_matrix("`bestdom'"):==3):*" generally dominates "):+tokens(st_local("names")))', /// + regexm(vec(tokens(st_local("names"))':+((st_matrix("`bestdom'"):==3):*" generally dominates "):+tokens(st_local("names")))', /// + "generally dominates")))') display "" @@ -657,29 +289,580 @@ if strlen("`e(all)'") display "{txt}Variables included in all subsets: `e(all)'" end -/*program to average fitstat across all multiple imputations for use in domin*/ -program define mi_dom, rclass +**# Mata function adapting Stata input for Mata and initiating the Mata environment +version 15 -version 12.1 +mata: -syntax, name(string) fitstat(string) list(numlist) +mata set matastrict on -tempname passstat +void domin_2mata( + string scalar reg, + string scalar fitstat, + string scalar sets, + string scalar all, + string scalar conditional, + string scalar complete, + string scalar epsilon, + string scalar consmodel, + string scalar reverse, + string scalar esampleok, + string scalar weight, + string scalar inif, + string scalar varlist + ) { + + +/* ~ declarations and initial structure */ + class AssociativeArray scalar model_specs + + real scalar builtin, iv, set, rc, obs, + full_fitstat, all_fitstat, cons_fitstat, index_count + + string scalar regopts, dv, marks + + string rowvector ivs, iv_sets, bltin_varlist, bltin_index, + bltin_all + + real rowvector bltin_dvcor + + real matrix bltin_corrmat + + transmorphic parser + +/* ~ argument checks ~ */ + /*-domin- defaults*/ + if ( (!strlen(reg) & !strlen(fitstat)) & !strlen(epsilon) ) { + + reg = "regress" + + fitstat = "e(r2)" + + weight = + subinword( subinword( weight, + "pweight", "aweight"), "iweight", "aweight") + + builtin = 1 + + } + else if ( (strlen(reg) & strlen(fitstat)) | strlen(epsilon) ) builtin = 0 + else { + + display("Both {opt reg()} and {opt fitstat()} must be chosen or " + + "neither as of {cmd:domin} version 3.5.") + exit(198) + + } + + /*'epsilon' specifics*/ + if ( strlen(epsilon) ) { + + /*exit conditions: user must restructure*/ + if ( strlen(all + sets) ) { + + display("{err}{opt epsilon} not allowed with" + + " {opt all()} or {opt sets()}.") + exit(198) + + } + + } + +/* ~ process iv and sets ~ */ + /*parse varlist - store as 'ivs'*/ + ivs = tokens(varlist) + + /*'dv ivs' structure means first entry is 'dv'*/ + dv = ivs[1] + + /*remaining entries, if any, are 'ivs'*/ + if ( length(ivs) > 1 ) ivs = ivs[2..length(ivs)] + else ivs = "" + + /*set processing*/ + if ( strlen(sets) ) { + /*setup parsing sets - bind on parentheses*/ + parser = tokeninit(" ", "", "()") + tokenset(parser, sets) + + /*get all sets and remove parentehses*/ + iv_sets = tokengetall(parser) + iv_sets = substr(iv_sets, 2, strlen(iv_sets):-2) + st_local("setcnt", strofreal( length(iv_sets) ) ) + for (set=1; set<=length(iv_sets); set++) { + + st_local("set"+strofreal(set), iv_sets[set] ) + + rc = _stata("fvunab waste: " + iv_sets[set], 1) + if ( rc ) { + display("{err}Problem with variables in set position " + strofreal(set) + ": " + iv_sets[set]) + exit(111) + } + + } + + /*combine to single iv vector*/ + if ( length(ivs) > 1 ) ivs = (ivs, iv_sets) + else ivs = iv_sets + + } + + if ( length(ivs) < 2 | ivs == "") { -scalar `passstat' = 0 //placeholder scalar to hold the sum + stata("display " + char(34) + + "{err}{cmd:domin} requires at least 2 independent variables or independent variable sets." + + char(34)) + exit(198) + + } -foreach x of numlist `list' { +/* ~ parse regression options ~ */ + parser = tokeninit(" ", ",") + + tokenset(parser, strtrim(reg)) + + reg = tokenget(parser) + + if ( length(tokenrest(parser)) ) regopts = substr(tokenrest(parser), 2) + else regopts = "" + +/* ~ check primary analysis and set estimation sample ~ */ + if ( !strlen(epsilon) ) { + + st_eclear() + + if (inif == " ") inif = "if _n > 0" + + rc = _stata(reg + " " + dv + " " + invtokens(ivs) + " " + all + + " [" + weight + "] " + inif + ", " + regopts, 1) + + if ( rc ) { + + display("{err}{cmd:" + reg + "} resulted in an error.") + exit(rc) - estimates use `name', number(`x') //find the focal estimates + } + + if ( !length( st_numscalar( strtrim(fitstat) ) ) ) { + + display("{err}{cmd:" + fitstat + + "} not returned by {cmd:" + reg + "} or {cmd:" + fitstat + + "} is not scalar valued. See {help return list}.") + exit(198) + + } + else full_fitstat = st_numscalar( strtrim(fitstat) ) + + marks = st_tempname() + + stata("generate byte " + marks[1] + " = e(sample)", 1) + + stata("count if " + marks[1], 1) + + obs = st_numscalar("r(N)") - scalar `passstat' = `passstat' + `fitstat'*`:list sizeof list'^-1 //add in the weighted fitstat value + } + else obs = 0 + if (obs == 0 & strlen(esampleok) ) { + + marks = st_tempname(2) + + stata("mark " + marks[1]) + + stata("generate byte " + marks[2] + " = 1 " + inif, 1) + + stata("markout " + marks[1] + " " + marks[2] + + " " + invtokens(ivs) + " " + all, 1) + + stata("count if " + marks[1], 1) + + obs = st_numscalar("r(N)") + + } + + if (obs == 0 & !strlen(esampleok)) { + + display("{err}{cmd:esample()} not set. Use {opt noesampleok} to avoid checking {cmd:esample()}.") + exit(198) + } + +/* ~ built-in linear regression-based model ~ */ + if ( builtin ) { + + bltin_varlist = bltin_index = J(1, length(ivs), "") + index_count = 1 + + for (iv = 1; iv <= length(ivs); iv++) { + + stata("fvexpand " + ivs[iv], 1) + + if ( strlen( st_macroexpand("`" + "r(fvops)" + "'") ) | + strlen( st_macroexpand("`" + "r(tsops)" + "'") ) ) { + + bltin_index[iv] = st_macroexpand("`" + "r(varlist)" + "'") + + stata("fvrevar " + ivs[iv], 1) + + bltin_varlist[iv] = st_macroexpand("`" + "r(varlist)" + "'") + + if ( any( + strmatch( tokens(bltin_index[iv]) , "*b.*") ) ) + bltin_varlist[iv] = + invtokens( + select( tokens(bltin_varlist[iv]), + !strmatch( tokens(bltin_index[iv]) , "*b.*") ) ) + + bltin_index[iv] = + invtokens( + strofreal((index_count..index_count+length( tokens( bltin_varlist[iv] ) + )-1 ) ) ) + + index_count = index_count + length( tokens( bltin_varlist[iv] ) ) + + } + else { + + bltin_varlist[iv] = ivs[iv] + + bltin_index[iv] = strofreal(index_count) + + index_count++ + + } + + } + + if ( strlen(all) ) { + + bltin_all = + strofreal( ( index_count..index_count+length(tokens(all))-1 ) ) + + index_count = index_count + length( tokens(all) ) + + } + else bltin_all = "" + + if (regexm(weight, "^[pi]weight=")) + weight = regexr(weight, "^[pi]weight=", "aweight=") + + stata("correlate " + invtokens(bltin_varlist) + " " + all + + " [" + weight + "] if " + marks[1], 1) + + bltin_corrmat = st_matrix("r(C)") + + stata("correlate " + dv + " " + invtokens(bltin_varlist) + + " " + all + " [" + weight + "] if " + marks[1], 1) + + bltin_dvcor = st_matrix("r(C)")[1, 2..index_count] + + } + +/* ~ begin collecting effect sizes ~ */ + all_fitstat = 0 + + if ( strlen(all) ) { + + stata(reg + " " + dv + " " + all + " [" + weight + "] if " + marks[1] + + ", " + regopts, 1) + + all_fitstat = st_numscalar( strtrim(fitstat) ) + + } + + cons_fitstat = 0 + + if ( strlen(consmodel) ) { + + stata(reg + " " + dv + " [" + weight + "] if " + marks[1] + + ", " + regopts, 1) + + cons_fitstat = st_numscalar( strtrim(fitstat) ) + + + } + + if ( strlen(all) ) all_fitstat = all_fitstat - cons_fitstat + + full_fitstat = full_fitstat - all_fitstat - cons_fitstat + + +/* ~ invoke dominance ~ */ + if ( strlen(epsilon) ) { + + if (reg == "mvdom") + stata("mvdom " + dv + " " + invtokens(ivs) + " if " + + marks[1] + ", " + regopts + " epsilon", 1) + else eps_ri(dv + " " + invtokens(ivs), reg, + marks[1], regopts, + ( length(tokens(weight, "=")) == 3 ) ? + tokens(weight, "=")[3] : "") + + } + else if ( builtin ) { + + model_specs.put("corrmat", bltin_corrmat) + model_specs.put("dvcor", bltin_dvcor) + + model_specs.put("all", bltin_all) + model_specs.put("all_fitstat", all_fitstat) + model_specs.put("cons_fitstat", 0) + + dominance( + model_specs, &domin_regress(), + bltin_index', conditional, complete, full_fitstat ) + + } + else { + + model_specs.put("reg", reg) + model_specs.put("fitstat", fitstat) + model_specs.put("weight", weight) + model_specs.put("touse", marks[1]) + model_specs.put("regopts", regopts) + + model_specs.put("all", all) + model_specs.put("dv", dv) + + model_specs.put("consmodel", consmodel) + model_specs.put("reverse", reverse) + + model_specs.put("all_fitstat", all_fitstat) + model_specs.put("cons_fitstat", cons_fitstat) + + dominance( + model_specs, &domin_call(), + ivs', conditional, complete, full_fitstat ) + + } +/* ~ return values ~ */ + if ( length(iv_sets) ) + ivs[(( length(ivs)-length(iv_sets)+1 )..length(ivs))] = + "set":+( strofreal( (1..length(iv_sets)) ) ) + + st_matrixcolstripe("r(domwgts)", (J(length(ivs), 1, ""), ivs') ) + + if ( strlen(reverse) ) + st_matrix("r(cptdom)", + st_matrix("r(cptdom)"):*-1 ) + + if ( !strlen(conditional) ) st_matrixrowstripe("r(cdldom)", (J(length(ivs), 1, ""), ivs') ) + + if ( !strlen(complete) ) { + st_matrixrowstripe("r(cptdom)", (J(length(ivs), 1, ""), ivs') ) + st_matrixcolstripe("r(cptdom)", (J(length(ivs), 1, ""), ivs') ) + } + + st_matrix("r(stdzd)", + (st_matrix("r(domwgts)")):/(sum(st_matrix("r(domwgts)") ) ) ) + st_matrixcolstripe("r(stdzd)", (J(length(ivs), 1, ""), ivs') ) + + st_matrix("r(ranks)", + colsum( + J(cols(st_matrix("r(domwgts)")), + 1, st_matrix("r(domwgts)") ):<=(st_matrix("r(domwgts)")') + ) ) + if ( strlen(reverse) ) + st_matrix("r(ranks)", + colsum( + J(cols(st_matrix("r(domwgts)")), + 1, st_matrix("r(domwgts)") ):>=(st_matrix("r(domwgts)")') + ) ) + st_matrixcolstripe("r(ranks)", (J(length(ivs), 1, ""), ivs') ) + + st_numscalar("r(N)", obs) + + st_local("reg", reg) + st_local("regopts", regopts) + st_local("touse", marks[1]) + st_local("dv", dv) + + st_numscalar("r(allfs)", model_specs.get("all_fitstat")) + st_numscalar("r(consfs)", model_specs.get("cons_fitstat")) + } -return scalar passstat = `passstat' //average fitstat = the MI'd fitstat +end + +**# Mata function to execute 'domin-flavored' models +version 15 + +mata: + + mata set matastrict on + + real scalar domin_call(string scalar IVs_in_model, + class AssociativeArray scalar model_specs) { + + real scalar fitstat + + stata(model_specs.get("reg") + " " + model_specs.get("dv") + " " + + model_specs.get("all") + " " + IVs_in_model + " [" + + model_specs.get("weight") + "] if " + + model_specs.get("touse") + ", " + model_specs.get("regopts"), 1) + + fitstat = st_numscalar(model_specs.get("fitstat")) - + model_specs.get("all_fitstat") - + model_specs.get("cons_fitstat") + + return(fitstat) + + } + +end + +**# Mata function to execute built-in linear regression +version 15 + +mata: + + mata set matastrict on + + real scalar domin_regress(string scalar IVs_in_model, + class AssociativeArray scalar model_specs) { + + real scalar fitstat + + real rowvector index + + real colvector coeffs + + real matrix analysis_mat + + index = + strtoreal( + tokens( + invtokens( (IVs_in_model, model_specs.get("all") ) ) + ) ) + + coeffs = + qrsolve( + model_specs.get("corrmat")[index, index], + model_specs.get("dvcor")[index]' ) + + fitstat = model_specs.get("dvcor")[index]*coeffs - + model_specs.get("all_fitstat") + + return(fitstat) + + } + +end + +**# Mata function to execute epsilon-based relative importance +version 12 + +mata: + +mata set matastrict on + +void eps_ri(string scalar varlist, string scalar reg, + string scalar touse, string scalar regopts, string scalar weight) +{ + /*object declarations*/ + real scalar rc + + real matrix X, L, R, Lm, L2, R2, orth + + real rowvector V, Bt, V2, glmwgts, varloc + + string rowvector orthnames + + real scalar sd_yhat, cor_yhat + + string scalar predmu, wgt_stmnt + + transmorphic view, wgt_view, mu_view + + /*begin processing*/ + st_view(view, ., tokens(varlist), st_varindex(touse)) + + if ( strlen(weight) ) st_view(wgt_view, ., weight, st_varindex(touse)) + else wgt_view = 1 + + X = correlation(view, wgt_view) //obtain correlations + + L = R = X[2..rows(X), 2..cols(X)] //set-up for svd() + + V = J(1, cols(X)-1, .) //placeholder for eigenvalues + + svd(X[2..rows(X), 2..cols(X)], L, V, R) //conduct singular value decomposition + + Lm = (L*diag( sqrt(V) )*R) //process orthogonalized predictors + + if (reg == "regress") Bt = invsym(Lm)*X[2..rows(X), 1] //obtain adjusted regression weights + + else if (reg == "glm") { //if glm-based... + + svd( + ( view[., 2..cols(view)]:-mean( + view[., 2..cols(view)] ) ):/( sqrt( diagonal( variance( + view[., 2..cols(view)] ) ) )' ), + L2, V2, R2) + + orth = L2*R2 //produce the re-constructed orthogonal predictors for use in regression + + orth = (orth:-mean(orth)):/sqrt( diagonal( variance(orth) ) )' //standardize the orthogonal predictors + + orthnames = st_tempname( cols(orth) ) + + varloc = st_addvar("double", orthnames) //generate some tempvars for Stata + + st_store(., orthnames, st_varindex(touse), orth) //put the orthogonalized variables in Stata + + if ( strlen(weight) ) wgt_stmnt = " [iweight=" + else wgt_stmnt = " [" + + rc = _stata("glm " + tokens(varlist)[1] + " " + + invtokens(orthnames) + wgt_stmnt + weight + "] if " + + touse + ", " + regopts, 1) //conduct the analysis + + if ( rc ) { + + display("{err}{cmd:" + reg + "} failed when executing {cmd:epsilon}.") + + exit( rc ) + + } + + glmwgts = st_matrix("e(b)") //record the regression weights to standardize + + predmu = st_tempname() //generate some more tempvars for Stata + + sd_yhat = sqrt( variance(orth*glmwgts[1, 1..cols(glmwgts)-1]') ) //SD of linear predictor + + stata("predict double " + predmu + " if " + touse + ", mu", 1) //translated with link function + + st_view(mu_view, ., st_varindex(predmu), st_varindex(touse)) + + cor_yhat = correlation((view[., 1], mu_view), wgt_view) + + Bt = (glmwgts[1, 1..cols(glmwgts)-1]:*((cor_yhat[2, 1])/(sd_yhat)))' + + } + + else { //asked for invalid reg + + display("{err}{opt reg(" + reg + ")} invalid with {opt epsilon}.") + + exit(198) + + } + + Bt = Bt:^2 //square values of regression weights + + Lm = Lm:^2 //square values of orthogonalized predictors + + st_matrix("r(domwgts)", (Lm*Bt)') //produce proportion of variance explained and put into Stata + +} end +/* programming notes and history +- domin_se version 0.0.0 - date - mth day, year + + /* programming notes and history - domin version 1.0 - date - April 4, 2013 ----- @@ -750,4 +933,18 @@ end - records domin options directly (doesn't call them as local macros) // 3.4.2 - March 7, 2023 - call 'dominance.mata' as opposed to using 'lb_dominance.mlib' to allow backward compatability + --- + domin version 3.5.0 - August 14, 2023 + - most of command migrated to Mata - minimum supported version is 15 + - argument check changes - new requirement that underlying command has e(sample) + - remove support for -mi- make it a wrapper like fitdom() + - remove dependency on -moremata- + - built-in linear regression/-regress- function for speedier DA + - updates to dominance.mata (to 0.1.0) + - updated 'epsilon' method to use st_view() instead of st_data() + - 'epsilon' method accepts weights + ** + --- + future domin + ** planned ** - depreciated by -domin2-'s bmaregress interface; eventually will be defunct and enveloped by its interface */ diff --git a/domin.sthlp b/domin.sthlp index 265d25a..b3c0819 100644 --- a/domin.sthlp +++ b/domin.sthlp @@ -1,5 +1,5 @@ {smcl} -{* *! version 3.4.2 March 7, 2023 J. N. Luchman}{...} +{* *! version 3.5.0 August 14, 2023 J. N. Luchman}{...} {cmd:help domin} {title:Title} @@ -22,9 +22,8 @@ {synopt :{opt s:ets((IVset_1) ... (IVset_x))}}sets of indepdendent variables{p_end} {synopt :{opt a:ll(IVall)}}indepdendent variables included in all subets{p_end} {synopt :{opt cons:model}}adjusts {opt fitstat()} value when {bf:_cons}-only model is not 0{p_end} -{synopt :{opt mi}}uses {cmd:mi set} data{p_end} -{synopt :{opt miopt(mi_options)}}options passed to {cmd:mi estimate}{p_end} {synopt :{opt eps:ilon}}uses the epsilon or relative weights estimator{p_end} +{synopt :{opt noesamp:leok}}allow computation when estimation sample is not set in {opt reg()}{p_end} {syntab:Reporting} {synopt :{opt nocon:ditional}}suppresses computation of conditional dominance statistics{p_end} @@ -50,10 +49,6 @@ Note that {cmd:domin} requires at least two indepvars or sets of indepvars (see option {opt sets()} below). Because it is possible to submit only sets of {it:indepvars}, the initial {it:indepvars} statement is optional. -{p 4 6 2} -{cmd:domin} requires installation of Ben Jann's {cmd:moremata} package -(install {stata ssc install moremata:here}).{p_end} - {title:Table of Contents} @@ -239,13 +234,18 @@ equation {it: cmd depvar indepvars} syntax. added in {opt reg()}, all the arguments following the comma will be passed to each run of the command as options. -{pmore}{opt reg()} defaults to {opt reg(regress)} with a warning denoting the default behavior. +{pmore}As of version 3.5, when {opt reg()} is omitted, {cmd:domin} defaults to using a very fast built-in method +for linear regression-based dominance analysis with the explained variance R2. Using the built-in method is +strongly recommended over using {opt reg(regress)}. The user must omit both {opt reg()} and {opt fitstat()} +to invoke the built-in method. {phang}{opt fitstat(scalar)} refers {cmd:domin} to the scalar valued model fit summary statistic used to compute all dominance statistics/designations. The scalar in {opt fitstat()} can be any {help return:returned}, {help ereturn:ereturned}, or other {help scalar:scalar}. -{pmore}{opt fitstat()} defaults to {opt fitstat(e(r2))} with a warning denoting the default behavior. +{pmore}As is noted in the {opt reg()} section, when {opt reg()} and {opt fitstat()} are omitted, +{cmd:domin} defaults to using a very fast built-in method for linear regression-based dominance analysis with the +explained variance R2. {pmore}See {help fitdom} for wrapper command to use fit statistics computed as postestimation commands such as {cmd: estat ic} (see Example #9b). @@ -292,19 +292,6 @@ model fit statistics that are not 0 when a constant[s]-only model is estimated ( and the user wants to obtain dominance statistics/designations adjusting for the constant[s]-only baseline value. -{phang}{opt mi} invokes Stata's {help mi} options across all sub-models. Thus, each sub-model's -analysis is run using the {cmd:mi estimate} prefix and all the {opt fitstat()} statistics returned -by the analysis program are averaged across all imputations (see Example #10). - -{pmore}To pass specific {opt mi} prefix options to each sub-model, use {opt miopt()} below. - -{phang}{opt miopt(mi_options)} passes options to {cmd:mi estimate} for each sub-model. Each analysis is -passed the options in {opt miopt()}. Each of the entries in {opt miopt()} must be a valid option -for {cmd:mi estimate}. - -{pmore}Invoking {opt miopt()} without {opt mi} invokes {opt mi} and produces a warning noting that -the user neglected to also specify {opt mi}. - {phang}{opt epsilon} is an alternative Shapley value decomposition estimator also known as "Relative Weights Analysis" (Johnson, 2000). {opt epsilon} is a faster implementation as it does not estimate all sub-models to ascertain the effect of each IV independent of each other IV, but rather @@ -327,7 +314,7 @@ Currently, {opt epsilon} works with commands {cmd:regress}, {cmd:glm} (for any { program for multivariate regression; see LeBreton & Tonidandel, 2008; see also Example #6). By default, {opt epsilon} assumes {opt reg(regress)} and {opt fitstat(e(r2))}. Note that {opt epsilon} ignores entries in {opt fitstat()} as it produces its own fit statistic. {opt episilon}'s implementation -does not allow {opt consmodel}, {opt reverse}, {opt mi}, and does not allow the use of {help weights}. +does not allow {opt consmodel} or {opt reverse}. As of version 3.5, {opt epsilon} does allow {help weights}. {pmore}{cmd:Note:} The {opt epsilon} approach has been criticized for being conceptually flawed and biased (see Thomas, Zumbo, Kwan, & Schweitzer, 2014) as an estimator of Shapley values. Despite this criticism @@ -336,6 +323,14 @@ research also shows similarity between DA and {opt epsilon}-based methods in ter to general dominance statistics/Shapley values. Ultimately, the user is cautioned in the use of {opt epsilon} as its speed may come at the cost of bias. +{phang}{opt noesampleok} allows {cmd:domin} to proceed in computing dominance statistics despite the underlying +command in {opt reg()} not setting the esimation sample. {cmd:domin} uses the {cmd:e(sample)} result to restrict +the observation prior to estimating all sub-models. This behavior is new as of version 3.5. + +{pmore} When {opt noesampleok} is invoked, {cmd:domin} will attempt to mark the estimation sample using all variables +the {it:depvar} and {it:indepvars} lists as well as the {opt all()} and the {opt sets()} options. This is {cmd:domin}'s +approach to sample marking in versions prior to 3.5. + {dlgtab:Reporting} {phang}{opt noconditional} suppresses the computation and display of of the conditional dominance @@ -376,8 +371,6 @@ when overall model fit statistics are used that decrease with better fit (e.g., {synopt:{cmd:e(fitstat)}}contents of the {opt fitstat()} option{p_end} {synopt:{cmd:e(reg)}}contents of the {opt reg()} option (before comma){p_end} {synopt:{cmd:e(regopts)}}contents of the {opt reg()} option (after comma){p_end} -{synopt:{cmd:e(mi)}}{cmd:mi}{p_end} -{synopt:{cmd:e(miopt)}}contents of the {opt miopt()} option{p_end} {synopt:{cmd:e(estimate)}}estimation method ({cmd:dominance} or {cmd:epsilon}){p_end} {synopt:{cmd:e(properties)}}{cmd:b}{p_end} {synopt:{cmd:e(depvar)}}name of dependent variable{p_end} @@ -398,7 +391,7 @@ when overall model fit statistics are used that decrease with better fit (e.g., {phang} {stata sysuse auto}{p_end} -{phang}Example 1: linear regression dominance analysis{p_end} +{phang}Example 1: linear regression dominance analysis using built-in method{p_end} {phang} {stata domin price mpg rep78 headroom} {p_end} {phang}Example 2: Ordered outcome dominance analysis with covariate (e.g., Luchman, 2014){p_end} @@ -451,22 +444,23 @@ when overall model fit statistics are used that decrease with better fit (e.g., {phang} {stata matrix `estlist' = r(S)}{p_end} {phang} {stata ereturn scalar bic = `estlist'[1,6]}{p_end} {phang} {stata end}{p_end} -{phang} {stata domin race tenure hours age never_married, reg(myprog) fitstat(e(bic)) consmodel reverse} {p_end} +{phang} {stata domin race tenure hours age nev_mar, reg(myprog) fitstat(e(bic)) consmodel reverse} {p_end} {phang}Example 9b: Multinomial logistic regression with {cmd:fitdom} {p_end} -{phang} {stata "domin race tenure hours age never_married, reg(fitdom, fitstat_fd(r(S)[1,6]) reg_fd(mlogit) postestimation(estat ic)) consmodel reverse fitstat(e(fitstat))"} {p_end} +{phang} {stata "domin race tenure hours age nev_mar, reg(fitdom, fitstat_fd(r(S)[1,6]) reg_fd(mlogit) postestimation(estat ic)) consmodel reverse fitstat(e(fitstat))"} {p_end} {phang}Example 9c: Comparison dominance analysis with McFadden's pseudo-R2 {p_end} -{phang} {stata domin race tenure hours age never_married, reg(mlogit) fitstat(e(r2_p))} {p_end} +{phang} {stata domin race tenure hours age nev_mar, reg(mlogit) fitstat(e(r2_p))} {p_end} -{phang}Example 10: Multiply imputed dominance analysis {p_end} +{phang}Example 10: Multiply imputed dominance analysis using {cmd:mi_dom}{p_end} {phang} {stata webuse mheart1s20, clear} {p_end} -{phang} {stata domin attack smokes age bmi hsgrad female, reg(logit) fitstat(e(r2_p)) mi} {p_end} +{phang} {stata domin attack smokes age bmi hsgrad female, reg(mi_dom, reg_mi(logit) fitstat_mi(e(r2_p))) fitstat(e(fitstat))} {p_end} {phang} Comparison dominance analysis without {cmd:mi} ("in 1/154" keeps only original observations for comparison as in {bf:{help mi_intro_substantive:[MI] intro substantive}}) {p_end} {phang} {stata domin attack smokes age bmi hsgrad female in 1/154, reg(logit) fitstat(e(r2_p))} {p_end} -{phang}Example 11: Random forest with custom in-sample R2 postestimation command (requires {stata ssc install randomforest:randomforest} and Stata V15 or better){p_end} +{phang}Example 11: Random forest with custom in-sample R2 postestimation command (requires {stata ssc install rforest:rforest}){p_end} +{phang} {stata sysuse auto, clear} {p_end} {phang} {stata program define myRFr2, eclass} {p_end} {phang} {stata tempvar rfpred} {p_end} {phang} {stata predict `rfpred'} {p_end} @@ -548,7 +542,7 @@ user need only provide {cmd:domin} the {cmd:pweight} variable for commands that {space 4}{title:6c] Extending Models that can be Dominance Analyzed} -{pstd}{cmd:domin} comes with 3 wrapper programs {cmd:mvdom}, {cmd:mixdom}, and {cmd:fitdom}. +{pstd}{cmd:domin} comes with 4 wrapper programs {cmd:mvdom}, {cmd:mixdom}, {cmd:fitdom}, and {cmd:mi_dom}. {pstd}{cmd:mvdom} implements multivariate regression-based dominance analysis described by Azen and Budescu (2006; see {help mvdom}). @@ -560,8 +554,8 @@ when they do not follow the traditional {it:depvar indepvars} format. As long a be expressed in some way that can be evaluated in {it:depvar indepvars} format, any analysis could be dominance analyzed. -{pstd}Any program used as a wrapper by {cmd:domin} must accept at least one optional argument and must accept -a {help if} statement in its {help syntax}. +{pstd}Any program used as a wrapper by {cmd:domin} must accept an {help if} statement in its {help syntax}. +It is recommended that wrapper programs parse the inputs as a {it:varlist} as well (see Example #9a). {pstd}A third wrapper program, {cmd:fitdom}, takes inspiration from the {browse "https://CRAN.R-project.org/package=domir":R package domir} as it serves as a wrapper for a postestimation @@ -570,6 +564,13 @@ command that produces a fit metric such as {help estat ic} or {help estat classi {pstd}This program allows postestimation commands that return fit metrics to be used directly in {cmd:domin} without having to make a wrapper program for the entire model (i.e., as in Example #9a). +{pstd}The fourth wrapper program, {cmd:mi_dom}, is a replacement for {cmd:domin}'s built in {opt mi} in versions +previous to 3.5 (see Example #10; see also {help mi_dom}). + +{pstd}This program allows multiply imputed model fit statistics to be used in place +of fit statistics with missing data. Use of multiply imputed fit statistics can +reduce the bias of coefficient estimates and dominance statistics when the imputation model is informative. + {marker refs}{...} {title:7. References} @@ -592,11 +593,11 @@ without having to make a wrapper program for the entire model (i.e., as in Examp {phang} Additional discussion of results, options, and conceptual issues on: -{phang}{browse "http://github.com/jluchman/domin/blob/master/README.md"} +{phang}{browse "https://github.com/fmg-jluchman/domin/wiki"} {phang} Please report bugs, requests for features, and contribute to as well as follow on-going development of {cmd:domin} on: -{phang}{browse "http://github.com/jluchman/domin"} +{phang}{browse "http://github.com/fmg-jluchman/domin"} {title:Article} diff --git a/dominance.mata b/dominance.mata index a706421..565424a 100644 --- a/dominance.mata +++ b/dominance.mata @@ -1,72 +1,39 @@ -*! dominance.mata version 0.0.1 1/27/2023 Joseph N. Luchman - -version 12.1 - -**# Container object for domin specs -mata: - -mata set matastrict on - -struct domin_specs { - - string scalar mi, reg, dv, all, weight, exp, touse, regopts - -} - -end +*! dominance.mata version 0.1.0 8/14/2023 Joseph N. Luchman **# Mata function to compute all combinations of predictors or predictor sets run all subsets regression, and compute all dominance criteria +version 15 + mata: mata set matastrict on -void dominance(struct domin_specs scalar model_specs, pointer scalar model_call, /// - string scalar cdlcompu, string scalar cptcompu, string scalar iv_string, /// - real scalar all_subsets_fitstat, real scalar constant_model_fitstat) -{ +void dominance( + class AssociativeArray scalar model_specs, pointer scalar model_call, + string colvector IVs, + string scalar cdlcompu, string scalar cptcompu, + real scalar full_fitstat) { + /*# object declarations*/ - real matrix IV_antiindicator_matrix, conditional_dominance, /// - weighted_order_fitstats, weighted_order1less_fitstats, /// - weight_matrix, complete_dominance, select2IVfits, /// + real matrix IV_antiindicator_matrix, conditional_dominance, + weighted_order_fitstats, weighted_order1less_fitstats, + weight_matrix, complete_dominance, select2IVfits, IV_indicator_matrix, sorted_two_IVs, compare_two_IVs string matrix IV_name_matrix - real rowvector fitstat_vector, orders, combin_at_order, /// - combin_at_order_1less, general_dominance, standardized, /// - focal_row_numbers, nonfocal_row_numbers, cpt_desig, /// + real rowvector fitstat_vector, orders, combin_at_order, + combin_at_order_1less, general_dominance, standardized, + focal_row_numbers, nonfocal_row_numbers, cpt_desig, indicator_weight, antiindicator_weight real colvector binary_pattern, cdl3, cpt_setup_vec, select2IVs - - string colvector IVs - real scalar number_of_IVs, number_of_regressions, display, fitstat, /// + real scalar number_of_IVs, number_of_regressions, display, fitstat, var1, var2, x, y string scalar IVs_in_model - transmorphic cpt_permute_container, t, wchars, pchars, qchars - - /*parse the predictor inputs*/ - t = tokeninit(wchars = (" "), pchars = (" "), qchars = ("<>")) //set up parsing rules - - tokenset(t, iv_string) //register the "iv_string" string scalar for parsing/tokenizing - - IVs = tokengetall(t)' //obtain all IV sets and IVs as a vector - - /*remove characters binding sets together (i.e., "<>")*/ - for (x = 1; x <= rows(IVs); x++) { - - if (substr(IVs[x], 1, 1) == "<") { //if any entry begins with "<" (will be the case for any/all sets)... - - IVs[x] = substr(IVs[x], 1, strlen(IVs[x]) - 1) //first character removed ("<") - - IVs[x] = substr(IVs[x], 2, strlen(IVs[x])) //last character removed (">") - - } - - } + transmorphic cpt_permute_container /*set-up and compute all combinations of predictors and predictor sets*/ number_of_IVs = rows(IVs) //compute total number of IV sets and IVs @@ -117,11 +84,11 @@ void dominance(struct domin_specs scalar model_specs, pointer scalar model_call, fitstat_vector = J(1, cols(IV_indicator_matrix), .) //pre-allocate container vector that will contain fitstats across all models - for (x = 1; x <= number_of_regressions; x++) { //loop to obtain all possible regression subsets + for (x = 2; x <= number_of_regressions; x++) { //loop to obtain all possible regression subsets if (number_of_IVs > 4) { - if (floor(x/number_of_regressions*20) > display) { + if (floor(x/(number_of_regressions-1)*20) > display) { printf(".") @@ -135,12 +102,13 @@ void dominance(struct domin_specs scalar model_specs, pointer scalar model_call, IVs_in_model = invtokens(IV_name_matrix[., x]') //collect a distinct subset of IVs, then collpase names into single string separated by spaces - fitstat = (*model_call)(IVs_in_model, all_subsets_fitstat, /// - constant_model_fitstat, model_specs) //implement called model - will differ for domin vs. domme + fitstat = (*model_call)(IVs_in_model, model_specs) //implement called model - will differ for domin vs. domme - fitstat_vector[1, x] = fitstat //add fitstat to vector of fitstats + fitstat_vector[x] = fitstat //add fitstat to vector of fitstats } + + fitstat_vector[1] = full_fitstat /*define the incremental prediction matrices and combination rules*/ IV_antiindicator_matrix = (IV_indicator_matrix:-1) //matrix flagging which IVs are not included in each model and setting them up for a subtractive effect @@ -162,17 +130,7 @@ void dominance(struct domin_specs scalar model_specs, pointer scalar model_call, general_dominance = colsum((weight_matrix:*fitstat_vector)') //general dominance weights created by computing product of weights and fitstats and summing for each IV row-wise; in implementing the rows are transposed and column summed so it forms a row vector as will be needed to make it an "e(b)" vector - fitstat = rowsum(general_dominance) + all_subsets_fitstat + constant_model_fitstat //total fitstat is then sum of gen. dom. wgts replacing the constant-only model and the "all" subsets stat - st_matrix("r(domwgts)", general_dominance) //return the general dom. wgts as r-class matrix - - standardized = general_dominance:*fitstat^-1 //generate the standardized gen. dom. wgts - - st_matrix("r(sdomwgts)", standardized) //return the stdzd general dom. wgts as r-class matrix - - st_matrix("r(ranks)", mm_ranks(general_dominance'*-1, 1, 1)') //return the ranks of the general dom. wgts as r-class matrix - - st_numscalar("r(fs)", fitstat) //return overall fit statistic in r-class scalar /*compute conditional dominance*/ if (strlen(cdlcompu) == 0) { @@ -245,148 +203,4 @@ void dominance(struct domin_specs scalar model_specs, pointer scalar model_call, } -end - -**# Mata function to execute 'domin-flavored' models -version 12.1 - -mata: - - mata set matastrict on - - real scalar domin_call(string scalar IVs_in_model, - real scalar all_subsets_fitstat, real scalar constant_model_fitstat, /// - struct domin_specs scalar model_specs) - { - - real scalar fitstat - - if (strlen(model_specs.mi) == 0) { //if not multiple imputation, then regular regression - - stata(model_specs.reg + " " + model_specs.dv + " " + /// - model_specs.all + " " + IVs_in_model + " [" + /// - model_specs.weight + model_specs.exp + "] if " + - model_specs.touse + ", " + model_specs.regopts, 1) //conduct regression - - fitstat = st_numscalar(st_local("fitstat")) - all_subsets_fitstat - constant_model_fitstat //record fitstat omitting constant and all subsets values; note that the fitstat to be pulled from Stata is stored as the Stata local "fitstat" - - } - - else { //otherwise, regression with "mi estimate:" - - stata("mi estimate, saving(\`mifile', replace) \`miopt': " + /// - model_specs.reg + " " + model_specs.dv + " " + /// - model_specs.all + " " + IVs_in_model + " [" + /// - model_specs.weight + model_specs.exp + "] if " + - model_specs.touse + ", " + model_specs.regopts, 1) //conduct regression with "mi estimate:" - - stata("mi_dom, name(\`mifile') fitstat(\`fitstat') list(\`=e(m_est_mi)')", 1) //use built-in program to obtain average fitstat across imputations - - fitstat = st_numscalar("r(passstat)") - all_subsets_fitstat - constant_model_fitstat //record fitstat omitting constant and "all" subsets values with "mi estimate:" - - } - - return(fitstat) - - } - -end - -**# Mata function to execute epsilon-based relative importance -version 12.1 - -mata: - -mata set matastrict on - -void eps_ri(string scalar varlist, string scalar reg, string scalar touse, string scalar regopts) -{ - /*object declarations*/ - real matrix X, L, R, Lm, glmdat, glmdats, L2, R2, orth - - real rowvector V, Bt, V2, glmwgts, varloc - - string rowvector orthnames - - real scalar sd_yhat, cor_yhat - - string scalar predmu - - /*begin processing*/ - X = correlation(st_data(., tokens(varlist), st_varindex(touse))) //obtain correlations - - L = R = X[2..rows(X), 2..cols(X)] //set-up for svd() - - V = J(1, cols(X)-1, .) //placeholder for eigenvalues - - svd(X[2..rows(X), 2..cols(X)], L, V, R) //conduct singular value decomposition - - Lm = (L*diag(sqrt(V))*R) //process orthogonalized predictors - - if (reg == "regress") Bt = invsym(Lm)*X[2..rows(X), 1] //obtain adjusted regression weights - - else if (reg == "glm") { //if glm-based... - - glmdat = st_data(., tokens(varlist), st_varindex(touse)) //pull data into Mata - - L2 = R2 = glmdat[., 2..cols(glmdat)] //set-up for svd() - - V2 = V //placeholder for eigenvalues - - glmdats = (glmdat[., 2..cols(glmdat)]:-mean(glmdat[., 2..cols(glmdat)])):/sqrt(diagonal(variance(glmdat[., 2..cols(glmdat)])))' //standardize the input data - - svd(glmdats, L2, V2, R2) //conduct singular value decomposition on full data - - orth = L2*R2 //produce the re-constructed orthogonal predictors for use in regression - - orth = (orth:-mean(orth)):/sqrt(diagonal(variance(orth)))' //standardize the orthogonal predictors - - orthnames = st_tempname(cols(orth)) - - varloc = st_addvar("double", orthnames) //generate some tempvars for Stata - - st_store(., orthnames, st_varindex(touse), orth) //put the orthogonalized variables in Stata - - stata("capture " + reg + " " + tokens(varlist)[1] + " " + invtokens(orthnames) + " if " + touse + ", " + regopts) //conduct the analysis - - if (st_numscalar("c(rc)")) { - - display("{err}{cmd:" + reg + "} failed when executing {cmd:epsilon}.") - - exit(st_numscalar("c(rc)")) - - } - - glmwgts = st_matrix("e(b)") //record the regression weights to standardize - - predmu = st_tempname() //generate some more tempvars for Stata - - sd_yhat = sqrt(variance(orth*glmwgts[1, 1..cols(glmwgts)-1]')) //SD of linear predictor - - stata("quietly predict double " + predmu + " if " + touse + ", mu") //translated with link function - - cor_yhat = correlation((glmdat[., 1], st_data(., st_varindex(predmu), st_varindex(touse)))) - - Bt = (glmwgts[1, 1..cols(glmwgts)-1]:*((cor_yhat[2, 1])/(sd_yhat)))' - - } - - else { //asked for invalid reg - - display("{err}{opt reg(" + reg + ")} invalid with {opt epsilon}.") - - exit(198) - - } - - Bt = Bt:^2 //square values of regression weights - - Lm = Lm:^2 //square values of orthogonalized predictors - - st_matrix("r(domwgts)", (Lm*Bt)') //produce proportion of variance explained and put into Stata - - st_numscalar("r(fs)", sum(Lm*Bt)) //sum relative weights to obtain R2 - -} - -end +end \ No newline at end of file diff --git a/dominance0.mata b/dominance0.mata new file mode 100644 index 0000000..8d45622 --- /dev/null +++ b/dominance0.mata @@ -0,0 +1,248 @@ +*! dominance0.mata version 0.0.0 8/11/2023 Joseph N. Luchman + +version 15 + +**# Container object for domin specs +mata: + +mata set matastrict on + +struct domin_specs { + + string scalar mi, reg, dv, all, weight, exp, touse, regopts + +} + +end + +**# Mata function to compute all combinations of predictors or predictor sets run all subsets regression, and compute all dominance criteria +mata: + +mata set matastrict on + +void dominance0(struct domin_specs scalar model_specs, pointer scalar model_call, /// + string scalar cdlcompu, string scalar cptcompu, string scalar iv_string, /// + real scalar all_subsets_fitstat, real scalar constant_model_fitstat) +{ + /*# object declarations*/ + real matrix IV_antiindicator_matrix, conditional_dominance, /// + weighted_order_fitstats, weighted_order1less_fitstats, /// + weight_matrix, complete_dominance, select2IVfits, /// + IV_indicator_matrix, sorted_two_IVs, compare_two_IVs + + string matrix IV_name_matrix + + real rowvector fitstat_vector, orders, combin_at_order, /// + combin_at_order_1less, general_dominance, standardized, /// + focal_row_numbers, nonfocal_row_numbers, cpt_desig, /// + indicator_weight, antiindicator_weight + + real colvector binary_pattern, cdl3, cpt_setup_vec, select2IVs + + string colvector IVs + + real scalar number_of_IVs, number_of_regressions, display, fitstat, /// + var1, var2, x, y + + string scalar IVs_in_model + + transmorphic cpt_permute_container, t, wchars, pchars, qchars + + /*parse the predictor inputs*/ + t = tokeninit(wchars = (" "), pchars = (" "), qchars = ("<>")) //set up parsing rules + + tokenset(t, iv_string) //register the "iv_string" string scalar for parsing/tokenizing + + IVs = tokengetall(t)' //obtain all IV sets and IVs as a vector + + /*remove characters binding sets together (i.e., "<>")*/ + for (x = 1; x <= rows(IVs); x++) { + + if (substr(IVs[x], 1, 1) == "<") { //if any entry begins with "<" (will be the case for any/all sets)... + + IVs[x] = substr(IVs[x], 1, strlen(IVs[x]) - 1) //first character removed ("<") + + IVs[x] = substr(IVs[x], 2, strlen(IVs[x])) //last character removed (">") + + } + + } + + /*set-up and compute all combinations of predictors and predictor sets*/ + number_of_IVs = rows(IVs) //compute total number of IV sets and IVs + + number_of_regressions = 2^number_of_IVs - 1 //compute total number of regressions + + printf("\n{txt}Total of {res}%f {txt}sub-models\n", number_of_regressions) + + if (number_of_IVs > 12) printf("\n{txt}Computing all independent variable combination sub-models\n") + + IV_indicator_matrix = J(number_of_IVs, 2^number_of_IVs, .) //indicating the IV by it's sequence in the rows (each row is an IV) and presence in a model by the columns (each column is a model); all models and the IVs in those models will be represented in this matrix; the matrix starts off empty/as all missings + + for (x = 1; x <= rows(IV_indicator_matrix); x++) { //fills in the IV indicators matrix - for each row... + + binary_pattern = J(1, 2^(x-1), 0), J(1, 2^(x-1), 1) //...make a binary matrix that grows exponentially; (0,1), then (0,0,1,1), then (0,0,0,0,1,1,1,1) growing in size until it fills the entire set of columns with sequences of 0's and 1's... + + IV_indicator_matrix[x, .] = J(1, 2^(number_of_IVs-x), binary_pattern) //...spread the binary pattern across all rows forcing it to repeat when not long enough to fill all columns - net effect is staggering all binary patters across rows to obtain all subsets in the final matrix + + } + + IV_indicator_matrix = IV_indicator_matrix[., 2..cols(IV_indicator_matrix)] //omit the first column that indicates a model with no IVs + + IV_indicator_matrix = (colsum(IV_indicator_matrix) \ IV_indicator_matrix)' //transpose the indicator matrix and create a column indicating the "order"/number of IVs in the model - used to sort all the models + + IV_indicator_matrix = sort(IV_indicator_matrix, (1..cols(IV_indicator_matrix))) //sort beginning with the order of the model indicator column followed by all other columns's binary values - net effect results in same sort order as cvpermute() as desired/originally designed in domin 3.0 + + orders = (IV_indicator_matrix[.,1]')[rows(IV_indicator_matrix)..1] //keep the orders of each model - sort order is reversed below which is also implemented here + + IV_indicator_matrix = IV_indicator_matrix[., 2..cols(IV_indicator_matrix)]' //omit orders from IV indicators matrix and transpose back to the models as rows + + //IV_indicator_matrix = sort(((cols(IV_indicator_matrix)::1), IV_indicator_matrix'), 1)[., 2..rows(IV_indicator_matrix)+1]' //reverse sort order, functions below expect reversed order + IV_indicator_matrix = IV_indicator_matrix[., cols(IV_indicator_matrix)..1] // reverse the IV indicator matrix's order; functions below expect a reversed order as originally designed in domin 3.0 + + IV_name_matrix = IV_indicator_matrix:*IVs //apply string variable names to all subsets indicator matrix + + /*all subsets regressions and progress bar syntax if predictors or sets of predictors is above 5*/ + display = 1 //for the display of dots during estimation - keeps track of where the regressions are - every 5% of models complete another "." is added + + if (number_of_IVs > 4) { + + printf("\n{txt}Progress in running all sub-models\n{res}0%%{txt}{hline 6}{res}50%%{txt}{hline 6}{res}100%%\n") + + printf(".") + + displayflush() + + } + + fitstat_vector = J(1, cols(IV_indicator_matrix), .) //pre-allocate container vector that will contain fitstats across all models + + for (x = 1; x <= number_of_regressions; x++) { //loop to obtain all possible regression subsets + + if (number_of_IVs > 4) { + + if (floor(x/number_of_regressions*20) > display) { + + printf(".") + + displayflush() + + display++ + + } + + } + + IVs_in_model = invtokens(IV_name_matrix[., x]') //collect a distinct subset of IVs, then collpase names into single string separated by spaces + + fitstat = (*model_call)(IVs_in_model, all_subsets_fitstat, /// + constant_model_fitstat, model_specs) //implement called model - will differ for domin vs. domme + + fitstat_vector[1, x] = fitstat //add fitstat to vector of fitstats + + } + + /*define the incremental prediction matrices and combination rules*/ + IV_antiindicator_matrix = (IV_indicator_matrix:-1) //matrix flagging which IVs are not included in each model and setting them up for a subtractive effect + + combin_at_order = J(1, number_of_regressions, 1):*comb(number_of_IVs, orders) //vector indicating the number of model combinations at each "order"/# of predictors for each models - this is used as a "weight" to construct general dominance statistics + + combin_at_order_1less = J(1, number_of_regressions, 1):*comb(number_of_IVs - 1, orders) //vector indicating the number of model combinations from the previous "order" - this is also used as a "weight" to construct general dominance statistics + + combin_at_order_1less[1] = 0 //replace missing first value in vector with 0 + + combin_at_order = combin_at_order - combin_at_order_1less //remove # of combinations for the "order" less the value at "order" - 1; this gives the number of relevant combinations at each order that include the focal IV and thus produce the right weight for averaging + + indicator_weight = IV_indicator_matrix:*combin_at_order //"spread" the vector indicating number of combinations at the current order involving the relevant IV to all models including that IV - to be used as a weight for summing the fitstat values in raw form (not increments) + + antiindicator_weight = IV_antiindicator_matrix:*combin_at_order_1less //"spread" the vector indicating number of combinations at the previous order not involving the relevant IV to all models not including the relevant IV - these components assist to subtract out values to make the values "increments" + + /*define the full design matrix - compute general dominance (average conditional dominance across orders)*/ + weight_matrix = ((indicator_weight + antiindicator_weight):*number_of_IVs):^-1 //combine weight matrices (which reflect the conditional dominance weights/within order averages) with the number of ivs (between order averages) and invert cell-wise - now can be multiplied by fit stat vector and summed to obtain general dominance statistics + + general_dominance = colsum((weight_matrix:*fitstat_vector)') //general dominance weights created by computing product of weights and fitstats and summing for each IV row-wise; in implementing the rows are transposed and column summed so it forms a row vector as will be needed to make it an "e(b)" vector + + fitstat = rowsum(general_dominance) + all_subsets_fitstat + constant_model_fitstat //total fitstat is then sum of gen. dom. wgts replacing the constant-only model and the "all" subsets stat + + st_matrix("r(domwgts)", general_dominance) //return the general dom. wgts as r-class matrix + + standardized = general_dominance:*fitstat^-1 //generate the standardized gen. dom. wgts + + st_matrix("r(sdomwgts)", standardized) //return the stdzd general dom. wgts as r-class matrix + + st_matrix("r(ranks)", mm_ranks(general_dominance'*-1, 1, 1)') //return the ranks of the general dom. wgts as r-class matrix + + st_numscalar("r(fs)", fitstat) //return overall fit statistic in r-class scalar + + /*compute conditional dominance*/ + if (strlen(cdlcompu) == 0) { + + if (number_of_IVs > 5) printf("\n\n{txt}Computing conditional dominance\n") + + conditional_dominance = J(number_of_IVs, number_of_IVs, 0) //pre-allocate contrainer matrix to hold conditional dominance stats + + weighted_order_fitstats = ((IV_indicator_matrix:*combin_at_order):^-1):*fitstat_vector // create matrix fit stats weighted by within-order counts - to be summed to create the conditional dominance statistics; these fit stats are "raw"/not incrments without the antiindicators + + weighted_order1less_fitstats = ((IV_antiindicator_matrix:*combin_at_order_1less):^-1):*fitstat_vector // create matrix fit stats weighted by within-order counts at the previous order - these are negative and also to be summed to create the conditional dominance statistics; they ensure that the values are increments + + /*loop over orders/number of predictors to obtain average incremental prediction within order*/ + for (x = 1; x <= number_of_IVs; x++) { //proceed order by order + + conditional_dominance[., x] = rowsum(select(weighted_order_fitstats, orders:==x)):+rowsum(select(weighted_order1less_fitstats, orders:==x-1)) //sum the weighted fit statistics at the focal order and the weighted (negative) fit statistics from the previous order + + } + + st_matrix("r(cdldom)", conditional_dominance) //return r-class matrix "cdldom" + + } + + /*compute complete dominance*/ + if (strlen(cptcompu) == 0) { + + if (number_of_IVs > 5) printf("\n{txt}Computing complete dominance\n") + + complete_dominance = J(number_of_IVs, number_of_IVs, 0) //pre-allocate matrix for complete dominance + + cpt_setup_vec = (J(2, 1, 1) \ J(number_of_IVs - 2, 1, 0)) //set-up a vector the length of the number of IVs with two "1"s and the rest "0"s; used to compare each 2 IVs via 'cvpermute()' + + cpt_permute_container = cvpermutesetup(cpt_setup_vec) //create a 'cvpermute' container to extract all permutations of two IVs + + for (x = 1; x <= comb(number_of_IVs, 2); x++) { + + select2IVs = cvpermute(cpt_permute_container) //invoke 'cvpermute' on the container - selects a unique combination of two IVs + + select2IVfits = colsum(select(IV_indicator_matrix, select2IVs)):==1 //filter IV indicator matrix to include just those fit statistic columns that inlude focal IVs - then only keep the columns that have one (never both or neither) of the IVs + + focal_row_numbers = (1..rows(IV_indicator_matrix)) //sequence of numbers for selecting specific rows indicating focal IVs + + nonfocal_row_numbers = select(focal_row_numbers, (!select2IVs)') //select row numbers for all non-focal IVs; needed for sorting (below) + + focal_row_numbers = select(focal_row_numbers, select2IVs') //select row numbers for all focal IVs; also needed for sorting + + sorted_two_IVs = /// + sort((select((IV_indicator_matrix \ fitstat_vector), select2IVfits))', /// //combine the indicators for IVs and fit statistic matrix - selecting only those columns corresponding with fit statistics where the focal two IVs are located, then... + (nonfocal_row_numbers, focal_row_numbers)) // ...sort this matrix on the nonfocal IVs first then the focal IVs. This ensures that sequential rows are comparable ~ all odd numbered rows can be compared with its subsequent even numbered row + + compare_two_IVs = /// + (select(sorted_two_IVs[,cols(sorted_two_IVs)], mod(1::rows(sorted_two_IVs), 2)), /// //select the odd rows for comparison (see constuction of 'sorted_two_IVs') + select(sorted_two_IVs[,cols(sorted_two_IVs)], mod((1::rows(sorted_two_IVs)):+1, 2))) //select the even rows for comparison + + cpt_desig = /// + (all(compare_two_IVs[,1]:>compare_two_IVs[,2]), /// //are all fit statistics in odd/first variable larger than the even/second variable? + all(compare_two_IVs[,1]:" //add constraint set to all other constraints - the chevrons bind sets together; Mata separates them + + mata: model_specs.param_list.put("_set:set" + st_local("setcount"), /// + st_local("cset" + st_local("setcount"))) // add the parameters associated with the name to the parameter mapping*/ + + } + + } + + if ( `:list sizeof ivs' < 2 ) & !`drop_exit' { //exit if too few parameter estimates/sets (otherwise prodices cryptic Mata error) + + display "{err}{cmd:domme} requires at least 2 parameter estimates " /// + "or parameter estimate sets." + + local drop_exit = 1 //indicate that constraints will be dropped + + } + + if strlen("`addconstr'") { //if there are custom constraints added to the analysis by number... + + foreach constrnum of numlist `addconstr' { //foreach constraint... + + local ivs "`ivs' _cns:cns`constrnum'" // format the constraint as a domme-able parameter + + local constrs "`constrs' `constrnum'" // add the constraint to the domme-able constraint list + + } + + } + + local nobrkt_constrs = /// + subinstr(subinstr("`constrs'", ">", "", .), "<", "", .) //local macro where chevrons in constraint list is removed to estimate the constant model + + /*finalize setup*/ + //**markouts and sample**// + tempvar touse keep //declare sample marking variables + + tempname obs allfs consfs //declare temporary scalars + + mark `touse' //declare marking variable using Stata norm "touse" + + quietly generate byte `keep' = 1 `if' `in' //generate tempvar that adjusts for "if" and "in" statements + + markout `touse' `keep' //mark sample for only really works with the if & in; all other missing-based adjustments must derive from running full model + + //**inital run to ensure the syntax works and restrict sample based on full model**// + if !`drop_exit' capture `reg' [`weight'`exp'] if `touse', `ropts' constraints(`extraconstr') //run of overall analysis assuming all is well so far; intended to check e(sample) and whether everything works as it should + + if _rc & !`drop_exit' { //exit if model is not estimable or program results in error - return the returned code + + display "{err}{cmd:`reg'} resulted in an error." + + local exit_code = `=_rc' + + local drop_exit = 1 //indicate that constraints will be dropped + + } + + //**fitstat processing**// + if regexm("`fitstat'", "^e\(\)") & !`drop_exit' { //if internal fitstat indication is in "fitstat" option; must have "e()" + + gettoken omit fit_opts: fitstat, parse(",") //parse the fitstat to see if "mcf", "est", "aic", or "bic" present + + compute_fitstat `fit_opts' //return fitstat using internal computation program; this is just to determine if the options given work + + if `r(many_few)' { //something's wrong - probably too many or too few fitstat options provided + + local drop_exit = 1 //indicate that constraints will be dropped + + display "{err}Options offered to e() fitstat computation " /// + "resulted in error." + + } + + else { //...otherwise nothing's wrong - indicate to program to use built in approach + + local fitstat = "r(fitstat)" //placeholder "returned" statistic - changed prior to program completion + + mata: st_numscalar("r(fitstat)", 0) //placeholder "returned" statistic's value + + local built_in = 1 //indicate that the program is to produce it's own fitstat + + local built_in_style "`r(style)'" //indicate the style of built in statistic to ask for + + } + + } + + else local built_in = 0 //if not built in, then indicate the estimation model produces it's own fitstat + + capture assert `fitstat' != . //is the "fitstat" the user supplied actually returned by the command? + + if _rc & !`drop_exit' { //exit if fitstat can't be found + + display "{err}{cmd:`fitstat'} not returned by {cmd:`reg'} or " /// + "{cmd:`fitstat'} is not scalar valued. See {help return list}." /// + _newline "Alternatively, {err}{cmd:`fitstat'} may have problems " /// + "being computed internally with the information supplied." + + + local drop_exit = 1 //indicate that constraints will be dropped + + } + + capture assert sign(`fitstat') != -1 //what is the sign of the fitstat? -domme- works best with positive ones - warn and proceed + + if _rc & !`drop_exit' { //merely note that -domme- works best with positive ones (that's what's expected) + + display "{err}{cmd:`fitstat'} returned by {cmd:`reg'}" _newline /// + "is negative. {cmd:domme} is programmed to work best" _newline /// + "with positive {opt fitstat()} summary statistics." _newline + + } + + //**count observations**// + if !`drop_exit' { //if nothing's wrong so far... + + quietly replace `touse' = e(sample) //replace sample marking variable with e(sample) from the model run + + if !inlist("`e(wtype)'", "iweight", "fweight") { //if weights don't affect obs (for probability and analytic weights) + + quietly count if `touse' //tally up "touse" + + scalar `obs' = r(N) //pull out the number of observations included + + } + + else if inlist("`e(wtype)'", "iweight", "fweight") { //if the weights do affect obs (for frequency and importance weights) + + quietly summarize `=regexr("`e(wexp)'", "=", "")' if `touse' //tally up "touse" by summing weights + + scalar `obs' = r(sum) //pull out the number of observations included + + } + + //**obtain parameters considered to be "part of model" and not adjusted out of fitstat**// + scalar `allfs' = 0 //defining fitstat of "all subsets" parameters as 0 - needed for dominance() function + + if `:list sizeof all' { //if there is something in the "all" option + + local 2 "`all'" //rename the content of "all" to generalize and simplify loop below + + while strlen("`2'") & !`drop_exit' { //process the equation-to-independent variable mapping for "all"... + + gettoken 1 2: 2, bind //parse the equation-to-independent variable mapping to bind all parentheses together and pull out first equation + + if ( substr("`1'", 1, 1) != "(" ) /// + | ( substr("`1'", strlen("`1'"), strlen("`1'")) != ")" ) { //if a left paren does not begin and a right paren does not end the first all equation... + + display "{err}Equation {cmd:`1'} in {cmd:all()} not bound by " /// + "parentheses." + + local drop_exit = 1 //indicate that constraints will be dropped + + continue, break //stop the -while- loop... + + } + + if !regexm("`1'", "=") { //exit if there is no equal sign to make an equation + + display "{err}Equation {cmd:`1'} in {cmd:all()} is missing a " /// + "{cmd:=} to distinguish equation and independent variable names." + + local drop_exit = 1 //indicate that constraints will be dropped + + continue, break //stop the -while- loop... + + } + + local 1 = subinstr("`1'", "(", "", 1) //remove left paren + + local 1 = subinstr("`1'", ")", "", 1) //remove right paren + + gettoken dv ivlist: 1, parse("=") //further parse the focal equation to separate out dependent from independent variables + + if ( `: list sizeof dv' != 1 ) | regexm("`dv'", "=") { //multiple dependent variables or no dependent variable where one should be... exit + + display "{err}Invalid equation name specified for " /// + "{cmd:(`dv'`ivlist')} in {cmd:all()}." + + local drop_exit = 1 //indicate that constraints will be dropped + + continue, break //stop the -while- loop... + + } + + local ivlist = regexr("`ivlist'", "=", "") //remove the equal sign from the independent variable list + + if ( `: list sizeof ivlist' == 0 ) { //empty independent vatiable list... exit as it breaks process + + display "{err}Empty set of independent variables specified for " /// + "equation {cmd:`dv'} in {cmd:all()}." + + local drop_exit = 1 //indicate that constraints will be dropped + + continue, break //stop the -while- loop... + + } + + local dv = trim("`dv'") //remove white spaces in dependent variable which can affect putting the string into equations + + foreach iv of local ivlist { //loop over the independent variables in the all equations... + + if substr("`dv'", 1, 1) == "~" { //method for parsing (G)SEM parameters with Stata defaults + + local dv "/" //(g)sem's DV is a forward slash + + local iv = subinstr("`iv'", "+", " ", .) //remove all plus signs required for covariance + + if `: list sizeof iv' == 3 /// //formatting if this is a covariance statement (situation where there would be 3 items in the parsed "independent variable") + local iv "`: word 1 of `iv''(`: word 2 of `iv'',`: word 3 of `iv'')" + + else local iv "`: word 1 of `iv''(`: word 2 of `iv'')" //formatting if this is a standard parameter + + } + + capture constraint free //find an unused constraint number + + if !_rc { //if an unused constraint can be found... + + local constr `r(free)' //use the free constraint + + constraint `constr' _b[`dv':`iv'] = 0 //establish this constraint as one that -domme- will use + + local add "`dv':`iv'" //set up the parameter estimate label + + local allset "`allset' `add'" //add the parameter estimate label to set + + local allcset "`allcset' `constr'" //add constraint to set + + } + + else { //if no unused constraints remain... + + display "{err}{cmd:domme} cannot make any more constraints as the " /// + "{help constraint dir} is full (see {help constraint drop})." + + local drop_exit = 1 //indicate that constraints will be dropped + + continue, break //stop the -while- loop... + + } + + if `drop_exit' continue, break //stop the -forvalues- loop... + + } + + } + + } + + //**obtain "constant" model which will adjusted out of fitstat**// + scalar `consfs' = 0 //define constant-only model fitstat as 0 - needed for dominance() function + + quietly `reg' [`weight'`exp'] if `touse', `ropts' /// + constraints(`nobrkt_constrs' `allcset' `extraconstr') //all constraints used - this estimates the "constant" model + + if `built_in' { + + compute_fitstat `fit_opts' iscons //if a built-in fistat desired, estimate it; note constant model + + scalar `consfs' = `fitstat' + + } + + if !`built_in' & !missing(`fitstat') /// + scalar `consfs' = `fitstat' //return constant model's fitstat if user supplied and not missing + + if !`built_in' & missing(`fitstat') scalar `consfs' = 0 //otherwise assume the constant-only model is 0 + + if strlen("`all'") { //distinguishes "all subsets" from "constant" fitstats/models + + quietly `reg' [`weight'`exp'] if `touse', `ropts' /// + constraints(`nobrkt_constrs' `extraconstr') //all "subsets" and extra constraints used - this estimates the "all" model + + if `built_in' compute_fitstat `fit_opts' consmodel(`=`consfs'') //if a built-in fistat desired, estimate it; supply constant model's value + + scalar `allfs' = `fitstat' //return all subset model's fitstat as distinct from any constant model + + + } + + if `built_in' local cons_no_add = /// + !inlist("`built_in_style'", "mcf", "est") //local macro indicating that the constant model fitstat should be included in the total fitstat - will be 0 for built in "mcf" and "est" and the actual value of "consfs" for non-built in where "consmodel" is not chosen will be 0 + + else local cons_no_add = 1 //...all other cases can be allowed to be 0 + + if strlen("`all'") scalar `allfs' = `allfs' - `consfs'*`cons_no_add' //adjust all subsets fitstat for the constant fitstat in situations where it's not 0 + + /*dominance statistic determination*/ + mata: model_specs.built_in = strtoreal(st_local("built_in")) + mata: model_specs.constraint_list = tokens(st_local("nobrkt_constrs")) + mata: model_specs.built_in_cons = st_numscalar(st_local("consfs")) + + mata: dominance0(model_specs, &domme_call(), /// + st_local("conditional'"), st_local("complete"), /// + st_local("ivs"), /// + st_numscalar(st_local("allfs")), /// + st_numscalar(st_local("consfs"))*strtoreal(st_local("cons_no_add"))) //invoke "dominance()" function in Mata + + /*translate r-class results from me_dominance() into temp results*/ + matrix `gendom' = r(domwgts) //general dominance statistics + + matrix `stzd_gendom' = r(sdomwgts) //standardized general dominance statistics + + matrix `ranks' = r(ranks) //ranks based on general dominance statistics + + if !strlen("`conditional'") matrix `cdldom' = r(cdldom) //conditional dominance statistic matrix + + if !strlen("`complete'") matrix `cptdom' = r(cptdom) //complete dominance designation matrix + + /*processing display results*/ + //**name matrices**// + matrix colnames `gendom' = `ivs' //name the columns of general dominance statistic vector + + if strlen("`reverse'") { //reverse the direction and interpretation of ranked and standardized general dominance statistics + + mata: st_matrix("`stzd_gendom'", /// + ( st_matrix("`gendom'"):*-1 ):/sum( st_matrix("`gendom'"):*-1 ) ) //reverse the signs of the standardized general dominance statistics + + mata: st_matrix("`ranks'", /// + ( ( st_matrix("`ranks'"):-1 ):*-1 ):+cols( st_matrix("`ranks'") ) ) //reverse the sign of the ranked general dominance statistics + + } + + matrix colnames `stzd_gendom' = `ivs' //name the columns of stanadrdized general dominance statistic vector + + matrix colnames `ranks' = `ivs' //name the columns of ranked general dominance statistic vector + + if !strlen("`complete'") { //if the complete dominance matrix was not suppressed... + + local cptivs = subinstr("`ivs'", ":", "_", .) //must remove colons between equation and independent variable; only 1 colon allowed - "dominates?" and "dominated?" are equations for complete dominance matrix + + local cptivs = subinstr("`ivs'", ".", "_", .) //must also remove "."s - causes error in naming + + if strlen("`reverse'") mata: /// + st_matrix("`cptdom'", st_matrix("`cptdom'"):*-1 ) //reverse the sign of the complete dominance designations + + matrix colnames `cptdom' = `cptivs' //name the columns of the complete dominance designations + + mata: st_matrixcolstripe("`cptdom'", /// + ("?":+st_matrixrowstripe("`cptdom'")[,1], st_matrixrowstripe("`cptdom'")[,2])) //add name the equation for the rows ">?" replacement for "dominates?" + + } + + if !strlen("`conditional'") { //if the conditional dominance matrix was not suppressed... + + matrix rownames `cdldom' = `ivs' //name the rows of the conditional dominance matrix + + local colcdl `:colnames `cdldom'' //the columns of the conditional dominance matrix are at defaults "c1 c2 ... cN" + + local colcdl = subinstr("`colcdl'", "c", "", .) //remove the "c"s from all the rownames; keep the values + + matrix colnames `cdldom' = `colcdl' //replace the column names of the conditional dominance matrix with the number of "orders" which matches their counting sequennce + + matrix coleq `cdldom' = #param_ests //equation names for all columns are "#param_ests" + + } + + if strlen("`e(title)'") local title "`e(title)'" //if the estimation command has an "e(title)" returned, save it + + else if !strlen("`e(title)'") & strlen("e(cmd)") local title "`e(cmd)'" //...otherwise save the "e(cmd)" as that's informative too + + else local title "Custom user analysis" //...finally, if none of the options are returned, call it "custom user analysis" + + /*return values*/ + ereturn post `gendom' [`weight'`exp'], obs(`=`obs'') esample(`touse') //primary estimation command returned value command; clears ereturn and returns "gendom" as e(b) + + if strlen("`setcount'") { //if there are sets... + + ereturn hidden scalar setcount = `setcount' //hidden scalar for use in "display" + + forvalues set = 1/`setcount' { //for each set... + + ereturn local set`set' = trim("`set`set''") //make a separate local macro with it's parameter estimate label contents + + } + + } + + else ereturn hidden scalar setcount = 0 //...otherwise hidden set count is 0 + + ereturn hidden local disp_title "`title'" //hidden title for display (hence, "disp_title") + + ereturn hidden local reverse "`reverse'" //hidden indicator for reverse - for display + + if `:list sizeof all' ereturn local all = strtrim("`allset'") //parameter estimate labels in all subsets + + if strlen("`ropts'") ereturn local ropts `"`ropts'"' //if there were regression command options return them as macro + + ereturn local reg "`reg'" //return command used in -domme- in macro + + if `built_in' local fitstat "e(`built_in_style')" //if a built-in is used, change "r(fitstat)" to the "e()" form with fitstat name + + ereturn local fitstat "`fitstat'" //return the name of the fitstat used + + ereturn local cmd "domme" //this command is -domme-; return that + + ereturn local title `"Dominance analysis for multiple equations"' //The title/description of -domme- + + ereturn local cmdline `"domme `0'"' //full command as read into -domme- the 0 macro is everything after the command's name + + ereturn scalar fitstat_o = r(fs) //overall fitstat value + + if `:list sizeof all' ereturn scalar fitstat_a = `allfs' + /// + `consfs'*`cons_no_add' //all subsets fitstat value + + if sign( `consfs'*`cons_no_add' ) ereturn scalar fitstat_c = /// //constant model fitstat value + `consfs'*`cons_no_add' + + if !strlen("`conditional'") ereturn matrix cdldom `cdldom' //return conditional dominance matrix + + if !strlen("`complete'") ereturn matrix cptdom `cptdom' //return complete dominance designations + + ereturn matrix ranking `ranks' //return ranked general dominance vector + + ereturn matrix std `stzd_gendom' //return standardized general dominance vector + + /*begin display*/ + Display + + } + + /*drop constraints -domme- made; not ones user supplied*/ + if missing(`exit_code') local exit_code = 198 //make the default exit code 198 + + if `drop_exit' { //if there was a problem during the program's execution + + if `: list sizeof constrs' { //if there are constraints used... + + foreach constr of numlist `constrs' { //go through each constraint that was made... + + constraint drop `constr' //drop the constraint + + } + + } + + exit `exit_code' //exit using applicable exit code + + } + + foreach constr of numlist `nobrkt_constrs' { //go through each constraint that was made... + + constraint drop `constr' //drop the constraint + + } + +end + + +/*Display program*/ +program define Display + + version 15.1 + + /*set up*/ + tempname gendom stzd_gendom ranks //declare names for temporary data + + matrix `gendom' = e(b) //as in original command "gendom" is general dominance statistics vector; now in e(b) + + matrix `stzd_gendom' = e(std) //as in original command "stzd_gendom" is standardized general dominance statistics vector; now in e(std) + + matrix `ranks' = e(ranking) //as in original command "ranks" is ranked general dominance statistics vector; now in e(ranking) + + local diivs: colnames e(b) //obtain independent variable names + + local eqivs: coleq e(b) //obtain dependent variable/equation names + + mata: st_local("cdltest", strofreal(cols(st_matrix("e(cdldom)")))) //indicator macro for presence of conditional dominance matrix + + mata: st_local("cpttest", strofreal(cols(st_matrix("e(cptdom)")))) //indicator macro for presence of complete dominance matrix + + tokenize "`diivs'" //tokenize list of dependent variables to associate numbers with independent variables + + /*begin displays*/ + display _newline "{txt}General dominance statistics: `e(disp_title)'" /// + _newline "{txt}Number of obs{col 27}={res}{col 40}" %12.0f e(N) + + display "{txt}Overall Fit Statistic{col 27}={res}{col 36}" /// + %16.4f e(fitstat_o) + + if !missing( e(fitstat_a) ) display "{txt}All Subsets Fit Stat." /// + "{col 27}={res}{col 36}" %16.4f e(fitstat_a) + + if !missing( e(fitstat_c) ) display "{txt}Constant-only Fit Stat." /// + "{col 27}={res}{col 36}" %16.4f e(fitstat_c) + + display _newline "{txt}{col 13}{c |}{col 20}Dominance" /// + "{col 35}Standardized{col 53}Ranking" + + display "{txt}{col 13}{c |}{col 20}Stat.{col 35}Domin. Stat." + + display "{txt}{hline 12}{c +}{hline 72}" + + local current_eq "" //for separating equation from independent variable names in -forvalues- loop below + + forvalues iv = 1/`:list sizeof diivs' { //for each entry of the e(b) vector... + + if "`current_eq'" != abbrev("`: word `iv' of `eqivs''", 11) /// + display `"{res}`=abbrev("`: word `iv' of `eqivs''", 11)'{txt}{col 13}{c |}"' //...display equation name only if it changes + + local current_eq = abbrev("`: word `iv' of `eqivs''", 11) //note current equation - truncate to 11 chars + + local `iv' = abbrev("``iv''", 10) //abbreviate independent variable to 10 chars + + display "{txt}{col 2}{lalign 11:``iv''}{c |}{col 14}{res}" /// + %15.4f `gendom'[1,`iv'] "{col 29}" %12.4f /// + `stzd_gendom'[1,`iv'] "{col 53}" %-2.0f `ranks'[1,`iv'] + + } + + display "{txt}{hline 12}{c BT}{hline 72}" + + if `cdltest' { //if conditional dominance matrix exists.... + + display "{txt}Conditional dominance statistics" _newline "{hline 85}" + + matrix list e(cdldom), noheader format(%12.4f) + + display "{txt}{hline 85}" + + } + + if `cpttest' { //if complete dominance designations exist... + + display "{txt}Complete dominance designation" _newline "{hline 85}" + + matrix list e(cptdom), noheader + + display "{txt}{hline 85}" + + } + + if `=`cpttest'*`cdltest'' { //if _both_ complete and conditional dominance designations exist - determine strongest dominance designation between each pair of parameter estimates + + display _newline "{res}Strongest dominance designations" _newline + + tempname bestdom cdl gen decision //declare temporary names for strongest designation search + + if strlen("`e(reverse)'") mata: st_matrix("`bestdom'", /// //start by determining complete dominance as "best" - if reversed, then reflect values over 0 + st_matrix("e(cptdom)"):*-1) + + else matrix `bestdom' = e(cptdom) //...otherwise take complete dominance values as is + + forvalues dominator = 1/`=colsof(e(cdldom))-1' { //search through all columns save last... + + forvalues dominatee = `=`dominator'+1'/`=colsof(e(cdldom))' { //...as well as all columns save first, dependent on dominator + + scalar `cdl' = 0 //define conditional dominance as 0 or "not" + + scalar `gen' = 0 //define general dominance as 0 or "not" + + mata: st_numscalar("`cdl'", /// + ( sum( st_matrix("e(cdldom)")[`dominator', .] /// //...sum the number of times the values across all columns of the conditional dominance matrix for the row corresponding to the "dominator" + :>st_matrix("e(cdldom)")[`dominatee', .] ) ) /// //...and compred to the values across the same columns of the conditional dominance matrix for the row corresponding to the "dominatee" + :==rows( st_matrix("e(cdldom)") ) ) //...if that's equal to the number of columns - thus every row for the "dominator" is bigger than every row for the "dominatee" - "dominator" conditionally dominates "dominatee" + + if !`cdl' mata: /// + st_numscalar("`cdl'", -1*((sum(st_matrix("e(cdldom)")[`dominator', .] /// + :st_matrix("e(b)")[1, `dominatee']) //compare dominator's general dominance statistic to dominatee's + + if !`gen' mata: st_numscalar("`gen'", /// + (st_matrix("e(b)")[1, `dominator'] 3 local many_few = 1 //too many fitstat options selected + + if !strlen("`ll'") local ll = "e(ll)" + + if strlen("`mcf'`est'") { + + if !strlen("`obs'") & strlen("`est'") local obs = "e(N)" + + if strlen("`iscons'") local fitstat = `ll' //baseline + + else if !strlen("`iscons'") & strlen("`est'") local /// + fitstat = 1 - (`ll'/`consmodel')^(-2*`consmodel'/`obs') //Estrella R2 + + else local fitstat = 1 - `ll'/`consmodel' //McFadden R2 + + } + + else if strlen("`aic'`bic'") { + + if !strlen("`parm'") local parm = "e(rank)" + + if !strlen("`obs'") & strlen("`bic'") local obs = "e(N)" + + if strlen("`aic'") local fitstat = -2*(`ll') + 2*`parm' //AIC + + else local fitstat = -2*(`ll') + `parm'*ln(`obs') //BIC + + } + + else local many_few = 1 + + return local style "`aic'`bic'`mcf'`est'" + + return scalar many_few = `many_few' + + return scalar fitstat = `fitstat' + +end + +**# Container object for domme specs +version 15.1 + +mata: + + mata set matastrict on + + struct domme_specs { + + real scalar built_in, built_in_cons + + string rowvector constraint_list + + class AssociativeArray scalar param_list + + } + +end + +**# Mata function to execute 'domme-flavored' models +version 15.1 + +mata: + + mata set matastrict on + + real scalar domme_call(string scalar Params_in_Model, /// + real scalar all_subsets_fitstat, real scalar constant_model_fitstat, /// + struct domme_specs scalar model_specs) + { + + real scalar fitstat, loop_length, x + + string scalar Constraints, remove_list, key + + string rowvector Param_List, out_list + + real rowvector which_constraints + + Param_List = tokens(Params_in_Model) + + loop_length = length(Param_List) + + remove_list = "" + + for (x = 1; x <= loop_length; x++) { + + remove_list = remove_list + " " + /// + model_specs.param_list.get(Param_List[x]) + + } + + out_list = tokens(remove_list) + + which_constraints = /// + !colsum( /// + J(length(out_list), 1, model_specs.constraint_list):==out_list' /// + ) + + Constraints = invtokens(select(model_specs.constraint_list, which_constraints)) + + stata("\`reg' [\`weight'\`exp'] if \`touse', \`ropts'" + /// + " constraints(" + Constraints + " \`extraconstr')", 1) //estimate model/regression in Stata + + + if (model_specs.built_in) { + + stata("compute_fitstat \`fit_opts' consmodel(" /// + + strofreal(model_specs.built_in_cons) + ")", 1) + + fitstat = st_numscalar("r(fitstat)") + + } + + else fitstat = st_numscalar(st_local("fitstat")) + + fitstat = fitstat - /// + all_subsets_fitstat - /// + constant_model_fitstat + + return(fitstat) + + } + +end + +/* programming notes and history + - domme version 1.0 - date - July 2, 2019 + -base version + // 1.0.1 - April 17, 2021 (initiating new versioning: #major.#minor.#patch) + -update to documentation for SJ article + -bug fix on constraint dropping with all() option and use with xi: +--- + domme version 1.1.0 - February 7, 2023 + - leverage dominance() function in -domin-; create own function to function passing; Mata struct to handle input specs + - -domin- now a dependency. + - use an AssociativeArray to map parameters/parameter sets to constraints in a way that is conformable with -domin- + - extensive documentation update + - fixed complete dominance table display; retains equation name and adds informative prefix + // 1.1.1 - March 7, 2023 + - call 'dominance.mata' as opposed to using 'lb_dominance.mlib' to allow backward compatability + // 1.1.2 - August 14, 2023 + - migrated to a sub-method in the -domin- module + - call temporary 'dominance0.mata' (v 0.0) for time being until re-designed similar to -domin- to use 'dominance.mata' versions > 0.0 +*/ diff --git a/domme.sthlp b/domme.sthlp new file mode 100644 index 0000000..6ae9421 --- /dev/null +++ b/domme.sthlp @@ -0,0 +1,385 @@ +{smcl} +{* *! version 1.1.2 August 14, 2023 J. N. Luchman}{...} +{cmd:help domme} + +{title:Title} + +{pstd} +{bf:domme} {hline 2} {ul on}Dom{ul off}inance analysis for {ul on}m{ul off}ulitple {ul on}e{ul off}quation models{p_end} + + +{title:Syntax} + +{p 8 16 2} +{cmd:domme} [{cmd:(}{it:eqname1 = parmnamelist1}{cmd:)} +{cmd:(}{it:eqname2 = parmnamelist2}{cmd:)} ... +{cmd:(}{it:eqnameN = parmnamelistN}{cmd:)}] +{ifin} {weight}{cmd:,} [{it:options}] + +{synoptset 35 tabbed}{...} +{synopthdr} +{synoptline} +{syntab:Model} +{synopt :{opt r:eg(full_estimation_command)}}preditive model command to call{p_end} +{synopt :{opt rop:ts(command_options)}}options to command in {opt reg()}{p_end} +{synopt :{opt f:itstat(fitstat_opts)}}fit statistic returned by {opt reg()} or computed using built-in method{p_end} +{synopt :{opt s:ets([PEset_1] ... [PEset_x])}}sets of indepdendent variables{p_end} +{synopt :{opt a:ll(PEall)}}indepdendent variables included in all subets{p_end} + +{syntab:Reporting} +{synopt :{opt nocon:ditional}}suppresses computation of conditional dominance statistics{p_end} +{synopt :{opt nocom:plete}}suppresses computation of complete dominance designations{p_end} +{synopt :{opt rev:erse}}reverses interpretation for statistics that decrease with better fit{p_end} +{synoptline} +{p 4 6 2} +Command in {opt reg()} must accept {help constraint}s as a command option. {p_end} + +{p 4 6 2} +{cmd:aweight}s, {cmd:fweight}s, {cmd:iweight}s, and {cmd:pweight}s are +allowed; see {help weight}. Weight use is restricted to commands in {opt reg()} +that accept them.{p_end} + +{p 4 6 2} +Note that {cmd:domme} requires at least two parameter estimates or +sets of parameter estimates (see option {opt sets()} below). +Because it is possible to submit only sets of parameter estimates, the +initial parameter estimates specification statement is optional. {p_end} + +{p 4 6 2} +{cmd:domme} requires installation of the {cmd:moremata} package +(install {stata ssc install moremata:here}). {p_end} + + +{title:Table of Contents} + +{space 4}{help domme##desc: 1. Description} +{space 4}{help domme##setup: 2. Set-up} +{space 4}{help domme##opts: 3. Options} +{space 4}{help domme##sav: 4. Saved Results} +{space 4}{help domme##examp: 5. Examples} +{space 4}{help domme##remark: 6. Final Remarks} +{space 4}{help domme##refs: 7. References} + + +{marker desc}{...} +{title:1. Description} + +{pstd} +Dominance analysis (DA) is a methodology for determining the relative +importance of independent variables or parameter estimates in a predictive model. +The {cmd: domme} approach extends on {cmd: domin} by being able to accommodate more +predictive models such as those with multiple equations/dependent variables +(see Luchman, Lei, and Kaplan, 2020 for a discussion). As an extension of single +equation dominance analysis, it is recommended that the user familiarize +themselves with single equation dominance analysis (i.e., see {help domin:domin}) +before attempting to use the multiple equation version of the methodology. + +{pstd} +This multiple equation DA implementation differs from the implementation of +the single equation version of DA in how it includes or excludes parameter +estimates from the model. Multiple equation DA uses {help constraint}s to exclude +parameter estimates from the model by constraining their values to be 0. When constrained +to be 0, the parameter estimate cannot affect prediction and, thus, is effectively +excluded. This is why {cmd: domme} can only be used with commands that accept +constraints. + + +{marker setup}{...} +{title:2. Set-up} + +{pstd} +This implementation of multiple equatuion DA must be provided with the components +of the parameter estimates from which it will create "parameter equals 0" +constraints. How these constraints are constructed follows from the way in which +Stata names each parameter estimate. Consider, for example, the following logistic +regression model (estimated on the {cmd: sysuse auto} data). + +{pstd} +{cmd:logit foreign price mpg turn trunk} + +{pstd} +Following the estimation of this model, the user can ask for the names of all +the parameters in the {cmd: e(b)} or coefficient matrix by using: + +{pstd} +{cmd: display "`: colfullnames e(b)'"} + +{pstd} +which produces + +{pstd} +{res:foreign:price foreign:mpg foreign:turn foreign:trunk foreign:_cons} + +{pstd} +This series of names are the parameter names for all the coefficients in the +logit model. In order to use DA on this model, the user needs +to supply {cmd: domme} with the names of all the parameters that will be used. +One way to supply these parameter names is to use the initial +{res:(eqname = parmnamelist)} statements. For instance: + +{pstd} +{cmd: domme (foreign = price mpg turn trunk), ...} + +{pstd} +implies four constraints: + +{phang} +{res: _b[foreign:price] = 0} + +{phang} +{res: _b[foreign:mpg] = 0} + +{phang} +{res: _b[foreign:turn] = 0} + +{phang} +{res: _b[foreign:trunk] = 0} + +{pstd} +{cmd: domme} uses these constraints to "remove" parameters by constraining their +value to 0. This mimics {cmd: domin}'s method where the parameter's name is +removed from the {cmd: indepvars} list directly. + +{pstd} +The way parameter constraints are produced with the {opt all()} and {opt sets()} +options is identical to that of the initial statements to {cmd: domme}. + +... brief discussion of why _cons not included? ... + +{marker opts}{...} +{title:3. Options} + +{dlgtab:Model} + +{phang}{opt reg(full_estimation_command)} refers {cmd:domme} to a command that accepts +{help constraint}s, uses {help ml} to estimate parameters, and that can produce the +scalar-valued statistic referenced in the {opt fitstat()} option. {cmd:domme} can be +applied to any built-in or user-written {help program} that meets these criteria. + +{pmore}{it:full_estimation_command} is the full estimation command, not including +a comma or options following the comma, as would be submitted to Stata. The +{opt reg()} option has no default and the user is required to provide a +{cmd:domme}-compatible statistical model. + +{phang}{opt ropts(command_options)} supplies the command in {opt reg()} with any +relevant estimation options. Any options normally following the comma in standard +Stata syntax can be supplied to the statisical model this way. The only exception +to is the use of {opt constraints()}; {cmd:domme} cannot, at current, accept +constraints other than those it creates. + +{phang}{opt f:itstat(fitstat_opts)} the scalar-valued model fit summary statistic +used in the dominance analysis. There are two ways {cmd:domme} points to fit +statistics. + +{pmore}The first method is identical to {cmd:domin}'s approach. {cmd:domme} +accepts any {help return:returned}, {help ereturn:ereturned}, or other +{help scalar:scalar} produced by the estimation command in {opt reg()}. Note +that some Stata commands change their list of ereturn-ed results when +constraints are applied (e.g., {cmd:logit}, {cmd:poisson}). Ensure that the +command used produces the desired scalar with constraints. + +{pmore}The second method accommodates Stata commands' tendency to not return +pseudo-R-square values with constraints and expands which commands can get a +fit statistic using a built-in fit statistic computation. When {opt fitstat()} +is asked for an empty ereturned statistic indicator (i.e., {res:e()}) you must provide a +the three character code as an option to the {opt fitstat()}. Four fit statistic +options are available. These options are the McFadden pseudo-R squared ({res:mcf}), +the Estrella pseudo-R squared ({res:est}), the Akaike information criterion ({res:aic}), +and the Bayesian information criterion ({res:bic}). For example, to ask {cmd:domme} to +compute McFadden's pseudo-R square as a fit statistic, type {res:fitstat(e(), mcf)} +(See Example #1). + +{pmore}Note that {cmd:domme} has no default fit statistic and the user is +required to provide a fit statistic option. In addition, the built-in options +assume the command in {opt reg()} ereturn specific scalars. {res:mcf} only +requires {res: e(ll)}. {res:est} requires both {res: e(ll)} and {res: e(N)}. +{res:aic} requires {res: e(ll)} and {res: e(parm)}. +Finally, {res: bic} requires {res: e(ll)}, {res: e(parm)}, and {res: e(N)}. + +{phang}{opt sets([PEset_1] ... [PEset_x])} binds together parameter estimate +constraints as a set that are always constrained jointly and act as a +single parameter estimate. + +{pmore}Each {it:PEset} is put together in the same way as the initial statements in +that they are constructed from a series of {res:(eqname = parmnamelist)} statements. +All {it:PEset}s must be bound by brackets "{res:[]}". For example, consider again +the model {cmd:logit foreign price mpg turn trunk}. To produce two sets of parameters, +one that includes {it:price} and {it:mpg} as well as a second that includes {it:turn} +and {it:trunk}, the {opt sets()} type {res:sets( [(foreign = price mpg)]} +{res:[(foreign = turn trunk)] )}. + +{pmore}Note that a single set can include parameters from multiple equations +(see Example #6). + +{phang}{opt all(PEall)} defines a set of parameter estimate constraints that +are allowed to explain the fit metric with a higher priority than the parameter +estimates in the initial statements or the {opt sets()} option (see Example #3). +In effect, the parameter estimates defined in the {opt all()} option are used +like covariates. + +{pmore}The {it:PEall} statement is set up in a way similar to the {it:PE_set}s +in a {res:(eqname = parmnamelist)} format and can accept parameters from multiple +equations. + +{dlgtab:Reporting} + +{phang}{opt noconditional} suppresses the computation and display of of the conditional dominance +statistics which can save computation time when conditional dominance statistics +are not desired. Suppressing the computation of conditional dominance statistics +also suppresses the "strongest dominance designations" list. + +{phang}{opt nocomplete} suppresses the computation of the complete dominance designations +which can save computation time when complete dominance designations are not desired. +Suppressing the computation of complete dominance designations also suppresses +the "strongest dominance designations" list. + +{phang}{opt reverse} reverses the interpretation of all dominance statistics in the +{cmd:e(ranking)} vector, {cmd:e(cptdom)} matrix, and corrects the computation of the +{cmd:e(std)} vector as well as the "strongest dominance designations" list. +{cmd:domme} assumes by default that higher values on overall fit statistics constitute +better fit. {opt reverse} is useful for the interpetation of dominance statistics +based on overall model fit statistics that decrease with better fit (e.g., AIC, BIC). + +{marker sav}{...} +{title:4. Saved Results} + +{phang}{cmd:domme} saves the following results to {cmd: e()}: + +{synoptset 16 tabbed}{...} +{p2col 5 15 19 2: scalars}{p_end} +{synopt:{cmd:e(N)}}number of observations{p_end} +{synopt:{cmd:e(fitstat_o)}}overall fit statistic value{p_end} +{synopt:{cmd:e(fitstat_a)}}fit statistic value associated with variables in {opt all()}{p_end} +{synopt:{cmd:e(fitstat_c)}}fit statistic value computed by default when the constant model is non-zero{p_end} +{p2col 5 15 19 2: macros}{p_end} +{synopt:{cmd:e(cmdline)}}command as typed{p_end} +{synopt:{cmd:e(title)}}{cmd:Dominance analysis for multiple equations}{p_end} +{synopt:{cmd:e(cmd)}}{cmd:domme}{p_end} +{synopt:{cmd:e(fitstat)}}contents of the {opt fitstat()} option{p_end} +{synopt:{cmd:e(reg)}}contents of the {opt reg()} option{p_end} +{synopt:{cmd:e(ropts)}}contents of the {opt ropts()} option{p_end} +{synopt:{cmd:e(properties)}}{cmd:b}{p_end} +{synopt:{cmd:e(set{it:#})}}parameters included in {opt set(#)}{p_end} +{synopt:{cmd:e(all)}}parameters included in {opt all()}{p_end} +{p2col 5 15 19 2: matrices}{p_end} +{synopt:{cmd:e(b)}}general dominance statistics vector{p_end} +{synopt:{cmd:e(std)}}general dominance standardized statistics vector{p_end} +{synopt:{cmd:e(ranking)}}rank ordering based on general dominance statistics vector{p_end} +{synopt:{cmd:e(cdldom)}}conditional dominance statistics matrix{p_end} +{synopt:{cmd:e(cptdom)}}complete dominance designation matrix{p_end} +{p2col 5 15 19 2: functions}{p_end} +{synopt:{cmd:e(sample)}}marks estimation sample{p_end} + + +{marker examp}{...} +{title:5. Examples} + +{phang} {stata sysuse auto}{p_end} + +{phang}Example 1: Path analysis/seemingly unrelated regression (SUR) with built in McFadden pseudo-R squared{p_end} +{phang} {stata sureg (price = length foreign gear_ratio) (headroom = mpg)} {p_end} +{phang} {stata domme (price = length foreign gear_ratio) (headroom = mpg), reg(sureg (price = length foreign gear_ratio) (headroom = mpg)) fitstat(e(), mcf)} {p_end} + +{phang}Example 2: Zero-inflated Poisson with built in BIC{p_end} +{phang} {stata generate zi_pr = price*foreign} {p_end} +{phang} {stata zip zi_pr headroom trunk,inflate(gear_ratio turn)} {p_end} +{phang} {stata domme (zi_pr = headroom trunk) (inflate = gear_ratio turn), reg(zip zi_pr headroom trunk) f(e(), bic) ropt(inflate(gear_ratio turn)) reverse} {p_end} + +{phang}Example 3: Path analysis/SUR model with all option {p_end} +{phang} {stata sem (foreign <- headroom) (price <- foreign length weight) (weight <- turn)} {p_end} +{phang} {stata estat ic} {p_end} +{phang} {stata domme (price = length foreign) (foreign = headroom), all((price = weight) (weight = turn)) reg(sem (foreign <- headroom) (price <- foreign length weight) (weight <- turn)) fitstat(e(), aic) reverse} {p_end} + +{phang}Example 4: Generalized negative binomial with all and parmeters treated as _cons in the dominance analysis (i.e., _b[price:foreign]) {p_end} +{phang} {stata gnbreg price foreign weight turn headroom, lnalpha(weight length)} {p_end} +{phang} {stata domme (price = turn headroom) (lnalpha = weight length), reg(gnbreg price foreign weight turn headroom) f(e(), mcf) ropt(lnalpha(weight length)) all( (price = weight) )} {p_end} + +{phang}Example 5: Generalized structural equation model with factor variables{p_end} +{phang} {stata sysuse nlsw88, clear} {p_end} +{phang} {stata gsem (wage <- union hours, regress) (south <- age ib1.race union, logit)} {p_end} +{phang} {stata domme (wage = union hours) (south = age union), reg(gsem (wage <- union hours, regress) (south <- age ib1.race union, logit)) fitstat(e(), mcf) sets([(south = 2.race 3.race)])}{p_end} + +{phang}Example 6: Generalized structural equation model with sets to evaluate independent variables{p_end} +{phang} {stata gsem (south union <- wage tenure ttl_exp, logit)} {p_end} +{phang} {stata domme, reg(gsem ( south smsa union <- wage tenure ttl_exp, logit)) fitstat(e(), mcf) sets( [(south = wage) (union = wage)] [(south = tenure) (union = tenure)] [(south = ttl_exp) (union = ttl_exp)]) } +{p_end} + +{phang}Examples 7: Replicating results from {cmd:domin}{p_end} +{pmore}7a: Logit model with factor varaible{p_end} +{pmore} {stata sysuse auto, clear} {p_end} +{pmore} {stata domin foreign price mpg turn trunk, reg(logit) fitstat(e(r2_p)) sets((i.rep78))} {p_end} +{pmore} {stata domme (foreign = price mpg turn trunk), reg(logit foreign price mpg turn trunk ib1.rep78) fitstat(e(), mcf) sets([(foreign = 3.rep78 4.rep78)])} {p_end} + +{pmore}7b: Ordered logit model with covariate{p_end} +{pmore} {stata domin rep78 trunk weight length, reg(ologit) fitstat(e(r2_p)) all(turn)} {p_end} +{pmore} {stata domme (rep78 = trunk weight length), reg(ologit rep78 trunk weight length turn) fitstat(e(), mcf) all((rep78 = turn))} {p_end} + +{pmore}7c: Poisson regression with BIC fitstat and constant-only comparison using reverse{p_end} +{pmore} {stata domin price mpg rep78 headroom, reg(fitdom, fitstat_fd(r(S)[1,6]) reg_fd(poisson) postestimation(estat ic)) fitstat(e(fitstat)) consmodel reverse} {p_end} +{pmore} {stata domme (price = mpg rep78 headroom), reg(poisson price mpg rep78 headroom) fitstat(e(), bic) reverse} {p_end} + +{marker remark}{...} +{title:6. Final Remarks} + +{pstd}See {stata help domin:domin's help file} for an extensive discussion of the role of dominance +analysis as a postestimation method and caveats about its use. All these notes +and considerations apply to {cmd:domme} as well. + +{pstd}Any parameter estimates in the model's {opt reg()} specification but not in +the initial statements, the {opt sets()}, or {opt all()} are considered to be a +part of the to a constant-only model (see Examples #4 and #7c). When using {cmd:domme}'s +built-in fit statistics, parameter estimates in the constant-only model +will be used to compute the baseline model for the {res:mcf} and {res:est} +pseudo-R-squares but will be reported as a part of the constant model for the +{res:aic} and {res:bic}. Other fit statistics supplied to {cmd: domme} will, like +{res:aic} and {res:bic}, be reported as a part of the constant model. + +{pstd}Note that {cmd:domme} does not check to ensure that the parameters supplied +it are in the model and it is the user's responsibility to ensure that the +parameter estimates created in the initial statements, as well as those created by +{opt sets()} and {opt all()}, are valid parameters in the estimated model. +{cmd:domme} also attempts to clean-up parameter constraints it creates but +under certain circumstances, when {cmd:domme} fails to execute in a way that it does +not capture, parameter constraints will remain in memory. {cmd:domme} also will never +overwrite existing parameter estimate constraints and, if there are insufficient parameter +constraints in memory, {cmd:domme} will fail with an error noting insufficient free +constraints. Use {help constraint dir} to list all defined constraints in memory. + + +{marker refs}{...} +{title:7. References} + +{p 4 8 2}Luchman, J. N., Lei, X., and Kaplan, S. A. (2020). Relative importance analysis with multivariate models: Shifting the focus from independent variables to parameter estimates. +{it:Journal of Applied Structural Equation Modeling, 4(2)}, 40–59.{p_end} + +{title:Development Webpage} + +{phang} Additional discussion of results, options, and conceptual issues on: + +{phang}{browse "https://github.com/fmg-jluchman/domin/wiki"} + +{phang} Please report bugs, requests for features, and contribute to as well as follow on-going development of {cmd:domme} on: + +{phang}{browse "http://github.com/jluchman/domin"} + +{title:Article} + +Please cite as: + +{p 4 8 2}Luchman, J. N. (2021). Determining relative importance in Stata using dominance analysis: domin and domme. {it:The Stata Journal, 21(2)}, 510–538. https://doi.org/10.1177/1536867X211025837{p_end} + + +{title:Author} + +{p 4}Joseph N. Luchman{p_end} +{p 4}Principal Scientist{p_end} +{p 4}Fors Marsh{p_end} +{p 4}Arlington, VA{p_end} +{p 4}jluchman@forsmarsh.com{p_end} + + +{title:See Also} + +{browse "https://CRAN.R-project.org/package=domir":R package domir}, {browse "https://cran.r-project.org/web/packages/domir/vignettes/domir_basics.html":Detailed description of Dominance Analysis} + + diff --git a/fitdom.ado b/fitdom.ado index 5fc57fe..ee2a764 100644 --- a/fitdom.ado +++ b/fitdom.ado @@ -1,8 +1,8 @@ -*! fitdom version 0.0.0 11/21/2021 Joseph N. Luchman +*! fitdom version 0.1.0 8/14/2023 Joseph N. Luchman program define fitdom, eclass //history and version information at end of file -version 12.1 +version 15 syntax varlist(min = 1 ts fv) if [aw pw iw fw] , [Reg_fd(string) Postestimation(string) Fitstat_fd(string)] @@ -18,10 +18,13 @@ if strlen("`regopts'") gettoken erase regopts: regopts, parse(",") //parse out c ereturn scalar fitstat = `=`fitstat_fd'' +//note to self needs title and esample <- how to work with non-esample commands? + end /* programming notes and history - fitdom version 0.0.0 - Nov 21, 2021 ----- - */ + --- + fitdom version 0.1.0 - August 14, 2023 +- minimum version 15 consistent with base -domin- \ No newline at end of file diff --git a/fitdom.sthlp b/fitdom.sthlp index c2a0cce..932b932 100644 --- a/fitdom.sthlp +++ b/fitdom.sthlp @@ -1,5 +1,5 @@ {smcl} -{* *! version 0.0.0 November 21, 2021 J. N. Luchman}{...} +{* *! version 0.1.0 August 14, 2023 J. N. Luchman}{...} {cmd:help fitdom} {hline}{...} @@ -17,9 +17,9 @@ Wrapper program for {cmd:domin} to obtain fit statistics from postestimation com {opt {ul on}f{ul off}itstat_fd(scalar)} {phang} -{help fvvarlist: Factor} and {help tsvarlist:time series variables} are allowed for commands in {opt reg()} +{help fvvarlist: Factor} and {help tsvarlist:time series variables} are allowed for commands in {opt reg_fd()} that accept them. {cmd:aweight}, {cmd:iweight}s, {cmd:pweight}s, and {cmd:fweight}s are also allowed -(see help {help weights:weights}) for commands in {opt reg()} that accept them. +(see help {help weights:weights}) for commands in {opt reg_fd()} that accept them. {title:Description} @@ -52,7 +52,7 @@ to {cmd:domin}. {title:Saved results} -{phang}{cmd:fitdom} saves the following results to {cmd: e()}: +{phang}{cmd:fitdom} retains all the results from the command in {opt reg_fd()} and adds the following results to {cmd: e()}: {synoptset 16 tabbed}{...} {p2col 5 15 19 2: scalars}{p_end} @@ -134,7 +134,7 @@ matrix name (i.e., {cmd:r(S)}) but also the matrix subscript selecting the AIC i {title:Author} {p 4}Joseph N. Luchman{p_end} -{p 4}Senior Scientist{p_end} -{p 4}Fors Marsh Group LLC{p_end} +{p 4}Principal Scientist{p_end} +{p 4}Fors Marsh{p_end} {p 4}Arlington, VA{p_end} -{p 4}jluchman@forsmarshgroup.com{p_end} +{p 4}jluchman@forsmarsh.com{p_end} diff --git a/mi_dom.ado b/mi_dom.ado new file mode 100644 index 0000000..82b5cd0 --- /dev/null +++ b/mi_dom.ado @@ -0,0 +1,48 @@ +*! fitdom version 0.0.0 8/14/2023 Joseph N. Luchman + +program define mi_dom, eclass //history and version information at end of file + +version 15 + +syntax varlist if [aw fw iw pw], Reg_mi(string) Fitstat_mi(string) [MIOpt(string)] + +tempfile mifile //produce a tempfile to store imputed fitstats for retreival + +tempvar touse + +tempname fitstat + +gettoken reg_mi regopts_mi: reg_mi // separate out reg() options + +quietly generate byte `touse' = 1 `if' + +quietly mi estimate, saving(`mifile') `miopt': `reg_mi' /// + `varlist' [`weight'`exp'] `if', `regopts_mi' //run mi analysis saving results + +scalar `fitstat' = 0 //placeholder scalar to hold the sum + +local num_imputes = `:word count `e(m_est_mi)'' + +foreach x of numlist `=e(m_est_mi)' { + + estimates use `mifile', number(`x') //find the focal estimates + + scalar `fitstat' = `fitstat' + `=`fitstat_mi''*`num_imputes'^-1 //add in the weighted fitstat value + +} + +local title = e(title) + +ereturn clear + +ereturn post, esample(`touse') + +ereturn local title = "Multiply imputed: `title'" + +ereturn scalar fitstat = `fitstat' + +end + +/* programming notes and history +- mi_dom version 0.0.0 - August 14, 2023 + diff --git a/mi_dom.sthlp b/mi_dom.sthlp new file mode 100644 index 0000000..94606aa --- /dev/null +++ b/mi_dom.sthlp @@ -0,0 +1,67 @@ +{smcl} +{* *! version 0.0.0 August 14, 2023 J. N. Luchman}{...} +{cmd:help mi_dom} +{hline}{...} + +{title:Title} + +{pstd} +Wrapper program for {cmd:domin} to obtain multiply imputed fit statistics {p_end} + +{title:Syntax} + +{phang} +{cmd:mi_dom} {it:depvar} {it:indepvars} {it:{help if} {weight}} {cmd:,} +{opt {ul on}r{ul off}eg_mi(command, command_options)} +{opt {ul on}f{ul off}itstat_mi(scalar)} +[{opt {ul on}MIO{ul off}pt(string)}] + +{phang} +{help fvvarlist: Factor} and {help tsvarlist:time series variables} are allowed for commands in {opt reg_mi()} +that accept them. {cmd:aweight}, {cmd:iweight}s, {cmd:pweight}s, and {cmd:fweight}s are also allowed +(see help {help weights:weights}) for commands in {opt reg_mi()} that accept them. + +{title:Description} + +{pstd} +{cmd:mi_dom} adds an additional layer to {cmd:domin} in which the user calls the {cmd:mi_dom} program in +{cmd:domin} and the {cmd:mi_dom} program runs the {cmd:mi estimate} compatible command in {opt reg_mi()}, +saves results for all multiply imputed datasets, and then averages them before submitting them as +{cmd:e(fitstat)} for use in the dominance analysis. + +{pstd} +This wrapper command is a replacement for the built in {opt mi} in {cmd:domin} versions prior to 3.5 and is +intended to add flexiblity to how {cmd:domin} can accommodate multiply imputed data. See Example #10 in the +{cmd:domin} helpfile for an example of {cmd:mi_dom} in use. + +{marker options}{...} +{title:Options} + +{phang}{opt reg_mi()} is the contents of the {opt reg()} option that would normally be supplied to {cmd:domin}. + +{phang}{opt fitstat_mi()} is the contents of the {opt fitstat()} option that would normally be supplied +to {cmd:domin}. + +{phang}{opt miopts()} are the options passed to {cmd:mi estimate} that will be filled in prior to the colon. +This produces a command structure like {it:mi estimate, miopts: reg_mi} for each run of the command in +{opt reg_mi()}. + +{title:Saved results} + +{phang}{cmd:mi_dom} saves the following results to {cmd: e()}: + +{synoptset 16 tabbed}{...} +{p2col 5 15 19 2: scalars}{p_end} +{synopt:{cmd:e(fitstat)}}The value of the fit statistic called by {opt postestimation}{p_end} +{p2col 5 15 19 2: macros}{p_end} +{synopt:{cmd:e(title)}}Adds "Mulltiply Imputed:" to the {cmd:e(title)} of the command in {opt reg_mi()}{p_end} +{p2col 5 15 19 2: functions}{p_end} +{synopt:{cmd:e(sample)}}marks estimation sample{p_end} + +{title:Author} + +{p 4}Joseph N. Luchman{p_end} +{p 4}Principal Scientist{p_end} +{p 4}Fors Marsh{p_end} +{p 4}Arlington, VA{p_end} +{p 4}jluchman@forsmarsh.com{p_end} diff --git a/mixdom.ado b/mixdom.ado index e4043b9..fcf62d8 100644 --- a/mixdom.ado +++ b/mixdom.ado @@ -1,24 +1,21 @@ -*! mixdom version 2.0 2/15/2021 Joseph N. Luchman +*! mixdom version 2.1.0 8/14/2023 Joseph N. Luchman -version 12 +version 15 program define mixdom, eclass -syntax varlist(min = 2 fv ts) [pw fw] if, id(varlist max = 1 min = 1) [REopt(string) XTMopt(string) /// -Mopt(string) noConstant] +syntax varlist(min = 2 fv ts) [pw fw] [if], id(varlist max = 1 min = 1) [REopt(string) XTMopt(string) /// +Mopt(string)] -if strlen("`xtmopt'") & strlen("`mopt'") { - display as err "{cmd:xtmopt} both {cmd:mopt} cannot be used together" +if strlen("`xtmopt'") { + display as err "{cmd:xtmopt()} is defunct. Use {cmd:mopt()}." exit 198 } -if c(stata_version) >= 13 local reg "mixed" -else local reg "xtmixed" - -if strlen("`xtmopt'") local mopt "`xtmopt'" - tempname estmat r2w r2b base_e base_u mean_h +tempvar touse + gettoken dv ivs: varlist foreach temp in base_e base_u mean_h { @@ -29,14 +26,12 @@ foreach temp in base_e base_u mean_h { } -`reg' `dv' `ivs' [`weight'`exp'] `if' , `constant' || `id':, `reopt' `mopt' nostderr +mixed `dv' `ivs' [`weight'`exp'] `if' , `constant' || `id':, `reopt' `mopt' nostderr matrix `estmat' = e(b) scalar `r2w' = (exp(`estmat'[1, `=colsof(`estmat') - 1']))^2 + (exp(`estmat'[1, `=colsof(`estmat')']))^2 -di `r2w' - if missing(`mean_h') { preserve @@ -55,11 +50,9 @@ if missing(`mean_h') { scalar `r2b' = (exp(`estmat'[1, `=colsof(`estmat') - 1']))^2 + ((exp(`estmat'[1, `=colsof(`estmat')']))^2)/`mean_h' -di `r2b' - if missing(`base_e') | missing(`base_u') { - `reg' `dv' [`weight'`exp'] `if' , `constant' || `id':, `reopt' `mopt' nostderr + mixed `dv' [`weight'`exp'] `if' , `constant' || `id':, `reopt' `mopt' nostderr matrix `estmat' = e(b) @@ -73,19 +66,25 @@ scalar `r2w' = 1 - ((exp(`base_u'))^2 + (exp(`base_e'))^2)^-1*`r2w' scalar `r2b' = 1 - ((exp(`base_u'))^2 + ((exp(`base_e'))^2)/`mean_h')^-1*`r2b' -ereturn scalar r2_w = `r2w' +generate `touse' = e(sample) + +ereturn clear -di `r2w' +ereturn post, esample(`touse') + +ereturn scalar r2_w = `r2w' ereturn scalar r2_b = `r2b' -di `r2b' +ereturn local title = "Mixed-effects ML regression" + +ereturn hidden scalar base_e = `base_e' // note to self - make these hidden not official returned values -ereturn scalar base_e = `base_e' +ereturn hidden scalar base_u = `base_u' -ereturn scalar base_u = `base_u' +ereturn hidden scalar mean_h = `mean_h' -ereturn scalar mean_h = `mean_h' +// note to self - return esample and title end @@ -99,6 +98,11 @@ Basic version - time series operators allowed - removed scalars persisting after estimation ----- -- mixdom version 2.0.1 - date - Feb 15, 2021 +- mixdom version 2.0 - date - Feb 15, 2021 - -xtmopt()- depreciated in favor of -mopt()- --- v12... + --- + mixdom version 2.1.0 - mth day, year + - minimum version 15 consistent with base -domin- + - xtmopt defunct - gives warning + - noconstant option removed + - returns e(smaple) to satisfy -domin- 3.5 requirements diff --git a/mixdom.sthlp b/mixdom.sthlp index bdbcc2c..5ba6af5 100644 --- a/mixdom.sthlp +++ b/mixdom.sthlp @@ -1,5 +1,5 @@ {smcl} -{* *! version 2.0.0 Feb 15, 2021 J. N. Luchman}{...} +{* *! version 2.1.0 August 14, 2023 J. N. Luchman}{...} {cmd:help mixdom} {hline}{...} @@ -12,7 +12,7 @@ Wrapper program for {cmd:domin} to conduct linear mixed effects regression domin {phang} {cmd:mixdom} {it:depvar} {it:indepvars} {it:{help if}} {weight} {cmd:,} -{opt id(idvar)} [{opt {ul on}re{ul off}opt(re_options)} {opt {ul on}m{ul off}opt(mixed_options)} {opt {ul on}noc{ul off}onstant}] +{opt id(idvar)} [{opt {ul on}re{ul off}opt(re_options)} {opt {ul on}m{ul off}opt(mixed_options)}] {phang}{cmd:pweight}s and {cmd:fweight}s are allowed (see help {help weights:weights}). {help fvvarlist:Factor} and {help tsvarlist:time series variables} are allowed. @@ -29,7 +29,11 @@ Azen (2013) recommend that even if random coefficients are present in the data, analysis. {pstd} -Negative R2 values indicate model misspecification. +{cmd:mixdom} is intended for use only as a wrapper program with {cmd:domin} for the dominance analysis of mixed-effects linear regression, and its syntax is designed to conform with {cmd:domin}'s expectations. +It is not recommended for use as an estimation command outside of {cmd:domin}. + +{pstd} +Note that negative R2 values indicate likely model misspecification. {marker options}{...} {title:Options} @@ -37,14 +41,11 @@ Negative R2 values indicate model misspecification. {phang}{opt id()} specifies the variable on which clustering occurs and that will appear after the random effects specification (i.e., ||) in the {cmd:mixed} syntax. -{phang}{opt remopt()} passes options to {cmd: mixed} specific to the random intercept effect (i.e., {opt pweight()} the user would +{phang}{opt reopt()} passes options to {cmd: mixed} specific to the random intercept effect (i.e., {opt pweight()} the user would like to utilize during estimation. {phang}{opt mopt()} passes options to {cmd:mixed} that the user would like to utilize during estimation. -This option was named {opt xtmopt()} in {cmd:mixdom} versions previous to 2.0. {opt xtmopt()} still works in -{cmd:mixdom} 2.0 but cannot be used with {opt mopt()}. - -{phang}{opt noconstant} does not estimate an average fixed-effect constant (see option {opt noconstant} of {help mixed}). +This option was named {opt xtmopt()} in {cmd:mixdom} versions previous to 2.0.0 and is now defunct. {title:Saved results} @@ -54,6 +55,10 @@ This option was named {opt xtmopt()} in {cmd:mixdom} versions previous to 2.0. {p2col 5 15 19 2: scalars}{p_end} {synopt:{cmd:e(r2_w)}}within-cluster R2{p_end} {synopt:{cmd:e(r2_b)}}between-cluster R2{p_end} +{p2col 5 15 19 2: macros}{p_end} +{synopt:{cmd:e(title)}}"Mixed-effects ML regression"{p_end} +{p2col 5 15 19 2: functions}{p_end} +{synopt:{cmd:e(sample)}}marks estimation sample{p_end} {title:References} @@ -63,10 +68,10 @@ This option was named {opt xtmopt()} in {cmd:mixdom} versions previous to 2.0. {title:Author} {p 4}Joseph N. Luchman{p_end} -{p 4}Senior Scientist{p_end} -{p 4}Fors Marsh Group LLC{p_end} +{p 4}Principal Scientist{p_end} +{p 4}Fors Marsh{p_end} {p 4}Arlington, VA{p_end} -{p 4}jluchman@forsmarshgroup.com{p_end} +{p 4}jluchman@forsmarsh.com{p_end} {title:Also see} diff --git a/mvdom.ado b/mvdom.ado index 3ce68d1..4bc90ec 100644 --- a/mvdom.ado +++ b/mvdom.ado @@ -1,10 +1,10 @@ -*! mvdom version 1.1 3/11/2015 Joseph N. Luchman +*! mvdom version 1.2.0 8/14/2023 Joseph N. Luchman -version 12.1 +version 15 program define mvdom, eclass -syntax varlist(min = 2) if [aw fw], dvs(varlist min=1) [noConstant epsilon pxy] //epsilon is a hidden option +syntax varlist(min = 2) [if] [aw fw], dvs(varlist min=1) [epsilon pxy] //epsilon is a hidden option tempname canonmat @@ -12,21 +12,19 @@ tempvar touse gettoken dv ivs: varlist -if strlen("`epsilon'") { - - quietly generate byte `touse' = 1 `if' +quietly generate byte `touse' = 1 `if' - quietly replace `touse' = 0 if missing(`touse') +quietly replace `touse' = 0 if missing(`touse') + +if strlen("`epsilon'") { mata: eps_ri_mv("`dv' `dvs'", "`ivs'", "`touse'") } - else { - if strlen("`pxy'") { - quietly correlate `dv' `dvs' `ivs' [`weight'`exp'] `if' + quietly correlate `dv' `dvs' `ivs' [`weight'`exp'] if `touse' local dvnum: word count `dv' `dvs' @@ -38,21 +36,21 @@ else { invsym(`canonmat'[`=`:word count `dv' `dvs''+1'..., `=`:word count `dv' `dvs''+1'...])* /// `canonmat'[`=`:word count `dv' `dvs''+1'..., 1..`:word count `dv' `dvs''] /// ) - - ereturn scalar r2 = `canonmat'[1, 1]/`:word count `dv' `dvs'' - - } + } else { - quietly _canon (`dv' `dvs') (`ivs') [`weight'`exp'] `if', `constant' + quietly _canon (`dv' `dvs') (`ivs') [`weight'`exp'] if `touse' matrix `canonmat' = e(ccorr) - ereturn scalar r2 = `canonmat'[1, 1]^2 - } + ereturn post, esample(`touse') + + if strlen("`pxy'") ereturn scalar r2 = `canonmat'[1, 1]/`:word count `dv' `dvs'' + else ereturn scalar r2 = `canonmat'[1, 1]^2 + ereturn local title "Multivariate regression" } @@ -60,7 +58,7 @@ else { end /*Mata function to execute epsilon-based relative importance with mvdom*/ -version 12.1 +version 15 mata: @@ -69,14 +67,20 @@ mata set matastrict on void eps_ri_mv(string scalar dvlist, string scalar ivlist, string scalar touse) { /*object declarations*/ - real matrix X, Y, L, R, Lm, L2, R2, Lm2, Rxy + real matrix X, Y, L, R, Lm, L2, R2, Lm2, Pxy real rowvector V, Bt, V2, Bt2 + transmorphic view_dv, view_iv + /*begin processing*/ - Y = correlation(st_data(., tokens(dvlist), st_varindex(touse))) //obtain DV correlations + st_view(view_dv, ., tokens(dvlist), st_varindex(touse)) + + st_view(view_iv, ., tokens(ivlist), st_varindex(touse)) + + Y = correlation(view_dv) //obtain DV correlations - X = correlation(st_data(., tokens(ivlist), st_varindex(touse))) //obtain IV correlations + X = correlation(view_iv) //obtain IV correlations L = R = X //set-up for svd(); IV side @@ -94,12 +98,11 @@ void eps_ri_mv(string scalar dvlist, string scalar ivlist, string scalar touse) Lm2 = (L2*diag(sqrt(V2))*R2) //process orthogonalized DVs - Rxy = correlation((st_data(., tokens(ivlist), st_varindex(touse)), /// - st_data(., tokens(dvlist), st_varindex(touse)))) //correlation between original IVs and DVs + Pxy = correlation((view_iv, view_dv)) //correlation between original IVs and DVs - Rxy = Rxy[rows(X)+1..rows(Rxy), 1..cols(X)] //take only IV-DV correlations + Pxy = Pxy[rows(X)+1..rows(Pxy), 1..cols(X)] //take only IV-DV correlations - Bt2 = Rxy'*invsym(Lm2) //obtain adjusted DV interrelations + Bt2 = Pxy'*invsym(Lm2) //obtain adjusted DV interrelations Bt = invsym(Lm)*Bt2 //obtain adjusted regression weights @@ -130,3 +133,13 @@ Basic version - added the Pxy metric - added the epsilon-based function - changed canon to _canon; canon had odd behavior when called from mvdom + --- + mvdom version 1.2.0 - August 14, 2023 + - minimum versions 15 consistent with base -domin- + - 'if' statement optional to accommodate domin v. 3.5.0 + - cleaned up returned values - consistent with reported + - added e(sample) needed for domin v. 3.5.0 + - restructured eps_ri_mv() + - removed noconstant option - not consistently applied, not sure why one would ever use it + - unhide epsilon - note that it produces Pxy as a metric + - use st_view as opposed to st_data for efficiency diff --git a/mvdom.sthlp b/mvdom.sthlp index 94a3d4a..f6be8b2 100644 --- a/mvdom.sthlp +++ b/mvdom.sthlp @@ -1,5 +1,5 @@ {smcl} -{* *! version 1.1 March 11 2015 J. N. Luchman}{...} +{* *! version 1.2.0 August 14, 2023 J. N. Luchman}{...} {cmd:help mvdom} {hline}{...} @@ -12,31 +12,30 @@ Wrapper program for {cmd:domin} to conduct multivariate regression-based dominan {phang} {cmd:mvdom} {it:depvar1} {it:indepvars} {it:{help if} {weight}} {cmd:,} -{opt dvs(varlist)} [{opt {ul on}noc{ul off}onstant} {opt pxy}] +{opt dvs(depvar2 [... depvar_r])} [{opt pxy} {opt epsilon}] -{phang}{cmd:aweight}s and {cmd:fweight}s are allowed (see help {help weights:weights}). {help fvvarlist: Factor} and -{help tsvarlist:time series variables} are not allowed. Use the {help xi} prefix for factor variables. +{phang}{cmd:aweight}s and {cmd:fweight}s are allowed (see help {help weights:weights}). {help fvvarlist: Factor} and {help tsvarlist:time series variables} are not allowed. Use the {help xi} prefix for factor variables. {title:Description} {pstd} -{cmd:mvdom} sets the data up in a way to allow for the dominance analysis of a multivariate regression by utilizing {help canon}ical correlation. -The default metric used is the Rxy metric described by Azen and Budescu (2006). +{cmd:mvdom} sets the data up in a way to allow for the dominance analysis of a multivariate regression by utilizing {help canon}ical correlation. The default metric used is the Rxy metric described by Azen and Budescu (2006). {pstd} -{cmd:mvdom} uses the first variable in the varlist as the first dependent variable in the multivariate regression. All other variables in the -varlist are used as independent variables. All other dependent variables are entered into the regression in an option. The output of the dominance -analysis (i.e., in {cmd:domin}) will only show the first dependent variable in the output. +{cmd:mvdom} uses the first variable in the {it:varlist} as the first dependent variable in the multivariate regression. All other variables in the {it:varlist} are used as independent variables. All dependent variables after the first are entered into the regression in the {opt dvs()} option. {cmd:domin} will only show the first dependent variable in the output. + +{pstd} +{cmd:mvdom} is intended for use only as a wrapper program with {cmd:domin} for the dominance analysis of multivariate linear regression, and its syntax is designed to conform with {cmd:domin}'s expectations. +It is not recommended for use as an estimation command outside of {cmd:domin}. {marker options}{...} {title:Options} -{phang}{opt dvs()} specifies the second through {it:n}th other dependent variables to be used in the multivariate regression. The first dependent -variable is put in the varlist. +{phang}{opt dvs(depvar2 [... depvar_r])} specifies the second through {it:r}th other dependent variables to be used in the multivariate regression. Note the first dependent variable, depvar1, as shown in the syntax. dvs() is required. -{phang}{opt noconstant} does not subtract means when obtaining correlations (see option {opt noconstant} of {help canon}). +{phang}{opt pxy} changes the fit statistic from the default "symmetric" {it:Rxy} to the "nonsymmetric" {it:Pxy} model fit statistic. Both fit statistics are described by Azen and Budescu (2006). -{phang}{opt pxy} uses the Pxy metric outlined by Azen and Budescu instead of the default Rxy metric. +{phang}{opt epsilon} changes the decomposition estimation method to relative weights estimation method described by LeBreton & Tonidandel (2008). The {opt epsilon} method produces a decomposition of the {it:Pxy} statistic. {title:Saved results} @@ -44,19 +43,24 @@ variable is put in the varlist. {synoptset 16 tabbed}{...} {p2col 5 15 19 2: scalars}{p_end} -{synopt:{cmd:e(r2)}}Rxy metric (default) or Pxy metric (with option {opt pxy}){p_end} +{synopt:{cmd:e(r2)}}{it:Rxy} metric (default) or {it:Pxy} metric (with option {opt pxy}){p_end} +{p2col 5 15 19 2: macros}{p_end} +{synopt:{cmd:e(title)}}"Multivariate regression"{p_end} +{p2col 5 15 19 2: functions}{p_end} +{synopt:{cmd:e(sample)}}marks estimation sample{p_end} {title:References} {p 4 8 2}Azen, R., & Budescu, D. V. (2006). Comparing predictors in multivariate regression models: An extension of dominance analysis. {it:Journal of Educational and Behavioral Statistics, 31(2)}, 157-180.{p_end} +{p 4 8 2}LeBreton, J. M., & Tonidandel, S. (2008). Multivariate relative importance: Extending relative weight analysis to multivariate criterion spaces. {it:Journal of Applied Psychology, 93(2)}, 329-345.{p_end} {title:Author} {p 4}Joseph N. Luchman{p_end} -{p 4}Senior Scientist{p_end} -{p 4}Fors Marsh Group LLC{p_end} +{p 4}Principal Scientist{p_end} +{p 4}Fors Marsh{p_end} {p 4}Arlington, VA{p_end} -{p 4}jluchman@forsmarshgroup.com{p_end} +{p 4}jluchman@forsmarsh.com{p_end} {title:Also see}