Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using [w = wage] or value(wage) #28

Open
ericmelse opened this issue Nov 12, 2024 · 2 comments
Open

Using [w = wage] or value(wage) #28

ericmelse opened this issue Nov 12, 2024 · 2 comments

Comments

@ericmelse
Copy link

Dear Asjad,

Thank you for your new version of alluvial. I just replicate your previous example that uses [w = wage]:

alluvial race married collgrad smsa union [w = wage], smooth(8) alpha(60) palette(CET C6) valsize(2)  ///
	laba(0) labs(1.6) boxw(11) gap(2) novalues ///
	showtotal wrapcat(20) wraplab(15) catgap(8) plotregion(margin(b+5 l+10 r+10)) ///
	xsize(2) ysize(1) showmiss labprop percent

and (want to) compare that with your last example that uses value(wage)

alluvial race married collgrad smsa union, value(wage) ///
	smooth(8) alpha(60) palette(CET C6) valsize(2)  ///
	laba(0) labs(1.6) boxw(11) gap(2) novalues ///
	showtotal wrapcat(20) wraplab(15) catgap(8) plotregion(margin(b+5 l+10 r+10)) ///
	xsize(2) ysize(1) showmiss labprop percent

(I know do not include the plots as these are on the main page.)
First, the result values that Stata produces using the user community contributed package fre with and without analytical weights:

. fre race [aw = wage]
race -- Race
-------------------------------------------------------------
                |      Freq.    Percent      Valid       Cum.
----------------+--------------------------------------------
Valid   1 White |   1703.612      75.85      75.85      75.85
        2 Black |   513.7638      22.87      22.87      98.73
        3 Other |   28.62389       1.27       1.27     100.00
        Total   |       2246     100.00     100.00           
-------------------------------------------------------------

. fre race 
race -- Race
-------------------------------------------------------------
                |      Freq.    Percent      Valid       Cum.
----------------+--------------------------------------------
Valid   1 White |       1637      72.89      72.89      72.89
        2 Black |        583      25.96      25.96      98.84
        3 Other |         26       1.16       1.16     100.00
        Total   |       2246     100.00     100.00           
-------------------------------------------------------------

Next, the result values that the code for the alluvial plot that uses value(wage) produces for the categorical variable race

White (75.85%)
Black (22.87%)

and, the result values that the code that uses [w = wage] produces for the categorical variable race

White (72.89%)
Black (25.96%)

You note in the help file (without indicating which type of weight is used): Weights are allowed but use them cautiously.

Now, I am a bit puzzled because using [w = wage] appears not to produce the weighted percentages, whereas using value(wage) does.
Maybe this is intentional, but, I fail to grasp why using [w = numvar] does not seem to make a difference for alluvial (yet).
Also, your description of the functional use of value(numvar): Define a numerical variable that will be aggregated over the categories for the flows. The default is the count of rows.
makes me wonder what 'that will be aggregated over the categories' implies: is it weighting - the weighted count of categorical cases?

@asjadnaqvi
Copy link
Owner

Dear Eric, I would highly recommend just using value() if that is the key variable over which the items need to be summed.

Weighted sums are very likely to give different estimates since the formulas for weights are doing more that simple summations. The weights should only be used if the data genuinely has a specific weight type (aw, pw, fw, iw in Stata) . I am planning on writing a note on this.

@ericmelse
Copy link
Author

Hmm, I am most interested in an example that shows where things go wrong!

But, what you write does not explain why the alluvial code that uses [w = wage] does not produce weighted result values while fre race [aw = wage] does. That is confusing (to me). I mean, Stata's tab produces the same weighted result:

. tab race [aw = wage]

       Race |      Freq.     Percent        Cum.
------------+-----------------------------------
      White | 1,703.6123       75.85       75.85
      Black | 513.763792       22.87       98.73
      Other | 28.6238923        1.27      100.00
------------+-----------------------------------
      Total |      2,246      100.00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants