Stata如何根据特定要求将分类变量的值重新排序?

在日常工作中,按照特定值排序分类变量(Reorder categorical variable by specified sort criterion)的场景非常常见。这里介绍一个Stata社区命令myaxis来便捷实现这一功能。

安装Stata社区命令myaxis:

ssc install myaxis,replace

读取示例数据:

.  sysuse auto, clear
(1978 Automobile Data)

查看分类变量(rep78)的排序,命令如下:

. tab rep78

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00

. tab rep78, sum(mpg)

     Repair |      Summary of Mileage (mpg)
Record 1978 |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          1 |          21   4.2426407           2
          2 |      19.125   3.7583241           8
          3 |   19.433333   4.1413252          30
          4 |   21.666667   4.9348699          18
          5 |   27.363636   8.7323849          11
------------+------------------------------------
      Total |   21.289855   5.8664085          69

场景一,如果想要根据分类变量(rep78)的频数降序排列,命令如下:

(myaxis命令如果不加descending选项,默认就是升序(ascending)排列)

. myaxis wanted1=rep78, sort(count) descending
. tab Lian1

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          3 |         30       43.48       43.48
          4 |         18       26.09       69.57
          5 |         11       15.94       85.51
          2 |          8       11.59       97.10
          1 |          2        2.90      100.00
------------+-----------------------------------
      Total |         69      100.00

//也可使用社区命令fre查看:
. fre Lian1

wanted1 -- Repair Record 1978
-----------------------------------------------------------
              |      Freq.    Percent      Valid       Cum.
--------------+--------------------------------------------
Valid   1 3   |         30      40.54      43.48      43.48
        2 4   |         18      24.32      26.09      69.57
        3 5   |         11      14.86      15.94      85.51
        4 2   |          8      10.81      11.59      97.10
        5 1   |          2       2.70       2.90     100.00
        Total |         69      93.24     100.00           
Missing .     |          5       6.76                      
Total         |         74     100.00                      
-----------------------------------------------------------

场景二,如果想要根据分类变量(rep78)按每组间连续变量(mpg)的均值降序排列,命令如下:

(myaxis命令如果不加descending选项,默认就是升序(ascending)排列)

. myaxis Lian2=rep78, sort(mean mpg) descending

. tab Lian2, sum(mpg)

     Repair |      Summary of Mileage (mpg)
Record 1978 |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          5 |   27.363636   8.7323849          11
          4 |   21.666667   4.9348699          18
          1 |          21   4.2426407           2
          3 |   19.433333   4.1413252          30
          2 |      19.125   3.7583241           8
------------+------------------------------------
      Total |   21.289855   5.8664085          69


//也可使用tabstat命令查看:
. tabstat mpg, stat(mean sd count) by(Lian2)

Summary for variables: mpg
     by categories of: wanted2 (Repair Record 1978)

  Lian2 |      mean        sd         N
--------+------------------------------
      5 |  27.36364  8.732385        11
      4 |  21.66667   4.93487        18
      1 |        21  4.242641         2
      3 |  19.43333  4.141325        30
      2 |    19.125  3.758324         8
--------+------------------------------
  Total |  21.28986  5.866408        69
---------------------------------------

命令说明:

最低版本:Stata version 8.2
发布日期:2021年03月19日
程序作者:Nicholas J. Cox, Durham University
联系邮箱:[email protected]