SUDAAN 是由 RTI International設計與研發的國際公認統計軟體,專門對於複雜的研究數據提供準確、高效的分析。SUDAAN非常適合調查和實驗研究數據的分析,因為SUDAAN計算程序正確地考慮複雜的設計功能,如相關的觀測、聚類、權重、分層。 SUDAAN 11為 SUDAAN最新版本,反映我們繼續致力於提供有用的統計工具。
SUDAAN可分析各種不同複雜調查來源:
- 複雜的抽樣調查數據(Data from complex sample surveys)
- 流行病學調查數據(Epidemiological study data)
- 臨床試驗數據(Clinical trial data)
- 實驗或觀察性研究(Experiments or observational studies)
- 縱向數據(Longitudinal Data)
SUDAAN考量個人學習需求所提供的設計特點,包括:
- 不平等的加權或不加權數據(Unequally weighted or unweighted data)
- 分層(Stratification)
- 有或沒有更換設計(With or without replacement designs)
- 多級和集群設計(Multistage and cluster designs)
- 重複測量(Repeated measures)
- 一般集群相關(例如,相關因病患採取多種措施、幼小動物於巢穴內,或學生在學校內)(General cluster-correlation (e.g., correlation due to multiple measures taken from patients, pups nested within litters, or students nested within schools))
- 多重插補分析變量(Multiply imputed analysis variables)
SUDAAN程式與估計方法
SUDAAN是由多種分析和新的預分析所組成的套裝程式。兩個預分析過程包括第一種是使用基於模型的重量校準方法( WTADJUST ),第二個是用Cox-Iannacchione Weighted Sequential Hot Dec法( HOTDECK )。所有的分析程式例如WTADJUST,皆提供了三種強大的方差估計方法:
- Taylor series linearization (GEE for regression models)
- Jackknife (with or without user-specified replicate weights)
- Balanced repeated replication (BRR)
以上為SUDAAN替統計分析軟體所帶來的重大突破,包括可調查不同複雜來源、符合個人需求設計及提供強大的方差估計方法。
SUDAAN是個由十個分析模組和三個新的預分析模組組成的套件。SUDAAN程式是用來分析複雜的抽樣調查和其他的觀察實驗研究。以下為其分別的特色:
Weighting and Imputation Procedures
WTADJUST— Produces nonresponse and post-stratification sample weight adjustments using a model-based, calibration approach. A weight truncation option is available that can be used to trim extreme weights. Any loss/gain in the weight sum is accounted for in the subsequent computation of the weight adjustments.
WTADJX —New in Release 11: As in WTADJUST, WTADJX produces nonresponse and post-stratification sample weight adjustments using a model-based, calibration approach. WTADJX, however, allows the user to specify a set of calibration variables used to estimate model parameters that vary from the model explanatory variables. Among other things, this means survey items known only for respondents can be used as explanatory variables in the weight adjustment model.
IMPUTE— Performs the weighted sequential hot deck and, new in Release 11, cell mean, and regression-based (linear and logistic) methods of imputation for item nonresponse.
Descriptive Procedures
CROSSTAB—Computes frequencies, percentage distributions, odds ratios, relative risks, and their standard errors (or confidence intervals) for user-specified cross-tabulations, as well as chi-square tests of independence and a series of Cochran-Mantel-Haenszel chi-square tests associated with stratified two-way tables. Release 11 adds statistics related to the Kappa measure of agreement in square tables and the Breslow-Day test for homogeneity of odds ratios in stratified 2x2 tables.
RATIO—Computes estimates, standard errors, and confidence limits of generalized ratios of the form Σi wixi / Σi wiyi. Computes standardized estimates and tests single-degree-of-freedom contrasts among levels of a categorical variable.
DESCRIPT—Computes estimates of means, totals, proportions, percentages, geometric means, quantiles, and their standard errors and confidence limits; also computes standardized estimates and tests of single-degree-of-freedom contrasts among levels of a categorical variable.
VARGEN—New in Release 11: Computes point estimates, design-based variances, and contrast estimates for any user-defined parameter that can be expressed as a function of means, totals, proportions, ratios, population variances, population standard deviations, and correlations. This means that VARGEN, for example, can estimate a ratio as well as a ratio of ratios.
Survival Procedures
SURVIVAL—Fits discrete and continuous proportional hazards models to failure time data; also estimates hazard ratios and their confidence intervals for each model parameter. Estimates exponentiated contrasts among model parameters (with confidence intervals). Includes facilities for time-dependent covariates, the counting process style of input, stratified baseline hazards, and Schoenfeld and Martingale residuals. Estimates conditional and predicted marginals and tests hypotheses about the marginals. Release 11 adds hazard ratios for a multiple-unit increase or decrease in a model covariate.
KAPMEIER—Fits the Kaplan-Meier model, also known as the product limit estimator, to survival data from sample surveys and other clustered data applications. KAPMEIER uses either discrete or continuous time variable to provide point estimates for the survival curve for failure time outcomes that may contain censored observations (Section 23).
Regression Procedures
REGRESS—Fits linear regression models and performs hypothesis tests concerning the model parameters. Uses Generalized Estimating Equations (GEE) to efficiently estimate regression parameters with robust and model-based variance estimation. Estimates conditional and predicted marginals and tests hypotheses about the marginals. Release 11 adds confidence intervals for the marginals.
LOGISTIC—Fits logistic regression models to binary data and computes hypothesis tests for model parameters; also estimates odds ratios and their confidence intervals for each model parameter; estimates exponentiated contrasts among model parameters (with confidence intervals), uses GEE to efficiently estimate regression parameters, with robust and model-based variance estimation. Estimates conditional and predicted marginals, and tests hypotheses about the marginals. Release 11 adds confidence intervals for marginals, as well as odds ratios for a multiple-unit increase or decrease in a model covariate.
MULTILOG—Fits logistic and multinomial logistic regression models to ordinal and nominal categorical data and computes hypothesis tests for model parameters; estimates odds ratios and their confidence intervals for each model parameter; estimates exponentiated contrasts among model parameters (with confidence intervals), uses GEE to efficiently estimate regression parameters, with robust and model-based variance estimation. Estimates conditional and predicted marginals, and tests hypotheses about the marginals. Release 11 adds confidence intervals for marginal, as well as odds ratios for a multiple-unit increase or decrease in a model covariate.
LOGLINK—Fits log-linear regression models to count data not in the form of proportions. Typical examples involve counts of events in a Poisson-like process where the upper limit to the number is infinite. Estimates incidence density ratios and confidence intervals for each model parameter. Estimates exponentiated contrasts among model parameters (with confidence intervals). Uses GEE to efficiently estimate regression parameters, with robust and model-based variance estimation. Estimates conditional and predicted marginals and tests hypotheses about the marginals. Release 11 adds confidence intervals for marginals, as well as incidence density ratios for a multiple-unit increase or decrease in a model covariate.
Utility Procedure
RECORDS—Prints observations from the input data set, obtains the contents of the input data set, converts an input data set from one type to another. You can use the SUBPOPN or SUBPOPX statement to create a subset of a given data set, and you can use the SORTBY statement to sort your data. RECORDS is a non-analytic procedure.
NEW PROCEDURES
VARGEN Procedure
VARGEN is a new descriptive statistics procedure introduced in Release 11. This procedure computes point estimates and their associated design-based variances for user-defined parameters that can be expressed as complex functions of estimated means, totals, ratios, percents, population variances, population standard deviations, and correlations. Examples include estimating differences between two variables; estimating the population covariance and Pearson correlation between two variables; testing the significance of a mean (or any statistic) against a nonzero value; and estimating a ratio of means or a ratio of ratios. Point estimates can be computed within subgroups, and subgroup contrasts can also be estimated in similar fashion to other descriptive procedures in SUDAAN.
WTADJX Procedure
WTADJX is very similar to the WTADJUST procedure introduced in Release 10. As with WTADJUST, WTADJX is designed to produce weight adjustments that compensate for unit (i.e., whole-record) nonresponse and coverage errors due to undercoverage or duplications in the frame. The primary difference between WTADJUST and WTADJX is that in WTADJUST, the vector of model explanatory variables and the vector of calibration variables must be the same. In WTADJX, the two vectors are allowed to differ. Among other things, this allows researchers to assess the potential for bias in estimates when nonrespondents are not missing at random.
IMPUTE Procedure
IMPUTE is the new item imputation procedure in Release 11 and replaces the HOTDECK procedure introduced in Release 10. IMPUTE extends the capabilities of the previous HOTDECK procedure by including four methods of item imputation: the Cox-Iannacchione Weighted Sequential Hot Deck, cell mean imputation, linear regression imputation for continuous variables and logistic regression imputation for binary variables.
NEW STATEMENTS FOR EVERY SUDAAN PROCEDURE
NEWVAR Statement
The NEWVAR statement allows users to recode existing variables, store the recoded variable in a new variable, and use the new variable in the same procedure for processing (e.g., on a CLASS, MODEL, VAR, or TABLES statement). The NEWVAR statement is available in all procedures and more than one NEWVAR statement can be included in the same procedure call. NEWVAR can create new variables via direct assignment or using IF-THEN-ELSE logic.
SUBPOPX Statement
The new SUBPOPX statement in Release 11 is used to define a subpopulation in a more flexible way than SUBPOPN.
BY (RBY) Statement
The BY statement (RBY in SAS-Callable SUDAAN) allows users to request output by the values of the variables specified on the BY statement. The new BY statement in Release 11 is very similar to the BY statement in SAS.
CROSSTAB PROCEDURE ENHANCEMENTS
Cohen’s Kappa Measure of Inter-Rater Agreement
The new AGREE statement in CROSSTAB allows one to estimate the kappa measure of agreement in square tables. Cohen's κ (kappa) Coefficient is a statistical measure of inter-rater reliability. It is generally thought to be a more robust measure than a simple percent agreement calculation, since κ takes into account the agreement occurring by chance
Breslow-Day Test for Homogeneity of Odds Ratios in Stratified 2x2 Tables
The new BDTEST statement in CROSSTAB provides the Breslow-Day Test for homogeneity of odds ratios in stratified 2x2 tables.
REGRESS, LOGISTIC, MULTILOG, AND LOGLINK PROCEDURE ENHANCEMENTS
Confidence Intervals for Predicted and Conditional Marginals
Beginning in Release 11, all modeling procedures produce 100(1-α)% confidence limits for predicted and conditional marginals, in addition to standard errors and associated t-tests.
LOGISTIC, MULTILOG, LOGLINK, AND SURVIVAL PROCEDURE ENHANCEMENTS
Odds Ratios, Incidence Density Ratios, and Hazard Ratios for Multiple Unit Change in a Continuous Variable
The LOGISTIC, MULTILOG, LOGLINK, and SURVIVAL procedures will now exponentiate regression coefficients to estimate odds ratios, incidence density ratios, and hazard ratios associated with any multiple unit change in a specified continuous covariate. Previous to this, user-specified odds ratios were only available for a 1-unit change in any covariate.
ENHANCEMENT TO ALL ITERATIVE MODELING AND WEIGHTING PROCEDURES
Model Parameter Estimates Available at each Iteration
For all modeling and weighting procedures (except REGRESS) the model parameters at each iteration of the Newton-Raphson algorithm used to estimate model parameters can be printed or output to a data file using the new output group ITBETAS or keyword ITBETA. This feature is provided to help researchers detect problematic variables that may cause the iterative algorithms to not converge.
ENHANCEMENTS TO WEIGHT ADJUSTMENTS IN LOGISTIC, WTADJUST, and WTADJX PROCEDURES
Descriptive Statistics for Weight Adjustment and Response Propensity
The LOGISTIC, WTADJUST, and WTADJX procedures will now produce descriptive statistics for the model-predicted response propensity and the weight adjustment. Descriptive statistics that can be obtained include the mean, population variance, population standard deviation, and relative standard deviation of the response propensity and associated weight adjustment. These new statistics can be obtained using the new PREDSTAT statement in SUDAAN. Standard errors associated with these estimates can also be obtained.
Weighted Response Rates
The new PREDSTAT statement can also be used to obtain estimates of the weighted response rate and the R-indicator (Representativity Indicator) statistic. The R-indicator provides a measure of the representativity of the respondents with respect to the sample or population from which they were drawn.
Precision Estimates That Properly Account for the Estimated Weight Adjustment
Beginning in Release 11, LOGISTIC, WTADJUST, and WTADJX can now properly account for the sample weight adjustment when estimating descriptive statistics (means, totals, percents, ratios) and their standard errors for any user-supplied variable. For each respondent record on the input file, the adjusted sample weight used in the computations is the product of the base weight (supplied on the WEIGHT statement) and the adjustment factor computed in the procedure. New statements associated with the weight-adjusted descriptive statistics include the VAR, NUMER, DENOM, TABLES, VCONTRAST, VDIFFVAR, VPAIRWISE, and VPOLYNOMIAL statements.
Additional Design Effects in LOGISTIC, WTADJUST, and WTADJX to Measure Impact of Weight Adjustments
In addition to providing standard error estimates that account for the weight adjustment, LOGISTIC, WTADJUST, and WTADJX also provide two sets of design effect-like statistics. In other words, the realized gains in statistical efficiency (decreases in standard error) from using one of SUDAAN’s calibration-weighting procedures can now be measured:
1. One set of design effects measures the potential bias from ignoring the estimation of the weight adjustment. These design effects are called MDEFF statistics in Release 11 and are defined as the variance of an estimate that accounts for the estimation of the nonresponse adjustment relative to the variance of an estimate that ignores this estimation. The variance estimates in both the numerator and denominator of the MDEFF statistics also account for the complex sample design.
2. The second set of design effects provides a measure of the effect of the weight adjustment on the variance of estimated means, percents, ratios, and totals. These are referred to as the ADEFF statistics. The numerator of the ADEFF design effect is a variance estimate that properly accounts for the estimation of the weight adjustment from the model in LOGISTIC, WTADJUST, and WTADJX. The denominators of these design effects are variance estimates that assume the weight adjustment is equal to 1.00. The variance estimates in both the numerator and denominator account for the complex sample design.
WTADJUST PROCEDURE ENHANCEMENT
Additional Summary Statistics in WTADJUST
Several additional summary statistics have been added to the WTADJUST procedure.
Versions of SUDAAN® are available for a wide variety of computing platforms. SUDAAN will read SAS® and SPSS® files and may be installed as an add-on to SAS. See our section on Frequently Asked Questions under Support for details on using SUDAAN with SAS and SPSS files.
SUDAAN Release 11.0.3/11.0.4, for SAS Version 9
- Windows System and Windows Server *.
- Red Hat Enterprise Linux 6 Operating System (32 bit and 64 bit, 11.0.3 only)
- Red Hat Enterprise Linux 7.9 Operating System 64 bit (11.0.4 only)
SUDAAN Release 11.0.3, Standalone Version
- Windows System**
SUDAAN Release 11.0.3, Command Prompt
- Windows System*
- Red Hat Enterprise Linux 6 Operating System (32 bit and 64 bit)
*SUDAAN has been tested on Windows 10 64 bit platforms and Windows 10 32 bit platforms
**SUDAAN has been tested on Windows 10 32 bit platforms