Bioequivalence data analysis-SAS
SAS is commonly employed in the analysis of bioequivalence (BE) data. On the other hand, R, a freely available open-source software for general-purpose data analysis, sees less frequent use than SAS in BE data analysis. This tutorial elucidates the utilization of R for BE data analysis, showcasing its capability
to yield comparable results to SAS.
The principal SAS procedures for BE data analysis, namely PROC GLM and PROC MIXED, find their counterparts in R through the main packages “sasLM” and “nlme,” respectively. In situations involving fixed effects or balanced data, both SAS PROC GLM and R “sasLM” yield reliable estimates. Conversely, when dealing with a mixed-effects model and unbalanced data, SAS PROC MIXED and R “nlme” are preferable, offering unbiased estimates. For user convenience, SAS and R scripts have been provided.
Introduction
The SAS PROC GLM has been in use for over 40 years, since 1976, while the PROC MIXED is a more recent procedure introduced in 1992. In PROC GLM, all effects are treated as fixed effects for calculations, whereas PROC MIXED is specifically designed to accurately compute mixed-effects models, incorporating
random effects. Given the common utilization of both fixed and random effects in bioequivalence (BE) studies, the MIXED procedure offers a superior linear unbiased estimator of random effects compared to GLM in BE analysis.
The distinction between fixed and random effects has been a topic of discussion in BE studies. Factors that exhibit distinct values in experiments or are intentionally chosen by investigators are referred to as fixed factors. In BE studies, period and treatment are typically considered fixed effects because these studies exclusively focus on mean differences, with treatment levels chosen deliberately rather than sampled from a distribution.
Conversely, random factors in many studies represent broader populations. For instance, the subject is considered a random factor as it is chosen to represent a sample from a population with a probability distribution. While level means and differences for fixed factors can be estimated and tested, those for random factors should neither be estimated nor tested; only the degree of variability (i.e., spread) should be estimated.
Despite numerous references emphasizing the use of “PROC MIXED” for analyzing models with both fixed and random effects. The application of GLM rather than MIXED has been predominant in crossover BE studies (2 × 2). This is attributed to the fact that both methods yield comparable results with balanced data. However, it’s crucial to note that PROC GLM necessitates balanced data, meaning subjects in a crossover trial who fail to provide evaluable data for both the test and reference products (e.g., dropouts) should be excluded during the statistical analysis with PROC GLM. On the contrary, PROC MIXED accommodates missing data at random, eliminating the need to exclude subjects with incomplete data.
Prior to the availability of the R “sasLM” package, replicating the results of SAS PROC GLM in R was impractical. The “Anova” function in the “car” package or the “drop1” function proves ineffective for BE data utilizing nested crossover designs. Therefore, it is recommended to employ SAS PROC MIXED or R “nlme” for conducting significance tests and calculating confidence intervals (CIs).
METHODS
In the evaluation of average bioequivalence (BE) between formulations regarding average bioavailability, the customary approach involves the following steps:
Utilize log-transformed values for both the areas under the plasma concentration-time curve from 0 to the last measurable concentration (AUC) and the peak concentrations (C).
Conduct an analysis of variance (ANOVA), such as PROC GLM from SAS, to assess the impact of group (or sequence), subject, period, and formulation (or treatment).
Following the aforementioned evaluation, derive 90% confidence intervals (CIs) using the mean squared error of ANOVA, falling within the range of log(0.80) to log(1.25).
The proposed process for BE analysis is outlined as follows: Apply log-transformed values for both AUC and C. In cases of dropout subjects, their data can be included if non-compartmental analysis can be executed from one or more periods.
Examine the effects of group (or sequence), subject, period, and formulation (or treatment) using a mixed-effects model, such as PROC MIXED in SAS or nlme::lme in R. This analysis is intended for the consideration of the study, not for invalidation or judgment.
Subsequent to the aforementioned test, determine 90% CIs using the estimate for intrasubject variance, which should be within the range of log(0.80) to log(1.25).
Software
SAS 9.4 and R 4.0.3 were used for the script and results
Statistical tools for BE analysis
Statistical instruments for bioequivalence (BE) analysis were employed to illustrate distinctions between mixed-effects models utilizing restricted maximum likelihood estimation and fixed-effects models. The comparison involved tools initially designed for linear mixed-effects models, such as SAS PROC MIXED and “nlme” in R, juxtaposed with those primarily intended for fixed-effects models, including SAS GLM and “sasLM” in R. The assessment of bioequivalence was demonstrated using a simulated and unbalanced
dataset as an example.
An example dataset saved as “BEsim.csv
Please be aware that Subjects 3 and 6 withdrew from the study after period 1, resulting in the absence of their data for period 2. It is recommended to disregard subjects with missing observations, including dropouts, when employing the GLM procedure. Nevertheless, in this instance, we showcased the BE analysis without excluding dropouts to facilitate a comparison of the results using the complete dataset. For illustrative purposes, only C data are utilized, and AUC data are intentionally omitted. In the presence of AUC data, the corresponding AUC column can be incorporated, and adjustments to the script should be made accordingly.
SAS script
The SAS script for data preparation and validation is presented in Figure 2, while the scripts for conducting PROC GLM and PROC MIXED analyses on 2 × 2 BE data are depicted in Figures 3 and 4, respectively. In the analysis described above, PROC GLM computes fixed effects for group (or sequence), subject, period, and formulation (or treatment). Subsequently, the subject effect is treated as a random effect in the calculation process.
R script
The R script detailing data preparation is illustrated in Fig., while the corresponding scripts for the SAS PROCGLM and PROC MIXED analyses concerning 2 × 2 BE data are presented in Figs. and 7, respectively. Both the “nlme” function in R and PROC MIXED treat the subject effect as a random factor throughout the computation. Furthermore, both “nlme” and PROC MIXED are equipped to assess the impacts of group, period, and formulation using F-tests or t-tests, aligning with the core purpose of A NOVA.
ANOVA, analysis of variance; CI, confidence interval; GMR, geometric mean ratio
R script equivalent to SAS PROC MIXED
The “af” functionality within the “sasLM” package transforms the data type of certain columns into factors. It is essential to interpret “af” as an abbreviation for “as factor.” It is important to note that the estimation of differences and confidence intervals (CI) is carried out in log scales, and the reversal is achieved through the application of the “exp” function.
A point of caution pertains to the uniqueness of subject IDs (Subject) within each group. If there is overlap in subject IDs between groups, it is imperative to modify the random argument in the second line as outlined below:
Result
= lme(log(Cmax) ~ Group + Period + Treatment, random=~1|Group/Subject,data=BEdata)
The GLM and MIXED analyses yield estimates and LSMeans, providing method means along with their corresponding standard errors. The R software’s “nlme” package mirrors the results obtained from “SAS PROC MIXED,” displaying geometric mean ratio point estimates of 0.8668 with a 90% CI of 0.5565–1.3501 (refer to Table 2). Additionally, the results from the “sasLM” package align with those of SAS PROC GLM, showing geometric mean ratio point estimates of 0.8708 and a 90% CI of 0.5515–1.3748. Hence, the “sasLM” package emerges as a potentially viable alternative for computing type III sum of squares .
ANOVA, analysis of variance; CI, confidence interval.
In the comparison of GLM and MIXED analyses in a balanced incomplete-block (BIB) crossover study with fixed and random effects, it was observed that the F values, means, and mean differences differed between the two methods due to the unbalanced nature of the example dataset. The analysis conducted using the MIXED method (implemented as “nlme” in R software) is advised over the GLM method for the accurate estimation of random between-subject effects and their variance in crossover bioequivalence studies.