|
|
||||||||
ORIGINAL RESEARCH |
From the Academic Department of Obstetrics and Gynecology, University of Birmingham, Birmingham; and the Department of Obstetrics and Gynecology, University of Dundee, Dundee, United Kingdom.
Address reprint requests to: Khalid S. Khan, MBBS, MSc Academic Department of Obstetrics and Gynecology Birmingham Womens Health Care NHS Trust Birmingham B15 2TG United Kingdom E-mail: k.s.khan{at}bham.ac.uk
| Abstract |
|---|
|
|
|---|
Methods: Methodologic criteria for reporting logistic regression analyses were developed to identify problems affecting accuracy, precision, and interpretation of this approach to multivariable statistical analysis. These criteria were applied to 193 articles that reported multivariable logistic regression in the issues of four generic obstetrics and gynecology journals in 1985, 1990, and 1995. Rates of compliance with the methodologic criteria and their time trends were analyzed.
Results: The proportion of articles using logistic regression analysis increased over time: 1.7% in 1985, 2.8% in 1990, and 6.5% in 1995 (P < .001 for trend). Violations and omissions of methodologic criteria for reporting logistic models were common. The research question, in terms of dependent and independent variables, was not clearly reported in 32.1%. The process of variable selection was inadequately described in 51.8% of the articles. Among articles with ranked independent variables, 85.1% did not report assessment of conformity to linear gradient. Tests for goodness of fit were not given in 93.2% of articles. The contribution of the independent variables could not be evaluated in 36.2% of the articles because of a lack of coding of the variables. Interactions between variables were not assessed in 86.4% of articles. Analysis of variations in the quality of logistic regression analyses over time showed no increase in reporting of the criteria concerning variable selection and goodness of fit. However, the proportion of articles reporting one quality criterion concerning interpretation of the substantive significance of independent variables showed a trend toward improvement: 42.3% in 1985, 73.6% in 1990, and 75.4% in 1995 (P = .004 for trend).
Conclusion: The reporting of multivariable logistic regression models in the obstetrics and gynecology literature is poor, and the time trends of improvement in quality of reporting are not particularly encouraging.
Multivariable regression modeling is a statistical method used for defining the relation between an outcome and a set of surrogate observations. The procedure is useful in identifying observations that provide independent information with respect to likelihood of the outcome in a particular data set. Failure to perform multivariable regression appropriately can lead to misleading inferences.1,2 Therefore, the need for reporting guidelines for complex multivariable regression models in clinical research is well recognized.1
Most physicians do not have adequate training in multivariable methods of statistical analysis. Despite this inadequacy, they are frequently faced with the results of such analyses in the medical literature, and the application of multivariable analytic methods in medical research has increased in recent years.1,2 Appraisal of the quality of these analytic methods has been limited to the general medical literature, and it shows a paucity of methodologic rigor.1,2 Similar deficiencies are likely to be present in the obstetrics and gynecology literature. Therefore, we generated guidelines for assessing multivariable models and investigated the quality of reporting by applying these guidelines to articles with multivariable logistic regression analyses in the obstetrics and gynecology literature.
| Materials and Methods |
|---|
|
|
|---|
![]() |
In the logistic regression model, the term G is the log odds of the outcome variable Y; ß0 is the intercept term; and ß1, ß2, . . . are the regression coefficients, indicating the impact of the independent variables X1, X2, . . . on the dependent variable Y. The coefficient is interpreted as the change in the outcome variable (change measured as log odds) associated with a one-unit change in the independent variable.
Logistic regression is the most frequently used approach to multivariable modeling in the medical literature.1 It has the advantage that its coefficients can be easily transformed into odds ratios, which is a commonly used measure of association in medical research. Because of these clinically useful properties, we assessed the quality of multivariable analysis in articles that used logistic regression. We assembled a database of articles with multivariable logistic regression analyses by a combination of manual and electronic searches of four general obstetrics and gynecology journals (Acta Obstetricia et Gynecologica Scandinavica, British Journal of Obstetrics and Gynaecology, American Journal of Obstetrics and Gynecology, and Obstetrics & Gynecology) for the years 1975, 1980, 1985, 1990, and 1995. One of the authors manually searched all of the issues of the above journals and years, supplementing it with an electronic MEDLINE search for the same journals and years. The electronic search was conducted using the search term "logistic model." The electronic search identified two additional articles that were missed by the manual search.
The journals in this study were selected because they were available in our library and were indexed in the MEDLINE database. They were published in English, and three of them had the highest impact factor for general obstetrics and gynecology journals in the 1995 Journal Citation Reports.4 The 5-year intervals enabled us to show any trends over time. All relevant articles identified by our search were retrieved for analysis. For each article, we extracted information pertaining to the methodologic features highlighted below. In a subgroup of articles, data extraction was done in duplicate.
Several statistics textbooks provide guidelines for the conduct and interpretation of multivariable logistic regression analyses3,58; however, there are no criteria based on consensus among experts. Therefore, we generated quality criteria for the evaluation of multivariable logistic regression models (Table 1
) based on published methodologic literature. Our criteria were divided into three parts: 1) Does the model appear to be correct? 2) How well does the model describe the data? and 3) If the overall model appears correct and works well, how important is each independent variable?
|
In each article, we sought a clear research hypothesis and and a description of the variable-selection process. Reporting of variable selection was considered adequate if it depended on consideration of the clinical or biologic importance, with forcing of such variables in the model if appropriate. The threshold of statistical significance for inclusion or deletion of variables in an automated algorithm (eg, stepwise forward or backward regression) had to be specified. In the absence of any information on the methods and criteria for selection, the variable-selection process was considered unreported.
An additional issue when assessing the form of the logistic regression model is the conformity of independent variables to linear gradient. This issue is pertinent to ranked independent variables (continuous or ordinal scale) because the value of the regression coefficient is assumed to be accurate as the average effect of the variable as it moves through different zones of measurement from low to high. If this condition is not met, the actual coefficient value may vary in different measurement zones, invalidating the estimated coefficient. In the articles with ranked independent variables, conformity to linear gradient was considered to be reported if an attempt was made to detect this problem (eg, by comparison of observed and predicted values for the outcome over the ordinal zones,6 or by an alternative analysis using cross-stratification9).
The second part of our quality checklist ("How well does the overall model work?") focused on the goodness of fit, or the accuracy with which the final regression model described the data. This evaluation allowed us to determine whether knowing the values of all the independent variables collectively would predict the dependent variable any better than if we had no information on any of the independent variables. The validity of the results and conclusions from multivariable regression models rest on goodness of fit.1,6,8 Coefficients of logistic regression can be unreliable if there is overfitting or underfitting, and it is the relative paucity of outcome events that leads to these problems.10 If the ratio of outcome events to independent variables is less than 10:1, then the extent to which the independent variables, as a group, explain the dependent variable is of questionable accuracy.1,8 We searched in each article for a test assessing goodness of fit.
If the multivariable logistic regression model appears to be correct and it seems to describe the data well, then the clinical importance of the regression results can be assessed by using our criteria under "How important is each independent variable?" The contribution of each independent variable is initially evaluated by testing for its statistical significance, ie, its P value. The substantive significance of the independent variables is determined by examination of the magnitude of effect, ie, the change in dependent variable associated with a unit change in the independent variable.8 For example, a variable such as blood pressure (BP) measured in mmHg can be coded in 1-mmHg increments, 10-mmHg intervals, or dichotomously as less than 90 or greater than 90 mmHg. If the odds ratio for seizure (outcome variable) at different 10-mmHg categories of diastolic BP (independent variable) were found to be 1.20, then the odds of developing a seizure would be interpreted as increasing by 20% for each 10-mmHg rise in diastolic BP. This interpretation would be different if the coding of BP were different. Many reports do not include enough information on the coding of independent variables to allow this interpretation.1 We reviewed each article to see whether the authors reported the units of measurements for the variables and whether they were on an interval, ordinal, or binary scale. In the absence of such information, the coding of independent variables was classified as unreported.
Testing for interactions is important when the impact of one variable on the outcome is dependent on the level of another variable.1 Interactions between independent variables were considered reported if a statement in the article mentioned that they were suspected on clinical grounds and were evaluated. Interactions were classified as unreported if neither of the above was mentioned in the text.
Our main analysis was based on the assessment of compliance of the articles with our criteria for quality of multivariable logistic regression analyses. We compared compliance rates among the journals for each of the quality criteria using
2 test. We also performed analyses to identify any trends in the publications and the quality of articles using logistic regression over time. The quality trends were evaluated separately for the three parts of our checklist. To assess the trends in reporting of criteria concerning "Did the model appear to be correct?" we classified articles as adequate if they reported at least one of the following three criteria concerning independent-variable selection: a clear research hypothesis, use of biologic sensibility, and description of the statistical method used for selection. For trends analysis of "How important was each of the independent variables?" we classified articles as adequate if they reported at least one of three criteria concerning interpretation of the substantive significance of independent variables: reporting of units, coding, and interaction of variables. Variations in publication and the quality of articles over time were analyzed statistically using
2 test for trend.
| Results |
|---|
|
|
|---|
|
|
Our analysis of changes in methodologic rigor over time showed that the proportion of articles that reported at least one quality criterion concerning independent-variable selection was 61.5% (16 of 26) in 1985, 71.7% (38 of 53) in 1990, and 70.2% (80 of 114) in 1995 (P = .52 for trend). Despite a lack of an overall trend, Obstetrics & Gynecology showed a significant trend toward improvement in reporting of this quality feature (Table 4
). Percentages of articles reporting goodness of fit did not change over time: 7.7% (two of 26) in 1985, 9.4% (five of 53) in 1990, and 5.3% (six of 114) in 1995 (P = .59 for trend). For reporting of at least one quality criterion concerning interpretation of the substantive significance of independent variables, there was a significant time trend: 42.3% (11 of 26) in 1985, 73.6% (39 of 53) in 1990, and 75.4% (86 of 114) in 1995 (P = .004 for trend). This overall trend was supported only by the American Journal of Obstetrics and Gynecology and Obstetrics & Gynecology (Table 5
).
|
|
| Discussion |
|---|
|
|
|---|
The observational study design is the most common publication type. For example, in 1996, Obstetrics & Gynecology published 241 articles using observational methodology, which represented 76% of the total that year.11 Validity of the results from observational studies depends on the degree of confounding,1216 which arises because of differences between subjects in the study groups that are separately related to the outcome. Therefore, investigators must attempt to remove the effect of confounding before assessing statistical significance.12 This control for confounding is usually done by multivariable analysis.5 Methodologic inadequacies in multivariable analyses can result in bias and imprecision. Bias refers to the existence of a systematic tendency for the estimated regression coefficients to be too high or too low compared with the true values of the coefficient. Imprecision refers to the tendency for the coefficients to have large standard errors (and confidence intervals), which makes it difficult to reject the null hypothesis even when it is false.
Focusing on validity of results, the randomized controlled trial is believed to be the most methodologically robust study design.17 However, there are many instances in health care when the randomized experimental design is not practical. For example, when evaluating the association of cigarette smoking with lung cancer16,18 or that of breast-feeding with infection,15,19 the observational design is more feasible. The primary aim of randomization is to exclude the effects of confounding factors,12,20 which are expected to be equally distributed in the randomized groups, leaving the intervention or exposure under study as the only disparity. Sometimes, randomization produces unbalanced groups, which requires multivariable analysis for adjustment of the imbalances.
Factors that invalidate multivariable logistic regression analyses were included in our methodologic criteria for quality (Table 1
). In some of our criteria when addressing whether the models appeared to be correct or incorrect, there was room for leniency in rigor. This was permissible because the variable-selection process is an art combining biologic and statistical sensibility, leading to some subjectivity in assessing the compliance of the articles with the quality criteria. However, if the multivariable model is incorrectly specified, irrelevant variables may be included or relevant variables may be omitted. Inclusion of irrelevant variables increases the standard error of the regression coefficients, reducing precision.5,8 Omission of relevant variables, on the other hand, results in biased coefficients.5,8 We found that the variable-selection process was unspecified in 51.8% of the articles. In the general medical literature, however, this feature was found to be inadequate in only 14%.1 This finding suggests that investigators in general medicine may be more aware of the impact of the variable-selection mechanism on the results of multivariable analyses than researchers in obstetrics and gynecology. It was discouraging to note that the trend analysis for reporting of independent-variable selection did not show a significant improvement over time for all of the journals pooled together. Only Obstetrics & Gynecology showed increasingly better reporting rates over time (Table 4
).
The validity of the logistic regression results also depends on meeting certain assumptions, such as conformity to linear gradient with ranked independent variables1 (ordinal or continuous) and checking for possible overfitting21 and interactions between the independent variables.5 Multivariable analyses in The Lancet and New England Journal of Medicine (19851989) were reported to have risks of overfitting in 42% of articles.1 In these journals, there was also a lack of testing for interactions in 73% of the articles,1 a figure not too different from the 86.4% of articles reported without interactions in our study. Another related methodologic issue is the mathematical fit of the model, ie, how effectively the calculated model fits the actual data for estimating outcome variables.1,6,10 Similar to our finding of a lack of goodness-of-fit tests in 93.2% of articles, The Lancet, New England Journal of Medicine, British Medical Journal, and Journal of the American Medical Association also failed to report such tests in 93.5% of their articles that used logistic regression analyses from 1991 to 1994 (
Bender R, Grouven U. Logistic regression models used in medical research are poorly reported [letter]. BMJ 1996;313:628
).
The findings of our study may be viewed with skepticism by critics who would argue that the lack of reporting does not necessarily mean that validation procedures and assumption testing were not used in the data analyses. It is plausible that the investigators conducted their analyses rigorously without reporting all of the elements contained in our quality checklist because there has not been a clear standard for reporting multivariable models in the medical literature. Our study may be seen as having biased the existing articles against compliance with our quality criteria. In addition, because of space constraints in medical journals, some of the material dealing with methods of analysis may have been deleted between manuscript submission and publication. Without information on analytic methods, however, it is impossible to make confident inferences about the validity of logistic regression results. Hence there is a need for improvement in the conduct and reporting of multivariable analyses in the medical literature.
| Footnotes |
|---|
Received September 1, 1998. Received in revised form November 6, 1998. Accepted November 25, 1998.
| References |
|---|
|
|
|---|
2. Katz MH, Hauck WW. Proportional hazards (Cox) regression. J Gen Intern Med 1993;8:70211.[Medline]
3. Hirsch RP, Riegelman RK. Statistical first aid. An interpretation of health research data. Boston: Blackwell Scientific Publications, 1992.
4. 1995 Journal citation reports (JCR). Philadelphia: Institute for Scientific Information, Inc, 1996.
5. Kleinbaum DG, Kupper LL, Muller KE. Applied regression analysis and other multivariable methods. Boston: PWS-Kent Publishing Co, 1988.
6. Hosmer DW, Lemeshow S. Applied logistic regression. New York: Wiley, 1989.
7. Armitage P, Berry G. Statistical methods in medical research. 3rd ed. London: Blackwell Scientific, 1994.
8. Menrad S. Applied logistic regression analysis. Sage University paper series on quantitative applications in social sciences, 07-106. Thousand Oaks, California: Sage Publications, 1995.
9. Feinstein AR. Prognostic stratification. In: Feinstein AR, ed. Clinical biostatistics. St. Louis: CV Mosby, 1977:385443.
10. Hosmer DW, Taber S, Lemeshow S. The importance of assessing the fit of logistic regression models: A case study. Am J Public Health 1991;81:16305.
11. Funai EF. Obstetrics & gynecology in 1996: Marking the progress toward evidence-based medicine by classifying studies based on methodology. Obstet Gynecol 1997;90:10202.[Abstract]
12. Brennan P, Croft P. Interpreting the results of observational research: Chance is not such a fine thing. BMJ 1994;309:72730.
13. Goldberg RJ, Pastides H, Ellison RC, Tuthill RW, Dewitt T. Uses of the case-control and cohort epidemiological approaches in pediatric practice and research. Pediatr Res 1985;19:78790.[Medline]
14. Bracken MB. Reporting observational studies. Br J Obstet Gynaecol 1989;96:3838.[Medline]
15. Bauchner H, Leventhal JM, Shapiro ED. Studies of breast-feeding and infections. How good is the evidence? JAMA 1986;256:88792.[Abstract]
16. Smith GD, Shipley MJ. Confounding of occupation and smoking: Its magnitude and consequences. Soc Sci Med 1991;32:1297300.
17. Sibbald B, Roland M. Why are randomised controlled trials important? BMJ 1998;316:201.
18. Loeb LA, Ernster VL, Warner KE, Abbotts J, Laszlo J. Smoking and lung cancer: An overview. Cancer Res 1984;44:594058.[Medline]
19. Jason JM, Nieburg P, Marks JS. Mortality and infectious disease associated with infant-feeding practices in developing countries. Pediatrics 1984;74:70227.
20. Treasure T, MacRae KD. Minimisation: The platinum standard for trials? BMJ 1998;317:3623.
21. Harrell FE Jr, Lee KL, Matchar DB, Reichert TA. Regression models for prognostic prediction: Advantages, problems, and suggested solutions. Cancer Treat Rep 1985;69:10717.[Medline]
This article has been cited by other articles:
![]() |
R. T. Mikolajczyk, A. DiSilvesto, and J. Zhang Evaluation of Logistic Regression Reporting in Current Obstetrics and Gynecology Literature Obstet. Gynecol., February 1, 2008; 111(2): 413 - 419. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Van Holsbeke, B. Van Calster, L. Valentin, A. C. Testa, E. Ferrazzi, I. Dimou, C. Lu, P. Moerman, S. Van Huffel, I. Vergote, et al. External Validation of Mathematical Models to Distinguish Between Benign and Malignant Adnexal Tumors: A Multicenter Study by the International Ovarian Tumor Analysis Group Clin. Cancer Res., August 1, 2007; 13(15): 4440 - 4447. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. P. V. Francisco, S. Miyadahira, and M. Zugaib Predicting pH at Birth in Absent or Reversed End-Diastolic Velocity in the Umbilical Arteries. Obstet. Gynecol., May 1, 2006; 107(5): 1042 - 1048. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Moss, D. A. Wellman, and G. A. Cotsonis An Appraisal of Multivariable Logistic Models in the Pulmonary and Critical Care Literature Chest, March 1, 2003; 123(3): 923 - 928. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Clark, S. H. Bakour, J. K. Gupta, and K. S. Khan Evaluation of Outpatient Hysteroscopy and Ultrasonography in the Diagnosis of Endometrial Disease Obstet. Gynecol., June 1, 2002; 99(6): 1001 - 1007. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Mullner, H. Matthews, and D. G. Altman Reporting on Statistical Methods To Adjust for Confounding: A Cross-Sectional Survey Ann Intern Med, January 15, 2002; 136(2): 122 - 126. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |