Skip Navigation


NDT Advance Access originally published online on November 19, 2007
Nephrology Dialysis Transplantation 2007 22(12):3422-3430; doi:10.1093/ndt/gfm777
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
22/12/3422    most recent
gfm777v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Ravani, P.
Right arrow Articles by Barrett, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ravani, P.
Right arrow Articles by Barrett, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author [2007]. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org



Clinical research of kidney diseases III: Principles of regression and modelling

Pietro Ravani1,2, Patrick Parfrey2, Veeresh Gadag3, Fabio Malberti1 and Brendan Barrett2

1Divisione di Nefrologia e Dialisi, Azienda Instituti Ospitalieri di Cremona, Cremona, Italy, 2Clinical Epidemiology Unit and 3Division of Community Health and Humanities, Faculty of Medicine, Memorial University of Newfoundland, Canada

Correspondence and offprint requests to: Pietro Ravani, Divisione di Nefrologia, Azienda Istituti Ospitalieri di Cremona, Italy, Largo priori 1, Cremona, 26100, Italy. E-mail: pietro.ravani{at}med.mun.ca

Keywords: confounding; interaction; interval estimate; point estimate; regression models



   Introduction
 Top
 Introduction
 Regression analysis
 Statistical models
 Modelling issues
 Reporting
 References
 
Inappropriate data analysis is a source of measurement error in clinical studies [1]. Descriptive methods (graphs, summary statistics and relational plots) are used to assess variable distributions, identify possible outliers and reveal the form of the relationship of interest. For example, in a study of hyperparathyroidism in chronic kidney disease, researchers are interested in the sample mean and standard deviation (SD) of both parathyroid hormone and kidney function levels, and in the form of their possible relationship (i.e. whether it is present across all variable levels and whether it can be described by a line, a curve, etc.). The next step is to extend the conclusions beyond the immediate sample (inference) and estimate, for example, the amount of parathyroid hormone increase as kidney function declines. Statistical models are used to test whether an input–output relationship is supported by observed data and assess its direction and strength [1,2]. Most researchers and consumers of clinical research are familiar with the preliminary steps of data analysis. However, there is a growing interest in filling the gap between elementary notions and more advanced knowledge. The present paper provides introductory notes on general principles of statistical modelling, including how regression methods are chosen and used to address epidemiological phenomena such as confounding and interaction.



   Regression analysis
 Top
 Introduction
 Regression analysis
 Statistical models
 Modelling issues
 Reporting
 References
 
Role of statistics
The task of statistics in the analysis of epidemiological data is to distinguish between chance findings and results that may be replicated upon repetition of the study [1]. For example, if a relationship between left ventricular mass (LVM) and systolic blood pressure (SBP) exists, LVM is expected to change by a certain amount as SBP changes. Data from a recent large-scale multicentre application of cardiac magnetic resonance in people without clinical cardiovascular disease were modelled to estimate the average change in LVM per unit change in SBP. LVM was 9.6 g greater (point estimate), 95% confidence intervals (CI) from 8.5 to 10.8 (interval estimate), per each 21 mmHg (SD) higher SBP [3]. The point estimate (fit) is the explained output variability, whereas the difference between recorded and predicted values (random error) is the variability unexplained by the model. This is used to calculate the 95% CI (measure of precision).

The residual error implies that the value of the response for an individual knowing his/her SBP (and other inputs in the multivariate model) can never be predicted with certainty. For example, considering the adjusted effects of SBP (9.6 g per SD) and body mass index (BMI, 11.7 g per 5 kg/m2), the expected LVM of a subject with SBP of 147 mmHg and BMI of 30 kg/m2 is 9.6 x 7 + 11.7 x 6 = 137.4 g [3]. This LVM may not correspond to the observed value for a subject with SBP of 147 and BMI of 30 (if that subject exists). Further information can reduce this error. However, even including several inputs into the model the ‘exact’ response value can never be established. In other words, some amount of variation will remain unexplained after fitting a model to any data. Measures of the explained variability in the response, such as the overall R2 statistic in linear models or equivalent (likelihood) measures in other models, inform on the clinical relevance of the effects as opposed to their statistical significance [2].

Finally, statistics only convey the effect of the chance element in the data but can neither identify nor reduce systematic errors [1,2]. The only bias that can be controlled during statistical analyses is ‘measured’ confounding. However, the interpretation of the results would be wrong if the statistical tool is incorrect. This implies the choice of the proper function to model the data and regression technique.

Concept of function
Most clinical research can be simplified as an assessment of the relationship between exposure (X, independent variable) and disease (Y, dependent variable). For example, if the study hypothesis is that LVM depends on BMI, smoking habit, diabetes and SBP [3], then the observed values of LVM (y) are said to have a functional relationship with these four variables (x1, x2, x3 and x4). This implies a link between input(s) and output.

A function (equation) can be thought of as a ‘machine’ transforming some ingredients (inputs) into a final product (output). Technically the ingredients on which a function operates are the ‘argument’ of that function. Just as any machine produces a specific output and has a typical shape and characteristics, similarly any function has its specific response variable and a typical mathematical form and graphical shape.

A special ‘ingredient’ is the linear predictor (LP), which is the ‘argument’ of most statistical functions of interest to clinical epidemiology. LP contains one or more inputs (the ‘Xs’) combined in linear fashion (i.e. LP is a linear function of the Xs). Figure 1 shows three important functions of the LP: the identity function, which does not modify its argument and gives LP as output; the exponential function of LP and the logistic function of LP. The underlying mathematical structure is not important here. However, two aspects should be noted: first, different transformations change the ‘shape’ of the relationship between LP (input) and its function (output); second, although LP can range from –{infty} to +{infty} (allowing any type of input to be accommodated into it), its function can be constrained into a range between 0 and 1 (logistic function); it can have a lower limit of 0 (exponential function) or can just have the same range as the LP (identity function). These aspects are crucial as the model choice is based on the distribution of the response. Different responses require different ways of modelling LP.


Figure 1
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Example of three common functions of the linear predictor (LP). The identity function does not change the LP (linear model) and yields graphs that are straight lines; the exponential function is the exponentiated LP (Poisson model); the logistic function is a sigmoid function of LP (logistic model). Note that different functions have not only different shapes but also different ranges.

 
Regression methods
Once a function has been chosen to describe the relationship of interest, its coefficients are estimated using regression strategies. Regression and correlation are often confused as they measure the degree of relationship between two or more variables in two related but different ways. Correlation (or more generally, covariation) measures the degree of association between two variables without distinction between input(s) and output. The variables can be two inputs or two outcome measures in the same subject. In regression analysis the output is modelled as a function of one or more inputs to predict its future values.

The term regression implies the tendency towards an average value. For example, if there is a linear relationship between age and 5-year mortality, the average change in mortality per unit change in age can be estimated using linear regression. This estimation task is accomplished by obtaining specific values (estimates) for the ‘unknowns’ (parameters) of the specific regression function. In the above example, the linear function of mortality is mortality = LP + {varepsilon} = β0 + βage x age + {varepsilon}, where LP is the linear function β0 + βx x x (fit) and {varepsilon} is the variability in the data unexplained by the model. The parameters of this univariable model are β0, representing the intercept of the line describing the input–output relationship, and βx, representing its slope (average change in mortality per year of age). For each ‘β’ the model provides a point estimate and 95% CI.

Different regression methods exist. The method commonly used in linear regression, for example, is the ordinary least-squares method (OLS). In lay words this method chooses the values of the function parameters (β0, βage) that minimize the distance between the observed values of the response y and their mean per unit of x (thus minimizing ‘{varepsilon}’). Graphically this corresponds to finding a line on the Cartesian axes passing through the observed points and minimizing their distance from the line of the average values towards which the observed measures are ‘regressed’ (Figure 2, left). Other estimation methods exist for other types of data, the most important of which is maximum likelihood estimation (MLE). As opposed to OLS, MLE works well for both normally (Gaussian) and non-normally distributed responses (for example Binomial or Poisson). However, all estimation procedures choose the most likely values of the parameters given the data, those that minimize the amount of error or difference between what is observed and what is expected.


Figure 2
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Ordinary least-squares method and components of a statistical model. The regression line drawn through a scatter plot of two variables is ‘the best fitting line’ of the response (left). In fact, this line is as close to the points as possible providing the ‘least sum of squares’ deviates or residuals (vertical dashed lines). These discrepancies are the differences between each observation (‘y’) and the fitted value to a given exposure value ("y"). The statistical model of the response (Y) includes a systematic component (SC) corresponding to the regression line (the linear predictor LP in linear regression or some transformation of the LP in other models) and an error term (E, right) characterized by some known distribution (for the linear model the distribution is normal, with mean = 0 and constant variance = {sigma}2).

 


   Statistical models
 Top
 Introduction
 Regression analysis
 Statistical models
 Modelling issues
 Reporting
 References
 
Definition
Models are representations of essential structures of objects or real processes. For example, the earth may be approximated to a sphere in geographic calculations although it is flattened at the poles. Given a reasonably linear relationship between kidney function and haemoglobin concentration, a linear model may be used to study anaemia in chronic kidney diseases. Even in the presence of mild deviations from ideal circumstances, the representation of a process by means of a simple model, such as the linear model, helps grasp the intimate nature and mechanisms of that process. Obviously, critical violations of model assumptions would make the model inappropriate. The linear model would be wrong if the relationship was exponential. Similarly, the sphere would not be an acceptable model if the earth were a cone. A useful model is a good compromise between appropriateness of the chosen function and interpretability of the effects (β), while leaving as little unexplained variability in the data as possible ({varepsilon}).

Indeed biologic phenomena, as opposed to deterministic phenomena of physics or chemistry, are characterized by considerable variability, yielding different results when repeated in the same experimental conditions. Probabilistic rather than deterministic models are applied to biomedical sciences as they include indexes of uncertainty around the population parameters estimated using samples (e.g. 95% CI). These characteristics of statistical models are reflected in their fit and random components. For example, the fit portion of a linear model is a line, and the errors are distributed normally with mean equal to zero (Figure 2, right). In other models the fit portion has different shapes and the residuals have different distribution.

Model choice
The most appropriate statistical model to fit the data depends on the type of response variable because this determines the shape of the systematic portion and the typical error distribution. Previous literature often provides useful information to guide on the model choice before data are collected. Once the model has been built, its systematic and random components are verified graphically, using formal tests based on residuals in order to ensure that the chosen model fits the data well. These procedures are called model checks (including influential observations and outliers) and assumption verification. They will not be discussed in this paper. However, three major principles can be summarized using the linear model as an example.

First, the relationship between input and output must reflect the mathematical form of that model. For example, to use the linear model the relationship must be linear. In other models, the functional form of the relationship is describable by other curve shapes (Figure 1) and the meaning of the parameters is different. This ‘shape’ assumption pertains to the systematic component (Figure 3). The other two conditions to use a statistical model pertain to its random component. First, the residuals must follow a distribution compatible with the specific model: normal in linear regression, binomial in logistic regression and Poisson in Poisson regression. For example, in a study of asymmetric dimethylarginine (ADMA) and glomerular filtration rate (GFR), the observed GFR was approximately symmetrically distributed above and below the fitted line of GFR over ADMA (error mean = zero), with equal (constant) variance along the whole line [4]. Second, the residuals must be independent. This is possible only if the observations are independent. This condition is violated if repeated measures are taken on the same subjects or if there are clusters in the data, i.e. some individuals sharing some experience/conditions that make them not fully independent. Consequently, once some measurements have been made, it becomes possible to more accurately ‘guess’ the values of further measurements within the same individual/cluster, and the corresponding errors are no longer due to chance alone. This final assumption must be satisfied in the study design. In the presence of correlation, appropriate statistical techniques are required.


Figure 3
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Linearity and equal variance. In the left panel, the response is linearly related to the exposure and has constant variance (homoscedasticity). In the right plot, two possible important violations are depicted: non-linearity and unequal variance (heteroscedasticity).

 
When the necessary conditions to use a certain model are clearly violated, they can be carefully diagnosed and treated. For instance, often non-linearity and unstable variance of a continuous response can be at least partially corrected by some mathematical transformations of the output and/or the inputs in order to permit use of the linear model. Urinary protein excretion, for example, is often log-transformed both when it is treated as output [5] and input [6]. However, any data transformation changes the meaning of the model parameters and their interpretation may become obscure. Reports often fail to explain clearly the meaning of the parameters of some complex models [5–8].

These three conditions have the following meaning. Once the model has been fitted to the data (1) it must be possible to quantify the amount of change of the output per unit change of the input(s), i.e. the parameter estimates are constant and apply over the whole range of the predictors; (2) what remains to be explained around the fit is unknown independent of the input(s) values and (3) the measurement process. For more detailed discussion on applied regression the interested reader is referred to specific texts [9].

Multivariable versus univariable analysis
Multiple regression models contain more than one input. Therefore, they estimate more effects simultaneously. A graphical approach using the linear model may help understand this concept.

When only the response variable is considered, e.g. the overall mean and standard deviation of LVM [3], the largest possible variability is observed in the data (unconditional response). The output variability becomes smaller if the response is studied as a function of one input at a time or, better, two inputs at the same time (conditional distribution of the response). As the systematic component of a multivariable model contains more information on the variability of the response, the amount of unexplained variability gets smaller (Figure 4, left). The intercept and standard error of the model without input variables (‘null model’, i.e. y = β0 + {varepsilon}) are the parameters of the unconditional distribution of the response (mean and SD).


Figure 4
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Information gain and residual variance and three-dimensional representation of the linear model. The residual variance of left ventricular mass (LVM) gets progressively smaller in comparison to the unconditional response (distribution of the response without any knowledge about exposure) as more informative inputs are introduced into the model. The inputs are systolic blood pressure, SBP (x1, in mmHg) and body mass index, BMI (x2, in kg/m2). These quantitative predictors generate a plane in the three-dimensional space. The number of fitted planes increases with the number of levels of a qualitative input (e.g. diabetes, DM).

 
Figure 4 (right) shows the multidimensional consequences of introducing more inputs. With two quantitative predictors such as SBP and BMI, the fitted values of LVM lie on a plane in the three-dimensional space, the plane that minimizes the residuals. The addition of a third quantitative variable would create a hyper-plane in the multidimensional space and so on. Of note, qualitative inputs, such as diabetes, separate the fitted values on more planes, one per each level of the independent variable. This plane would have some sophisticated shape in other models, but the multidimensional meaning of multivariable analysis would be the same.



   Modelling issues
 Top
 Introduction
 Regression analysis
 Statistical models
 Modelling issues
 Reporting
 References
 
Confounding
Definition.
A confounder is an ‘extraneous’ variable associated with both exposure and response without lying in the pathway between them, (Figure 5). Conversely, a marker or proxy is only related to the exposure, an intermediate variable explains the outcome, and two inputs are colinear when they carry the same or at least similar information. For example, in a recent ultrasound study of renal resistance indices (RI) in chronic kidney disease, carotid intima–media thickness was significantly associated with RI in baseline models that did not include age [10]. However, older patients had thicker carotid artery walls and once age was entered into the model intima–media thickness lost its predictive power (confounding by age). In the final model of RI the introduction of phosphate ‘lowered’ the coefficient of GFR (another input). However, phosphate increase may be one mechanism through which kidney function reduction contributes to higher RI, i.e. be in the causal chain between exposure and response (intermediate variable). Adjustment of the effect of GFR for phosphate (which is affected by GFR) may be biased, as it does not merely reflect confounding. Although modelling strategies help identify multiple relationships, their direction and temporal sequence should be made explicit in the design and ideally tested in experimental studies [1,2]. A further challenge of longitudinal data is that some covariates may play the dual role of confounders and intermediates over time [18,19]. For example, in a study of the effect of obesity on mortality, the development of clinical cardiac or respiratory disease is an independent predictor of both mortality and subsequent weight loss and is influenced by prior weight gain. In a study of anti-proteinuric agents and mortality, the time-dependent covariate proteinuria is both an independent predictor of survival and initiation of therapy and is itself influenced by prior treatment.


Figure 5
View larger version (26K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Possible relationships between input and outcome. A confounding factor (XC) is independently associated with the response (YO) even in the absence of the exposure of interest (XE), and with XE without being the consequence of XE (e.g. in a study of coffee drinking, XE, and coronary artery disease, CAD, smoking, XC, is an independent risk factor for CAD, and related to coffee drinking without being in the causal chain between them). A marker or proxy (XM) is associated with the exposure only and has no direct relationship with the outcome (e.g. yellow fingers are related to smoking but not to CAD). Two inputs (XE and XF) may also have an independent association with the response; this is the ideal situation as it maximizes the information in the data (e.g. both male gender and age are independently related to CAD). Colinearity is a phenomenon whereby two inputs (X1 and X2) carry (at least partially) the same information on the response (e.g. age and kidney function estimates including age). An intermediate variable (XI) lies in the pathological path leading to the outcome (e.g. cholesterol levels in the causal chain between diet and CAD). Inclusion of XI in a multivariable model can be useful to assess the amount of change in the estimated effect of XE (Figure 5) on YO mediated by XI. However, the adjusted Figure 5 is biased as (at least) part of the effect of XE is due to an effect of XE on XI rather than confounding.

 
Control.
Confounding can be prevented using randomization, restriction or matching in the design phase [1,2] and controlled through stratification or modelling during analysis (Table 1). Stratification refers to cross-tabulation of data on exposure and response by categories of one or more potential confounders (Table 2). Adjusted estimates are obtained aggregating stratum specific information using pooling or standardization [12]. The way multivariable regression removes the association between confounder and outcome (the necessary condition for confounding) is straightforward. Consider the following linear model of the response (Y) including exposure (E) and a confounder (C): Y = β0 + βEE + βCC + {varepsilon}. The difference between Y and the effect of C left in the model gives the effect of the E: y – βCC = β0 + βEE + {varepsilon}. The right-hand part of the equation is a simple regression. The same applies to other models. This is the epidemiological concept of independence: ‘independent’ means purified from the effects of other inputs kept in the model.


View this table:
[in this window]
[in a new window]

 
Table 1. Confounding and interaction in multiple regression models

 

View this table:
[in this window]
[in a new window]

 
Table 2. Hypothetical cardiac event data by the presence of diabetes (DM versus non-diabetics, ND) with age (A < 65 versus A ≥ 65) as a potential confounder

 
Interaction
Definition.
An interaction between two inputs is a modification of the effect of one input in the presence of the other (Table 1). For example, in the CARE trial, inflammation was associated with higher progression rate of kidney disease, whereas pravastatin treatment was associated with slower progression rate only in the presence of inflammation [11]. Inflammation modified the effect of pravastatin (and vice versa). The interacting variables (main terms) can be of the same type (qualitative or quantitative) or different type. The interaction effect can be qualitative (antagonism) or quantitative (synergism). For example, in one study [7] both BMI and HbA1C were directly related to the log albumin/creatinine ratio (output) when considered separately. However, the interaction coefficient had a negative sign indicating that the total change in the response in the presence of one unit increase of both inputs was lower than the sum of the two main effects, i.e. (0.1535 + 0.0386) – 0.0036.

The term interaction is challenging as it describes both the biologic interdependence of two factors in exerting their effects and statistically the necessity for a new term in a model.

Statistical assessment versus epidemiological interpretation of interaction.
The formal test for the presence of interaction tests whether there is a deviation from the underlying form of that model (Figure 6). For example, if the effect of age and diabetes on some event rate are respectively βAGE = 0.001 (per year) and βDM = 0.02 and there is no (significant) interaction, then the two fitted lines corresponding to the presence and absence of diabetes are constantly 0.02 rate units apart but have the same slope (additive model). Conversely, if there is an interaction effect βINT = 0.001 the two lines of the interaction model are also diverging by a certain amount due to the further rate change per year of age in diabetics, graphically a difference in slope (multiplicative model). Statistically, the interaction coefficient estimates the amount of departure from the underlying form of the model. Epidemiologically, the coefficient of interaction is a difference between differences (in terms of LP). For example, there is a rate difference of 0.001 to consider if a subject is 1 year older and diabetic in addition to the differences of the main effects (Figure 6). In linear models, interactions between two continuous variables would change the slope of the fitted line without affecting the model intercept [7]. Interactions involving only qualitative inputs change the intercept of the line.


Figure 6
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. Interaction parameter as a measure of the departure from the underlying form of a model. The plot shows two models of some event rate as a function of age and diabetes without interaction and with their interaction term. When diabetes is absent (ND, bottom line) the event rate is explained by age only in both models. When diabetes is present (DM) the fitted line of the event rate depends on age and diabetes according to the no interaction model (middle line) and on age, diabetes and their product (INT) in the interaction model (top line). In the no interaction model the effect of diabetes consists in shifting the event rate by a certain amount quantified by the coefficient of diabetes (change in the intercept of the line). In the interaction model, the (dashed) fitted line is not only shifted apart for the effect of diabetes but also diverging from the bottom line (absence of diabetes). The amount of change in the slope is the effect of the interaction between age and diabetes and is a measure of the departure from the underlying additive form of the model.

 
Measurement scale and biological implications.
The definition of interaction as a measure of the departure from the underlying form of the model meets both statistical and biological interpretation of the phenomenon as an amount of effect unexplained by the main terms. However, when this effect is measured, the interpretations differ, depending on the model scale [14]. In linear models, the statistical and biological perspectives coincide: input effects are (untransformed) differences. Interaction parameters are differences chosen to measure departure from an additive model: antagonistic interaction results in a change lower than expected (under-additive) [7]; synergistic interaction results in a change greater than expected (over-additive) (Figure 6). Statistical testing of this departure also measures the biologic phenomenon. Conversely, Cox's, logistic and Poisson regressions are multiplicative models because the joint effect of two or more factors is the product (rather than the sum) of their effects as LP is the argument of some non-identity function. For example, if the risk of death associated with diabetes is twice as high as in non-diabetics and is three times as high in men as in women, diabetic men have a risk six times higher than non-diabetic women. In these models effects are measured as ratios and interaction parameters are ratios chosen to measure departures from a multiplicative model: antagonistic interaction results in a change lower than expected (under-multiplicative), whereas a synergistic interaction results in a change greater than expected (over-multiplicative). Statistical assessment of this departure tests whether there is a departure from multiplicativity and not the existence of a biologic phenomenon [14]. Thus, from the statistical viewpoint, interaction depends on how the effects are measured. However, lack of evidence of deviation from the multiplicative scale supports the existence of biologic interaction, as the resulting change in the response is greater than the sum of the effects (over-additivity). This requires a biological explanation. For example, if diabetic men have a risk six times as high as non-diabetic women and the relative risks associated with the main effects are 3 and 2, there is no deviation from the multiplicative scale but there is over-additivity because 6 – 1 > (3 – 1 + 2 – 1). On the other hand, the choice of the model depends on the distribution of the response variable and cannot be dictated by the need to study interaction. There are ways to use multiplicative models and still assess the biological meaning of the phenomenon (Table 3 and 4).


View this table:
[in this window]
[in a new window]

 
Table 3. Hypothetical cardiac event data expressed as an incidence rate ratio (IRR) by level of two risk factors: smoking and hypertension, where there is no interaction on a multiplicative scale but there is on an additive scale

 

View this table:
[in this window]
[in a new window]

 
Table 4. Hypothetical cardiac event data expressed as an incidence rate ratio (IRR) by level of two risk factors: smoking and hypertension, where there is antagonism on a multiplicative scale and synergism on an additive scale

 
Analysis power
The study size should be much larger than the number of input variables in the model. Most authors recommend that there should be at least 10 to 20 times as many observations as there are coefficients in the model; otherwise the estimates are very unstable [15]. Models of binary outcomes require at least 10 events per parameter [16]. For example, age as continuous input will have one coefficient, three age categories will have two parameters (one reference category) and so on.



   Reporting
 Top
 Introduction
 Regression analysis
 Statistical models
 Modelling issues
 Reporting
 References
 
The reporting of statistical methods and results in the medical literature is often suboptimal. A few tips are summarized in Table 5. More detailed checklists for reading and reporting statistical analyses are available in textbooks [17].


View this table:
[in this window]
[in a new window]

 
Table 5. Reporting statistical methods and regression results

 



   Acknowledgments
 
P.R. held a young investigator award from the Italian Society of Nephrology for the year 2005–2006 and received funding from the EU (Marie Curie Actions-OIF, proposal 021676) for the year 2006–2007.

Conflict of interest statement. None to declare.



   References
 Top
 Introduction
 Regression analysis
 Statistical models
 Modelling issues
 Reporting
 References
 

  1. Ravani P, Parfrey PS, Curtis B, et al. Clinical research of kidney diseases I: researchable questions and valid answers. Nephrol Dial Transplant (2007) 22:2459–68.[Free Full Text]
  2. Ravani P, Parfrey PS, Dicks E, et al. Clinical research of kidney diseases II: problems of study design. Nephrol Dial Transplant (2007) 22:2785–94.[Free Full Text]
  3. Heckbert SR, Post W, Pearson GD, et al. Traditional cardiovascular risk factors in relation to left ventricular mass, volume, and systolic function by cardiac magnetic resonance imaging: the multiethnic study of atherosclerosis. J Am Coll Cardiol (2006) 48:2285–2292.[Abstract/Free Full Text]
  4. Ravani P, Tripepi G, Malberti F, et al. Asymmetrical dimethylarginine predicts progression to dialysis and death in patients with chronic kidney disease: a competing risks modeling approach. J Am Soc Nephrol (2005) 16:2449–2455.[Abstract/Free Full Text]
  5. Palatini P, Mormino P, Dorigatti F, et al, HARVEST Study Group. Glomerular hyperfiltration predicts the development of microalbuminuria in stage 1 hypertension: the HARVEST. Kidney Int (2006) 70:578–584.[Web of Science][Medline]
  6. Malik AR, Sultan S, Turner ST, et al. Urinary albumin excretion is associated with impaired flow- and nitroglycerin-mediated brachial artery dilatation in hypertensive adults. J Hum Hypertens (2007) 21:231–238.[Web of Science][Medline]
  7. Kohler KA, McClellan WM, Ziemer DC, et al. Risk factors for microalbuminuria in black Americans with newly diagnosed type 2 diabetes. Am J Kidney Dis (2000) 36:903–913.[Web of Science][Medline]
  8. Verhave JC, Hillege HL, Burgerhof JG, et al. PREVEND Study Group: cardiovascular risk factors are differently associated with urinary albumin excretion in men and women. J Am Soc Nephrol (2003) 14:1330–1335.[Abstract/Free Full Text]
  9. Glantz SA, Slinker BK. A Primer of Applied Regression and Analysis of Variance, 2nd edn. (2001) New York: McGraw-Hill.
  10. Heine GH, Reichart B, Ulrich C, et al. Do ultrasound renal resistance indices reflect systemic rather than renal vascular damage in chronic kidney disease? Nephrol Dial Transplant (2007) 22:163–170.[Abstract/Free Full Text]
  11. Tonelli M, Sacks F, Pfeffer M, et al. Biomarkers of inflammation and progression of chronic kidney disease. Kidney Int (2005) 68:237–245.[CrossRef][Web of Science][Medline]
  12. Rothman KJ. Controlling confounding by stratifying data. In: Epidemiology: An Introduction. (2002) Oxford: Oxford University Press. 144–167.
  13. Winkelmayer WC, Kurth T. Propensity scores: help or hype? Nephrol Dial Transplant (2004) 19:1671–1673.[Free Full Text]
  14. Rothman KJ. Measuring interaction. In: Epidemiology: An Introduction. (2002) Oxford: Oxford University Press. 168–180.
  15. http://www.statsoft.com/textbook/stmulreg.html.
  16. Hosmer DW, Lemeshow S. Special topics. In: Applied Logistic Regression. (2000) New York: Wiley. 339–351.
  17. Altman DG, Machin D, Bryant TN, et al. Statistics with Confidence, 2nd edn. (2000) London: BMJ Books.
  18. Robins J. The control of confounding by intermediate variables. Stat Med (1989) 8:679-7-1.
  19. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology (2000) 11:550–560.[CrossRef][Web of Science][Medline]
Received for publication: 31. 7.07
Accepted in revised form: 4.10.07


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nephrol Dial TransplantHome page
C. M. Soares, J. S. S. Diniz, E. M. Lima, G. R. Oliveira, M. R. Canhestro, E. A. Colosimo, A. C. S. e Silva, and E. A. Oliveira
Predictive factors of progression to chronic kidney disease stage 5 in a predialysis interdisciplinary programme
Nephrol. Dial. Transplant., March 1, 2009; 24(3): 848 - 855.
[Abstract] [Full Text] [PDF]


Home page
Nephrol Dial TransplantHome page
P. Ravani, P. Parfrey, V. Gadag, F. Malberti, and B. Barrett
Clinical research of kidney diseases V: extended analytic models
Nephrol. Dial. Transplant., May 1, 2008; 23(5): 1484 - 1492.
[Full Text] [PDF]


Home page
Nephrol Dial TransplantHome page
P. Ravani, P. Parfrey, S. Murphy, V. Gadag, and B. Barrett
Clinical research of kidney diseases IV: standard regression models
Nephrol. Dial. Transplant., February 1, 2008; 23(2): 475 - 482.
[Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
22/12/3422    most recent
gfm777v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Ravani, P.
Right arrow Articles by Barrett, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ravani, P.
Right arrow Articles by Barrett, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?