Skip Navigation


NDT Advance Access originally published online on January 8, 2008
Nephrology Dialysis Transplantation 2008 23(2):475-482; doi:10.1093/ndt/gfm880
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/2/475    most recent
gfm880v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Ravani, P.
Right arrow Articles by Barrett, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ravani, P.
Right arrow Articles by Barrett, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author [2008]. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org



Clinical research of kidney diseases IV: standard regression models

Pietro Ravani1, Patrick Parfrey2, Sean Murphy2, Veeresh Gadag3 and Brendan Barrett2

1Divisione di Nefrologia e Dialisi, Azienda Instituti Ospitalieri di Cremona, Cremona, Italy, 2Clinical Epidemiology Unit and 3Division of Community Health and Humanities, Faculty of Medicine, Memorial University of Newfoundland, Canada

Correspondence and offprint requests to: Pietro Ravani, Divisione di Nefrologia, Azienda Istituti Ospitalieri di Cremona, Largo priori 1, Cremona 26100, Italy. E-mail: pietro.ravani{at}med.mun.ca

Keywords: Generalized linear models; linear regression; logistic regression; Poisson regression; survival analysis



   Introduction
 Top
 Introduction
 Generalized linear models
 Models for time-to-event data
 Appendix A
 Appendix B
 Appendix C
 Acknowledgements.
 References
 
Statistical modelling is similar to the engineering concept of the study outcome being a mixture of signal and noise. For example, the signal of a model of left ventricular mass (LVM) as a function of systolic blood pressure (SBP) [1] is the average change in LVM as SBP changes (systematic component). The noise is what remains to be explained of LVM variability once the effect of SBP has been taken into account (random component). Statisticians assess the characteristics of these two elements in different ways, to establish whether a model is appropriate [2].

The present review introduces two popular families of standard regression models: generalized linear models and models for time-to-event data. The conditions that make each model appropriate are summarized along with the epidemiological meaning of its coefficients (parameters). The interested reader is referred to specific textbooks for details on model specification and assumption verification methods [3–8].



   Generalized linear models
 Top
 Introduction
 Generalized linear models
 Models for time-to-event data
 Appendix A
 Appendix B
 Appendix C
 Acknowledgements.
 References
 
Generalized linear models form a large family of ‘parametric’ models. Parametric models estimate population characteristics (parameters) based on assumptions about the shape of the input–output relationship and the distribution of the residuals [2]. The chosen model must satisfy these assumptions. For example, in the linear model of LVM [1] the input–output relationship is linear and the residuals are normally distributed around the fitted line [2]. The linear model is a crucial member of this family as all generalized linear models can be viewed as extensions of the linear model. In fact, they all contain a linear function of the inputs in their structure or linear predictor, LP [2]. Two other attributes common to these models are the presence of a ‘link function’ and a specific error distribution (Figure 1). Some practical examples may help grasp the meaning of these concepts.


Figure 1
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Generalized linear models. The model choice is based on the type of response y (e.g. continuous, binary, counts) and error distribution. Errors (residuals) are deviations from the expected y values (second column, continuous line). Each model has a specific invertible function (grey), linking the expected response Figure 1 to the linear predictor (LP = β0 + β1x1 + β2 x 2 + ... + βkxk). LP contains the estimated input effects (βs) whose meaning depends on the model function. All these functions are like ‘machines’ transforming the inputs (xs) into LP [2]. Each machine can work in the opposite direction estimating Figure 1 from LP (inverse). For example, linear models (top) are used for continuous responses with normally distributed residuals (e.g. LVM); they have an identity link function and the coefficients are differences in Figure 1 (average change) per unit change in the inputs (e.g. effect of SBP on LVM); using the inverse, predictions about future observations (LVM values) can be obtained from the model (i.e. individual SPB value and estimated SBP effect). A logit function (middle) can be used to link the expected presence versus absence of a disease to the LP and its inverse (logistic function) allows estimating expected probabilities of future outcomes from LP. In Poisson regression (bottom) the natural log links the expected counts (e.g. number of deaths or hospitalizations) to the LP, its inverse is the exponential function that allows estimating expected incidence rates based on LP.

 
Linear model for quantitative continuous responses
Linear regression is appropriate to model continuous response variables (Table 1). For example, in a cohort study of chronic kidney disease progression and patient survival, glomerular filtration rate (GFR) was inversely related to asymmetrical dimethylarginine (ADMA) levels at baseline, being on average 0.17 mL/min per 1.73 m2 lower per 0.1 µmol/L of ADMA [9]. This inverse relationship suggests that one variable tends to change in the opposite direction of the other, although only 48% of the change in GFR was explained by the systematic component of the multivariable model (R2 statistics) [2].


View this table:
[in this window]
[in a new window]

 
Table 1. Characteristics and conditions of validity of the general linear model [2]

 
The systematic component of this model has a linear shape, i.e. a line describes the input–output relationship. The estimated parameters are the intercept (‘β0’) and the effects (e.g. change in GFR) associated with the predictor (e.g. ADMA) and other input variables in the model (‘βk’). The response is modelled as identity function of LP (Figure 1). The random component [2] is normally distributed around the fitted line. This is summarized by an unexplained variability of GFR as high as 52% in the ADMA study [9]. Examples of the different prediction ability of the same set of covariates for five different cardiac outcomes are described in the Multiethnic Study of Atherosclerosis. In this study the R2 statistics decreased from almost 60% for the model of LVM to <20% for the model of LV ejection fraction [1]. However, the R2 statistics is commonly used to compare models with the same response and fitting the same set of observations (nested models) [2]. The ‘best’ model (i.e. predictors to include confounding and interaction terms as well as possible transformations to consider) is selected considering the best fit in terms of improvement of the R2 statistics (i.e. reduction of the residual variance).

Linear models can include one (e.g. simple linear regression, t-test, one-way ANOVA) or more inputs (e.g. multiple linear regression, two-way ANOVA, ANCOVA).

The regression coefficients of the linear model estimate the average change in the output per unit change in each input. Therefore they are ‘differences’ in the average response by level or unit of exposure. For example, in a study of renal resistance indices in pre-dialysis [10], age, GFR and diabetes were inputs of the final model. The coefficient associated with diabetes (5.59, 95% CI 2.1, 9) indicates that diabetics had on average 5.5 higher values of the response. Resistance indices tended to be higher in older subjects (0.18 per year of age) and lower in those with more preserved GFR (–0.07 per mL/min). The P-value associated with each estimate is the probability of falsely rejecting the null hypothesis that the coefficient is zero [11].

Logistic model for qualitative responses
Logistic regression is appropriate when the response variable is a binary outcome that is either present or absent, such as the presence of a disease in a survey or the occurrence of an event in a prevention study [12]. Researchers are interested in identifying factors associated with the probability (or risk) that an event happens.

The characteristics of logistic regression may be more easily appreciated by showing why linear regression cannot be used to model probabilities. For example, plotting heart disease status (present/absent, y) over age (in years, x) makes all observations fall on one of two possible values (Figure 2, left). This plot shows the binary nature of the response and suggests the existence of an association, as younger individuals tend to fall on the bottom line of no disease. However, the large variability of the response at all ages does not allow appreciation of the relationship [13]. Some variability can be removed by categorizing the input and plotting the mean output value (proportion) over the input levels (Figure 2, right). However, as compared to quantitative responses, the shape of the relationship is sigmoid rather straight, as the conditional mean (probability) is confined between 0 and 1, approaching 0 and 1 ‘gradually’. The logistic model rather than the linear model is appropriate in these cases as the logistic function transforms a continuous variable ranging between +{infty} and –{infty} (LP) into a response ranging from 0 to 1 [2].


Figure 2
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Study of the presence or absence of a disease as a function of age. The first scatter-plot uses the original measurements on 100 study subjects (left). The values of the response lie on two lines indicating the presence (y = 1) or absence (y = 0) of the disease. However, subjects tend to be younger when y = 0 (older when y = 1). Since within the age values subjects may be on either line (high variability) the possible relationship cannot be easily appreciated. The second curve represents the proportion of subjects with the disease over age category midpoints (right). The relationship between the two variables is more clearly appreciated, but the curve is sigmoid rather than straight [13].

 
The logistic function of LP represents the expected risk of disease for a given combination of inputs (Table 2). For example, if the coefficient of male gender is 0.3, the transformed coefficient, exp(0.3) = 1.35, is the OR associated with each level increase of input. This means that each unit increase of the predictor is associated with 35% higher odds for disease (e.g. men versus women). If the coefficient has a negative sign, the predictor is associated with reduced odds. For example, if the coefficient of serum albumin in grams per litre is –0.2, the OR of 0.81 implies a 19% reduction in the odds per each grams-per-litre increase in serum albumin. As the OR is a symmetric effect measure (Table 2), logistic regression is the model of choice in case control designs where subjects are selected retrospectively based on disease status. For risk estimation in prospective longitudinal studies Poisson and Cox's regressions are the methods of choice.


View this table:
[in this window]
[in a new window]

 
Table 2. Meaning of the coefficients in logistic regression

 
Poisson model for quantitative discrete responses
Poisson regression is appropriate when the individual risk for an event is small and constant but the number of individuals is large, and thus the total number of events is considerable. The outcome variable is a count of independent events, such as the number of deaths, over a period of time at risk. The principal covariate in the model is the exposure time, which is recorded for each observation or aggregated data.

Poisson regression does not model risks but rates. Risks (event count/person during a specified period of time) are dimensionless and range from 0 to 1. Rates ({lambda} = event count/person-time) have the dimension of 1/time and range from 0 to +{infty}. Risks can be estimated directly in short studies where subject follow-up is approximately complete [12]. Rates are estimated in longer studies because as the study duration increases fewer subjects have complete follow-up [14]. Rates treat one time unit as equivalent to another, regardless of which individual they come from (e.g. one person observed for 10 years and another for 20 years would contribute for a total of 30 person-years of follow-up or 30 persons per unit time). Depending on the chosen time unit, the same rate can have different numerical values and can exceed 1 (100%). For example, if eight cases occur among 36 subjects in 1 month, then the same rate can be expressed as 0.22 cases per person-month or 2.66 cases per person-year.

Incidence rates can be used to estimate risks. If the underlying risk is constant and small (e.g. <0.2) it can be estimated as the product of the estimated rate and the observation time. For example, if 1000 subjects are followed for 10 years and experience a mortality rate of 0.01 per person-year (0.01/year), the risk can be estimated as 0.01 * 10 or 0.1 over 10 years (each individual has a probability of 10% to die in 10 years). However, as deaths occur over time the same mortality applies to a steadily smaller population at risk. Since this population shrinking is neglected in the calculation, the risk approximation of the incidence rate does not work well for high risks or very long time durations [Table 3).


View this table:
[in this window]
[in a new window]

 
Table 3. Meaning of the coefficients in Poisson regression

 
Finally, rate estimates are unaffected by the precision of the measurement time scale. If three deaths are observed in 10 subjects and their exposure times are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 months (bands of 1/12 = 0.0833 years duration), the event rate is 3/(55 * 0.0833) = 0.655 per year = 655 deaths per 1000 person-years. Had the data been updated every day, then the length of the band would have been 1 day and there would have been 55 * (365.25/12) = 1674 bands (units of n) of 0.002737 year duration. However, the rate would have been the same, 3/(1674 * 0.002737) = 0.655.



   Models for time-to-event data
 Top
 Introduction
 Generalized linear models
 Models for time-to-event data
 Appendix A
 Appendix B
 Appendix C
 Acknowledgements.
 References
 
Survival data
In many clinical studies, the main outcome under assessment is the time to an event of interest. This time is called survival time, although it may be applied to the time ‘survived’ from complete remission to disease relapse or progression as equally as to the time from diagnosis to death. For appropriate outcome measurement and analysis in survival studies it is especially important to define precisely the event and when the period of observation starts and finishes. For example, in studies of survival post-myocardial infarction, time is recorded from a starting point (time zero, e.g. the date of diagnosis of myocardial infarction), and the observation continues for each subject until either a recurrent fatal or non-fatal event occurs, the study ends, or further observation becomes impossible.

Key requirements for survival analysis
A critical aspect of survival analysis is that some individuals have not had the event of interest at the end of follow-up. Therefore, their true time to event remains unknown. This phenomenon is called censoring. It may arise because a patient (a) has not (yet) experienced the outcome event by the study close date; (b) is lost to follow-up during the study period (e.g. due to transfer to another centre or for consent withdrawal) or (c) experiences another (competing) event that makes further follow-up impossible (e.g. heart transplantation, a new health problem or even a car accident). Censored observations are those who survived at least as long as they remained in the study but for whom their actual event-free survival times are not known exactly. Such right-censored survival times underestimate the true (but unknown) time to event. If the event occurred in all individuals, other methods of analysis would be applicable. However the presence of censoring and distribution of the failure times make survival analysis necessary for time-to-event data [6,7].

The analytical tool used to study survival data assumes that if censoring occurs it occurs randomly and is unrelated to the reason for failure (independent censoring principle). In practical terms, this means that censoring must carry no prognostic information about the subsequent survival experience. This uninformative censoring assumption would be violated if subjects were highly likely to leave the study just prior to failure or dropout rates between groups were differential. Other key requirements for a valid survival study are follow-up duration based on disease severity (sufficient to capture enough events), homogeneous cohort effect on survival (similar survival probabilities for subjects recruited early and late in the study) and independence of the failure times (absence of correlation in the data).

Functions of time-to-event data
Survival data are generally described and modelled in terms of three related functions, namely the survivor, the hazard and the cumulative hazard functions. They are different functions of the LP meant to summarize the information on the outcome components described above (time zero, end date and censor status) in one response variable. The survival probability (cumulative survival probability or survivor function) is the probability (from 1 at t = 0 to 0 as time goes to infinity) that an individual survives from time zero up to a specified future time t (observation end). Survival probabilities at different times provide essential summary information from time to event data. For example, a survivor function of 0.85 at 2.5 years informs that 85% of the subjects (observed from t = 0) are still event free at 2.5 years (risk of 0.15 at 2.5 years). The hazard is the instantaneous probability that an individual who is under observation at time t has an event at that time. So hazard is a rate, i.e. a probability over a time interval, though very small. Put another way, it gives the instantaneous potential for the event to occur, given that the subject has survived up to that instant (conditional rate). In contrast to the survivor function, which can only decrease over time, the hazard function can remain constant or vary with different shapes over time. The hazard is like a speed, with the risk of failure over time instead of distance covered over time, and may assume different values over time (from 0 to +{infty}) independent of the average value calculated in an interval. Despite the defined relationship between survival and hazard functions, estimation of the hazard is not simple. Another quantity, the cumulative hazard, is calculated instead as an intermediary measure for estimating the hazard. The cumulative hazard at t is the integral of the hazard (area under the hazard function between times 0 and t). To understand the concept it is useful to go back to the speed example. If a person faces a hazard rate of death of 0.1 event per hour (a speed of 0.1 miles/h), then the cumulative hazard is such that were that rate to continue for 2 days (the speed constantly at 0.1 mph) 4.8 failures would be expected to occur (4.8 miles travelled) in 2 days. Author: Please check whether the edit made in the sentence ‘If a person faces a hazard rate of death...’ retains your intended sense. Since an integral is indeed just a sum, a cumulative hazard is not unlike the total number of times the subject ‘would fail’ over the interval period (cumulative force of mortality).

To compare hazards, survival functions or times across groups, there are different approaches more or less free from specific distributional assumptions about the hazard function (Table 4). Furthermore, some parametric models have an accelerated failure time metric, i.e. the estimated coefficients (the covariate effects) are interpretable as log-time ratios and some have both the proportional hazard and the log-time interpretation. The two interpretations are different. The proportional hazard metric focuses on the actual risk process (the hazard function) that causes failure and how the risk changes with the value of the covariates in the model. The accelerated failure time metric gives a more prominent role to time in the analysis (how the survival time changes with the value of the covariates in the model). Despite their clinical appeal, these parametric models remain underutilized in nephrology.


View this table:
[in this window]
[in a new window]

 
Table 4. Forms of survival analyses

 
Poisson regression can also be used to assess risks in survival studies. However, Poisson regression models rates, which are assumed to be low and constant (with variance equal to the mean). Poisson regression and other generalized linear models can be applied to model event counts over a specified time interval (events per person-year) using aggregated data or to simplify complex survival models [6,7].

Cox's model
Cox's model is by far the most commonly used survival procedure [8]. It is a semi-parametric model since it formulates the analysis of survival data where no parametric form of the hazard function (output) is specified and yet the effects of the covariates (inputs) are parameterized (i.e. modelled based on assumptions) to alter the baseline hazard function (the hazard for which all covariates are equal to zero). Cox's model makes estimation possible assuming that the covariates multiplicatively shift the baseline hazard (Figures 3 and 4).


Figure 3
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Hazards proportionality. The individual hazard at time t (i.e. at any time during follow-up) given the exposure ‘X’ is a function of the baseline hazard ({lambda}0) and the hazard ratio (HR) associated with each unit change of the input, i.e. {lambda}(t) = {lambda}0(t) times HR. This HR is estimated as ‘exp(β)’. As the estimated coefficients are constant, and constant differences on the log scale (LP1 – LP0) correspond to constant ratio on the exponential scale, the model assumes proportional hazards. If more inputs are in the model each HR is adjusted for the effect of all the other independent variables. See Appendix C for details.

 

Figure 4
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Stratified Cox's model. There can be groups with different baseline hazard {lambda}0(t), such as race groups (e.g. race A and B). If the effect on survival of race A versus B is not of interest (does not need to be estimated) but requires to be controlled for, a stratification variable can be used to specify the model (e.g. with possible values A and B). LP remains the same (i.e. contains the same predictors, such as gender) but the model allows the basal risk to vary, i.e. {lambda}A0(t) != {lambda}B0(t). The difference in LP (β) between two groups of subjects (e.g. men = 1 and women = 0) in terms of hazard (e.g. for cardiovascular event) is still constant (hazard proportionality), and is the same in both strata. However, as {lambda}A0(t) != {lambda}B0(t), also {lambda}A(t) != {lambda}B(t). See Appendix C for details.

 
Besides the ease of coefficient interpretation, freedom from distributional assumption is the greatest advantage of Cox's regression. The cost is a loss of efficiency (precision) since the parameters are estimated comparing subjects at the times when failures happen to occur whereas parametric models maximize the use of the information in the data.

As seen from previous models, differences in logs imply taking the exponential to interpret the meaning of the coefficients. The exponentiated coefficient represents the ratio of the hazards or HR between two levels or units of exposure.

Hazard ratios estimate the true risk ratios as ratios of instantaneous event rates. In fact, as with rates, HR is an instantaneous RR, the limiting value for the RR as time approaches zero. As time approaches zero, the risks also approach zero. However, the value of HR is different from zero and approaches that of the true RR. In survival analysis, the incidence rate ratio is the limiting value for the RR as time approaches zero. Further details on survival analysis can be found in specific textbooks [6–8].



   Appendix A
 Top
 Introduction
 Generalized linear models
 Models for time-to-event data
 Appendix A
 Appendix B
 Appendix C
 Acknowledgements.
 References
 
Consider the data in Table 2.

The null model of myocardial infarction (ignoring the exposure x) is Logit({pi}) = log-odds = ln[{pi}/(1–{pi})] = LP = β0 = ln(40/60) = –0.405. The logistic function of β0 gives the unconditional risk estimate {pi} = Logistic(LP) = 1/(1+e–LP) = 0.4.

The full model considering x is Logit({pi}|xi) = β0* + βixi, where:

1. The new intercept [β0* = –0.405 + ln(LR)] = –1.386 is the log-odds among unexposed. In fact post-test odds = pre-test odds times likelihood ratio, LR (post-test log-odds = pre-test log-odds + log-likelihood ratio). In the example, LR = [c/(a + c)]/[d/(b + d)] = 0.375; LR+ = [a/(a + c)]/[b/(b + d)] = 2.25.
2. The coefficient of the exposure (βx = 1.8) is the log-odds ratio (log-OR). In fact, when x = 0, the odds for disease are exp(–1.386 + 0) = 0.25 (as LP = β0* + 0); when x = 1 the odds are exp(–1.386 + 1.8) = 1.5 (as LP = β0* + βx). The corresponding logistic functions give the conditional risks {pi}0 = 1/(1+e1.386) = 0.2 and {pi}1 = 1/(1+e1.386–1.8) = 0.6. Therefore βx represents the difference in log-odds per unit change in the level of the exposure (log-OR) and the exponentiated βx gives the OR of exposed versus unexposed (per unit increase of x).
3. The model parameters are estimated using the maximum likelihood method [2]. Of note, the OR is a symmetric effect measure as the OR for disease by exposure level ([30/20]/[10/40]) is equivalent to the OR for exposure by disease status ([30/10]/[20/40]).



   Appendix B
 Top
 Introduction
 Generalized linear models
 Models for time-to-event data
 Appendix A
 Appendix B
 Appendix C
 Acknowledgements.
 References
 
Consider the data in Table 3.

The expected rate per person per unit time ignoring the exposure, is the exponentiated coefficient (intercept) of the null model ln({lambda}) = LP = β0. Using the inverse function, the unconditional event rate is {lambda} = exp(β0) = 0.01410096 per person-year.

To estimate the effect of gender on mortality, the covariate xi indicating male (i = 1) or female gender (i = 0) is introduced into the model. The pieces of information needed for this model are ni, the group exposure time, and di, the number of deaths per group. The expected number of deaths is E(d1| x1) = n1{lambda}1 in males and E(d0| x0) = n0{lambda}0 in females. Since the incidence rate ratio is IRR = {lambda}1/{lambda}0, then ln[E(d1| x1)] = ln[{lambda}1] + ln[n1], and ln[E(d1| x1)] = ln[n1] + ln[IRR] + ln[{lambda}0].

Renaming ln[IRR] = βx and ln[{lambda}0] = β0, the Poisson model is ln[E(d1| x1)] = ln[n1] + β0 + βx, and more generally ln[E(di| xi)] = ln[ni] + β0 + xiβx.

Therefore, the meaning of the coefficient βx is βx = ln[IRR] = ln[{lambda}1] – ln[{lambda}0] and the IRR is estimated as exp(β). The model parameters are estimated using the maximum likelihood method [2].



   Appendix C
 Top
 Introduction
 Generalized linear models
 Models for time-to-event data
 Appendix A
 Appendix B
 Appendix C
 Acknowledgements.
 References
 
Consider Figures 3 and 4.

The individual hazard at time t given the exposure ‘X (x1, x2, x3, ..., xq) is {lambda} = (t|X) = {lambda}(t) * exp(LP), where exp(LP) = HR, the hazard ratio. This means that the hazard experienced at any time during follow-up {lambda}(t) depends on the basal hazard {lambda}0(t) and the HR. In fact, the ratio {lambda}(t|x1)/ {lambda}(t|x0) is {lambda}0(t) * exp(LP1)/ {lambda}0(t) * exp(LP0), which taking the logs is simply the difference LP1 – LP0. As this difference remains constant over time (i.e. it does not involve t), the hazards are proportional on the exponential scale (equidistant on the log scale).

This is true also in the stratified model {lambda}k = (t|X) = {lambda}k0(t) * exp(LP), where ‘k’ indicate the stratum.



   Acknowledgements.
 Top
 Introduction
 Generalized linear models
 Models for time-to-event data
 Appendix A
 Appendix B
 Appendix C
 Acknowledgements.
 References
 
P.R. held a young investigator award from the Italian Society of Nephrology for the year 2005–2006 and received funding from the EU (Marie Curie Actions-OIF, proposal #021676) for the year 2006–2007.

Conflict of interest statement. None to declare.



   References
 Top
 Introduction
 Generalized linear models
 Models for time-to-event data
 Appendix A
 Appendix B
 Appendix C
 Acknowledgements.
 References
 

  1. Heckbert SR, Post W, Pearson GD, et al. Traditional cardiovascular risk factors in relation to left ventricular mass, volume, and systolic function by cardiac magnetic resonance imaging: the Multiethnic Study of Atherosclerosis. J Am Coll Cardiol (2006) 48:2285–2292.[Abstract/Free Full Text]
  2. Ravani P, Parfrey P, Gadag V, et al. Clinical research of kidney diseases III: principles of regression and modeling. Nephrol Dial Transplant (2007) 22:3422–30.[Free Full Text]
  3. Glantz SA, Slinker BK. A Primer of Applied Regression and Analysis of Variance. (2001) 2nd edn. New York: McGraw-Hill.
  4. Hosmer DW, Lemeshow LS. Applied Logistic Regression. (2000) 2nd edn. New York: Wiley.
  5. Kleinbaum DG, Kupper LL, Muller KE, et al. Poisson regression analysis. In: Applied Regression Analysis and Multivariable Methods. (1997) North Scituate, MA: Duxbury Press. 687–710.
  6. Hosmer DW, Lemeshow LS. Applied Survival Analysis, Regression Modelling of Time to Event Data. (1999) New York: Wiley.
  7. Kleinbaum DG. Survival Analysis, a Self-Learning Text. (2005) New York: Springer.
  8. Cox DR. Regression models and life-tables. J R Stat Soc B (1972) 34:187–220.
  9. Ravani P, Tripepi G, Malberti F, et al. Asymmetrical dimethylarginine predicts progression to dialysis and death in patients with chronic kidney disease: a competing risks modeling approach. J Am Soc Nephrol (2005) 16:2449–2455.[Abstract/Free Full Text]
  10. Heine GH, Reichart B, Ulrich C, et al. Do ultrasound renal resistance indices reflect systemic rather than renal vascular damage in chronic kidney disease? Nephrol Dial Transplant (2007) 22:163–170.[Abstract/Free Full Text]
  11. Ravani P, Parfrey PS, Dicks E, et al. Clinical research of kidney diseases II: problems of study design. Nephrol Dial Transplant (2007) 22:2785–2794.[Free Full Text]
  12. Merten GJ, Burgess WP, Gray LV, et al. Prevention of contrast-induced nephropathy with sodium bicarbonate: a randomized controlled trial. JAMA (2004) 291:2328–2334.[Abstract/Free Full Text]
  13. Hosmer DW, Lemeshow LS. Introduction to logistic regression model. In: Applied Logistic Regression. (2000) 2nd edn. New York: Wiley. 1–30.
  14. Ravani P, Parfrey PS, Curtis B, et al. Clinical research of kidney diseases 1: researchable questions and valid answers. Nephrol Dial Transplant (2007) 22:2459–2468.[Free Full Text]
  15. Rothman KJ. Measuring disease occurrence and causal effects. In: Epidemiology, an Introduction.—Rothman KJ, ed. (2002) New York: Oxford University Press. 24–56.
  16. Dupont WD. Introduction to Poisson regression: inferences on morbidity and mortality rates. In: A simple introduction to the analysis of complex data. (2002) New York: Cambridge University Press. 269–294.
Received for publication: 21.10.07
Accepted in revised form: 19.11.07


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nephrol Dial TransplantHome page
P. Ravani, P. Parfrey, V. Gadag, F. Malberti, and B. Barrett
Clinical research of kidney diseases V: extended analytic models
Nephrol. Dial. Transplant., May 1, 2008; 23(5): 1484 - 1492.
[Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/2/475    most recent
gfm880v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Ravani, P.
Right arrow Articles by Barrett, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ravani, P.
Right arrow Articles by Barrett, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?