NDT Advance Access published online on September 22, 2007
Nephrology Dialysis Transplantation, doi:10.1093/ndt/gfm560
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Prediction of early progression in recently diagnosed IgA nephropathy
1Division of Nephrology, 2Department of Pathology, Stanford University School of Medicine and 3Department of Statistics, Stanford University, Stanford, CA 94305, USA
Correspondence and offprint requests to: Kevin V. Lemley, MD, PhD, Division of Nephrology, MS#40, Childrens Hospital Los Angeles, 4650 Sunset Boulevard, Los Angeles, CA 90027, USA. Email: klemley{at}chla.usc.edu
| Abstract |
|---|
|
|
|---|
Background. Most studies of prognosis in IgA nephropathy (IgAN) have tried to predict dichotomous outcomes based on a small number of clinical or semi-quantitative histological variables in large numbers of patients.
Methods. We pursued a quite different approach. We measured GFR annually for 4–5 years in 22 adult patients with recently diagnosed IgAN. Quantitative morphology was performed on the diagnostic biopsy specimens and baseline glomerular filtration dynamics were performed at study entry. An initial set of 30 plausible predictor variables (half demographic or physiological, half structural) was reduced to 22 using phylogenetic trees. Least-angle regression (LARS) was used to predict the rate of GFR change from these variables
Results. The rate of GFR change ranged from a loss of 41 ml/min/year to a gain of 8.6 ml/min/year. We found an optimum predictor set of five baseline variables: the percentage of glomeruli with global sclerosis, the fractional interstitial area, the serum creatinine, the average tuft volume of non-sclerotic glomeruli and the renal plasma flow.
Conclusions. The strong predictive relationship of the three structural variables with the slope of GFR in our subjects suggests that even at the time of their initial diagnosis many patients with IgAN already manifest a remnant kidney phenomenon. The distinctive pathophysiological insights derived from this study suggest some of the advantages of intense quantitative investigations applied to a small number of subjects.
Keywords: GFR; glomerular sclerosis; IgA nephropathy; least-angle regression; morphometry; progression
| Introduction |
|---|
|
|
|---|
IgA nephropathy (IgAN) is the most common form of glomerular nephritis among children and young adults [1,2]. The disease is clinically heterogeneous and slowly progressive in most cases, leading to kidney failure in 15–20% of patients over 10 years and in 20–30% over 20 years.
The slow rate of progression of IgAN makes studies of the disease and its response to treatment difficult. Since some forms of therapy (e.g. long-term steroids) are associated with significant morbidity, it would be useful to be able to predict the prognosis of individual patients, so as to avoid exposing those with a good prognosis to unnecessary aggressive therapy. In addition, selection of patients for clinical trials would be enhanced, if individuals with a high likelihood of long-term stable renal function could be effectively excluded.
Numerous attempts have been made to define factors that predict those patients expected to have poor outcomes. DAmico [3,4] identified azotemia, heavy proteinuria and the presence of glomerular sclerosis and tubulointerstitial fibrosis at presentation as predictors of loss of renal function. Fofi and colleagues [5] have also described increased blood pressure, increased age and the presence of tubulointerstitial lesions as predictors of impaired renal function. In a study by Alamartine and coworkers [6], a Cox regression model indicated only a global pathology score, initial proteinuria, initial hypertension and HLA-B35 antigen as predictors of chronic renal failure. Multivariate prediction analyses from different studies have generally been based on a small number of clinical or semi-quantitative histological variables. It is possible that use of more detailed physiological and histological variables might yield both better predictors and more insight into the primary mechanisms responsible for progression. For example, relationships between podocyte number per glomerulus and both the rate of progression [7] and the severity of glomerular sclerosis [8] have been described in studies of IgAN. Quantitative histological features such as podocyte number have not in general been included in previous predictive studies.
Dichotomous survival analyses require long observation times in order to determine progression because of the nature of the measures of changes in renal function used, mostly doubling of serum creatinine or the development of end-stage kidney failure. An alternative approach is the use of serial measurements of true GFR to estimate the rate of change of GFR as a continuous outcome variable.
Experimental designs in which the number of predictors is large compared to the total number of outcomes run the risk of overfitting of the statistical model. Overfitted models tend to reflect the idiosyncratic (noisy) structure of the data sets from which they were developed, to the detriment of their ability to capture the true pathophysiological relationships present within the underlying population. The present study therefore uses the novel statistical methodology of least-angle regression (LARS [9]) to predict the rate of loss of GFR from an expanded set of quantitative structural, demographic and physiological predictors measured at baseline. The LARS method is robust against problems of overfitting that may occur with standard multivariate methods, including the Cox proportional hazards model.
Our long-term goal is to characterize the principal pathophysiological factors responsible for disease progression in IgAN by determining important predictors of outcome. The current study investigates whether the use of measured GFR, a large set of quantitative predictors and a relatively small number of subjects with IgAN can yield a reasonable predictive model of disease progression over the short to intermediate term.
| Subjects and methods |
|---|
|
|
|---|
Patient population
The study subjects were 22 adult patients (11 men, 11 women) from 23 to 61 years of age diagnosed with IgAN by biopsy within 2.5 years (median 6 months) of enrollment in the study. The subjects were enrolled in an on-going study of IgAN after referral to Stanford University Medical Center. Self-described ethnicity was Hispanic in three, Asian in seven and Caucasian in 12 subjects. Clinical findings consistent with IgAN were first noted at a median of 17 months prior to biopsy. Other causes of IgA-positive glomerular staining, such as Henoch–Schönlein nephritis, lupus nephritis or chronic active hepatitis, were excluded. Each patient was put on renoprotective therapy with an angiotensin-converting enzyme inhibitor. Because of the development of a cough, three subjects were later switched to an angiotensin receptor blocker. Six patients were already taking fish oil supplements prescribed by their treating physicians at the time of their physiological studies. This therapy was continued throughout the study in these six individuals.
Control values for the structural analyses came from 17 normal healthy kidney donors who had renal biopsies taken at the time of organ donation. Control values for the renal clearance studies were derived from 117 healthy volunteers with no history of renal disease, hypertension or diabetes. The structural control subjects ranged from 23 to 55 years of age; physiological controls were between 23 and 60 years.
Patients and physiological controls were admitted to the General Clinical Research Center at Stanford University Hospital for their clearance studies. The study protocol was approved by the Stanford University Administrative Panel on Human Subjects in Medical Research. Written informed consent was obtained from all participants.
Study protocol
The IgAN patients underwent annual clearance studies (median number 6) over 4–5 years. Two subjects had a shorter period of follow-up (8 and 20 months) due to rapid progression to kidney failure.
Glomerular function
Glomerular filtration dynamics were examined using urinary clearances of inulin and PAH as previously described [10]. The renal plasma flow rate was derived by dividing the PAH clearance by the estimated arteriovenous extraction ratio for PAH (0.7 in those patients with a GFR <80 ml/min 1.73 m2 and 0.8 in patients with a GFR >80 ml/min 1.73 m2 [11]; for the healthy controls a ratio of 0.9 was used). Urine and plasma PAH and inulin concentrations were determined with an autoanalyzer technique [11]. Serum creatinine concentration was determined using a Beckman Creatinine Analyzer (Model 2; Beckman, Brea CA). Plasma oncotic pressure (
A) was determined by membrane osmometry [12]. Urine and plasma concentrations of albumin were determined by nephelometry (Array 360; Beckman, Brea CA).
Renal structure
The initial (diagnostic) renal biopsy was performed a median of 6 months (range 1–30 months) prior to the first clearance study. Biopsies were performed and tissue processed as previously described [10]. For one biopsy, performed at an outside institution, summary data were obtained from the pathology report.
The volume of glomeruli was estimated from the planar area of the glomerular tuft profiles using the equation [13]:
|
| (1) |
For transmission electron microscopy, photomontages were made of two complete glomerular profiles (magnification x2880) to determine the filtration surface area density, mesangial volume density, and podocyte and endocapillary cell densities using standard methods [10,13]. The density of endocapillary cells was estimated rather than that of mesangial and endothelial cells separately, since a clear distinction between these two cell types is not always possible in single sections. Higher-magnification prints (x11280) were used for the determination of the thickness of the glomerular basement membrane by the orthogonal intercept method of Jensen et al. [16], as well as the filtration slit frequency (the number of slits per millimeter of basement membrane length) [10]. Adequate material for ultrastructural analysis was not available for three subjects.
A mathematical model of glomerular filtration was used to estimate the total ultrafiltration capacity (Kf) of the two kidneys [17]. The model uses an assumed value for the average glomerular transcapillary hydraulic pressure difference of 40 mmHg, as this cannot be measured directly in humans [18]. The calculated Kf for this value is denoted as Kf-40. To the degree that the pressure differences exceed 40 mmHg, the model will overestimate Kf. Given the mild degree of our subjects hypertension, it seems likely that the actual pressure difference would be fairly close to 40 mmHg. The single-nephron ultrafiltration coefficient (SNKf) was calculated as the product of the intrinsic hydraulic permeability of the glomerular capillary wall and the total anatomic capillary surface area per glomerulus available for filtration obtained from the morphometric studies. The filtration surface area per glomerulus was corrected for the effects of paraffin embedding and for the decreased glomerular dimensions resulting from immersion fixation [14]. The hydraulic permeability was estimated using the hydrodynamic model of Drumond and Deen [19] as previously modified for use in humans [20]. An estimate of the total number of filtering glomeruli was made by dividing the two-kidney Kf-40 by the SNKf [17].
Statistics
Results are reported as mean ± SD or as median (range). Group differences between the baseline physiological values for IgAN subjects and controls were assessed by ANCOVA, using age as a covariate, since the age-distributions of the groups differed. Group differences in structural variables were assessed by unpaired t-test or Mann–Whitney test. Bifurcation of the IgAN subjects into two groups based on their progression rates was done using fuzzy clustering (NCSS, Kaysville UT), a method that allows a less extreme separation than the standard k-means procedure, by allowing the data points to participate to some extent in multiple clusters. This division with regard to rates of progression is not intended to define what constitutes clinically relevant progression, but rather to divide the subjects into two groups based on the natural structure of the data set itself. The clustering algorithm divided the subjects into those with rates of GFR decline greater than the average for the entire group (Group 1) and those with rates of decline less than the average (Group 2).
Our principal goal was to predict the rate of change of GFR during follow-up using a combination of demographic, functional and structural measurements made at baseline. The rate of change of GFR over time was determined by least squares linear regression of the absolute GFR (ml/min) on time. Absolute GFR was used to eliminate the influence of any changes in body weight during follow-up. In four cases, patients progressed to kidney failure during follow-up. In these patients, the value of 5 ml/min was imputed as the final GFR [21]. The slope was expressed as the negative of the regression line slope (ml/min/1000 days). Thus, slopes associated with a loss of GFR were positive. In order to avoid the distorting effects of extreme values, slopes were truncated to the interval [–20, 30]. Three slopes outside this range were truncated. From the potential list of over 40 measured variables, we selected a subset of 30 putative predictors, of which 15 were functional or demographic and 15 were quantitative structural variables. These were chosen on pathophysiological grounds and in order to include factors previously described to predict progression of IgAN. The percentage of glomeruli with crescents was not included in the prediction, as it did not differ between Group 1 and Group 2. Using a phylogenetic tree approach, eight variables (four functional/demographic and four structural) with high correlations to other variables were eliminated. The final analysis was performed using 22 potential predictors (Table 1). Some of the potential predictors are not independent. For example, the estimated number of glomeruli is the ratio Kf-40/SNKf, and the filtration fraction is the ratio GFR/RPF. However, our method of analysis is not strongly impacted by this.
|
Using standard multilinear regression methods for 22 predictor variables and 22 outcome measures runs a very high risk of overfitting. This is the reason we used least angle regression (LARS) for our analysis [9]. The LARS approach is an automated forward-selection type algorithm that protects against overfitting of data (see appendix).
The quality of predictions can be assessed by examining the correlation coefficient of the predicted with the actual GFR slopes. Since the model is assessed on its fit to the observed data set on which it was developed, however, this will generally yield an overly optimistic assessment of its performance [22]. Some form of cross-validation is needed for the error estimate. To achieve this, we performed 100 bootstrap replications [23] of the LARS procedure (see appendix).
Another means to assess the quality of our model prediction is based on proper assignment of individual patients into Groups 1 or 2, corresponding to above-average and below-average rates of loss of GFR, respectively. We determined the ability of the LARS-based model to predict whether or not the rate of loss of GFR would exceed 15 ml/min/1000 days, the cut-off value indicated by fuzzy clustering. As above, we used bootstrap replications to estimate a corrected error rate (see appendix). Since the particular value chosen to divide the subjects into two groups is somewhat arbitrary, we assessed the sensitivity of the predictive accuracy to the particular cut-off value used by estimating the error rate of assignment for GFR changes greater than 10, 12, 14, 16, 18 and 20 ml/min/1000 days.
We also compared the results of the LARS procedure with that of standard stepwise multilinear regression (S-PLUS, Insightful Corp., Seattle, WA).
| Results |
|---|
|
|
|---|
Baseline results
Our subjects reflected the typical clinical features of IgAN: variably decreased GFR, moderate arterial hypertension and moderate proteinuria (Table 2). Their diagnostic biopsies showed segmental and global glomerular sclerosis, expansion of the glomerular mesangium and interstitial fibrosis (Table 3). Most of the potential physiological predictor variables at baseline differed significantly from control values (Table 2). The potential structural predictor variables at baseline also differed from control values in most cases (Table 3). Only the number of podocytes (P = 0.10), the filtration slit frequency and the single-nephron ultrafiltration coefficient (SNKf) did not differ between the subjects with IgAN and controls. Crescents were present in four of eight subjects in Group 1 (in 3–7% of glomeruli) and in 5 of 14 subjects in Group 2 (in 4–28% of glomeruli). The discrepant depression of the 2-kidney ultrafiltration coefficient (Table 2) given the normal SNKf (Table 3) suggests functional glomerulopenia in the subjects with IgAN.
|
|
Change in GFR with time
The average rate of decrease in GFR was 14.5 mL/min/1000 days (42.7 ± 39.2 in Group 1 and –0.5 ± 12.0 in Group 2). Three outlying values (less than –20 and greater than 30) were truncated to this range for the statistical analysis. Linear regression coefficients for the fits to the serial GFR measurements averaged 0.61. Many of those with lower regression coefficients (<0.6) had only very small changes over time in GFR and were actually well approximated by linear fits (Figure 1). Excluding a small number of apparent outliers from the regression lines did not change the results of the LARS prediction.
|
Four of the subjects developed end-stage kidney failure during the 4–6 years of follow-up, two of these within 2 years. Linear regression fits to the GFR data are shown in Figure 2.
|
Predictors of change in GFR with time
The LARS procedure reached a minimum estimated true predictive error Cp (maximum adjusted R2) with five variables: the frequency of global sclerosis (g1), the fractional interstitial area (FIA), the serum creatinine, the average volume of patent glomerular tufts (VG) and the renal plasma flow (RPF). The adjusted R2 with these five predictors was 0.60. The variable g1 had substantial correlation with the other four predictors, so the LARS procedure was repeated with g1 left out of the possible predictor set. In this case, the maximum adjusted R2 was reached after the addition of six variables: FIA, serum creatinine, fractional clearance of albumin, average volume of patent glomerular tufts, RPF and GFR. Since the predictive capability of this model was no better than the original five-variable model including the frequency of global sclerosis, the original model was used in the remaining analyses.
The uncorrected correlation coefficient between the predicted and the observed values of GFR slope was 0.89. Using 100 bootstrap repetitions of the LARS procedure to estimate a more realistic correlation between modeled and observed values, yielded an average correlation of 0.78, with a standard error of 0.01. This corresponds to an R2 of 0.61, close to the maximum adjusted R2 (0.60) of the original five-variable LARS model.
The bootstrap repetitions of the LARS procedure (Table 4) show a clear tendency for the frequency of global sclerosis and the fractional interstitial area to be the most important predictors of GFR change, while the serum creatinine and glomerular tuft volume cluster in the second rank (orders 2 through 4) and renal plasma flow appears most often in the next rank order (4 through 7). Initial GFR occurred in all seven positions, with no clear concentration in any particular rank.
|
We also assessed the quality of the prediction in terms of proper dichotomous classification of individuals as belonging to either Group 1 (above average GFR loss) or Group 2 (below average GFR loss). Fuzzy clustering suggested a cut-off value to divide the rates of progression as a change in GFR of
14.5 ml/min/1000 days. Using this cut-off, the LARS procedure made two assignment errors out of 22 individuals, a 9% error rate. Bootstrap correction for the effects of overfitting gave a revised error rate of 17%. The corrected error rate varied only modestly (12.7–22.2%) for cut-off values ranging from 10 to 20 ml/min/1000 days. The baseline values of the predictor variables selected by LARS, either in the primary or bootstrap iterations, for Group 1 (n = 8) and Group 2 (n = 14) are given in Table 5.
|
Using standard stepwise multilinear regression on the same data set, we found a similar set of predictors with one notable exception: serum creatinine was not among the first 10 variables in the stepwise model. The first five variables entered in the stepwise regression were the frequency of global sclerosis (g1), the fractional interstitial area (FIA), renal plasma flow (RPF), the volume of patent glomerular tufts (VG) and the number of podocytes per glomerulus. Podocyte number entered the LARS model most often as the sixth variable in the 100 bootstrap repetitions (Table 4).
| Discussion |
|---|
|
|
|---|
IgAN is the most common primary glomerular nephritis worldwide among children and young adults [1,2]. Our subjects reflected the typical clinical features of IgAN: variably decreased GFR, arterial hypertension and moderate proteinuria (Table 2). Their diagnostic biopsies showed segmental and global glomerular sclerosis, expansion of the glomerular mesangium and interstitial fibrosis (Table 3).
Numerous attempts have been made to define those clinical and pathological findings that predict poor outcomes in IgAN patients. Most models have been based on a relatively small number of clinical or semi-quantitative histological variables often in a large number of patients. Most outcome studies have also been based on dichotomous survival analyses (for example, the Cox proportional hazards model) that require relatively long observation times in order to determine progression. The proportional hazards model is also subject to problems of overfitting [24].
In the current study, we used the rate of change of GFR derived from serial determinations of inulin clearance as our outcome measure. The rate of change of GFR has the advantage of requiring a shorter follow-up period to establish than dichotomous outcomes. As it reflects the rate of progression, it may also be more informative with regard to stage-specific mechanisms of progression. We also studied an extensive set of possible predictors, including a large set of quantitative structural (morphometric) variables—such as glomerular tuft volume and the number of podocytes per glomerulus [7,8]—as well as fundamental physiological variables, such as GFR, renal plasma flow and the fractional albumin clearance. Using more fundamental structural and functional predictors may allow a finer dissection of primary pathological processes. This approach, however, excludes the possibility of studying large numbers of patients due to the significant commitment of resources required for each subject. It thus entails the risk of overfitting the data, since there are a small number of outcomes for a comparatively large set of predictors. For this reason, we employed a novel statistical method—least-angle regression [9]—as an alternative to the usual multivariate methodologies.
Despite studying a relatively small number of subjects, our findings agree in large part with those of previous, larger studies [25]. We found that the incidence of global glomerular sclerosis, the degree of interstitial scarring, and the baseline serum creatinine were all significant multivariate predictors of declining GFR. In addition, we found significant roles for two variables seldom included as predictors of outcome in clinical studies of IgAN: glomerular tuft volume and renal plasma flow. Some factors that have been reported to be associated with progression (gender, age, hypertension) were notably absent from our set of predictors. In fact, the serum creatinine concentration was the only classic, clinical variable in the prediction set. The fractional albumin clearance (
alb)—which is probably a better representative of the filtered protein load than quantitative proteinuria—was a relatively strong factor in the predictor set in many of the bootstrap replicates. Other strong, second-tier factors included the number of podocytes per glomerulus, the baseline GFR, the percentage of glomeruli with segmental sclerosis and the number of endocapillary cells per glomerulus (Table 4).
The ability of the LARS model to predict loss of GFR as a dichotomous process (at
15 ml/min/1000 days) was quite good. The bootstrap-corrected error rate was 17%. The error rate was similar over a range of cut-off values (from 10 to 20 ml/min/1000 days.
What do the factors that made significant contributions to the predictive model tell us about the pathophysiology of disease progression in recently diagnosed IgAN? The high incidence of global sclerosis, expanded fractional interstitial area and an enlarged volume of remnant patent glomeruli, all reflect a loss of functioning glomeruli. Glomerular enlargement reflects this loss indirectly as compensatory glomerular hypertrophy is a concomitant of adaptive hyperfiltration. Glomerular hypertrophy may in fact also contribute to progression by its effect of lowering glomerular podocyte density [26]. The reasons for a strong independent role of serum creatinine in a model potentially containing both measured GFR and RPF are not clear. Interestingly, when the predictor set was perturbed (by excluding g1), both serum creatinine and GFR were included in the model as significant predictors. This suggests that the serum creatinine contains information content beyond its inverse relationship to GFR. Looked at another way, in 100 bootstrap replications of LARS, although baseline GFR and serum creatinine tied for the number of times they entered the model as the first factor, serum creatinine was clearly preferred in second, third and fourth places (Table 4). In fact, serum creatinine and GFR were together among the five most significant predictors 11 times in the 100 bootstrap replications. It is possible that the inclusion of serum creatinine over GFR actually derives from the hyperbolic relationship between serum creatinine and GFR [27,28]. This relationship means that changes in serum creatinine are a relatively insensitive measure of changes in GFR at normal or near-normal renal function. When GFR decreases below 50% of normal, however, further decrements lead to relatively over-weighted changes in serum creatinine. It is notable that serum creatinine did not enter the stepwise multilinear model among the first 10 variables. LARS may be better able than standard multilinear regression to incorporate non-linear effects of predictors such as serum creatinine. Baseline GFR has in fact failed to predict progression in several other kidney diseases [29]. This finding and its relatively weak position among predictors in the present study may reflect the offsetting, adaptive hyperfiltration that limits correlation of early changes in GFR with the extent of nephron loss. We found a lower filtration fraction in our subjects compared to controls and a slightly lower value in Group 1 compared to Group 2 (data not shown). This could reflect the effects of inhibition of angiotensin converting enzyme on a heightened activity in the intrarenal renin–angiotensin system as previously described [30], or the relative hyperperfusion of the surviving glomeruli as a part of the process of adaptation.
The number of endocapillary glomerular cells in our subjects reflected mesangial cell proliferation typical of active IgAN. Endocapillary hypercellularity was quite similar in Groups 1 and 2 (Table 3), however, and the number of endocapillary cells per glomerulus was the weakest of the first 10 LARS predictors of declining GFR (Table 4), suggesting that disease activity at the time of the diagnostic biopsy is not a major determinant of disease progression. In contrast, the strongest structural variables in our predictor set (global sclerosis, interstitial expansion and hypertrophy of patent glomeruli) are all pathological hallmarks of a self-perpetuating injury that follows a reduction in the number of functioning glomeruli, the so-called remnant kidney phenomenon [31–33]. It seems likely that in many IgAN patients, glomerular injury early in the course of disease reduces the number of functioning nephrons sufficiently to result in hypertrophy and hyperfunction of the remaining glomeruli, followed by an autonomous injury process leading to focal and segmental glomerulosclerosis. We propose that a remnant kidney form of chronic injury rather than the extent of active proliferative disease present at the time of biopsy is the most significant factor leading to an unfavourable course in recently diagnosed IgAN.
There are certain limitations to the present study. The multi-ethnic origin of our patients reflects the diversity of our referral population. It was not a significant factor in the multivariate analysis, but may limit application of our findings to other populations. As a referral population to an academic medical center, it is possible that our subjects represented the more severe end of the spectrum of IgAN. At the same time, only 8 of the 22 subjects had an average GFR loss of over about 5 ml/min/year. Although we tried to study the progression of IgAN during its early stages, there was evidence of advanced injury in some of the subjects. Four of the 22 subjects had an incidence of global sclerosis in their diagnostic biopsies >40%. This may be due to a relatively fulminant course in the earliest stages of the disease, before a biopsy had been obtained, as the majority of subjects had their diagnostic biopsy within 1.5 years of the onset of symptoms. We also studied the rate of progression in IgAN, rather than an endpoint such as the time to end-stage kidney failure. Evidence of the relevance of rates of GFR loss to the endpoint of kidney failure is provided by the fact that the four subjects who progressed to kidney failure during follow-up had the steepest GFR slopes.
In conclusion, we have described a robust predictive model of progression in recently diagnosed IgAN based on an expanded set of baseline structural and functional variables measured in a relatively small number of intensively studied subjects. The power of our novel approach depends upon the use of statistical methods resistant to problems of overfitting. Our results suggest that disease progression in newly diagnosed patients is largely related to the degree of glomerular loss already present by the time of the initial diagnostic biopsy. Therapeutic strategies therefore need to distinguish progression regimes associated with active, proliferative glomerular nephritis from those associated with established chronic injury (the remnant kidney phenomenon). In the former, anti-inflammatory therapies may have a more important role, while in the latter greater reliance on blockade of the renin–angiotensin–aldosterone system is probably indicated.
| Appendix |
|---|
|
|
|---|
Least-angle regression (LARS) for prediction
One of the most common methods in biomedical research for developing linear predictive models of one outcome (dependent) variable on several predictor (independent) variables is multiple linear regression by least squares. This method is subject to a variety of computational and other limitations, among them the fact that when the number of predictor variables is on the order of the number of observations to which the model is fit, there is a high risk of over-fitting. Overfitting (or over-specification) is a well-known problem for complex predictive models. It refers to use of a model whose structural complexity exceeds that of the underlying population being modeled. Because of the large number of adjustable parameters (in the case of multiple linear regression, the regression coefficients), the model can be made to fit the training dataset very well, including the idiosyncratic (noisy) structure of this dataset that does not reflect the true relationships among the variables in the underlying population, while doing less well when applied to the population from which the training set was selected. In multilinear regression a major problem is multicollinearity [34] or redundancy among the independent variables as reflected in significant univariate correlations of the variables with each other. This increases the error associated with estimation of the regression coefficients and thus diminishes the predictive power of the model when applied to the broader population from which the training dataset was drawn. LARS on the other hand is relatively resistant to problems of over-fitting. This was the principal motivation for using LARS as an automated model-building algorithm in our study, since the study is characterized by a large number of potential predictor variables relative to the number of observations.
How do standard multilinear regression and LARS differ? In standard forward stepwise multilinear regression of one outcome (dependent) variable on k potential predictor variables, the predictor variable (x1) with the highest individual correlation with the outcome variable is entered first into the model with a regression coefficient chosen to minimize the residual variance in the dependent variable (that is, the squared difference of the predicted minus the measured values) after removing the linear effect of the predictor. The remaining predictors are then projected orthogonal to this first predictor (to remove their correlation with it) giving a new set of k – 1 predictors. Of these, the predictor (say x2) having the greatest correlation with the orthogonal complement (with respect to x1) of the previous residuals is added to the linear equation with a regression coefficient that minimizes the variance remaining after removing the x2 variable's linear effect. The linear influences of the predictor variables on the dependent variable are thus eliminated one at a time by minimizing the residual variances (of nested orthogonal subsets) for a linear model based on more and more variables. The coefficient of determination (R2) in general increases with each step, so stopping model building depends on determining when adding another variable does not make a significant contribution to the predictive value of the model (usually by means of an F test).
In least-angle regression [9] the regression coefficient of the first predictor variable is not chosen to minimize the residual variance in the dependent variable, but only to decrease it until another predictor variable can account for an equal amount of the variance. From this point, the first and second predictors are added into the model with equal weighting (geometrically, this means splitting the angle between the axes of these variables vectors, hence the name least-angle regression). The latter process continues until a third variable can in turn account for an equal amount of the residual variance at which point the three variables start to be added in equally. This is why LARS is said to be non-greedy. By not using up the variance too early, LARS allows more potential independent variables into the model. The stopping rule—i.e. when to stop adding predictors to the model—is also different with LARS. A penalty is included for increasing the number of variables (degrees of freedom) in the model. This takes the form of Mallows Cp statistic, which represents an estimate of the true predictive error and reaches a minimum (corresponding to the maximum adjusted R2) after a certain number of variables are entered in the model. This selection of the natural minimal model defines the sense in which LARS is parsimonious. Returning an optimally simple model increases the chances that the model variables chosen will be easily interpretable and (patho) physiologically relevant.
An estimate of the true predictive error (Cp) taking into account both bias and variance is given by
|
|
2 is the variance of the GFR slopes, n is the number of outcomes, and k is the number of variables used in the prediction.
Bootstrap methods
Bootstrap procedures [23] are data-based computation-intensive simulation methods that can be used to robustly (non-parametrically) estimate the precision of statistical population parameters. In this case, the estimate is of the predictive accuracy of a linear model. Bootstrap estimates are based on resampling the same data set used to develop the model. Thus, bootstrap methods represent an alternative to standard validation (i.e. holding out a certain percentage of the training set to use as a validation set) or cross-validation methods (such as leave-one-out, etc.). In the bootstrap approach, resampling is performed with replacement. In our bootstrap application, 22 patients were chosen randomly with replacement from the total set of 22 subjects and subjected to the same LARS procedure as the training set. Thus, the original sample size is inherently represented in the simulation process. By creating 100 bootstrap samples, a distribution of the errors of the LARS model could be developed from which to estimate its precision. The.632+ rule is used to adjust for the inherent downward bias in error estimates using standard bootstrap methods and is resistant to some effects of over-fitting [35].
| Acknowledgements |
|---|
|
|
|---|
This work was supported by NIH/NIDDK grant R01DK49372 and NIH/NCRR grant M01-RR-00070. Dr Lemley was supported in part by a Faculty Scholar Award from Satellite Dialysis Corporation. The assistance of the staff of the General Clinical Research Center at Stanford is gratefully acknowledged.
Conflict of interest statement. None declared.
| References |
|---|
|
|
|---|
- Rychlik I, Andrassy K, Waldherr R, et al. Clinical features and natural history of IgA nephropathy. Ann Med Interne (1999) 150:117–126.[Medline]
- Barratt J, Feehally J. IgA Nephropathy. J Am Soc Nephrol (2005) 16:2088–2097.
[Free Full Text] - DAmico G. Influence of clinical and histological features on actuarial renal survival in adult patients with idiopathic IgA nephropathy, membranous nephropathy, and membranoproliferative glomerulonephritis: survey of the recent literature. Am J Kidney Dis (1992) 20:315–323.[Web of Science][Medline]
- D'Amico G. Natural history of idiopathic IgA nephropathy: role of clinical and histological prognostic factors. Am J Kidney Dis (2000) 36:227–237.[Web of Science][Medline]
- Fofi C, Pecci G, Galliani M, et al. IgA nephropathy: multivariate statistical analysis aimed at predicting outcome. J Nephrol (2001) 14:280–285.[CrossRef][Web of Science][Medline]
- Alamartine E, Sabatier JC, Guerin C, Berliet JM, Berthoux F. Prognostic factors in mesangial IgA glomerulonephritis: an extensive study with univariate and multivariate analyses. Am J Kidney Dis (1991) 18:12–19.[Web of Science][Medline]
- Hishiki T, Shirato I, Takahashi Y, et al. Podocyte injury predicts prognosis in patients with IgA nephropathy using a small amount of renal biopsy tissue. Kidney Blood Pressure Res (2001) 24:99–104.[CrossRef][Web of Science][Medline]
- Lemley KV, Lafayette RA, Safai M, et al. Podocytopenia and disease severity in IgA nephropathy. Kidney Int (2002) 61:1475–1485.[CrossRef][Web of Science][Medline]
- Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat (2004) 32:407–499.[CrossRef]
- Squarer A, Lemley KV, Ambalavanan S, et al. Mechanisms of progressive glomerular injury in membranous nephropathy. J Am Soc Nephrol (1998) 9:1389–1398.[Abstract]
- Battilana C, Zhang HP, Olshen RA, Wexler L, Myers BD. PAH extraction and estimation of plasma flow in diseased human kidneys. Am J Physiol (1991) 261:F726–F733.[Web of Science][Medline]
- Canaan-Kuhl S, Venkatraman ES, Ernst SI, Olshen RA, Myers BD. Relationships among protein and albumin concentrations and oncotic pressure in nephrotic plasma. Am J Physiol (1993) 264:F1052–F1059.[Web of Science][Medline]
- Weibel ER. Stereologic Methods. Practical Methods for Biological Morphometry. (1979) Volume 1. London: Academic Press, Inc.
- Miller PL, Meyer TW. Effects of tissue preparation on glomerular volume and capillary structure in the rat. Lab Invest (1990) 63:862–866.[Web of Science][Medline]
- Gundersen HJ. Notes on the estimation of the numerical density of arbitrary profiles: the edge effect. J Microsc (1977) 111:219–223.[Web of Science]
- Jensen EB, Gundersen HJ, Osterby R. Determination of membrane thickness distribution from orthogonal intercepts. J Microsc (1979) 115:19–33.[Web of Science][Medline]
- Hladunewich MA, Lemley KV, Blouch KL, Myers BD. Determinants of GFR depression in early membranous nephropathy. Am J Physiol (2003) 284:F1014–F1022.[Web of Science]
- Ting RH, Kristal B, Myers BD. The biophysical basis of hypofiltration in nephrotic humans with membranous nephropathy. Kidney Int (1994) 45:390–397.[Web of Science][Medline]
- Drumond MC, Deen WM. Structural determinants of glomerular hydraulic permeability. Am J Physiol (1994) 266:F1–F12.[Web of Science][Medline]
- Drumond MC, Kristal B, Myers BD, Deen WM. Structural basis for reduced glomerular filtration capacity in nephrotic humans. J Clin Invest (1994) 94:1187–1195.[Web of Science][Medline]
- Obrador GT, Arora P, Kausz AT, et al. Level of renal function at the initiation of dialysis in the U.S. end-stage renal disease population. Kidney Int (1999) 56:2227–2235.[CrossRef][Web of Science][Medline]
- Efron B. How biased is the apparent error rate of a prediction rule? J Am Stat Assoc (1986) 81:461–470.[CrossRef][Web of Science]
- Efron B, Tibshirani RJ. An Introduction to the Bootstrap (1993) Chapman & Hall;: New York:.
- Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models. Ann Intern Med (1993) 118:201–210.
[Abstract/Free Full Text] - D'Amico G. Natural history of idiopathic IgA nephropathy and factors predictive of disease outcome. Sem Nephrol (2004) 24:179–196.[CrossRef][Web of Science][Medline]
- Kretzler M. Role of Podocytes in Focal Sclerosis: Defining the Point of No Return. J Am Soc Nephrol (2005) 16:2830–2832.
[Free Full Text] - Myers BD, Boothroyd D, Olshen RA. Angiotensin-converting enzyme inhibitor for slowing progression of diabetic and nondiabetic kidney disease. J Am Soc Nephrol (1998) 9 [Suppl 12]:S66–S70.
- Shemesh O, Golbetz H, Kriss JP, Myers BD. Limitations of creatinine as a filtration marker in glomerulopathic patients. Kidney Int (1985) 28:830–838.[Web of Science][Medline]
- Hunsicker LG, Adler S, Caggiula A, et al. Predictors of the progression of renal disease in the Modification of Diet in Renal Disease Study. Kidney Int (1997) 51:1908–1919.[Web of Science][Medline]
- Coppo R, Amore A, Gianoglio B, et al. Angiotensin II local hyperreactivity in the progression of IgA nephropathy. Am J Kidney Dis (1993) 21:593–602.[Web of Science][Medline]
- Meyer TW, Rennke HG. Progressive glomerular injury after limited renal infarction in the rat. Am J Physiol Renal Physiol (1988) 254:F856–F862.
[Abstract/Free Full Text] - Abdi R, Dong VM, Rubel JR, et al. Correlation between glomerular size and long-term renal function in patients with substantial loss of renal mass. J Urol (2003) 170:42–44.[CrossRef][Web of Science][Medline]
- Novick AC, Gephardt G, Guz B, Steinmuller D, Tubbs RR. Long-term follow-up after partial removal of a solitary kidney. New Engl J Med (1991) 325:1058–1062.[Abstract]
- Glantz SA, Slinker BK. Primer of Applied Regression & Analysis of Variance. 2nd ed. McGraw-Hill Medical, 2000.
- Efron B, Tibshirani R. Improvements on cross-validation: The.632+ bootstrap method. J Am Stat Assoc (1997) 92:548–560.[CrossRef][Web of Science]
Accepted in revised form: 24. 7.07
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

