# Penalized additive logistic regression for cardiovascular risk prediction

Penalized additive logistic regression for cardiovascular risk prediction - Download this document for free, or read online. Document in PDF available to download.

* Corresponding author 1 Biostatistique 2 Heudiasyc - Heuristique et Diagnostic des Systèmes Complexes Compiègne

Abstract : Predicting individual risk is needed to target preventive interventions toward people with the highest probability of benefit over a given time period. Any estimate of cardiovascular risk is currently based on the use of statistical models inferred from cohort data with methods such as logistic regression. Although attractively simple, the logistic model fails in some situations: 1 If the number of prognostic factors is large with respect to the number of observations or if they are highly correlated, then the variance of coefficient estimates may be high, leading to prediction inaccuracy. Subset selection is extensively used to address this difficulty. Another way to overcome these obstacles consists in imposing a penalty on large fluctuations of the estimated parameters. The lasso estimates a vector of linear regression coefficients by minimizing the residual sum of squares subject to a constraint on the l {1}-norm of coefficient vector. An interesting feature of the l 1-norm constraint is that it shrinks some coefficients and sets others to exactly zero. On the other hand, the smooth form of the constraint leads to a less variable model than that provided by subset selection. 2 In real life, effects are generally not linear. When the study exposure is continuous, linear models may not accurately characterize the exposure-response curve. A generalization of the standard logistic model is the additive logistic model. The aim of this study is to model parsimoniously the relationship between a binary response and several continuous covariates in the case of possible nonlinearities in the effect of the covariates. We present a new method for variable selection and function estimation in non parametric additive logistic models fitted by cubic smoothing splines: penalized additive logistic regression. The method is based on a generalization of the lasso. Because of their nature, these constraints shrink linear and nonlinear coefficients, some of them going exactly to zero. Hence, they give parsimonious models, select significant variables, and reveal nonlinearities in the effects of predictors. Penalized additive logistic regression is applied to predict the risk of cardiovascular disease in a real database from the INDANA project Individual Data Analysis of Antihypertensive Intervention Trials.

Mots-clés : Model selection function estimation hypertension

Author: ** Marta Avalos - Yves Grandvalet - Christophe Ambroise - **

Source: https://hal.archives-ouvertes.fr/