Selecció de característiques: Mètodes d'envoltura
Autor: Joan Puigcerver Ibáñez
Correu electrònic: j.puigcerveribanez@edu.gva.es
Llicència: CC BY-NC-SA 4.0
(Reconeixement - NoComercial - CompartirIgual) 🅭
Selecció de característiques: Mètodes d'envoltura
Els mètodes d'envoltura són una tècnica de reducció de dimensionalitat
basada en la selecció de característiques que seleccionen les característiques
basant-se en el rendiment d'un model d'aprenentatge automàtic.
Selecció cap endavant
La selecció cap endavant (forward selection ) és un mètode d'envoltura
que es basa en seleccionar les característiques d'una a una,
afegint la característica que millora més el rendiment del model
en cada pas.
La classe SequentialFeatureSelector
de mlxtend
implementa aquest mètode,
utilitzant el paràmetre forward=True
.
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.linear_model import LinearRegression
k_features = 5
sfs = SFS ( LinearRegression (),
k_features = k_features ,
forward = True ,
floating = False ,
scoring = 'r2' ,
cv = 0 )
sfs . fit ( X_train , y_train )
initial_features = X_train . columns . tolist ()
selected_features = list ( sfs . k_feature_names_ )
removed_features = list ( set ( initial_features ) - set ( selected_features ))
print ( "Initial features: " , initial_features )
print ()
print ( f "## Forward selection (k = { k_features } )" )
print ( "Selected features: " , selected_features )
print ( "Removed features: " , removed_features )
Selecció cap enrere
La selecció cap enrere (backward elimination ) és un mètode d'envoltura
que parteix de totes les característiques i va eliminant-ne una
a una, eliminant la característica que més empitjora el rendiment
del model en cada pas.
Aquest mètode també es pot implementar amb la classe SequentialFeatureSelector
de mlxtend
, utilitzant el paràmetre forward=False
.
k_features = 5
sfs = SFS ( LinearRegression (),
k_features = k_features ,
forward = False ,
floating = False ,
scoring = 'r2' ,
cv = 0 )
sfs . fit ( X_train , y_train )
selected_features = list ( sfs . k_feature_names_ )
removed_features = list ( set ( initial_features ) - set ( selected_features ))
print ()
print ( f "## Backward elimination (k = { k_features } )" )
print ( "Selected features: " , selected_features )
print ( "Removed features: " , removed_features )
Selecció bidireccional
La selecció bidireccional (bidirectional selection ) és un mètode
d'envoltura que combina els dos mètodes anteriors, afegint i eliminant
característiques en cada pas.
Aquest mètode també es pot implementar amb la classe SequentialFeatureSelector
de mlxtend
, utilitzant el paràmetre floating=True`.
forward=True
i floating=True
: Selecció cap endavant amb possibilitat
d'eliminar característiques seleccionades.
k_features = 5
sfs = SFS ( LinearRegression (),
k_features = k_features ,
forward = True ,
floating = True , # Bi-directional
scoring = 'r2' ,
cv = 0 )
sfs . fit ( X_train , y_train )
selected_features = list ( sfs . k_feature_names_ )
removed_features = list ( set ( initial_features ) - set ( selected_features ))
print ()
print ( f "## Bidirectional forward (k = { k_features } )" )
print ( "Selected features: " , selected_features )
print ( "Removed features: " , removed_features )
forward=False
i floating=True
: Selecció cap enrere amb possibilitat
de tornar a afegir característiques eliminades.
k_features = 5
sfs = SFS ( LinearRegression (),
k_features = k_features ,
forward = False , # Backward
floating = True , # Bi-directional
scoring = 'r2' ,
cv = 0 )
sfs . fit ( X_train , y_train )
selected_features = list ( sfs . k_feature_names_ )
removed_features = list ( set ( initial_features ) - set ( selected_features ))
print ()
print ( f "## Bidirectional backward (k = { k_features } )" )
print ( "Selected features: " , selected_features )
print ( "Removed features: " , removed_features )
Codi font
reduccio_envoltura.py import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.linear_model import LinearRegression
california = fetch_california_housing ()
X = pd . DataFrame ( california . data , columns = california . feature_names )
Y = california . target
X_train , X_test , y_train , y_test = train_test_split ( X , Y , train_size = 0.7 )
print ( X_train . shape )
print ( X_test . shape )
k_features = 5
sfs = SFS ( LinearRegression (),
k_features = k_features ,
forward = True ,
floating = False ,
scoring = 'r2' ,
cv = 0 )
sfs . fit ( X_train , y_train )
initial_features = X_train . columns . tolist ()
selected_features = list ( sfs . k_feature_names_ )
removed_features = list ( set ( initial_features ) - set ( selected_features ))
print ( "Initial features: " , initial_features )
print ()
print ( f "## Forward selection (k = { k_features } )" )
print ( "Selected features: " , selected_features )
print ( "Removed features: " , removed_features )
k_features = 5
sfs = SFS ( LinearRegression (),
k_features = k_features ,
forward = False ,
floating = False ,
scoring = 'r2' ,
cv = 0 )
sfs . fit ( X_train , y_train )
selected_features = list ( sfs . k_feature_names_ )
removed_features = list ( set ( initial_features ) - set ( selected_features ))
print ()
print ( f "## Backward elimination (k = { k_features } )" )
print ( "Selected features: " , selected_features )
print ( "Removed features: " , removed_features )
k_features = 5
sfs = SFS ( LinearRegression (),
k_features = k_features ,
forward = True ,
floating = True , # Bi-directional
scoring = 'r2' ,
cv = 0 )
sfs . fit ( X_train , y_train )
selected_features = list ( sfs . k_feature_names_ )
removed_features = list ( set ( initial_features ) - set ( selected_features ))
print ()
print ( f "## Bidirectional forward (k = { k_features } )" )
print ( "Selected features: " , selected_features )
print ( "Removed features: " , removed_features )
k_features = 5
sfs = SFS ( LinearRegression (),
k_features = k_features ,
forward = False , # Backward
floating = True , # Bi-directional
scoring = 'r2' ,
cv = 0 )
sfs . fit ( X_train , y_train )
selected_features = list ( sfs . k_feature_names_ )
removed_features = list ( set ( initial_features ) - set ( selected_features ))
print ()
print ( f "## Bidirectional backward (k = { k_features } )" )
print ( "Selected features: " , selected_features )
print ( "Removed features: " , removed_features )