pysensors.classification package¶
Module contents¶
- class pysensors.classification.SSPOC(basis=None, classifier=None, n_sensors=None, threshold=None, l1_penalty=0.1)[source]¶
Bases:
BaseEstimator
Sparse Sensor Placement Optimization for Classification (SSPOC) object.
As the name suggests, this class can be used to select optimal sensor locations (measurement locations) for classification tasks.
The time complexity of the SSPOC algorithm can be decomposed as
\[C_{total} = C_{basis} + C_{classification} + C_{optimization}\]\(C_{basis}\): the complexity of fitting the selected basis object and producing the matrix inverse. The matrix inverse is “free” to compute for
pysensors.basis.Identity
andpysensors.basis.SVD
. Forpysensors.basis.RandomProjection
the complexity is that of callingnumpy.linalg.pinv
on a matrix of sizen_input_features * n_basis_modes
.\(C_{classification}\): the cost of fitting the chosen classifier to
n_examples
examples withn_basis_modes
features.\(C_{optimization}\): the cost of solving the sensor optimization problem. For binary classification we use
sklearn.linear_model.OrthogonalMatchingPursuit
. For multi-class classification we usesklearn.linear_model.MultiTaskLasso
. The costs for each depend on the fit options that are specified. In both cases there aren_basis_modes
examples withn_features
features.
The space complexity likewise depends on the same three factors. Generally, the basis requires
O(n_basis_modes * n_features)
space. The space requirements for classification and optimization depend on the particular algorithms being employed. See the Scikit-learn documentation for specifics.See the following reference for more information:
Brunton, Bingni W., et al. “Sparse sensor placement optimization for classification.” SIAM Journal on Applied Mathematics 76.5 (2016): 2099-2122.
- Parameters
basis (basis object, optional (default
pysensors.basis.Identity
)) – Basis in which to represent the data. Default is the identity basis (i.e. raw features).classifier (classifier object, optional ) – (default Linear Discriminant Analysis (LDA)) Classifier for which to optimize sensors. Must be a linear classifier with a
coef_
attribute andfit
andpredict
methods.n_sensors (positive integer, optional (default None)) – Number of sensor locations to be used after fitting. If
n_sensors
is not None then it overrides thethreshold
parameter. If set to 0, thenclassifier
will be replaced with a dummy classifier which predicts the class randomly.threshold (nonnegative float, optional (default None)) –
Threshold for selecting sensors. Overriden by
n_sensors
. If boththreshold
andn_sensors
are None when thefit
method is called, then the threshold will be set to\[\frac{\|s\|_F}{2rc}\]where \(s\) is a sensor coefficient matrix, \(r\) is the number of basis modes, and \(c\) is the number of distinct classes, as suggested in Brunton et al. (2016).
l1_penalty (nonnegative float, optional (default 0.1)) – The L1 penalty term used to form the sensor coefficient matrix, s. Larger values will result in a sparser s and fewer selected sensors. This parameter is ignored for binary classification problems.
- Attributes
n_basis_modes (nonnegative integer) – Number of basis modes to be used when deciding sensor locations.
basis_matrix_inverse_ (np.ndarray, shape (n_basis_modes, n_input_features)) – The inverse of the matrix of basis vectors.
sensor_coef_ (np.ndarray, shape (n_input_features, n_classes)) – The sensor coefficient matrix, s.
sparse_sensors_ (np.ndarray, shape (n_sensors, )) – The selected sensors.
Examples
>>> from sklearn.metrics import accuracy_score >>> from sklearn.datasets import make_classification >>> from pysensors.classification import SSPOC >>> >>> x, y = make_classification(n_classes=3, n_informative=3, random_state=10) >>> >>> model = SSPOC(n_sensors=10, l1_penalty=0.03) >>> model.fit(x, y, quiet=True) SSPOC(basis=Identity(n_basis_modes=100), classifier=LinearDiscriminantAnalysis(), l1_penalty=0.03, n_sensors=10) >>> print(model.selected_sensors) [10 13 6 19 17 16 15 14 12 11] >>> >>> acc = accuracy_score(y, model.predict(x[:, model.selected_sensors])) >>> print("Accuracy:", acc) Accuracy: 0.66 >>> >>> model.update_sensors(n_sensors=5, xy=(x, y), quiet=True) >>> print(model.selected_sensors) [10 13 6 19 17] >>> >>> acc = accuracy_score(y, model.predict(x[:, model.selected_sensors])) >>> print("Accuracy:", acc) Accuracy: 0.6
- fit(x, y, quiet=False, prefit_basis=False, refit=True, **optimizer_kws)[source]¶
Fit the SSPOC model, determining which sensors are relevant.
- Parameters
x (array-like, shape (n_samples, n_input_features)) – Training data.
y (array-like, shape (n_samples,)) – Training labels.
quiet (boolean, optional (default False)) – Whether or not to suppress warnings during fitting.
prefit_basis (boolean, optional (default False)) – Whether or not the basis has already been fit to x. For example, you may have already fit and experimented with a
SVD
object to determine the optimal number of modes. This option allows you to avoid an unnecessary SVD.refit (boolean, optional (default True)) – Whether or not to refit the classifier using measurements only from the learned sensor locations.
optimizer_kws (dict, optional) – Keyword arguments to be passed to the optimization routine.
- Returns
self
- Return type
a fitted
SSPOC
instance
- predict(x)[source]¶
Predict classes for given measurements. If
self.n_sensors
is 0 then a dummy classifier is used in place ofself.classifier
.- Parameters
x (array-like, shape (n_samples, n_sensors) or (n_samples, n_features)) – Examples to be classified. The measurements should be taken at the sensor locations specified by
self.selected_sensors
.- Returns
y – Predicted classes.
- Return type
np.ndarray, shape (n_samples,)
- update_sensors(n_sensors=None, threshold=None, xy=None, quiet=False, method=<function amax>, **method_kws)[source]¶
Update the selected sensors by changing either the preferred number of sensors or the threshold used to select the sensors, refitting the classifier afterwards, if possible.
- Parameters
n_sensors (nonnegative integer, optional (default None)) – The number of sensor locations to select. If None, then
threshold
will be used to pick the sensors. Note thatn_sensors
andthreshold
cannot both be None.threshold (nonnegative float, optional (default None)) – The threshold to use to select sensors based on the magnitudes of entries in
self.sensor_coef_
(s). Overridden byn_sensors
. Note thatn_sensors
andthreshold
cannot both be None.xy (tuple of np.ndarray, length 2, optional (default None)) – Tuple containing training data x and labels y for refitting. x should have shape (n_samples, n_input_features) and y shape (n_samples, ). If not None, the classifier will be refit after the new sensors have been selected.
quiet (boolean, optional (default False)) – Whether to silence warnings.
method (callable, optional (default
np.max
)) – Function used along withthreshold
to select sensors. For binary classification problems one need not specify a method. For multiclass classification problems,sensor_coef_
(s) has multiple columns andmethod
is applied along each row to aggregate coefficients for thresholding, i.e.method
is called as followsmethod(np.abs(self.sensor_coef_), axis=1, **method_kws)
. Other examples of acceptable methods arenp.min
,np.mean
, andnp.median
.**method_kws (dict, optional) – Keyword arguments to be passed into
method
when it is called.
- update_n_basis_modes(n_basis_modes, xy, **fit_kws)[source]¶
Re-fit the
SSPOC
object using a different value ofn_basis_modes
.This method allows one to relearn sensor locations for a different number of basis modes _without_ re-fitting the basis in many cases. Specifically, if
n_basis_modes <= self.basis.n_basis_modes
then the basis does not need to be refit. Otherwise this function does not save any computational resources.- Parameters
n_basis_modes (positive int, optional (default None)) – Number of basis modes to be used during fit. Must be less than or equal to
n_samples
.xy (tuple of np.ndarray, length 2) – Tuple containing training data x and labels y for refitting. x should have shape (n_samples, n_input_features) and y shape (n_samples, ).
**fit_kws (dict, optional) – Keyword arguments to pass to
SSPOC.fit
.
- property selected_sensors¶
Get the indices of the selected sensors.
- Returns
sensors – Indices of the selected sensors.
- Return type
numpy array, shape (n_sensors,)