pysensors.classification package

Module contents

class pysensors.classification.SSPOC(basis=None, classifier=None, n_sensors=None, threshold=None, l1_penalty=0.1)[source]

Bases: BaseEstimator

Sparse Sensor Placement Optimization for Classification (SSPOC) object.

As the name suggests, this class can be used to select optimal sensor locations (measurement locations) for classification tasks.

The time complexity of the SSPOC algorithm can be decomposed as

\[C_{total} = C_{basis} + C_{classification} + C_{optimization}\]
  • \(C_{basis}\): the complexity of fitting the selected basis object and producing the matrix inverse. The matrix inverse is “free” to compute for pysensors.basis.Identity and pysensors.basis.SVD. For pysensors.basis.RandomProjection the complexity is that of calling numpy.linalg.pinv on a matrix of size n_input_features * n_basis_modes.

  • \(C_{classification}\): the cost of fitting the chosen classifier to n_examples examples with n_basis_modes features.

  • \(C_{optimization}\): the cost of solving the sensor optimization problem. For binary classification we use sklearn.linear_model.OrthogonalMatchingPursuit. For multi-class classification we use sklearn.linear_model.MultiTaskLasso. The costs for each depend on the fit options that are specified. In both cases there are n_basis_modes examples with n_features features.

The space complexity likewise depends on the same three factors. Generally, the basis requires O(n_basis_modes * n_features) space. The space requirements for classification and optimization depend on the particular algorithms being employed. See the Scikit-learn documentation for specifics.

See the following reference for more information:

Brunton, Bingni W., et al. “Sparse sensor placement optimization for classification.” SIAM Journal on Applied Mathematics 76.5 (2016): 2099-2122.

Parameters
  • basis (basis object, optional (default pysensors.basis.Identity)) – Basis in which to represent the data. Default is the identity basis (i.e. raw features).

  • classifier (classifier object, optional ) – (default Linear Discriminant Analysis (LDA)) Classifier for which to optimize sensors. Must be a linear classifier with a coef_ attribute and fit and predict methods.

  • n_sensors (positive integer, optional (default None)) – Number of sensor locations to be used after fitting. If n_sensors is not None then it overrides the threshold parameter. If set to 0, then classifier will be replaced with a dummy classifier which predicts the class randomly.

  • threshold (nonnegative float, optional (default None)) –

    Threshold for selecting sensors. Overriden by n_sensors. If both threshold and n_sensors are None when the fit method is called, then the threshold will be set to

    \[\frac{\|s\|_F}{2rc}\]

    where \(s\) is a sensor coefficient matrix, \(r\) is the number of basis modes, and \(c\) is the number of distinct classes, as suggested in Brunton et al. (2016).

  • l1_penalty (nonnegative float, optional (default 0.1)) – The L1 penalty term used to form the sensor coefficient matrix, s. Larger values will result in a sparser s and fewer selected sensors. This parameter is ignored for binary classification problems.

Attributes
  • n_basis_modes (nonnegative integer) – Number of basis modes to be used when deciding sensor locations.

  • basis_matrix_inverse_ (np.ndarray, shape (n_basis_modes, n_input_features)) – The inverse of the matrix of basis vectors.

  • sensor_coef_ (np.ndarray, shape (n_input_features, n_classes)) – The sensor coefficient matrix, s.

  • sparse_sensors_ (np.ndarray, shape (n_sensors, )) – The selected sensors.

Examples

>>> from sklearn.metrics import accuracy_score
>>> from sklearn.datasets import make_classification
>>> from pysensors.classification import SSPOC
>>>
>>> x, y = make_classification(n_classes=3, n_informative=3, random_state=10)
>>>
>>> model = SSPOC(n_sensors=10, l1_penalty=0.03)
>>> model.fit(x, y, quiet=True)
SSPOC(basis=Identity(n_basis_modes=100),
      classifier=LinearDiscriminantAnalysis(), l1_penalty=0.03, n_sensors=10)
>>> print(model.selected_sensors)
[10 13  6 19 17 16 15 14 12 11]
>>>
>>> acc = accuracy_score(y, model.predict(x[:, model.selected_sensors]))
>>> print("Accuracy:", acc)
Accuracy: 0.66
>>>
>>> model.update_sensors(n_sensors=5, xy=(x, y), quiet=True)
>>> print(model.selected_sensors)
[10 13  6 19 17]
>>>
>>> acc = accuracy_score(y, model.predict(x[:, model.selected_sensors]))
>>> print("Accuracy:", acc)
Accuracy: 0.6
fit(x, y, quiet=False, prefit_basis=False, refit=True, **optimizer_kws)[source]

Fit the SSPOC model, determining which sensors are relevant.

Parameters
  • x (array-like, shape (n_samples, n_input_features)) – Training data.

  • y (array-like, shape (n_samples,)) – Training labels.

  • quiet (boolean, optional (default False)) – Whether or not to suppress warnings during fitting.

  • prefit_basis (boolean, optional (default False)) – Whether or not the basis has already been fit to x. For example, you may have already fit and experimented with a SVD object to determine the optimal number of modes. This option allows you to avoid an unnecessary SVD.

  • refit (boolean, optional (default True)) – Whether or not to refit the classifier using measurements only from the learned sensor locations.

  • optimizer_kws (dict, optional) – Keyword arguments to be passed to the optimization routine.

Returns

self

Return type

a fitted SSPOC instance

predict(x)[source]

Predict classes for given measurements. If self.n_sensors is 0 then a dummy classifier is used in place of self.classifier.

Parameters

x (array-like, shape (n_samples, n_sensors) or (n_samples, n_features)) – Examples to be classified. The measurements should be taken at the sensor locations specified by self.selected_sensors.

Returns

y – Predicted classes.

Return type

np.ndarray, shape (n_samples,)

update_sensors(n_sensors=None, threshold=None, xy=None, quiet=False, method=<function amax>, **method_kws)[source]

Update the selected sensors by changing either the preferred number of sensors or the threshold used to select the sensors, refitting the classifier afterwards, if possible.

Parameters
  • n_sensors (nonnegative integer, optional (default None)) – The number of sensor locations to select. If None, then threshold will be used to pick the sensors. Note that n_sensors and threshold cannot both be None.

  • threshold (nonnegative float, optional (default None)) – The threshold to use to select sensors based on the magnitudes of entries in self.sensor_coef_ (s). Overridden by n_sensors. Note that n_sensors and threshold cannot both be None.

  • xy (tuple of np.ndarray, length 2, optional (default None)) – Tuple containing training data x and labels y for refitting. x should have shape (n_samples, n_input_features) and y shape (n_samples, ). If not None, the classifier will be refit after the new sensors have been selected.

  • quiet (boolean, optional (default False)) – Whether to silence warnings.

  • method (callable, optional (default np.max)) – Function used along with threshold to select sensors. For binary classification problems one need not specify a method. For multiclass classification problems, sensor_coef_ (s) has multiple columns and method is applied along each row to aggregate coefficients for thresholding, i.e. method is called as follows method(np.abs(self.sensor_coef_), axis=1, **method_kws). Other examples of acceptable methods are np.min, np.mean, and np.median.

  • **method_kws (dict, optional) – Keyword arguments to be passed into method when it is called.

update_n_basis_modes(n_basis_modes, xy, **fit_kws)[source]

Re-fit the SSPOC object using a different value of n_basis_modes.

This method allows one to relearn sensor locations for a different number of basis modes _without_ re-fitting the basis in many cases. Specifically, if n_basis_modes <= self.basis.n_basis_modes then the basis does not need to be refit. Otherwise this function does not save any computational resources.

Parameters
  • n_basis_modes (positive int, optional (default None)) – Number of basis modes to be used during fit. Must be less than or equal to n_samples.

  • xy (tuple of np.ndarray, length 2) – Tuple containing training data x and labels y for refitting. x should have shape (n_samples, n_input_features) and y shape (n_samples, ).

  • **fit_kws (dict, optional) – Keyword arguments to pass to SSPOC.fit.

property selected_sensors

Get the indices of the selected sensors.

Returns

sensors – Indices of the selected sensors.

Return type

numpy array, shape (n_sensors,)

get_selected_sensors()[source]

Convenience function for getting indices of the selected sensors.

Returns

sensors – Indices of the selected sensors.

Return type

numpy array, shape (n_sensors,)