pysensors package

Subpackages

Module contents

class pysensors.SSPOR(basis=None, optimizer=None, n_sensors=None)[source]

Bases: BaseEstimator

Sparse Sensor Placement Optimization for Reconstruction: a model for selecting the best sensor locations for state reconstruction.

Given a basis in which to represent the state (e.g. PCA modes) along with measurement data, a SSPOR instance produces a list of sensor locations (a permutation of the numbers 0, 1, …, n_input_features - 1) ranked in descending order of importance. One can then select the top k sensors and take future measurements at that limited set of locations.

The overall time complexity of fitting a SSPOR object is O(n_basis_modes * n_input_features * n_input_features) plus the cost for fitting the basis. Different bases have different complexities. The space complexity is O(n_basis_modes * n_input_features).

Parameters
  • basis (basis object, optional (default pysensors.basis.Identity)) – Basis in which to represent the data. Default is the identity basis (i.e. raw features).

  • optimizer (optimizer object, optional (default pysensors.optimizers.QR)) – Optimization method used to rank sensor locations.

  • n_sensors (int, optional (default n_input_features)) – Number of sensors to select. Note that s = SSPOR(n_sensors=10); s.fit(x) is equivalent to s = SSPOR(); s.fit(x); s.set_number_of_sensors(10).

Attributes
  • n_basis_modes (int) – Number of basis modes considered during fitting.

  • basis_matrix_ (np.ndarray) – Internal representation of the basis.

  • ranked_sensors_ (np.ndarray) – Sensor locations ranked in descending order of importance.

Examples

>>> import numpy as np
>>> from pysensors import SSPOR
>>>
>>> x = np.linspace(0, 1, 501)
>>> monomials = np.vander(x, 15).T
>>>
>>> model = SSPOR(n_sensors=5)
>>> model.fit(monomials)
SSPOR(basis=Identity(n_basis_modes=15), n_sensors=5, optimizer=QR())
>>> print(model.selected_sensors)
[500 377   0 460 185]
>>> print(x[model.selected_sensors])
[1.    0.754 0.    0.92  0.37 ]
>>> model.set_n_sensors(7)
>>> print(x[model.selected_sensors])
[1.    0.754 0.    0.92  0.37  0.572 0.134]
>>> f = np.sin(3*x)
>>> f_pred = model.predict(f[model.selected_sensors])
>>> print(np.linalg.norm(f - f_pred))
0.022405698005838044
fit(x, quiet=False, prefit_basis=False, seed=None, **optimizer_kws)[source]

Fit the SSPOR model, determining which sensors are relevant.

Parameters
  • x (array-like, shape (n_samples, n_input_features)) – Training data.

  • quiet (boolean, optional (default False)) – Whether or not to suppress warnings during fitting.

  • prefit_basis (boolean, optional (default False)) – Whether or not the basis has already been fit to x. For example, you may have already fit and experimented with a SVD object to determine the optimal number of modes. This option allows you to avoid an unnecessary SVD.

  • seed (int, optional (default None)) – Seed for the random number generator used to shuffle sensors after the self.basis.n_basis_modes sensor. Most optimizers only rank the top self.basis.n_basis_modes sensors, leaving the rest virtually untouched. As a result the remaining samples are randomly permuted.

  • optimizer_kws (dict, optional) – Keyword arguments to be passed to the get_sensors method of the optimizer.

Returns

self

Return type

a fitted SSPOR instance

predict(x, **solve_kws)[source]

Predict values at all positions given measurements at sensor locations.

Parameters
  • x (array-like, shape (n_samples, n_sensors)) – Measurements from which to form prediction. The measurements should be taken at the sensor locations specified by self.get_selected_sensors().

  • solve_kws (dict, optional) – keyword arguments to be passed to the linear solver used to invert the basis matrix.

Returns

y – Predicted values at every location.

Return type

numpy array, shape (n_samples, n_features)

get_selected_sensors()[source]

Get the indices of the sensors chosen by the model.

Returns

sensors – Indices of the sensors chosen by the model (i.e. the sensor locations) ranked in descending order of importance.

Return type

numpy array, shape (n_sensors,)

property selected_sensors

Get the indices of the sensors chosen by the model.

Returns

sensors – Indices of the sensors chosen by the model (i.e. the sensor locations) ranked in descending order of importance.

Return type

numpy array, shape (n_sensors,)

get_all_sensors()[source]

Get a ranked list consisting of all the sensors. The sensors are given in descending order of importance.

Returns

sensors – Indices of sensors in descending order of importance.

Return type

numpy array, shape (n_features,)

property all_sensors

Get a ranked list consisting of all the sensors. The sensors are given in descending order of importance.

Returns

sensors – Indices of sensors in descending order of importance.

Return type

numpy array, shape (n_features,)

set_number_of_sensors(n_sensors)[source]

Set n_sensors, the number of sensors to be used for prediction.

Parameters

n_sensors (int) – The number of sensors. Must be a positive integer. Cannot exceed the number of available sensors (n_features).

set_n_sensors(n_sensors)[source]

A convenience function accomplishing the same thing as set_number_of_sensors. Set n_sensors, the number of sensors to be used for prediction.

Parameters

n_sensors (int) – The number of sensors. Must be a positive integer. Cannot exceed the number of available sensors (n_features).

update_n_basis_modes(n_basis_modes, x=None, quiet=False)[source]

Re-fit the SSPOR object using a different value of n_basis_modes.

This method allows one to relearn sensor locations for a different number of basis modes _without_ re-fitting the basis in many cases. Specifically, if n_basis_modes <= self.basis.n_basis_modes then the basis does not need to be refit. Otherwise this function does not save any computational resources.

Parameters
  • n_basis_modes (positive int, optional (default None)) – Number of basis modes to be used during fit. Must be less than or equal to n_samples.

  • x (numpy array, shape (n_examples, n_features), optional (default None)) – Only used if n_basis_modes exceeds the number of available basis modes for the already fit basis.

  • quiet (boolean, optional (default False)) – Whether or not to suppress warnings during refitting.

score(x, y=None, score_function=None, score_kws={}, solve_kws={})[source]

Compute the reconstruction error for a given set of measurements.

Parameters
  • x (numpy array, shape (n_examples, n_features)) – Measurements with which to compute the score. Note that x should consist of measurements at every location, not just the recommended sensor location, i.e. its shape should be (n_examples, n_features) rather than (n_examples, n_sensors).

  • y (None) – Dummy input to maintain compatibility with Scikit-learn.

  • score_function (callable, optional (default None)) – Function used to compute the score. Should have the call signature score_function(y_true, y_pred, **score_kws). Default is the negative of the root mean squared error (sklearn expects higher scores to correspond to better performance).

  • score_kws (dict, optional) – Keyword arguments to be passed to score_function. Ignored if score_function is None.

  • solve_kws (dict, optional) – Keyword arguments to be passed to the predict method.

Returns

score – The score.

Return type

float

reconstruction_error(x_test, sensor_range=None, score=None, **solve_kws)[source]

Compute the reconstruction error for different numbers of sensors.

Parameters
  • x_test (numpy array, shape (n_examples, n_features)) – Measurements to be reconstructed.

  • sensor_range (1D numpy array, optional (default None)) – Numbers of sensors at which to compute the reconstruction error. If None, will be set to [1, 2, … , min(n_sensors, basis.n_basis_modes)].

  • score (callable, optional (default None)) – Function used to compute the reconstruction error. Should have the signature score(x, x_pred). If None, the root mean squared error is used.

  • solve_kws (dict, optional) – Keyword arguments to be passed to the linear solver.

Returns

error – Reconstruction scores for each number of sensors in sensor_range.

Return type

numpy array, shape (len(sensor_range),)

class pysensors.SSPOC(basis=None, classifier=None, n_sensors=None, threshold=None, l1_penalty=0.1)[source]

Bases: BaseEstimator

Sparse Sensor Placement Optimization for Classification (SSPOC) object.

As the name suggests, this class can be used to select optimal sensor locations (measurement locations) for classification tasks.

The time complexity of the SSPOC algorithm can be decomposed as

\[C_{total} = C_{basis} + C_{classification} + C_{optimization}\]
  • \(C_{basis}\): the complexity of fitting the selected basis object and producing the matrix inverse. The matrix inverse is “free” to compute for pysensors.basis.Identity and pysensors.basis.SVD. For pysensors.basis.RandomProjection the complexity is that of calling numpy.linalg.pinv on a matrix of size n_input_features * n_basis_modes.

  • \(C_{classification}\): the cost of fitting the chosen classifier to n_examples examples with n_basis_modes features.

  • \(C_{optimization}\): the cost of solving the sensor optimization problem. For binary classification we use sklearn.linear_model.OrthogonalMatchingPursuit. For multi-class classification we use sklearn.linear_model.MultiTaskLasso. The costs for each depend on the fit options that are specified. In both cases there are n_basis_modes examples with n_features features.

The space complexity likewise depends on the same three factors. Generally, the basis requires O(n_basis_modes * n_features) space. The space requirements for classification and optimization depend on the particular algorithms being employed. See the Scikit-learn documentation for specifics.

See the following reference for more information:

Brunton, Bingni W., et al. “Sparse sensor placement optimization for classification.” SIAM Journal on Applied Mathematics 76.5 (2016): 2099-2122.

Parameters
  • basis (basis object, optional (default pysensors.basis.Identity)) – Basis in which to represent the data. Default is the identity basis (i.e. raw features).

  • classifier (classifier object, optional ) – (default Linear Discriminant Analysis (LDA)) Classifier for which to optimize sensors. Must be a linear classifier with a coef_ attribute and fit and predict methods.

  • n_sensors (positive integer, optional (default None)) – Number of sensor locations to be used after fitting. If n_sensors is not None then it overrides the threshold parameter. If set to 0, then classifier will be replaced with a dummy classifier which predicts the class randomly.

  • threshold (nonnegative float, optional (default None)) –

    Threshold for selecting sensors. Overriden by n_sensors. If both threshold and n_sensors are None when the fit method is called, then the threshold will be set to

    \[\frac{\|s\|_F}{2rc}\]

    where \(s\) is a sensor coefficient matrix, \(r\) is the number of basis modes, and \(c\) is the number of distinct classes, as suggested in Brunton et al. (2016).

  • l1_penalty (nonnegative float, optional (default 0.1)) – The L1 penalty term used to form the sensor coefficient matrix, s. Larger values will result in a sparser s and fewer selected sensors. This parameter is ignored for binary classification problems.

Attributes
  • n_basis_modes (nonnegative integer) – Number of basis modes to be used when deciding sensor locations.

  • basis_matrix_inverse_ (np.ndarray, shape (n_basis_modes, n_input_features)) – The inverse of the matrix of basis vectors.

  • sensor_coef_ (np.ndarray, shape (n_input_features, n_classes)) – The sensor coefficient matrix, s.

  • sparse_sensors_ (np.ndarray, shape (n_sensors, )) – The selected sensors.

Examples

>>> from sklearn.metrics import accuracy_score
>>> from sklearn.datasets import make_classification
>>> from pysensors.classification import SSPOC
>>>
>>> x, y = make_classification(n_classes=3, n_informative=3, random_state=10)
>>>
>>> model = SSPOC(n_sensors=10, l1_penalty=0.03)
>>> model.fit(x, y, quiet=True)
SSPOC(basis=Identity(n_basis_modes=100),
      classifier=LinearDiscriminantAnalysis(), l1_penalty=0.03, n_sensors=10)
>>> print(model.selected_sensors)
[10 13  6 19 17 16 15 14 12 11]
>>>
>>> acc = accuracy_score(y, model.predict(x[:, model.selected_sensors]))
>>> print("Accuracy:", acc)
Accuracy: 0.66
>>>
>>> model.update_sensors(n_sensors=5, xy=(x, y), quiet=True)
>>> print(model.selected_sensors)
[10 13  6 19 17]
>>>
>>> acc = accuracy_score(y, model.predict(x[:, model.selected_sensors]))
>>> print("Accuracy:", acc)
Accuracy: 0.6
fit(x, y, quiet=False, prefit_basis=False, refit=True, **optimizer_kws)[source]

Fit the SSPOC model, determining which sensors are relevant.

Parameters
  • x (array-like, shape (n_samples, n_input_features)) – Training data.

  • y (array-like, shape (n_samples,)) – Training labels.

  • quiet (boolean, optional (default False)) – Whether or not to suppress warnings during fitting.

  • prefit_basis (boolean, optional (default False)) – Whether or not the basis has already been fit to x. For example, you may have already fit and experimented with a SVD object to determine the optimal number of modes. This option allows you to avoid an unnecessary SVD.

  • refit (boolean, optional (default True)) – Whether or not to refit the classifier using measurements only from the learned sensor locations.

  • optimizer_kws (dict, optional) – Keyword arguments to be passed to the optimization routine.

Returns

self

Return type

a fitted SSPOC instance

predict(x)[source]

Predict classes for given measurements. If self.n_sensors is 0 then a dummy classifier is used in place of self.classifier.

Parameters

x (array-like, shape (n_samples, n_sensors) or (n_samples, n_features)) – Examples to be classified. The measurements should be taken at the sensor locations specified by self.selected_sensors.

Returns

y – Predicted classes.

Return type

np.ndarray, shape (n_samples,)

update_sensors(n_sensors=None, threshold=None, xy=None, quiet=False, method=<function amax>, **method_kws)[source]

Update the selected sensors by changing either the preferred number of sensors or the threshold used to select the sensors, refitting the classifier afterwards, if possible.

Parameters
  • n_sensors (nonnegative integer, optional (default None)) – The number of sensor locations to select. If None, then threshold will be used to pick the sensors. Note that n_sensors and threshold cannot both be None.

  • threshold (nonnegative float, optional (default None)) – The threshold to use to select sensors based on the magnitudes of entries in self.sensor_coef_ (s). Overridden by n_sensors. Note that n_sensors and threshold cannot both be None.

  • xy (tuple of np.ndarray, length 2, optional (default None)) – Tuple containing training data x and labels y for refitting. x should have shape (n_samples, n_input_features) and y shape (n_samples, ). If not None, the classifier will be refit after the new sensors have been selected.

  • quiet (boolean, optional (default False)) – Whether to silence warnings.

  • method (callable, optional (default np.max)) – Function used along with threshold to select sensors. For binary classification problems one need not specify a method. For multiclass classification problems, sensor_coef_ (s) has multiple columns and method is applied along each row to aggregate coefficients for thresholding, i.e. method is called as follows method(np.abs(self.sensor_coef_), axis=1, **method_kws). Other examples of acceptable methods are np.min, np.mean, and np.median.

  • **method_kws (dict, optional) – Keyword arguments to be passed into method when it is called.

update_n_basis_modes(n_basis_modes, xy, **fit_kws)[source]

Re-fit the SSPOC object using a different value of n_basis_modes.

This method allows one to relearn sensor locations for a different number of basis modes _without_ re-fitting the basis in many cases. Specifically, if n_basis_modes <= self.basis.n_basis_modes then the basis does not need to be refit. Otherwise this function does not save any computational resources.

Parameters
  • n_basis_modes (positive int, optional (default None)) – Number of basis modes to be used during fit. Must be less than or equal to n_samples.

  • xy (tuple of np.ndarray, length 2) – Tuple containing training data x and labels y for refitting. x should have shape (n_samples, n_input_features) and y shape (n_samples, ).

  • **fit_kws (dict, optional) – Keyword arguments to pass to SSPOC.fit.

property selected_sensors

Get the indices of the selected sensors.

Returns

sensors – Indices of the selected sensors.

Return type

numpy array, shape (n_sensors,)

get_selected_sensors()[source]

Convenience function for getting indices of the selected sensors.

Returns

sensors – Indices of the selected sensors.

Return type

numpy array, shape (n_sensors,)