Slicefinder

class sliceline.Slicefinder(alpha: float = 0.6, k: int = 1, max_l: int = 4, min_sup: int | float = 10, verbose: bool = True)

Slicefinder class.

SliceLine is a fast, linear-algebra-based slice finding for ML Model Debugging.

Given an input dataset (X) and a model error vector (errors), SliceLine finds the k slices in X that identify where the model performs significantly worse. A slice is a subspace of X defined by one or more predicates. The maximal dimension of this subspace is controlled by max_l.

The slice scoring function is the linear combination of two objectives:
  • Find sufficiently large slices, with more than min_sup elements (high impact on the overall model)

  • With substantial errors (high negative impact on sub-group/model)

The importance of each objective is controlled through a single parameter alpha.

Slice enumeration and pruning techniques are done via sparse linear algebra.

Parameters

alpha: float, default=0.6

Weight parameter for the importance of the average slice error. 0 < alpha <= 1.

k: int, default=1

Maximum number of slices to return. Note: in case of equality between k-th slice score and the following ones, all those slices are returned, leading to _n_features_out slices returned. (_n_features_out >= k)

max_l: int, default=4

Maximum lattice level. In other words: the maximum number of predicate to define a slice.

min_sup: int or float, default=10

Minimum support threshold. Inspired by frequent itemset mining, it ensures statistical significance. If min_sup is a float (0 < min_sup < 1),

it represents the faction of the input dataset (X).

verbose: bool, default=True

Controls the verbosity.

Attributes

top_slices_: np.ndarray of shape (_n_features_out, number of columns of the input dataset)

The _n_features_out slices with the highest score. None values in slices represent unused column in the slice.

average_error_: float

Mean value of the input error.

top_slices_statistics_: list of dict of length len(top_slices_)

The statistics of the slices found sorted by slice’s scores. For each slice, the following statistics are stored:

  • slice_score: the score of the slice (defined in _score method)

  • sum_slice_error: the sum of all the errors in the slice

  • max_slice_error: the maximum of all errors in the slice

  • slice_size: the number of elements in the slice

  • slice_average_error: the average error in the slice (sum_slice_error / slice_size)

References

SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging, from Svetlana Sagadeeva and Matthias Boehm of Graz University of Technology.