Performance
Performance(
scores: np.ndarray, get_ci: bool = False, method: str = 'percentile',
task_bootstrap: bool = False, reps: int = 50000,
confidence_interval_size: float = 0.95,
random_state: Optional[random.RandomState] = None
)
Evaluate the performance of an algorithm. Based on: https://github.com/google-research/rliable/blob/master/rliable/metrics.py
Args
- scores (np.ndarray) : A matrix of size (
num_runs
xnum_tasks
) where scores[n][m] represent the score on runn
of taskm
. - get_ci (bool) : Compute CIs or not.
- method (str) : One of
basic
,percentile
,bc
(identical todebiased
,bias-corrected
), orbca
. - task_bootstrap (bool) : Whether to perform bootstrapping over tasks in addition to
runs. Defaults to False. See
StratifiedBoostrap
for more details. - reps (int) : Number of bootstrap replications.
- confidence_interval_size (float) : Coverage of confidence interval.
- random_state (int) : If specified, ensures reproducibility in uncertainty estimates.
Returns
Performance evaluator.
Methods:
.aggregate_mean
Computes mean of sample mean scores per task.
.aggregate_median
Computes median of sample mean scores per task.
.aggregate_og
Computes optimality gap across all runs and tasks.
Args
- gamma (float) : Threshold for optimality gap. All scores above
gamma
are clipped togamma
.
Returns
Optimality gap at threshold gamma
.
.aggregate_iqm
Computes the interquartile mean across runs and tasks.
.get_interval_estimates
Computes interval estimation of the above performance evaluators.
Args
- scores (np.ndarray) : A matrix of size (
num_runs
xnum_tasks
) where scores[n][m] represent the score on runn
of taskm
. - metric (Callable) : One of the above performance evaluators used for estimation.
Returns
Confidence intervals.
.create_performance_profile
.create_performance_profile(
tau_list: Union[List[float], np.ndarray], use_score_distribution: bool = True
)
Method for calculating performance profilies.
Args
- tau_list (Union[List[float], np.ndarray]) : List of 1D numpy array of threshold values on which the profile is evaluated.
- use_score_distribution (bool) : Whether to report score distributions or average score distributions.
Returns
Point and interval estimates of profiles evaluated at all thresholds in 'tau_list'.