Comparison
Comparison(
scores_x: np.ndarray, scores_y: np.ndarray, get_ci: bool = False,
method: str = 'percentile', reps: int = 2000, confidence_interval_size: float = 0.95,
random_state: Optional[random.RandomState] = None
)
Compare the performance between algorithms. Based on: https://github.com/google-research/rliable/blob/master/rliable/metrics.py
Args
- scores_x (np.ndarray) : A matrix of size (
num_runs_x
xnum_tasks
) where scores[n][m] represent the score on runn
of taskm
for algorithmX
. - scores_y (np.ndarray) : A matrix of size (
num_runs_y
xnum_tasks
) where scores[n][m] represent the score on runn
of taskm
for algorithmY
. - get_ci (bool) : Compute CIs or not.
- method (str) : One of
basic
,percentile
,bc
(identical todebiased
,bias-corrected
), orbca
. - reps (int) : Number of bootstrap replications.
- confidence_interval_size (float) : Coverage of confidence interval.
- random_state (int) : If specified, ensures reproducibility in uncertainty estimates.
Returns
Comparer instance.
Methods:
.compute_poi
Compute the overall probability of imporvement of algorithm X
over Y
.
.get_interval_estimates
Computes interval estimation of the above performance evaluators.
Args
- scores_x (np.ndarray) : A matrix of size (
num_runs_x
xnum_tasks
) where scores[n][m] represent the score on runn
of taskm
for algorithmX
. - scores_y (np.ndarray) : A matrix of size (
num_runs_y
xnum_tasks
) where scores[n][m] represent the score on runn
of taskm
for algorithmY
. - metric (Callable) : One of the above performance evaluators used for estimation.
Returns
Confidence intervals.