actableai.classification package

Submodules

actableai.classification.config module

actableai.classification.cross_validation module

class actableai.classification.cross_validation.AverageEnsembleClassifier(predictors)

Bases: object

predict(X) pandas.core.series.Series

Predicts the class for each sample in X. :param X: DataFrame with features.

Returns
Predicted class for each sample.
Return type
pd.Series
predict_proba(X, *args, **kwargs)

Predict probabilities for each predictor for each class for each sample.

Parameters
X – DataFrame with features.
Returns
List of probabilities for each predictor for each class
Return type
List[np.ndarray]
unpersist_models()

Unpersists all models in the ensemble.

actableai.classification.cross_validation.run_cross_validation(classification_train_task: actableai.tasks.classification._AAIClassificationTrainTask, problem_type: str, explain_samples: bool, positive_label: Optional[str], presets: str, hyperparameters: dict, model_directory: str, target: str, features: list, run_model: bool, df_train: pandas.core.frame.DataFrame, df_test: pandas.core.frame.DataFrame, kfolds: int, cross_validation_max_concurrency: int, drop_duplicates: bool, run_debiasing: bool, biased_groups: list, debiased_features: list, residuals_hyperparameters: Optional[dict], num_gpus: int, eval_metric: str, time_limit: Optional[int], drop_unique: bool, drop_useless_features: bool, feature_prune: bool, feature_prune_time_limit: Optional[float], tabpfn_model_directory: Optional[str], num_trials: int, infer_limit: float, infer_limit_batch_size: int) Tuple[actableai.classification.cross_validation.AverageEnsembleClassifier, list, dict, List[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Runs a cross validation for a classification task.

Parameters
  • classification_train_task – The classification task to run cross validation on.
  • problem_type (str) – The problem type. Can be either ‘binary’ or ‘multiclass’.
  • explain_samples (bool) – Explaining the samples.
  • positive_label (str) – The positive label. Only used if problem_type is ‘binary’.
  • presets (dict) – The presets to use for AutoGluon. See https://auto.gluon.ai/stable/api/autogluon.task.html#autogluon.tabular.TabularPredictor.fit for more information.
  • hyperparameters – The hyperparameters to use for AutoGluon. See https://auto.gluon.ai/stable/api/autogluon.task.html#autogluon.tabular.TabularPredictor.fit for more information.
  • model_directory – The directory to store the models.
  • target – The target column.
  • features – The features columns used for training/prediction.
  • run_model – If True, classification models run predictions on unseen values.
  • df_train – The input dataframe.
  • df_test – Testing data.
  • kfolds – The number of folds to use for cross validation.
  • cross_validation_max_concurrency – The maximum number of concurrent processes to use for cross validation.
  • drop_duplicates – Whether to drop duplicates.
  • run_debiasing – Whether to run debiasing.
  • biased_groups – The groups introducing bias.
  • debiased_features – The features to debias.
  • residuals_hyperparameters – The hyperparameters to use for the debiasing model.
  • num_gpus (int) – The number of GPUs to use.
  • eval_metric – Metric to be optimized for.
  • feature_prune – Whether to prune features.
  • feature_prune_time_limit – The time limit for feature pruning. (in seconds)
  • tabpfn_model_directory – TabPFN Model Directory.
  • num_trials – The number of trials for hyperparameter optimization
  • infer_limit – The time in seconds to predict 1 row of data. For example, infer_limit=0.05 means 50 ms per row of data, or 20 rows / second throughput.
  • infer_limit_batch_size – The amount of rows passed at once to be predicted when calculating per-row speed. This is very important because infer_limit_batch_size=1 (online-inference) is highly suboptimal as various operations have a fixed cost overhead regardless of data size. If you can pass your test data in bulk, you should specify infer_limit_batch_size=10000. Must be an integer greater than 0.
Returns

Result of the cross validation.
  • AverageEnsembleClassifier: The average ensemble classifier.
  • list: The feature importances.
  • dict: The evaluation metrics.
  • list: Probabilities of the predicted classes.
  • pd.DataFrame: The training dataframe.
  • pd.DataFrame: The test dataframe.

Return type

Tuple

actableai.classification.model module

class actableai.classification.model.ClassificationInference(model: actableai.models.inference.ModelType)

Bases: actableai.models.autogluon.base.AAIAutogluonTabularInference[actableai.classification.model.ClassificationModel, actableai.classification.model.ClassificationMetadata]

class actableai.classification.model.ClassificationMetadata(*, features: List[str], feature_parameters: Dict[str, Any], problem_type: Literal['binary', 'multiclass'], prediction_target: str, is_explainer_available: bool, intervened_column: Optional[str] = None, discrete_treatment: Optional[str] = None, class_labels: List[str])

Bases: actableai.models.autogluon.base.AAIAutogluonTabularMetadata

class_labels: List[str]
problem_type: Literal['binary', 'multiclass']
class actableai.classification.model.ClassificationModel(autogluon_model, df_training: pandas.core.frame.DataFrame, explanation_model=None, intervention_model=None)

Bases: actableai.models.autogluon.base.AAIAutogluonTabularModel

predict_from_proba(df_proba: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
set_positive_label_index(positive_label_index: int) None
set_probability_threshold(probability_threshold: float) None

actableai.classification.roc_curve_cross_validation module

actableai.classification.roc_curve_cross_validation.cross_validation_curve(cross_val_auc_curves: Dict, x: str = 'False Positive Rate', y: str = 'True Positive Rate', negative_label: bool = True) Dict
Computes the combined curves for ROC and Precision-Recall curves when using
cross-validation.
Parameters
roc_curves_dictionnary – A dictionnary containing the ROC curves for each classifier
Returns
A dictionnary containing the combined ROC curves for each classifier

actableai.classification.utils module

actableai.classification.utils.leaderboard_cross_val(cross_val_leaderboard: List[pandas.core.frame.DataFrame]) pandas.core.frame.DataFrame

Creates a leaderboard from a list of cross validation results.

Parameters
cross_val_leaderboard – List of cross validation results.
Returns
Leaderboard.
Return type
pd.DataFrame
actableai.classification.utils.split_validation_by_datetime(df_train: pandas.core.frame.DataFrame, datetime_column: str, validation_ratio: float = 0.2) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]

Module contents