actableai.classification package¶

Subpackages¶

Submodules¶

actableai.classification.config module¶

actableai.classification.cross_validation module¶

class actableai.classification.cross_validation.AverageEnsembleClassifier(predictors)¶

Bases: object

predict(X) → pandas.core.series.Series¶

Predicts the class for each sample in X. :param X: DataFrame with features.

Returns: Predicted class for each sample.
Return type: pd.Series

predict_proba(X, *args, **kwargs)¶

Predict probabilities for each predictor for each class for each sample.

Parameters: X – DataFrame with features.
Returns: List of probabilities for each predictor for each class
Return type: List[np.ndarray]

unpersist_models()¶: Unpersists all models in the ensemble.

actableai.classification.cross_validation.run_cross_validation(classification_train_task: actableai.tasks.classification._AAIClassificationTrainTask, problem_type: str, explain_samples: bool, positive_label: Optional[str], presets: str, hyperparameters: dict, model_directory: str, target: str, features: list, run_model: bool, df_train: pandas.core.frame.DataFrame, df_test: pandas.core.frame.DataFrame, kfolds: int, cross_validation_max_concurrency: int, drop_duplicates: bool, run_debiasing: bool, biased_groups: list, debiased_features: list, residuals_hyperparameters: Optional[dict], num_gpus: int, eval_metric: str, time_limit: Optional[int], drop_unique: bool, drop_useless_features: bool, feature_prune: bool, feature_prune_time_limit: Optional[float], tabpfn_model_directory: Optional[str], num_trials: int, infer_limit: float, infer_limit_batch_size: int) → Tuple[actableai.classification.cross_validation.AverageEnsembleClassifier, list, dict, List[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶

Runs a cross validation for a classification task.

Parameters

classification_train_task – The classification task to run cross validation on.
problem_type (str) – The problem type. Can be either ‘binary’ or ‘multiclass’.
explain_samples (bool) – Explaining the samples.
positive_label (str) – The positive label. Only used if problem_type is ‘binary’.
presets (dict) – The presets to use for AutoGluon. See https://auto.gluon.ai/stable/api/autogluon.task.html#autogluon.tabular.TabularPredictor.fit for more information.
hyperparameters – The hyperparameters to use for AutoGluon. See https://auto.gluon.ai/stable/api/autogluon.task.html#autogluon.tabular.TabularPredictor.fit for more information.
model_directory – The directory to store the models.
target – The target column.
features – The features columns used for training/prediction.
run_model – If True, classification models run predictions on unseen values.
df_train – The input dataframe.
df_test – Testing data.
kfolds – The number of folds to use for cross validation.
cross_validation_max_concurrency – The maximum number of concurrent processes to use for cross validation.
drop_duplicates – Whether to drop duplicates.
run_debiasing – Whether to run debiasing.
biased_groups – The groups introducing bias.
debiased_features – The features to debias.
residuals_hyperparameters – The hyperparameters to use for the debiasing model.
num_gpus (int) – The number of GPUs to use.
eval_metric – Metric to be optimized for.
feature_prune – Whether to prune features.
feature_prune_time_limit – The time limit for feature pruning. (in seconds)
tabpfn_model_directory – TabPFN Model Directory.
num_trials – The number of trials for hyperparameter optimization
infer_limit – The time in seconds to predict 1 row of data. For example, infer_limit=0.05 means 50 ms per row of data, or 20 rows / second throughput.
infer_limit_batch_size – The amount of rows passed at once to be predicted when calculating per-row speed. This is very important because infer_limit_batch_size=1 (online-inference) is highly suboptimal as various operations have a fixed cost overhead regardless of data size. If you can pass your test data in bulk, you should specify infer_limit_batch_size=10000. Must be an integer greater than 0.

Returns

Result of the cross validation.

AverageEnsembleClassifier: The average ensemble classifier.
list: The feature importances.
dict: The evaluation metrics.
list: Probabilities of the predicted classes.
pd.DataFrame: The training dataframe.
pd.DataFrame: The test dataframe.

Return type

Tuple

actableai.classification.model module¶

class actableai.classification.model.ClassificationInference(model: actableai.models.inference.ModelType)¶: Bases: actableai.models.autogluon.base.AAIAutogluonTabularInference[actableai.classification.model.ClassificationModel, actableai.classification.model.ClassificationMetadata]

class actableai.classification.model.ClassificationMetadata(*, features: List[str], feature_parameters: Dict[str, Any], problem_type: Literal['binary', 'multiclass'], prediction_target: str, is_explainer_available: bool, intervened_column: Optional[str] = None, discrete_treatment: Optional[str] = None, class_labels: List[str])¶

Bases: actableai.models.autogluon.base.AAIAutogluonTabularMetadata

class_labels: List[str]¶

problem_type: Literal['binary', 'multiclass']¶

class actableai.classification.model.ClassificationModel(autogluon_model, df_training: pandas.core.frame.DataFrame, explanation_model=None, intervention_model=None)¶

Bases: actableai.models.autogluon.base.AAIAutogluonTabularModel

predict_from_proba(df_proba: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶

set_positive_label_index(positive_label_index: int) → None¶

set_probability_threshold(probability_threshold: float) → None¶

actableai.classification.roc_curve_cross_validation module¶

actableai.classification.roc_curve_cross_validation.cross_validation_curve(cross_val_auc_curves: Dict, x: str = 'False Positive Rate', y: str = 'True Positive Rate', negative_label: bool = True) → Dict¶

Computes the combined curves for ROC and Precision-Recall curves when using: cross-validation.

Parameters: roc_curves_dictionnary – A dictionnary containing the ROC curves for each classifier
Returns: A dictionnary containing the combined ROC curves for each classifier

actableai.classification.utils module¶

actableai.classification.utils.leaderboard_cross_val(cross_val_leaderboard: List[pandas.core.frame.DataFrame]) → pandas.core.frame.DataFrame¶

Creates a leaderboard from a list of cross validation results.

Parameters: cross_val_leaderboard – List of cross validation results.
Returns: Leaderboard.
Return type: pd.DataFrame

actableai.classification.utils.split_validation_by_datetime(df_train: pandas.core.frame.DataFrame, datetime_column: str, validation_ratio: float = 0.2) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶

Contents

This Page