actableai.tasks package¶

Subpackages¶

actableai.tasks.tests package

Submodules¶

actableai.tasks.association_rules module¶

class actableai.tasks.association_rules.AAIAssociationRulesTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

run(df: pandas.core.frame.DataFrame, group_by: List[str], items: str, frequent_method: str = 'fpgrowth', min_support: float = 0.5, association_metric: str = 'confidence', min_association_metric: float = 0.5, graph_top_k: int = 10) → Dict¶

Generate association rules from a dataframe.

Parameters

df – Input dataframe.
group_by – List of columns to group by. (e.g. order_id or customer_id)
items – Column name of items. (e.g. product_id or product_name)
frequent_method – Frequent method to use. Available options are [“fpgrowth”, “fpmax”, “apriori”]
min_support – Minimum support threshold for itemsets generation.
association_metric – Association metric used for association rules generation. Available options are [“support”, “confidence”, “lift”, “leverage”, “conviction”]
min_association_metric – Minimum value for significance of association.
graph_top_k – Maximum number of nodes to display on association graph.

Examples

>>> import pandas as pd
>>> from actableai.tasks.association_rules import AssociationRulesTask
>>> df = pd.read_csv("path/to/data.csv")
>>> result = AssociationRulesTask().run(
...     df,
...     group_by=["order_id", "customer_id"],
...     items="product_id",
... )
>>> result["association_rules"]

Returns

Dictionnary containing the results of the task.

”status”: “SUCCESS” if the task successfully ran else “FAILURE”
”data”: Dictionnary containing the data of the task.
- ”rules”: List of association rules.
- ”frequent_itemset”: Frequent itemsets.
- ”df_list”: List of associated items for each group_by.
- ”graph”: Association graph.
- ”association_metric”: Association metric used for association
  rules generation.
- ”association_rules_chord”: Association rules chord diagram.
”validations”: List of validations on the data,
non-empty if the data presents a problem for the task
”runtime”: Time taken to run the task.

Return type

Dict

actableai.tasks.autogluon module¶

class actableai.tasks.autogluon.AAIAutogluonTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITunableTask, abc.ABC

static get_available_models(problem_type: str, explain_samples: bool, gpu: bool = False, ag_automm_enabled: bool = False, tabpfn_enabled: bool = False, causal_inference: bool = False) → List[actableai.models.autogluon.params.base.Model]¶

Get list of available models for the given problem type.

Parameters

problem_type – The type of the problem (‘regression’ or ‘quantile’)
explain_samples – Boolean indicating if explanations for predictions in test and validation will be generated.
gpu – If GPU is available. If False, ‘CPU’ is used, otherwise ‘GPU’ is used. Used to filter out any models which can only run on the GPU and the GPU is unavailable.
ag_automm_enabled – Boolean indicating if AG_AUTOMM model should be used
tabpfn_enabled – Boolean indicating if TabPFN model should be used
causal_inference – Whether causal inference is used

Returns

List of available models

classmethod get_base_hyperparameters_space(num_class: int, dataset_len: int, problem_type: str, device: str = 'cpu', explain_samples: bool = False, ag_automm_enabled: bool = False, tabpfn_enabled: bool = False, causal_inference: bool = False) → actableai.parameters.models.OptionsSpace[Parameters]¶

Return the hyperparameters space of the task.

Parameters

df – DataFrame containing the features
num_class – The number of classes in the target column (‘-1’ can be used for regression which does not use classes)
dataset_len – The length of the dataset
problem_type – The type of the problem (‘regression’/’quantile’/’multiclass’/’binary’)
device – Which device is being used, can be one of ‘cpu’ or ‘gpu’.
explain_samples – Boolean indicating if explanations for predictions in test and validation will be generated.
ag_automm_enabled – Boolean indicating if AG_AUTOMM model should be used.
tabpfn_enabled – Boolean indicating if TabPFN model should be used.
causal_inference – Boolean indicating if causal inference is being performed.

Returns

Default models and settings. options: Display name and hyperparameters of the available models

Return type

default

actableai.tasks.base module¶

class actableai.tasks.base.AAITask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: abc.ABC

Base abstract class to represent a Actable AI Task

classmethod get_parameters() → actableai.parameters.parameters.Parameters¶

abstract run(*args, **kwargs)¶: Abstract method called to run the task

static run_with_ray_remote(task: actableai.tasks.TaskType) → Callable¶

Method to run a specific task with ray remote (used as a decorator)

Parameters: task – The task type that will be run
Returns: The decorator

class actableai.tasks.base.AAITunableTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask, abc.ABC

Base abstract class to represent a Tunable Actable AI Task.

abstract static get_hyperparameters_space(*args, **kwargs) → actableai.parameters.models.OptionsSpace[Parameters]¶

Return the hyperparameters space oof the task.

Returns: Hyperparameters space represented as a ModelSpace.

actableai.tasks.bayesian_regression module¶

class actableai.tasks.bayesian_regression.AAIBayesianRegressionTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

run(df: pandas.core.frame.DataFrame, features: List[str], target: str, priors: Optional[Dict] = None, prediction_quantile_low: int = 5, prediction_quantile_high: int = 95, trials: int = 1, polynomial_degree: int = 1, validation_split: int = 20, pdf_steps: int = 100, predict_steps: int = 100, normalize: bool = False) → Dict¶

A task to run a Bayesian Regression on features w.r.t to target

Parameters

df – DataFrame containing the features and target. Additional columns are ignored.
features – List of features/columns to use for the Bayesian Regression
target – Target column for the Bayesian Regression
priors – Prior probabilty distribution of features.
prediction_quantile_low – Quantile for lowest point on prediction.
prediction_quantile_high – Quantile for highest point on prediction.
trials – Number of trials for tuning, best model is used for prediction.
polynomial_degree – Value for generating maximum polynomial features and cross-intersection features, higher values means better results but uses more memory.
validation_split – Percentage of the data used for validation.
pdf_steps – Number of steps for probability density function.
predict_steps – Number of predicted values.
normalize – If the generated features should be normalized, useful for big polynomial degrees.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> result = AAIBayesianRegressionTask(
...     df,
...     ["feature1", "feature2", "feature3"],
...     "target"
>>> )
>>> result

Raises

ValueError – Categorical exponents are not raised to any exponents. ValueError if a prior is looking for an exponentiated value of a categorical feature.

Returns

Dictionary containing the results of the task.

”status”: “SUCCESS” if the task successfully ran else “FAILURE”
”messenger”: Message returned with the task
”data”: Dictionary containing the data of the task
- ”rules”: List of association rules
- ”frequent_itemset”: Frequent itemsets
- ”df_list”: List of associated items for each group_by
- ”graph”: Association graph in dot format
- ”association_metric”: Association metric used for association
  rules generation
- ”association_rules_chord”: Association rules chord diagram
- ”coeffs”: Coefficients of the Regression model,
- ”intercept”: Intercept of the Regression model,
- ”sigma”: Sigmas of the re Regression model,
- ”best_config”: Best usable model,
- ”evaluation”: r2 and MSE metrics of the trained model
”validations”: List of validations on the data,
non-empty if the data presents a problem for the task
”runtime”: Time taken to run the task

Return type

Dict

actableai.tasks.causal_discovery module¶

class actableai.tasks.causal_discovery.AAICausalDiscoveryTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

run(algo: str, payload: actableai.causal.discover.algorithms.payloads.CausalDiscoveryPayload, progress_callback: Optional[Callable] = None) → Dict¶

Run a causal discovery algorithm.

Parameters

algo (str) – The name of the algorithm to run. Must be either “deci”, “notears”, “direct-lingamp” or “pc”.
payload (CausalDiscoveryPayload) – The payload to use for the algorithm. Use actableai.causal.discovery.algorithms.deci.DeciPayload for “deci”, actableai.causal.discovery.algorithms.notears.NotearsPayload for “notears”, actableai.causal.discovery.algorithms.direct_lingam.DirectLiNGAMPayload for “direct-lingam” and actableai.causal.discovery.algorithms.pc.PCPayload for “pc”.
progress_callback (Union[Callable, None], optional) – A callback to use for progress reporting. Defaults to None.

Returns

The causal graph produced by the algorithm.

Return type

CausalGraph

Raises

ValueError – If the algorithm is not supported.

actableai.tasks.causal_inference module¶

class actableai.tasks.causal_inference.AAICausalInferenceTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

static get_estimator_parameters(Y: numpy.ndarray, T: numpy.ndarray, has_categorical_treatment: bool, has_binary_outcome: bool, is_single_binary_treatment: bool, common_causes_and_effect_modifiers: List) → pydantic.generics.OptionsParameter[Parameters]¶

TODO: Finalise Documentation

Parameters

Y – (n × d_y) matrix or vector of length n. Outcomes for each sample
T – (n × d_t) matrix or vector of length n. Treatments for each sample
has_categorical_treatment – Whether the treatment is categorical.
has_binary_outcome – Whether the outcome is binary.
is_single_binary_treatment – Whether there is only one treatment, and it is binary.
common_causes_and_effect_modifiers – List of common causes and effect modifiers.

Returns

The parameter options space of the estimator

classmethod get_hyperparameters(pd_table: pandas.core.frame.DataFrame, treatments: List, outcomes: List, device: str, dataset_len: Union[int, None, str] = 'auto', num_class: Union[int, None, str] = 'auto', has_categorical_treatment: Union[bool, None, str] = 'auto', is_single_binary_outcome: Union[bool, None, str] = 'auto', effect_modifiers: Optional[List] = [], common_causes: Optional[List] = [], log_treatment: Optional[bool] = False, log_outcome: Optional[bool] = False, positive_outcome_value=None) → actableai.parameters.parameters.Parameters¶

Get hyperparameters of the outcomes model and treatments model (for Double ML estimators)

Parameters

pd_table – Dataset for the causal analysis
treatments – treatment variable(s)
outcomes – outcome variables
effect_modifiers – list of effect modifiers (X) for CATE estimation. Defaults to [].
common_causes – list of common causes (W). Defaults to [].
device – The device to use (‘cpu’ or ‘gpu’)
dataset_len – The length of the dataset (Optional)
num_class – The number of classes for the outcome (Optional)
has_categorical_treatment – Whether the treatment is categorical (Optional)
is_single_binary_outcome – Whether the outcome is binary (Optional)
log_treatment – flag to indicate whether log transform is to be applied to treatment
log_outcome – flag to indicate whether log transform is to be applied to outcome
positive_outcome_value – If not None, target is converted into 0, 1 where 1 is when original target is equal to positive_outcome_value else 0.

Returns

Parameters of the outcomes model and treatments model

static get_hyperparameters_t(has_categorical_treatment: bool, pd_table: pandas.core.frame.DataFrame, device: str, label_t: str, dataset_len: Union[int, None, str] = 'auto', num_class: Union[int, None, str] = 'auto')¶

Get hyperparameters for treatments model

Parameters

has_categorical_treatment – has_categorical_treatment: Whether the treatment is categorical.
pd_table – Pandas DataFrame containing the data
device – Device to use (‘cpu’ or ‘gpu’)
label_t – Label of the target column in pd_table
dataset_len – The length of the dataset (Optional)
num_class – The number of classes for the outcome (Optional)

Returns

The hyperparameters space task_t: The regression/classification task

Return type

hyperparameters_space_t

static get_hyperparameters_y(is_single_binary_outcome: bool, pd_table: pandas.core.frame.DataFrame, device: str, dataset_len: Union[int, None, str] = 'auto')¶

Get hyperparameters for outcomes model

Parameters

is_single_binary_outcome – Whether the outcome is binary.
pd_table – Pandas DataFrame containing the data
device – Device to use (‘cpu’ or ‘gpu’)
dataset_len – The length of the dataset (Optional)

Returns

The hyperparameters space task_y: The regression/classification task

Return type

hyperparameters_space_y

classmethod get_parameters(pd_table: pandas.core.frame.DataFrame, treatments: List, outcomes: List, Y: Union[numpy.ndarray, None, str] = 'auto', T: Union[numpy.ndarray, None, str] = 'auto', has_categorical_treatment: Union[bool, None, str] = 'auto', is_single_binary_outcome: Union[bool, None, str] = 'auto', is_single_binary_treatment: Union[bool, None, str] = 'auto', effect_modifiers: Optional[List] = [], common_causes: Optional[List] = [], log_treatment: Optional[bool] = False, log_outcome: Optional[bool] = False, positive_outcome_value=None) → actableai.parameters.parameters.Parameters¶

Get parameters of the estimator

Parameters

pd_table – Dataset for the causal analysis
treatments – treatment variable(s)
outcomes – outcome variables
Y – (n × d_y) matrix or vector of length n. Outcomes for each sample (Optional)
T – (n × d_t) matrix or vector of length n. Treatments for each sample (Optional)
has_categorical_treatment – Whether the treatment is categorical (Optional)
is_single_binary_outcome – Whether the outcome is binary (Optional)
is_single_binary_treatment – Whether there is only one treatment, and it is binary (Optional)
effect_modifiers – list of effect modifiers (X) for CATE estimation. Defaults to [].
common_causes – list of common causes (W). Defaults to [].
log_treatment – flag to indicate whether log transform is to be applied to treatment
log_outcome – flag to indicate whether log transform is to be applied to outcome
positive_outcome_value – If not None, target is converted into 0, 1 where 1 is when original target is equal to positive_outcome_value else 0.

Returns

Parameters of the estimator

run(pd_table: pandas.core.frame.DataFrame, treatments: List, outcomes: List, effect_modifiers: Optional[List] = None, common_causes: Optional[List] = None, instrumental_variables: Optional[List] = None, controls: Optional[dict] = None, positive_outcome_value=None, target_units: Optional[str] = 'ate', alpha: Optional[float] = 0.05, tree_max_depth: Optional[int] = 3, log_treatment: Optional[bool] = False, log_outcome: Optional[bool] = False, model_directory: Optional[Union[str, pathlib.Path]] = None, ag_presets: str = 'medium_quality_faster_train', model_params: Optional[List] = None, rscorer: Optional[List] = None, feature_importance: bool = False, seed: int = 123, num_gpus: Union[int, str] = 0, drop_unique: bool = True, drop_useless_features: bool = False, parameters_estimator: Optional[dict] = {}, num_trials: int = 1)¶

Causal analysis task

Parameters

pd_table – Dataset for the causal analysis
treatments – treatment variable(s)
outcomes – outcome variables
effect_modifiers – list of effect modifiers (X) for CATE estimation. Defaults to [].
common_causes – list of common causes (W). Defaults to [].
instrumental_variables – list of instrumental variables (Z). Defaults to [].
controls – dictionary of control treatment values. Keys are categorical treatment names
positive_outcome_value – If not None, target is converted into 0, 1 where 1 is when original target is equal to positive_outcome_value else 0.
target_units – Targeted used for calculating the effect. Possible values are “ate”, “att”, “atc”. Defaults to “ate”
alpha – Significance level of effect confidence interval (from 0.01 to 0.99). Defaults to 0.05
tree_max_depth – Maximum depth of CATE function’s tree interpreter. Default to 3.
log_treatment – flag to indicate whether log transform is to be applied to treatment
log_outcome – flag to indicate whether log transform is to be applied to outcome
model_directory – Where the model should be stored, if None stored in the /tmp folder
ag_presets – Presets for Autogluon Models
model_params – List of model Parameters for the AAICausalEstimator
rscorer – tune rscorer object
feature_importance – Whether the feature importance are computed
seed – Random numpy seed
num_gpus – Number of GPUs for the TabularPredictors. Can be set to an int or “auto”
drop_unique – Wether to drop columns with only unique values as preprocessing step
drop_useless_features – Whether to drop columns with only unique values at fit time
parameters_estimator –
Options for the estimator to be used. This needs to be defined in the key ‘estimator’. For DoubleML estimators, the following keys can also be specified:

’hyperparameters_t’: AutoGluon hyperparameters for the tabular
predictor model(s) trained on the treatments

’hyperparameters_y’: AutoGluon hyperparameters for the tabular
predictor model(s) trained on the outcomes
num_trials – The number of trials for hyperparameter optimization

Examples

>>> df = pd.read_csv("path/to/csv")
>>> result = infer_causal(
...    df,
...    treatments=["feature1", "feature2"],
...    outcomes=["feature3", "feature4"],
...    effect_modifiers=["feature5", "feature6"]
... )
>>> result

Returns

dictionary of estimation results

”status”: “SUCCESS” if the task successfully ran else “FAILURE”
”messenger”: Custom message from the task
”runtime”: Execution time from the task
”data”: Dictionnary containing data from the task
- ””:
- ”effect”:
- ”controls”: Transformed controls
- ”causal_graph_dot”: Feature graph in dot format
- ”tree_interpreter_dot”: Tree interpreter in dot format
- ”refutation_results”: # TODO
- ”T_res”: Treatment residuals
- ”Y_res”: Outcome residuals
- ”X”: Common causes values
- ”model_t_scores”: Scores for the treatment model
- ”model_y_scores”: Scores for the outcome model
- ”model_t_feature_importances”: Feature importances for treatment model
- ”model_y_feature_importances”: Feature importances for outcome model
”validations”: List of validations on the data,
non-empty if the data presents a problem for the task

Return type

Dict

exception actableai.tasks.causal_inference.LogCategoricalOutcomeNotAllowed¶: Bases: ValueError

exception actableai.tasks.causal_inference.LogCategoricalTreatmentNotAllowed¶: Bases: ValueError

actableai.tasks.causal_inference.convert_categorical_to_numeric(df, columns)¶

actableai.tasks.classification module¶

class actableai.tasks.classification.AAIClassificationTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.autogluon.AAIAutogluonTask

AAIClassificationTask class for classification

Parameters: AAIAutogluonTask – Base Class for every AutoGluon task

classmethod compute_problem_type(df: pandas.core.frame.DataFrame, target: str, num_class: Optional[int] = None) → str¶

Determine the problem type (‘multiclass’ or ‘binary’), using the values in the target column

Parameters

df – The dataset for which the problem type is inferred
target – Name of the target column in df
num_class – The number of classes. If None, it will be computed using the target column in df

Returns

‘multiclass’ or ‘binary’

Return type

String representation of the problem type

classmethod get_hyperparameters_space(num_class: int, dataset_len: int, device: str = 'cpu', explain_samples: bool = False, ag_automm_enabled: bool = False, tabpfn_enabled: bool = False, causal_inference: bool = False, name: str = 'classification_model_space', display_name: str = 'Classification Model Space', description: str = 'The space of available and default classification models and parameters.') → actableai.parameters.models.OptionsSpace[Parameters]¶

Return the hyperparameters space of the task.

Parameters

num_class – Number of classes in the target column.
device – Which device is being used, can be one of ‘cpu’ or ‘gpu’.
explain_samples – Boolean indicating if explanations for predictions in test and validation will be generated.
ag_automm_enabled – Boolean indicating if AG_AUTOMM model should be used.
tabpfn_enabled – Boolean indicating if TabPFN model should be used.
causal_inference – Boolean indicating if causal inference is being performed.
name – Name of the output model space.
display_name – Display name of the output model space.
description – Description of the output model space.

Returns

Hyperparameters space represented as a ModelSpace.

static get_num_class(df: pandas.core.frame.DataFrame, target: str) → int¶

Determine the number of classes of the target.

Parameters

df – The dataset for which the problem type is inferred
target – Name of the target column in df

Returns

An integer representing the number of classes of the target column.

run(df: pandas.core.frame.DataFrame, target: str, features: Optional[List[str]] = None, biased_groups: Optional[List[str]] = None, debiased_features: Optional[List[str]] = None, validation_ratio: float = 0.2, positive_label: Optional[str] = None, explain_samples: bool = False, model_directory: Optional[str] = None, presets: str = 'medium_quality_faster_train', hyperparameters: Optional[Dict] = None, train_task_params: Optional[Dict] = None, kfolds: int = 1, cross_validation_max_concurrency: int = 1, residuals_hyperparameters: Optional[Dict] = None, drop_duplicates: bool = True, num_gpus: float = 0, eval_metric: str = 'accuracy', time_limit: Optional[int] = None, drop_unique: bool = True, drop_useless_features: bool = True, split_by_datetime: bool = False, datetime_column: Optional[str] = None, ag_automm_enabled=False, refit_full=False, feature_prune=True, feature_prune_time_limit: Optional[float] = None, intervention_run_params: Optional[Dict] = None, run_pdp: bool = True, run_ice: bool = True, pdp_ice_grid_resolution: Optional[int] = 100, pdp_ice_n_samples: Optional[int] = 100, tabpfn_model_directory: Optional[str] = None, num_trials: int = 1, infer_limit: float = 60, infer_limit_batch_size: int = 100) → Dict¶

Run this classification task and return results.

Parameters

df – Input DataFrame
target – Target columns in df. If there are empty values in this columns, predictions will be generated for these rows.
features – A list of features to be used for prediction. If None, all columns except target are used as features. Defaults to None.
biased_groups – A list of columns of groups that should be protected from biases (e.g. gender, race, age). Defaults to None.
debiased_features – A list of proxy features that need to be debiased for protection of sensitive groups. Defaults to None.
validation_ratio – The ratio to randomly split data for training and validation. Defaults to 0.2.
positive_label – If target contains only 2 different value, pick the positive label by setting positive_label to one of them. Defaults to None.
explain_samples – If true, explanations for predictions in test and validation will be generated. It takes significantly longer time to run. Defaults to False.
model_directory – Directory to output the model after training. Defaults to None.
presets – Autogluon’s presets for training model. More details at https://auto.gluon.ai/stable/_modules/autogluon/tabular/predictor/predictor.html#TabularPredictor.fit. Defaults to “medium_quality_faster_train”.
hyperparameters – Autogluon’s hyperparameters. Defaults to None.
train_task_params – ?. Defaults to None.
kfolds – Number of fold for cross-validation. Defaults to 1.
cross_validation_max_concurrency – Maximum number of Ray actors used for cross validation (each actor execute for one split). Defaults to 1.
residuals_hyperparameters – Autogluon’s hyperparameteres used in final model of counterfactual predictions. Defaults to None.
drop_duplicates – Whether duplicate values should be dropped before training. Defaults to True.
num_gpus – Number of gpus used for training. Defaults to 0.
eval_metric – Metric to be optimized for. Possible values include ‘accuracy’, ‘balanced_accuracy’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_weighted’, ‘roc_auc’, ‘roc_auc_ovo_macro’, ‘average_precision’, ‘precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_weighted’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_weighted’, ‘log_loss’, ‘pac_score’. Defaults to “accuracy”.
time_limit – Time limit of training (in seconds)
drop_unique – Wether to drop columns with only unique values as preprocessing step.
drop_useless_features – Whether to drop columns with only unique values at fit time.
split_by_datetime – Whether the training/validation has to be split based on a datetime column.
datetime_column – If split_by_datetime, the column that will split training and validation, else, the parameter is ignored.
ag_automm_enabled – Whether to use autogluon multimodal model on text columns. This features makes text classification way more accurate by using text models. This feature is heavy on resources and requires GPU.
refit_full – Whether at the end of classification, a second task is launched to refit a new model on the whole dataset. This makes accuracy much better but divides the training time in half. (half for first task, other half for refitting)
feature_prune – Wether the feature_pruning is enabled or not. This option improves results but extend the training time. If there is no time left to do feature_pruning after training this step is skipped.
intervention_run_params – Parameters for running an intervention task. Check actableai/tasks/intervention.py for more details.
run_pdp – Run Partial Dependency to get Partial Dependency Plot (PDP)
run_ice – Run Independent Conditional Expectation (ICE)
pdp_ice_grid_resolution – Maximum resolution of the grid to use for computation of the PDP and/or ICE
pdp_ice_n_samples – The number of rows to sample in df_train. If ‘None, no sampling is performed.
tabpfn_model_directory – TabPFN Model Directory.
num_trials – The number of trials for hyperparameter optimization
infer_limit – The time in seconds to predict 1 row of data. For example, infer_limit=0.05 means 50 ms per row of data, or 20 rows / second throughput.
infer_limit_batch_size – The amount of rows passed at once to be predicted when calculating per-row speed. This is very important because infer_limit_batch_size=1 (online-inference) is highly suboptimal as various operations have a fixed cost overhead regardless of data size. If you can pass your test data in bulk, you should specify infer_limit_batch_size=10000. Must be an integer greater than 0.

Raises

Exception – If the target has less than 2 unique values.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> AAIClassificationTask(df, ["feature1", "feature2", "feature3"], "target")

Returns

Dictionary containing the results

”status”: “SUCCESS” if the task successfully ran else “FAILURE”
”messenger”: Message returned with the task
”validations”: List of validations on the data.
non-empty if the data presents a problem for the task
”runtime”: Execution time of the task
”data”: Dictionary containing the data for the task
- ”validation_table”: Validation table
- ”prediction_table”: Prediction table
- ”fields”: Column names of the prediction table
- ”predictData”: Prediction Table
- ”predict_shaps”: Shapley values for prediction table
- ”validation_shaps”: Shapley values for validation table
- ”exdata”: Validation Table
- ”evaluate”: Evaluation metrics on validation set
- ”importantFeatures”: Feature importance on validation set
- ”debiasing_charts”: If debiasing enabled, debiasing data to create charts
- ”leaderboard”: Leaderboard of the best model on validation
”model”: AAIModel to redeploy the model

Return type

Dict

actableai.tasks.clustering module¶

class actableai.tasks.clustering.AAIClusteringTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

Clustering Task

Parameters: AAITask – Base class for every tasks

classmethod get_parameters() → actableai.parameters.parameters.Parameters¶

run(df: pandas.core.frame.DataFrame, features: Optional[List[str]] = None, num_clusters: int = 2, drop_low_info: bool = False, explain_samples: bool = False, cluster_explain_max_depth=20, cluster_explain_min_impurity_decrease=0.001, cluster_explain_min_samples_leaf=0.001, cluster_explain_min_precision=0.8, max_train_samples: Optional[int] = None, parameters: Dict[str, Any] = None) → Dict¶

Runs a clustering analysis on df

Parameters

df – Input DataFrame
features – Features used in Input DataFrame. Defaults to None.
num_clusters – Number of different clusters assignable to each row.
drop_low_info – Wether the algorithm drops columns with only one unique value or only different categorical values accross all rows.
explain_samples – If the result contains a human readable explanation of the clustering.
init – Initialization for weights of the DEC model.
pretrain_optimizer – Optimizer for pretaining phase of autoencoder.
update_interval – The interval to check the stopping criterion and update the cluster centers.
pretrain_epochs – Number of epochs for pretraining DEC.
alpha_k – The factor to control the penalty term of the number of clusters.
max_train_samples – Number of randomly selected rows to train the DEC.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> result = AAIClusteringTask().run(
...     df,
...     ["feature1", "feature2", "feature3"]
... )
>>> result

Returns

Dictionnary containing the result

”status”: “SUCCESS” if the task successfully ran else “FAILURE”
”messenger”: Message returned with the task
”data”: Dictionary containing the data for the clustering task
- ”cluster_id”: ID of the generated cluster
- ”explanation”: Explanation for the points for this cluster
- ”encoded_value”: Encoded value for centroid for this cluster
- ”projected_value”: Projected centroid for this cluster
- ”projected_nearest_point”: Nearest point for the centroid
”data_v2”: Updated dictionary containing the data for the clustering task
- ”clusters”: Same dictionary as data
- ”shap_values”: Shapley values for clustering
”runtime”: Time taken to run the task
”validations”: List of validations on the data,
non-empty if the data presents a problem for the task

Return type

Dict

actableai.tasks.correlation module¶

class actableai.tasks.correlation.AAICorrelationTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

Correlation Task

Parameters: AAITask – Base Class for tasks

run(df: pandas.core.frame.DataFrame, target_column: str, target_value: Optional[str] = None, kde_steps: int = 100, lr_steps: int = 100, control_columns: Optional[List[str]] = None, control_values: Optional[List[str]] = None, correlation_threshold: float = 0.05, p_value: float = 0.05, use_bonferroni: bool = False, top_k: int = 20) → Dict¶

Runs a correlation analysis on Input DataFrame

Parameters

df – Input DataFrame
target_column – Target for correlation analysis
target_value – If target_column type is categorical, target_value must be one value of target_column. Else should be None. Defaults to None.
kde_steps – Number of steps for kernel density graph. Defaults to 100.
lr_steps – Number of steps for linear regression graph. Defaults to 100.
control_columns – Control columns for decorrelations. Defaults to None.
control_values – Control values for decorrelations. control_values[i] must be a value from df[control_columns[i]]. Defaults to None.
correlation_threshold – Threshold for correlation validation. Values with an absolute value above this threshold are considered correlated. Defaults to 0.05.
p_value – PValue for correlation validation. Defaults to 0.05.
use_bonferroni (bool, optional) – Whether we should use bonferroni test. Defaults to False.
top_k – Limit for number of results returned. Only the best k correlated columns are returned. Defaults to 20.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> result = AAICorrelationTask().run(
...     df,
...     ["feature1", "feature2", "feature3"],
...     "target"
... )

Returns

Dictionnary containing the results

”status”: “SUCCESS” if the task successfully ran else “FAILURE”
”messenger”: Message returned with the task
”data”: Dictionary containing the data for the clustering task
- ”corrs”: Correlation values for each feature
- ”charts”: Dictionnary containing the charts for correlations
”runtime”: Time taken to run the task
”validations”: List of validations on the data,
non-empty if the data presents a problem for the task

Return type

Dict

actableai.tasks.data_imputation module¶

class actableai.tasks.data_imputation.AAIDataImputationTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

run(df, rules: Tuple[str, str] = ('', ''), impute_nulls: bool = True, override_column_types: Dict = {}) → Dict¶

Impute the DataFrame df

Parameters

df – DataFrame to impute
rules – Set of rules for imputation. Defaults to (“”, “”).
impute_nulls – Whether null, None and nan values should be imputed. Defaults to True.
override_column_types – Columns overriden by a special type . Defaults to {}.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> result = AAIDataImputationTask().run(df)
>>> result

Returns

Dictionnary of results

”status”: “SUCCESS” if the task successfully ran else “FAILURE”
”messenger”: Message returned with the task
”validations”: List of validations on the data.
non-empty if the data presents a problem for the task
”runtime”: Execution time of the task
”data”: Dictionnary containing the data for the task
- ”columns”: Columns of the new table
- ”records”: Records containing the new values after imputation

Return type

Dict

actableai.tasks.data_imputation.construct_rules(data)¶

actableai.tasks.direct_causal module¶

class actableai.tasks.direct_causal.AAIDirectCausalFeatureSelection(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

Search for direct causal features with DML.

run(df, target, features, max_concurrent_ci_tasks=4, positive_outcome_value=None, causal_inference_task_params=None, causal_inference_run_params=None)¶

This function performs causal feature selection on a given dataset.

Parameters

self (object) – The instance of the class.
df (pandas.DataFrame) – The dataframe containing the data.
target (str) – The target feature for which the causal inference is to be performed.
features (list) – List of features for which the causal inference is to be performed.
max_concurrent_ci_tasks (int, optional) – Maximum number of concurrent causal inference tasks. Defaults to 4.
dummy_prefix_sep (str, optional) – Prefix separator to be used while creating dummy variables. Defaults to “:::”.
positive_outcome_value (str, optional) – Positive outcome value.
causal_inference_task_params (dict, optional) – Causal inference task parameters. Defaults to None.
causal_inference_run_params (dict, optional) – Causal inference run parameters. Defaults to None.

Returns

A dictionary containing the status, data and validations of the function.

Return type

dict

actableai.tasks.forecast module¶

class actableai.tasks.forecast.AAIForecastTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITunableTask

Forecast (time series) Task

static get_hyperparameters_space(dataset_len: int) → actableai.parameters.models.OptionsSpace[Parameters]¶

Return the hyperparameters space of the task.

Parameters: dataset_len – Len of the dataset (shape[0]).
Returns: Hyperparameters space represented as a ModelSpace.

run(df: pandas.core.frame.DataFrame, prediction_length: int, date_column: Optional[str] = None, predicted_columns: Optional[List[str]] = None, group_by: Optional[List[str]] = None, feature_columns: Optional[List[str]] = None, ray_tune_kwargs: Optional[Dict] = None, max_concurrent: int = 3, trials: int = 1, use_ray: bool = True, tune_samples: int = 20, refit_full: bool = True, verbose: int = 3, seed: int = 123, sampling_method: str = 'random', tuning_metric: str = 'mean_wQuantileLoss', seasonal_periods: Optional[List[int]] = None, hyperparameters: Dict = None) → Dict[str, Any]¶

Run time series forecasting task and return results.

Parameters

df – Input DataFrame.
prediction_length – Length of the prediction to forecast.
date_column – Column containing the date/datetime/time component of the time series.
predicted_columns – List of columns to forecast, if None all the columns will be selected.
group_by – List of columns to use to separate different time series/groups. This list is used by the groupby function of the pandas library.
feature_columns – List of columns containing extraneous features used to forecast. If one or more feature columns contain dynamic features (features that change over time) the dataset must contain prediction_length features data points in the future.
ray_tune_kwargs – Named parameters to pass to ray’s tune function.
max_concurrent – Maximum number of concurrent ray task.
trials – Number of trials for hyperparameter search.
use_ray – If True ray will be used for hyperparameter tuning.
tune_samples – Number of dataset samples to use when tuning.
refit_full – If True the final model will be fitted using all the data (including the validation set).
verbose – Verbose level.
seed – Random seed to use.
sampling_method – Method used when extracting the samples for the tuning [“random”, “last”].
tuning_metric – Metric to minimize when tuning.
seasonal_periods – List of seasonal periods (seasonality).
hyperparameters – Dictionary representing the hyperparameters to run the tuning search on.

Returns

Dictionary containing the results.

Return type

Dict

actableai.tasks.intervention module¶

class actableai.tasks.intervention.AAIInterventionTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

run(df: pandas.core.frame.DataFrame, target: str, current_intervention_column: str, new_intervention_column: Optional[str] = None, expected_target: Optional[str] = None, common_causes: Optional[List[str]] = None, causal_cv: Optional[int] = None, causal_hyperparameters: Optional[Dict] = None, cate_alpha: Optional[float] = None, presets: Optional[str] = None, model_directory: Optional[str] = None, num_gpus: Optional[int] = 0, feature_importance: bool = True, drop_unique: bool = True, drop_useless_features: bool = True, only_fit: bool = False, tabpfn_model_directory: Optional[str] = None, cross_validation_hyperparameters: Optional[Dict] = None) → Dict¶

Run this intervention task and return the results.

Parameters

df – Input DataFrame
target – Column name of target variable
current_intervention_column – Column name of the current intervention
new_intervention_column – Column name of the new intervention
common_causes – List of common causes to be used for the intervention
causal_cv – Number of folds for causal cross validation
causal_hyperparameters – Hyperparameters for AutoGluon See https://auto.gluon.ai/stable/api/autogluon.task.html?highlight=tabularpredictor#autogluon.tabular.TabularPredictor
cate_alpha – Alpha for intervention effect. Ignored if df[target] is categorical
presets – Presets for AutoGluon. See https://auto.gluon.ai/stable/api/autogluon.task.html?highlight=tabularpredictor#autogluon.tabular.TabularPredictor
model_directory – Model directory
num_gpus – Number of GPUs used by causal models
drop_unique – Whether the classification algorithm drops columns that only have a unique value accross all rows at fit time
drop_useless_features – Whether the classification algorithm drops columns that only have a unique value accross all rows at preprocessing time
tabpfn_model_directory – TabPFN Model Directory.
cross_validation_hyperparameters – Hyperparameters when running cross validation

Examples

>>> import pandas as pd
>>> from actableai.tasks.intervention import AAIInterventionTask
>>> df = pd.read_csv("path/to/csv")
>>> result = AAIInterventionTask().run(
...     df,
...     'target_column',
... )

Returns

Dictionnay containing the following keys:

status: Status of the task
messenger: Message of the task
validations: Validations for the tasks parameters
data: Dictionnary containing the following keys:
- df: DataFrame with the intervention
- causal_graph_dot: Causal graph in dot format
- T_res: Residuals of the treatment
- Y_res: Residuals of the outcome
- X: Common causes
- model_t_scores: Model scores for the treatment
- model_y_scores: Model scores for the outcome
- intervention_plot: Data for plotting the intervention
runtime: Runtime of the task

Return type

Dict

actableai.tasks.ocr module¶

class actableai.tasks.ocr.AAIOCRTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

classmethod get_parameters() → actableai.parameters.parameters.Parameters¶

run(images: Iterable[PIL.Image.Image], parameters: Optional[Dict[str, Any]] = None)¶: Abstract method called to run the task

actableai.tasks.regression module¶

class actableai.tasks.regression.AAIRegressionTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.autogluon.AAIAutogluonTask

Regression task.

static compute_problem_type(prediction_quantiles: Optional[List[int]]) → str¶

Determine the problem type (‘regression’ or ‘quantile’)

Parameters: prediction_quantiles – List of quantiles. (in percentage)
Returns: String representation of the problem type: ‘regression’ or ‘quantile’
Return type: problem_type

classmethod get_hyperparameters_space(dataset_len: int, prediction_quantiles: Optional[List[float]] = None, device: str = 'cpu', explain_samples: bool = False, ag_automm_enabled: bool = False, tabpfn_enabled: bool = False, causal_inference: bool = False, name: str = 'regression_model_space', display_name: str = 'Regression Model Space', description: str = 'The space of available and default regression models and parameters.') → actableai.parameters.models.OptionsSpace[Parameters]¶

Return the hyperparameters space of the task.

Parameters

dataset_len – Length of the dataset.
prediction_quantiles – List of quantiles (for regression task only), as a percentage
device – Which device is being used, can be one of ‘cpu’ or ‘gpu’.
explain_samples – Boolean indicating if explanations for predictions in test and validation will be generated.
ag_automm_enabled – Boolean indicating if AG_AUTOMM model should be used.
tabpfn_enabled – Boolean indicating if TabPFN model should be used.
causal_inference – Boolean indicating if causal inference is being performed.
name – Name of the output model space.
display_name – Display name of the output model space.
description – Description of the output model space.

Returns

Hyperparameters space represented as a ModelSpace.

run(df: pandas.core.frame.DataFrame, target: str, features: Optional[List[str]] = None, biased_groups: Optional[List[str]] = None, debiased_features: Optional[List[str]] = None, eval_metric: str = 'r2', validation_ratio: float = 0.2, prediction_quantiles: Optional[List[float]] = None, explain_samples: bool = False, model_directory: Optional[str] = None, presets: str = 'medium_quality_faster_train', hyperparameters: Optional[dict] = None, train_task_params: Optional[dict] = None, kfolds: int = 1, cross_validation_max_concurrency: int = 1, residuals_hyperparameters: Optional[dict] = None, drop_duplicates: bool = True, return_residuals: bool = False, kde_steps: int = 10, num_gpus: Union[int, str] = 0, time_limit: Optional[int] = None, drop_unique: bool = True, drop_useless_features: bool = True, split_by_datetime: bool = False, datetime_column: Optional[str] = None, ag_automm_enabled: bool = False, refit_full: bool = False, feature_prune: bool = True, feature_prune_time_limit: Optional[float] = None, intervention_run_params: Optional[Dict] = None, causal_feature_selection: bool = False, causal_feature_selection_max_concurrent_tasks: int = 20, ci_for_causal_feature_selection_task_params: Optional[dict] = None, ci_for_causal_feature_selection_run_params: Optional[dict] = None, run_pdp: bool = True, run_ice: bool = True, pdp_ice_grid_resolution: Optional[int] = 100, pdp_ice_n_samples: Optional[int] = 100, num_trials: int = 1, infer_limit: float = 60, infer_limit_batch_size: int = 100) → Dict[str, Any]¶

Run this regression task and return results.

Parameters

df – Input data frame
target – Target columns in df. If there are empty values in this columns, predictions will be generated for these rows.
features – A list of features to be used for prediction. If None, all columns except target are used as features
biased_groups – A list of columns of groups that should be protected from biases (e.g. gender, race, age)
debiased_features – A list of proxy features that need to be debiased for protection of sensitive groups
eval_metric – Metric to be optimized during training. Possible values include ‘root_mean_squared_error’, ‘mean_squared_error’, ‘mean_absolute_error’, ‘median_absolute_error’, ‘r2’
validation_ratio – The ratio to randomly split data for training and validation
prediction_quantiles – List of quantiles. (in percentage)
explain_samples – If true, explanations for predictions in test and validation will be generated. It takes significantly longer time to run.
model_directory – Destination to store trained model. If not set, a temporary folder will be created
presets – Autogluon’s presets for training model. See https://auto.gluon.ai/stable/_modules/autogluon/tabular/predictor/predictor.html#TabularPredictor.fit.
hyperparameters – Autogluon’s hyperparameters for training model. See https://auto.gluon.ai/stable/_modules/autogluon/tabular/predictor/predictor.html#TabularPredictor.fit.
train_task_params – Parameters for _AAITrainTask constructor.
kfolds – Number of folds for cross validation. If 1, train test split is used instead.
cross_validation_max_concurrency – Maximum number of Ray actors used for cross validation (each actor execute for one split)
residuals_hyperparameters – Autogluon’s hyperparameteres used in final model of counterfactual predictions
drop_duplicates – Whether duplicate values should be dropped before training.
return_residuals – Whether residual values should be returned in counterfactual prediction
kde_steps – Steps used to generate KDE plots with debiasing
num_gpus – Number of GPUs used in nuisnace models in counterfactual prediction
time_limit – time limit (in seconds) of training. None means no time limit
drop_unique – Wether to drop columns with only unique values as preprocessing step
drop_useless_features – Whether to drop columns with only unique values at fit time
split_by_datetime – Wether train/validation sets are split using datetime. Training will be the most recent data and validation the latest.
datetime_column – The specified datetime column if split_by_datetime is enabled
ag_automm_enabled – Whether to use autogluon multimodal model on text columns.
refit_full – Wether the model is completely refitted on validation at the end of the task. Training time is divided by 2 to allow refitting for the other half of the time
feature_prune – Whether feature pruning is enabled. Can increase accuracy by removing harmful features for the model (features that are detrimental to the performance). If no training time left, this step is skipped
feature_prune_time_limit – Time limit for feature pruning.
intervention_run_params – Parameters for running an intervention task. Check actableai/tasks/intervention.py for more details.
causal_feature_selection – if True, it will search for direct causal features and use only these features for the prediction
causal_feature_selection_max_concurrent_tasks – maximum number of concurrent tasks for selecting causal features
ci_for_causal_feature_selection_task_params – Parameters for AAIDirectCausalFeatureSelectionTask
ci_for_causal_feature_selection_run_params – Kwargs for AAIDirectCausalFeatureSelectionTask’s run
run_pdp – Run Partial Dependency to get Partial Dependency Plot (PDP)
run_ice – Run Independent Conditional Expectation (ICE)
pdp_ice_grid_resolution – Maximum resolution of the grid to use for computation of the PDP and/or ICE
pdp_ice_n_samples – The number of rows to sample in df_train. If ‘None, no sampling is performed.
num_trials – The number of trials for hyperparameter optimization
infer_limit – The time in seconds to predict 1 row of data. For example, infer_limit=0.05 means 50 ms per row of data, or 20 rows / second throughput.
infer_limit_batch_size – The amount of rows passed at once to be predicted when calculating per-row speed. This is very important because infer_limit_batch_size=1 (online-inference) is highly suboptimal as various operations have a fixed cost overhead regardless of data size. If you can pass your test data in bulk, you should specify infer_limit_batch_size=10000. Must be an integer greater than 0.

Examples

>>> import pandas as pd
>>> from actableai.tasks.regression import AAIRegressionTask
>>> df = pd.read_csv("path/to/csv")
>>> result = AAIRegressionTask().run(
...     df,
...     'target_column',
... )

Returns

Dictionary containing the results for this task

”status”: “SUCCESS” if the task successfully ran else “FAILURE”
”messenger”: Message returned with the task
”validations”: List of validations on the data.
non-empty if the data presents a problem for the task
”runtime”: Execution time of the task
”data”: Dictionary containing the data for the task
- ”validation_table”: Validation table
- ”prediction_table”: Prediction table
- ”predict_shaps”: Shapley values for prediction table
- ”evaluate”: Evaluation metrics for the task
- ”validation_shaps”: Shapley values for the validation table
- ”importantFeatures”: Feature importance for the validation table
- ”debiasing_charts”: If debiasing enabled, charts to display debiasing
- ”leaderboard”: Leaderboard of the best trained models

”model”: AAIModel to redeploy the model

Return type

Dict

actableai.tasks.sentiment_analysis module¶

class actableai.tasks.sentiment_analysis.AAISentimentAnalysisTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

Sentiment Analysis Task

run(df: pandas.core.frame.DataFrame, target: str, batch_size: int = 32, rake_threshold=1.0) → Dict¶

Run a sentiment analysis on Input DataFrame

Parameters

df – Input DataFrame
target – Target for sentiment analysis
batch_size – Batch Size. Defaults to 32.
rake_threshold – Threshold for Rake scores used to extract keywords . Defaults to 1.0.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> AAISentimentAnalysisTask().run(df, "target")

Returns: Dictionnary of results
Return type: Dict

actableai.tasks.text_extraction module¶

class actableai.tasks.text_extraction.AAITextExtractionTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)¶

Bases: actableai.tasks.base.AAITask

classmethod get_parameters() → actableai.parameters.parameters.Parameters¶

run(df: pandas.core.frame.DataFrame, document_name_column: str, text_column: str, default_openai_api_key: str, openai_rate_limit_per_minute: float = None, parameters: Optional[Dict[str, Any]] = None) → Dict[str, Any]¶: Abstract method called to run the task

Module contents¶

class actableai.tasks.TaskType(value)¶

Bases: str, enum.Enum

Enum representing the different tasks available

ASSOCIATION_RULES = 'association_rules'¶

BAYESIAN_REGRESSION = 'bayesian_regression'¶

CAUSAL_DISCOVERY = 'causal_discovery'¶

CAUSAL_INFERENCE = 'causal_inference'¶

CLASSIFICATION = 'classification'¶

CLASSIFICATION_TRAIN = 'classification_train'¶

CLUSTERING = 'clustering'¶

CORRELATION = 'correlation'¶

DATA_IMPUTATION = 'data_imputation'¶

DEC_ANCHOR_CLUSTERING = 'dec_anchor_clustering'¶

DIRECT_CAUSAL_FEATURE_SELECTION = 'direct_causal_feature_selection'¶

FORECAST = 'forecast'¶

INTERVENTION = 'intervention'¶

OCR = 'ocr'¶

REGRESSION = 'regression'¶

REGRESSION_TRAIN = 'regression_train'¶

SENTIMENT_ANALYSIS = 'sentiment_analysis'¶

TEXT_EXTRACTION = 'text_extraction'¶

Contents

Previous topic

Next topic

This Page

actableai.tasks package¶

Subpackages¶

Submodules¶

actableai.tasks.association_rules module¶

actableai.tasks.autogluon module¶

actableai.tasks.base module¶

actableai.tasks.bayesian_regression module¶

actableai.tasks.causal_discovery module¶

actableai.tasks.causal_inference module¶

actableai.tasks.classification module¶

actableai.tasks.clustering module¶

actableai.tasks.correlation module¶

actableai.tasks.data_imputation module¶

actableai.tasks.direct_causal module¶

actableai.tasks.forecast module¶

actableai.tasks.intervention module¶

actableai.tasks.ocr module¶

actableai.tasks.regression module¶

actableai.tasks.sentiment_analysis module¶

actableai.tasks.text_extraction module¶

Module contents¶