actableai.tasks package

Submodules

actableai.tasks.association_rules module

class actableai.tasks.association_rules.AAIAssociationRulesTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

run(df: pandas.core.frame.DataFrame, group_by: List[str], items: str, frequent_method: str = 'fpgrowth', min_support: float = 0.5, association_metric: str = 'confidence', min_association_metric: float = 0.5, graph_top_k: int = 10) Dict

Generate association rules from a dataframe.

Parameters
  • df – Input dataframe.
  • group_by – List of columns to group by. (e.g. order_id or customer_id)
  • items – Column name of items. (e.g. product_id or product_name)
  • frequent_method – Frequent method to use. Available options are [“fpgrowth”, “fpmax”, “apriori”]
  • min_support – Minimum support threshold for itemsets generation.
  • association_metric – Association metric used for association rules generation. Available options are [“support”, “confidence”, “lift”, “leverage”, “conviction”]
  • min_association_metric – Minimum value for significance of association.
  • graph_top_k – Maximum number of nodes to display on association graph.

Examples

>>> import pandas as pd
>>> from actableai.tasks.association_rules import AssociationRulesTask
>>> df = pd.read_csv("path/to/data.csv")
>>> result = AssociationRulesTask().run(
...     df,
...     group_by=["order_id", "customer_id"],
...     items="product_id",
... )
>>> result["association_rules"]
Returns
Dictionnary containing the results of the task.
  • ”status”: “SUCCESS” if the task successfully ran else “FAILURE”
  • ”data”: Dictionnary containing the data of the task.
    • ”rules”: List of association rules.
    • ”frequent_itemset”: Frequent itemsets.
    • ”df_list”: List of associated items for each group_by.
    • ”graph”: Association graph.
    • ”association_metric”: Association metric used for association
      rules generation.
    • ”association_rules_chord”: Association rules chord diagram.
  • ”validations”: List of validations on the data,
    non-empty if the data presents a problem for the task
  • ”runtime”: Time taken to run the task.
Return type
Dict

actableai.tasks.autogluon module

class actableai.tasks.autogluon.AAIAutogluonTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITunableTask, abc.ABC

static get_available_models(problem_type: str, explain_samples: bool, gpu: bool = False, ag_automm_enabled: bool = False, tabpfn_enabled: bool = False, causal_inference: bool = False) List[actableai.models.autogluon.params.base.Model]

Get list of available models for the given problem type.

Parameters
  • problem_type – The type of the problem (‘regression’ or ‘quantile’)
  • explain_samples – Boolean indicating if explanations for predictions in test and validation will be generated.
  • gpu – If GPU is available. If False, ‘CPU’ is used, otherwise ‘GPU’ is used. Used to filter out any models which can only run on the GPU and the GPU is unavailable.
  • ag_automm_enabled – Boolean indicating if AG_AUTOMM model should be used
  • tabpfn_enabled – Boolean indicating if TabPFN model should be used
  • causal_inference – Whether causal inference is used
Returns

List of available models

classmethod get_base_hyperparameters_space(num_class: int, dataset_len: int, problem_type: str, device: str = 'cpu', explain_samples: bool = False, ag_automm_enabled: bool = False, tabpfn_enabled: bool = False, causal_inference: bool = False) actableai.parameters.models.OptionsSpace[Parameters]

Return the hyperparameters space of the task.

Parameters
  • df – DataFrame containing the features
  • num_class – The number of classes in the target column (‘-1’ can be used for regression which does not use classes)
  • dataset_len – The length of the dataset
  • problem_type – The type of the problem (‘regression’/’quantile’/’multiclass’/’binary’)
  • device – Which device is being used, can be one of ‘cpu’ or ‘gpu’.
  • explain_samples – Boolean indicating if explanations for predictions in test and validation will be generated.
  • ag_automm_enabled – Boolean indicating if AG_AUTOMM model should be used.
  • tabpfn_enabled – Boolean indicating if TabPFN model should be used.
  • causal_inference – Boolean indicating if causal inference is being performed.
Returns

Default models and settings. options: Display name and hyperparameters of the available models

Return type

default

actableai.tasks.base module

class actableai.tasks.base.AAITask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: abc.ABC

Base abstract class to represent a Actable AI Task

classmethod get_parameters() actableai.parameters.parameters.Parameters
abstract run(*args, **kwargs)

Abstract method called to run the task

static run_with_ray_remote(task: actableai.tasks.TaskType) Callable

Method to run a specific task with ray remote (used as a decorator)

Parameters
task – The task type that will be run
Returns
The decorator
class actableai.tasks.base.AAITunableTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask, abc.ABC

Base abstract class to represent a Tunable Actable AI Task.

abstract static get_hyperparameters_space(*args, **kwargs) actableai.parameters.models.OptionsSpace[Parameters]

Return the hyperparameters space oof the task.

Returns
Hyperparameters space represented as a ModelSpace.

actableai.tasks.bayesian_regression module

class actableai.tasks.bayesian_regression.AAIBayesianRegressionTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

run(df: pandas.core.frame.DataFrame, features: List[str], target: str, priors: Optional[Dict] = None, prediction_quantile_low: int = 5, prediction_quantile_high: int = 95, trials: int = 1, polynomial_degree: int = 1, validation_split: int = 20, pdf_steps: int = 100, predict_steps: int = 100, normalize: bool = False) Dict

A task to run a Bayesian Regression on features w.r.t to target

Parameters
  • df – DataFrame containing the features and target. Additional columns are ignored.
  • features – List of features/columns to use for the Bayesian Regression
  • target – Target column for the Bayesian Regression
  • priors – Prior probabilty distribution of features.
  • prediction_quantile_low – Quantile for lowest point on prediction.
  • prediction_quantile_high – Quantile for highest point on prediction.
  • trials – Number of trials for tuning, best model is used for prediction.
  • polynomial_degree – Value for generating maximum polynomial features and cross-intersection features, higher values means better results but uses more memory.
  • validation_split – Percentage of the data used for validation.
  • pdf_steps – Number of steps for probability density function.
  • predict_steps – Number of predicted values.
  • normalize – If the generated features should be normalized, useful for big polynomial degrees.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> result = AAIBayesianRegressionTask(
...     df,
...     ["feature1", "feature2", "feature3"],
...     "target"
>>> )
>>> result
Raises
ValueError – Categorical exponents are not raised to any exponents. ValueError if a prior is looking for an exponentiated value of a categorical feature.
Returns
Dictionary containing the results of the task.
  • ”status”: “SUCCESS” if the task successfully ran else “FAILURE”
  • ”messenger”: Message returned with the task
  • ”data”: Dictionary containing the data of the task
    • ”rules”: List of association rules
    • ”frequent_itemset”: Frequent itemsets
    • ”df_list”: List of associated items for each group_by
    • ”graph”: Association graph in dot format
    • ”association_metric”: Association metric used for association
      rules generation
    • ”association_rules_chord”: Association rules chord diagram
    • ”coeffs”: Coefficients of the Regression model,
    • ”intercept”: Intercept of the Regression model,
    • ”sigma”: Sigmas of the re Regression model,
    • ”best_config”: Best usable model,
    • ”evaluation”: r2 and MSE metrics of the trained model
  • ”validations”: List of validations on the data,
    non-empty if the data presents a problem for the task
  • ”runtime”: Time taken to run the task
Return type
Dict

actableai.tasks.causal_discovery module

class actableai.tasks.causal_discovery.AAICausalDiscoveryTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

run(algo: str, payload: actableai.causal.discover.algorithms.payloads.CausalDiscoveryPayload, progress_callback: Optional[Callable] = None) Dict

Run a causal discovery algorithm.

Parameters
  • algo (str) – The name of the algorithm to run. Must be either “deci”, “notears”, “direct-lingamp” or “pc”.
  • payload (CausalDiscoveryPayload) – The payload to use for the algorithm. Use actableai.causal.discovery.algorithms.deci.DeciPayload for “deci”, actableai.causal.discovery.algorithms.notears.NotearsPayload for “notears”, actableai.causal.discovery.algorithms.direct_lingam.DirectLiNGAMPayload for “direct-lingam” and actableai.causal.discovery.algorithms.pc.PCPayload for “pc”.
  • progress_callback (Union[Callable, None], optional) – A callback to use for progress reporting. Defaults to None.
Returns

The causal graph produced by the algorithm.

Return type

CausalGraph

Raises

ValueError – If the algorithm is not supported.

actableai.tasks.causal_inference module

class actableai.tasks.causal_inference.AAICausalInferenceTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

static get_estimator_parameters(Y: numpy.ndarray, T: numpy.ndarray, has_categorical_treatment: bool, has_binary_outcome: bool, is_single_binary_treatment: bool, common_causes_and_effect_modifiers: List) pydantic.generics.OptionsParameter[Parameters]

TODO: Finalise Documentation

Parameters
  • Y – (n × d_y) matrix or vector of length n. Outcomes for each sample
  • T – (n × d_t) matrix or vector of length n. Treatments for each sample
  • has_categorical_treatment – Whether the treatment is categorical.
  • has_binary_outcome – Whether the outcome is binary.
  • is_single_binary_treatment – Whether there is only one treatment, and it is binary.
  • common_causes_and_effect_modifiers – List of common causes and effect modifiers.
Returns

The parameter options space of the estimator

classmethod get_hyperparameters(pd_table: pandas.core.frame.DataFrame, treatments: List, outcomes: List, device: str, dataset_len: Union[int, None, str] = 'auto', num_class: Union[int, None, str] = 'auto', has_categorical_treatment: Union[bool, None, str] = 'auto', is_single_binary_outcome: Union[bool, None, str] = 'auto', effect_modifiers: Optional[List] = [], common_causes: Optional[List] = [], log_treatment: Optional[bool] = False, log_outcome: Optional[bool] = False, positive_outcome_value=None) actableai.parameters.parameters.Parameters

Get hyperparameters of the outcomes model and treatments model (for Double ML estimators)

Parameters
  • pd_table – Dataset for the causal analysis
  • treatments – treatment variable(s)
  • outcomes – outcome variables
  • effect_modifiers – list of effect modifiers (X) for CATE estimation. Defaults to [].
  • common_causes – list of common causes (W). Defaults to [].
  • device – The device to use (‘cpu’ or ‘gpu’)
  • dataset_len – The length of the dataset (Optional)
  • num_class – The number of classes for the outcome (Optional)
  • has_categorical_treatment – Whether the treatment is categorical (Optional)
  • is_single_binary_outcome – Whether the outcome is binary (Optional)
  • log_treatment – flag to indicate whether log transform is to be applied to treatment
  • log_outcome – flag to indicate whether log transform is to be applied to outcome
  • positive_outcome_value – If not None, target is converted into 0, 1 where 1 is when original target is equal to positive_outcome_value else 0.
Returns

Parameters of the outcomes model and treatments model

static get_hyperparameters_t(has_categorical_treatment: bool, pd_table: pandas.core.frame.DataFrame, device: str, label_t: str, dataset_len: Union[int, None, str] = 'auto', num_class: Union[int, None, str] = 'auto')

Get hyperparameters for treatments model

Parameters
  • has_categorical_treatment – has_categorical_treatment: Whether the treatment is categorical.
  • pd_table – Pandas DataFrame containing the data
  • device – Device to use (‘cpu’ or ‘gpu’)
  • label_t – Label of the target column in pd_table
  • dataset_len – The length of the dataset (Optional)
  • num_class – The number of classes for the outcome (Optional)
Returns

The hyperparameters space task_t: The regression/classification task

Return type

hyperparameters_space_t

static get_hyperparameters_y(is_single_binary_outcome: bool, pd_table: pandas.core.frame.DataFrame, device: str, dataset_len: Union[int, None, str] = 'auto')

Get hyperparameters for outcomes model

Parameters
  • is_single_binary_outcome – Whether the outcome is binary.
  • pd_table – Pandas DataFrame containing the data
  • device – Device to use (‘cpu’ or ‘gpu’)
  • dataset_len – The length of the dataset (Optional)
Returns

The hyperparameters space task_y: The regression/classification task

Return type

hyperparameters_space_y

classmethod get_parameters(pd_table: pandas.core.frame.DataFrame, treatments: List, outcomes: List, Y: Union[numpy.ndarray, None, str] = 'auto', T: Union[numpy.ndarray, None, str] = 'auto', has_categorical_treatment: Union[bool, None, str] = 'auto', is_single_binary_outcome: Union[bool, None, str] = 'auto', is_single_binary_treatment: Union[bool, None, str] = 'auto', effect_modifiers: Optional[List] = [], common_causes: Optional[List] = [], log_treatment: Optional[bool] = False, log_outcome: Optional[bool] = False, positive_outcome_value=None) actableai.parameters.parameters.Parameters

Get parameters of the estimator

Parameters
  • pd_table – Dataset for the causal analysis
  • treatments – treatment variable(s)
  • outcomes – outcome variables
  • Y – (n × d_y) matrix or vector of length n. Outcomes for each sample (Optional)
  • T – (n × d_t) matrix or vector of length n. Treatments for each sample (Optional)
  • has_categorical_treatment – Whether the treatment is categorical (Optional)
  • is_single_binary_outcome – Whether the outcome is binary (Optional)
  • is_single_binary_treatment – Whether there is only one treatment, and it is binary (Optional)
  • effect_modifiers – list of effect modifiers (X) for CATE estimation. Defaults to [].
  • common_causes – list of common causes (W). Defaults to [].
  • log_treatment – flag to indicate whether log transform is to be applied to treatment
  • log_outcome – flag to indicate whether log transform is to be applied to outcome
  • positive_outcome_value – If not None, target is converted into 0, 1 where 1 is when original target is equal to positive_outcome_value else 0.
Returns

Parameters of the estimator

run(pd_table: pandas.core.frame.DataFrame, treatments: List, outcomes: List, effect_modifiers: Optional[List] = None, common_causes: Optional[List] = None, instrumental_variables: Optional[List] = None, controls: Optional[dict] = None, positive_outcome_value=None, target_units: Optional[str] = 'ate', alpha: Optional[float] = 0.05, tree_max_depth: Optional[int] = 3, log_treatment: Optional[bool] = False, log_outcome: Optional[bool] = False, model_directory: Optional[Union[str, pathlib.Path]] = None, ag_presets: str = 'medium_quality_faster_train', model_params: Optional[List] = None, rscorer: Optional[List] = None, feature_importance: bool = False, seed: int = 123, num_gpus: Union[int, str] = 0, drop_unique: bool = True, drop_useless_features: bool = False, parameters_estimator: Optional[dict] = {}, num_trials: int = 1)

Causal analysis task

Parameters
  • pd_table – Dataset for the causal analysis
  • treatments – treatment variable(s)
  • outcomes – outcome variables
  • effect_modifiers – list of effect modifiers (X) for CATE estimation. Defaults to [].
  • common_causes – list of common causes (W). Defaults to [].
  • instrumental_variables – list of instrumental variables (Z). Defaults to [].
  • controls – dictionary of control treatment values. Keys are categorical treatment names
  • positive_outcome_value – If not None, target is converted into 0, 1 where 1 is when original target is equal to positive_outcome_value else 0.
  • target_units – Targeted used for calculating the effect. Possible values are “ate”, “att”, “atc”. Defaults to “ate”
  • alpha – Significance level of effect confidence interval (from 0.01 to 0.99). Defaults to 0.05
  • tree_max_depth – Maximum depth of CATE function’s tree interpreter. Default to 3.
  • log_treatment – flag to indicate whether log transform is to be applied to treatment
  • log_outcome – flag to indicate whether log transform is to be applied to outcome
  • model_directory – Where the model should be stored, if None stored in the /tmp folder
  • ag_presets – Presets for Autogluon Models
  • model_params – List of model Parameters for the AAICausalEstimator
  • rscorer – tune rscorer object
  • feature_importance – Whether the feature importance are computed
  • seed – Random numpy seed
  • num_gpus – Number of GPUs for the TabularPredictors. Can be set to an int or “auto”
  • drop_unique – Wether to drop columns with only unique values as preprocessing step
  • drop_useless_features – Whether to drop columns with only unique values at fit time
  • parameters_estimator

    Options for the estimator to be used. This needs to be defined in the key ‘estimator’. For DoubleML estimators, the following keys can also be specified:

    ’hyperparameters_t’: AutoGluon hyperparameters for the tabular
    predictor model(s) trained on the treatments
    ’hyperparameters_y’: AutoGluon hyperparameters for the tabular
    predictor model(s) trained on the outcomes
  • num_trials – The number of trials for hyperparameter optimization

Examples

>>> df = pd.read_csv("path/to/csv")
>>> result = infer_causal(
...    df,
...    treatments=["feature1", "feature2"],
...    outcomes=["feature3", "feature4"],
...    effect_modifiers=["feature5", "feature6"]
... )
>>> result
Returns
dictionary of estimation results
  • ”status”: “SUCCESS” if the task successfully ran else “FAILURE”
  • ”messenger”: Custom message from the task
  • ”runtime”: Execution time from the task
  • ”data”: Dictionnary containing data from the task
    • ””:
    • ”effect”:
    • ”controls”: Transformed controls
    • ”causal_graph_dot”: Feature graph in dot format
    • ”tree_interpreter_dot”: Tree interpreter in dot format
    • ”refutation_results”: # TODO
    • ”T_res”: Treatment residuals
    • ”Y_res”: Outcome residuals
    • ”X”: Common causes values
    • ”model_t_scores”: Scores for the treatment model
    • ”model_y_scores”: Scores for the outcome model
    • ”model_t_feature_importances”: Feature importances for treatment model
    • ”model_y_feature_importances”: Feature importances for outcome model
  • ”validations”: List of validations on the data,
    non-empty if the data presents a problem for the task
Return type
Dict
exception actableai.tasks.causal_inference.LogCategoricalOutcomeNotAllowed

Bases: ValueError

exception actableai.tasks.causal_inference.LogCategoricalTreatmentNotAllowed

Bases: ValueError

actableai.tasks.causal_inference.convert_categorical_to_numeric(df, columns)

actableai.tasks.classification module

class actableai.tasks.classification.AAIClassificationTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.autogluon.AAIAutogluonTask

AAIClassificationTask class for classification

Parameters
AAIAutogluonTask – Base Class for every AutoGluon task
classmethod compute_problem_type(df: pandas.core.frame.DataFrame, target: str, num_class: Optional[int] = None) str

Determine the problem type (‘multiclass’ or ‘binary’), using the values in the target column

Parameters
  • df – The dataset for which the problem type is inferred
  • target – Name of the target column in df
  • num_class – The number of classes. If None, it will be computed using the target column in df
Returns

‘multiclass’ or ‘binary’

Return type

String representation of the problem type

classmethod get_hyperparameters_space(num_class: int, dataset_len: int, device: str = 'cpu', explain_samples: bool = False, ag_automm_enabled: bool = False, tabpfn_enabled: bool = False, causal_inference: bool = False, name: str = 'classification_model_space', display_name: str = 'Classification Model Space', description: str = 'The space of available and default classification models and parameters.') actableai.parameters.models.OptionsSpace[Parameters]

Return the hyperparameters space of the task.

Parameters
  • num_class – Number of classes in the target column.
  • device – Which device is being used, can be one of ‘cpu’ or ‘gpu’.
  • explain_samples – Boolean indicating if explanations for predictions in test and validation will be generated.
  • ag_automm_enabled – Boolean indicating if AG_AUTOMM model should be used.
  • tabpfn_enabled – Boolean indicating if TabPFN model should be used.
  • causal_inference – Boolean indicating if causal inference is being performed.
  • name – Name of the output model space.
  • display_name – Display name of the output model space.
  • description – Description of the output model space.
Returns

Hyperparameters space represented as a ModelSpace.

static get_num_class(df: pandas.core.frame.DataFrame, target: str) int

Determine the number of classes of the target.

Parameters
  • df – The dataset for which the problem type is inferred
  • target – Name of the target column in df
Returns

An integer representing the number of classes of the target column.

run(df: pandas.core.frame.DataFrame, target: str, features: Optional[List[str]] = None, biased_groups: Optional[List[str]] = None, debiased_features: Optional[List[str]] = None, validation_ratio: float = 0.2, positive_label: Optional[str] = None, explain_samples: bool = False, model_directory: Optional[str] = None, presets: str = 'medium_quality_faster_train', hyperparameters: Optional[Dict] = None, train_task_params: Optional[Dict] = None, kfolds: int = 1, cross_validation_max_concurrency: int = 1, residuals_hyperparameters: Optional[Dict] = None, drop_duplicates: bool = True, num_gpus: float = 0, eval_metric: str = 'accuracy', time_limit: Optional[int] = None, drop_unique: bool = True, drop_useless_features: bool = True, split_by_datetime: bool = False, datetime_column: Optional[str] = None, ag_automm_enabled=False, refit_full=False, feature_prune=True, feature_prune_time_limit: Optional[float] = None, intervention_run_params: Optional[Dict] = None, run_pdp: bool = True, run_ice: bool = True, pdp_ice_grid_resolution: Optional[int] = 100, pdp_ice_n_samples: Optional[int] = 100, tabpfn_model_directory: Optional[str] = None, num_trials: int = 1, infer_limit: float = 60, infer_limit_batch_size: int = 100) Dict

Run this classification task and return results.

Parameters
  • df – Input DataFrame
  • target – Target columns in df. If there are empty values in this columns, predictions will be generated for these rows.
  • features – A list of features to be used for prediction. If None, all columns except target are used as features. Defaults to None.
  • biased_groups – A list of columns of groups that should be protected from biases (e.g. gender, race, age). Defaults to None.
  • debiased_features – A list of proxy features that need to be debiased for protection of sensitive groups. Defaults to None.
  • validation_ratio – The ratio to randomly split data for training and validation. Defaults to 0.2.
  • positive_label – If target contains only 2 different value, pick the positive label by setting positive_label to one of them. Defaults to None.
  • explain_samples – If true, explanations for predictions in test and validation will be generated. It takes significantly longer time to run. Defaults to False.
  • model_directory – Directory to output the model after training. Defaults to None.
  • presets – Autogluon’s presets for training model. More details at https://auto.gluon.ai/stable/_modules/autogluon/tabular/predictor/predictor.html#TabularPredictor.fit. Defaults to “medium_quality_faster_train”.
  • hyperparameters – Autogluon’s hyperparameters. Defaults to None.
  • train_task_params – ?. Defaults to None.
  • kfolds – Number of fold for cross-validation. Defaults to 1.
  • cross_validation_max_concurrency – Maximum number of Ray actors used for cross validation (each actor execute for one split). Defaults to 1.
  • residuals_hyperparameters – Autogluon’s hyperparameteres used in final model of counterfactual predictions. Defaults to None.
  • drop_duplicates – Whether duplicate values should be dropped before training. Defaults to True.
  • num_gpus – Number of gpus used for training. Defaults to 0.
  • eval_metric – Metric to be optimized for. Possible values include ‘accuracy’, ‘balanced_accuracy’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_weighted’, ‘roc_auc’, ‘roc_auc_ovo_macro’, ‘average_precision’, ‘precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_weighted’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_weighted’, ‘log_loss’, ‘pac_score’. Defaults to “accuracy”.
  • time_limit – Time limit of training (in seconds)
  • drop_unique – Wether to drop columns with only unique values as preprocessing step.
  • drop_useless_features – Whether to drop columns with only unique values at fit time.
  • split_by_datetime – Whether the training/validation has to be split based on a datetime column.
  • datetime_column – If split_by_datetime, the column that will split training and validation, else, the parameter is ignored.
  • ag_automm_enabled – Whether to use autogluon multimodal model on text columns. This features makes text classification way more accurate by using text models. This feature is heavy on resources and requires GPU.
  • refit_full – Whether at the end of classification, a second task is launched to refit a new model on the whole dataset. This makes accuracy much better but divides the training time in half. (half for first task, other half for refitting)
  • feature_prune – Wether the feature_pruning is enabled or not. This option improves results but extend the training time. If there is no time left to do feature_pruning after training this step is skipped.
  • intervention_run_params – Parameters for running an intervention task. Check actableai/tasks/intervention.py for more details.
  • run_pdp – Run Partial Dependency to get Partial Dependency Plot (PDP)
  • run_ice – Run Independent Conditional Expectation (ICE)
  • pdp_ice_grid_resolution – Maximum resolution of the grid to use for computation of the PDP and/or ICE
  • pdp_ice_n_samples – The number of rows to sample in df_train. If ‘None, no sampling is performed.
  • tabpfn_model_directory – TabPFN Model Directory.
  • num_trials – The number of trials for hyperparameter optimization
  • infer_limit – The time in seconds to predict 1 row of data. For example, infer_limit=0.05 means 50 ms per row of data, or 20 rows / second throughput.
  • infer_limit_batch_size – The amount of rows passed at once to be predicted when calculating per-row speed. This is very important because infer_limit_batch_size=1 (online-inference) is highly suboptimal as various operations have a fixed cost overhead regardless of data size. If you can pass your test data in bulk, you should specify infer_limit_batch_size=10000. Must be an integer greater than 0.
Raises

Exception – If the target has less than 2 unique values.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> AAIClassificationTask(df, ["feature1", "feature2", "feature3"], "target")
Returns
Dictionary containing the results
  • ”status”: “SUCCESS” if the task successfully ran else “FAILURE”
  • ”messenger”: Message returned with the task
  • ”validations”: List of validations on the data.
    non-empty if the data presents a problem for the task
  • ”runtime”: Execution time of the task
  • ”data”: Dictionary containing the data for the task
    • ”validation_table”: Validation table
    • ”prediction_table”: Prediction table
    • ”fields”: Column names of the prediction table
    • ”predictData”: Prediction Table
    • ”predict_shaps”: Shapley values for prediction table
    • ”validation_shaps”: Shapley values for validation table
    • ”exdata”: Validation Table
    • ”evaluate”: Evaluation metrics on validation set
    • ”importantFeatures”: Feature importance on validation set
    • ”debiasing_charts”: If debiasing enabled, debiasing data to create charts
    • ”leaderboard”: Leaderboard of the best model on validation
  • ”model”: AAIModel to redeploy the model
Return type
Dict

actableai.tasks.clustering module

class actableai.tasks.clustering.AAIClusteringTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

Clustering Task

Parameters
AAITask – Base class for every tasks
classmethod get_parameters() actableai.parameters.parameters.Parameters
run(df: pandas.core.frame.DataFrame, features: Optional[List[str]] = None, num_clusters: int = 2, drop_low_info: bool = False, explain_samples: bool = False, cluster_explain_max_depth=20, cluster_explain_min_impurity_decrease=0.001, cluster_explain_min_samples_leaf=0.001, cluster_explain_min_precision=0.8, max_train_samples: Optional[int] = None, parameters: Dict[str, Any] = None) Dict

Runs a clustering analysis on df

Parameters
  • df – Input DataFrame
  • features – Features used in Input DataFrame. Defaults to None.
  • num_clusters – Number of different clusters assignable to each row.
  • drop_low_info – Wether the algorithm drops columns with only one unique value or only different categorical values accross all rows.
  • explain_samples – If the result contains a human readable explanation of the clustering.
  • init – Initialization for weights of the DEC model.
  • pretrain_optimizer – Optimizer for pretaining phase of autoencoder.
  • update_interval – The interval to check the stopping criterion and update the cluster centers.
  • pretrain_epochs – Number of epochs for pretraining DEC.
  • alpha_k – The factor to control the penalty term of the number of clusters.
  • max_train_samples – Number of randomly selected rows to train the DEC.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> result = AAIClusteringTask().run(
...     df,
...     ["feature1", "feature2", "feature3"]
... )
>>> result
Returns
Dictionnary containing the result
  • ”status”: “SUCCESS” if the task successfully ran else “FAILURE”
  • ”messenger”: Message returned with the task
  • ”data”: Dictionary containing the data for the clustering task
    • ”cluster_id”: ID of the generated cluster
    • ”explanation”: Explanation for the points for this cluster
    • ”encoded_value”: Encoded value for centroid for this cluster
    • ”projected_value”: Projected centroid for this cluster
    • ”projected_nearest_point”: Nearest point for the centroid
  • ”data_v2”: Updated dictionary containing the data for the clustering task
    • ”clusters”: Same dictionary as data
    • ”shap_values”: Shapley values for clustering
  • ”runtime”: Time taken to run the task
  • ”validations”: List of validations on the data,
    non-empty if the data presents a problem for the task
Return type
Dict

actableai.tasks.correlation module

class actableai.tasks.correlation.AAICorrelationTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

Correlation Task

Parameters
AAITask – Base Class for tasks
run(df: pandas.core.frame.DataFrame, target_column: str, target_value: Optional[str] = None, kde_steps: int = 100, lr_steps: int = 100, control_columns: Optional[List[str]] = None, control_values: Optional[List[str]] = None, correlation_threshold: float = 0.05, p_value: float = 0.05, use_bonferroni: bool = False, top_k: int = 20) Dict

Runs a correlation analysis on Input DataFrame

Parameters
  • df – Input DataFrame
  • target_column – Target for correlation analysis
  • target_value – If target_column type is categorical, target_value must be one value of target_column. Else should be None. Defaults to None.
  • kde_steps – Number of steps for kernel density graph. Defaults to 100.
  • lr_steps – Number of steps for linear regression graph. Defaults to 100.
  • control_columns – Control columns for decorrelations. Defaults to None.
  • control_values – Control values for decorrelations. control_values[i] must be a value from df[control_columns[i]]. Defaults to None.
  • correlation_threshold – Threshold for correlation validation. Values with an absolute value above this threshold are considered correlated. Defaults to 0.05.
  • p_value – PValue for correlation validation. Defaults to 0.05.
  • use_bonferroni (bool, optional) – Whether we should use bonferroni test. Defaults to False.
  • top_k – Limit for number of results returned. Only the best k correlated columns are returned. Defaults to 20.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> result = AAICorrelationTask().run(
...     df,
...     ["feature1", "feature2", "feature3"],
...     "target"
... )
Returns
Dictionnary containing the results
  • ”status”: “SUCCESS” if the task successfully ran else “FAILURE”
  • ”messenger”: Message returned with the task
  • ”data”: Dictionary containing the data for the clustering task
    • ”corrs”: Correlation values for each feature
    • ”charts”: Dictionnary containing the charts for correlations
  • ”runtime”: Time taken to run the task
  • ”validations”: List of validations on the data,
    non-empty if the data presents a problem for the task
Return type
Dict

actableai.tasks.data_imputation module

class actableai.tasks.data_imputation.AAIDataImputationTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

run(df, rules: Tuple[str, str] = ('', ''), impute_nulls: bool = True, override_column_types: Dict = {}) Dict

Impute the DataFrame df

Parameters
  • df – DataFrame to impute
  • rules – Set of rules for imputation. Defaults to (“”, “”).
  • impute_nulls – Whether null, None and nan values should be imputed. Defaults to True.
  • override_column_types – Columns overriden by a special type . Defaults to {}.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> result = AAIDataImputationTask().run(df)
>>> result
Returns
Dictionnary of results
  • ”status”: “SUCCESS” if the task successfully ran else “FAILURE”
  • ”messenger”: Message returned with the task
  • ”validations”: List of validations on the data.
    non-empty if the data presents a problem for the task
  • ”runtime”: Execution time of the task
  • ”data”: Dictionnary containing the data for the task
    • ”columns”: Columns of the new table
    • ”records”: Records containing the new values after imputation
Return type
Dict
actableai.tasks.data_imputation.construct_rules(data)

actableai.tasks.direct_causal module

class actableai.tasks.direct_causal.AAIDirectCausalFeatureSelection(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

Search for direct causal features with DML.

run(df, target, features, max_concurrent_ci_tasks=4, positive_outcome_value=None, causal_inference_task_params=None, causal_inference_run_params=None)

This function performs causal feature selection on a given dataset.

Parameters
  • self (object) – The instance of the class.
  • df (pandas.DataFrame) – The dataframe containing the data.
  • target (str) – The target feature for which the causal inference is to be performed.
  • features (list) – List of features for which the causal inference is to be performed.
  • max_concurrent_ci_tasks (int, optional) – Maximum number of concurrent causal inference tasks. Defaults to 4.
  • dummy_prefix_sep (str, optional) – Prefix separator to be used while creating dummy variables. Defaults to “:::”.
  • positive_outcome_value (str, optional) – Positive outcome value.
  • causal_inference_task_params (dict, optional) – Causal inference task parameters. Defaults to None.
  • causal_inference_run_params (dict, optional) – Causal inference run parameters. Defaults to None.
Returns

A dictionary containing the status, data and validations of the function.

Return type

dict

actableai.tasks.forecast module

class actableai.tasks.forecast.AAIForecastTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITunableTask

Forecast (time series) Task

static get_hyperparameters_space(dataset_len: int) actableai.parameters.models.OptionsSpace[Parameters]

Return the hyperparameters space of the task.

Parameters
dataset_len – Len of the dataset (shape[0]).
Returns
Hyperparameters space represented as a ModelSpace.
run(df: pandas.core.frame.DataFrame, prediction_length: int, date_column: Optional[str] = None, predicted_columns: Optional[List[str]] = None, group_by: Optional[List[str]] = None, feature_columns: Optional[List[str]] = None, ray_tune_kwargs: Optional[Dict] = None, max_concurrent: int = 3, trials: int = 1, use_ray: bool = True, tune_samples: int = 20, refit_full: bool = True, verbose: int = 3, seed: int = 123, sampling_method: str = 'random', tuning_metric: str = 'mean_wQuantileLoss', seasonal_periods: Optional[List[int]] = None, hyperparameters: Dict = None) Dict[str, Any]

Run time series forecasting task and return results.

Parameters
  • df – Input DataFrame.
  • prediction_length – Length of the prediction to forecast.
  • date_column – Column containing the date/datetime/time component of the time series.
  • predicted_columns – List of columns to forecast, if None all the columns will be selected.
  • group_by – List of columns to use to separate different time series/groups. This list is used by the groupby function of the pandas library.
  • feature_columns – List of columns containing extraneous features used to forecast. If one or more feature columns contain dynamic features (features that change over time) the dataset must contain prediction_length features data points in the future.
  • ray_tune_kwargs – Named parameters to pass to ray’s tune function.
  • max_concurrent – Maximum number of concurrent ray task.
  • trials – Number of trials for hyperparameter search.
  • use_ray – If True ray will be used for hyperparameter tuning.
  • tune_samples – Number of dataset samples to use when tuning.
  • refit_full – If True the final model will be fitted using all the data (including the validation set).
  • verbose – Verbose level.
  • seed – Random seed to use.
  • sampling_method – Method used when extracting the samples for the tuning [“random”, “last”].
  • tuning_metric – Metric to minimize when tuning.
  • seasonal_periods – List of seasonal periods (seasonality).
  • hyperparameters – Dictionary representing the hyperparameters to run the tuning search on.
Returns

Dictionary containing the results.

Return type

Dict

actableai.tasks.intervention module

class actableai.tasks.intervention.AAIInterventionTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

run(df: pandas.core.frame.DataFrame, target: str, current_intervention_column: str, new_intervention_column: Optional[str] = None, expected_target: Optional[str] = None, common_causes: Optional[List[str]] = None, causal_cv: Optional[int] = None, causal_hyperparameters: Optional[Dict] = None, cate_alpha: Optional[float] = None, presets: Optional[str] = None, model_directory: Optional[str] = None, num_gpus: Optional[int] = 0, feature_importance: bool = True, drop_unique: bool = True, drop_useless_features: bool = True, only_fit: bool = False, tabpfn_model_directory: Optional[str] = None, cross_validation_hyperparameters: Optional[Dict] = None) Dict

Run this intervention task and return the results.

Parameters
  • df – Input DataFrame
  • target – Column name of target variable
  • current_intervention_column – Column name of the current intervention
  • new_intervention_column – Column name of the new intervention
  • common_causes – List of common causes to be used for the intervention
  • causal_cv – Number of folds for causal cross validation
  • causal_hyperparameters – Hyperparameters for AutoGluon See https://auto.gluon.ai/stable/api/autogluon.task.html?highlight=tabularpredictor#autogluon.tabular.TabularPredictor
  • cate_alpha – Alpha for intervention effect. Ignored if df[target] is categorical
  • presets – Presets for AutoGluon. See https://auto.gluon.ai/stable/api/autogluon.task.html?highlight=tabularpredictor#autogluon.tabular.TabularPredictor
  • model_directory – Model directory
  • num_gpus – Number of GPUs used by causal models
  • drop_unique – Whether the classification algorithm drops columns that only have a unique value accross all rows at fit time
  • drop_useless_features – Whether the classification algorithm drops columns that only have a unique value accross all rows at preprocessing time
  • tabpfn_model_directory – TabPFN Model Directory.
  • cross_validation_hyperparameters – Hyperparameters when running cross validation

Examples

>>> import pandas as pd
>>> from actableai.tasks.intervention import AAIInterventionTask
>>> df = pd.read_csv("path/to/csv")
>>> result = AAIInterventionTask().run(
...     df,
...     'target_column',
... )
Returns
Dictionnay containing the following keys:
  • status: Status of the task
  • messenger: Message of the task
  • validations: Validations for the tasks parameters
  • data: Dictionnary containing the following keys:
    • df: DataFrame with the intervention
    • causal_graph_dot: Causal graph in dot format
    • T_res: Residuals of the treatment
    • Y_res: Residuals of the outcome
    • X: Common causes
    • model_t_scores: Model scores for the treatment
    • model_y_scores: Model scores for the outcome
    • intervention_plot: Data for plotting the intervention
  • runtime: Runtime of the task
Return type
Dict

actableai.tasks.ocr module

class actableai.tasks.ocr.AAIOCRTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

classmethod get_parameters() actableai.parameters.parameters.Parameters
run(images: Iterable[PIL.Image.Image], parameters: Optional[Dict[str, Any]] = None)

Abstract method called to run the task

actableai.tasks.regression module

class actableai.tasks.regression.AAIRegressionTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.autogluon.AAIAutogluonTask

Regression task.

static compute_problem_type(prediction_quantiles: Optional[List[int]]) str

Determine the problem type (‘regression’ or ‘quantile’)

Parameters
prediction_quantiles – List of quantiles. (in percentage)
Returns
String representation of the problem type: ‘regression’ or ‘quantile’
Return type
problem_type
classmethod get_hyperparameters_space(dataset_len: int, prediction_quantiles: Optional[List[float]] = None, device: str = 'cpu', explain_samples: bool = False, ag_automm_enabled: bool = False, tabpfn_enabled: bool = False, causal_inference: bool = False, name: str = 'regression_model_space', display_name: str = 'Regression Model Space', description: str = 'The space of available and default regression models and parameters.') actableai.parameters.models.OptionsSpace[Parameters]

Return the hyperparameters space of the task.

Parameters
  • dataset_len – Length of the dataset.
  • prediction_quantiles – List of quantiles (for regression task only), as a percentage
  • device – Which device is being used, can be one of ‘cpu’ or ‘gpu’.
  • explain_samples – Boolean indicating if explanations for predictions in test and validation will be generated.
  • ag_automm_enabled – Boolean indicating if AG_AUTOMM model should be used.
  • tabpfn_enabled – Boolean indicating if TabPFN model should be used.
  • causal_inference – Boolean indicating if causal inference is being performed.
  • name – Name of the output model space.
  • display_name – Display name of the output model space.
  • description – Description of the output model space.
Returns

Hyperparameters space represented as a ModelSpace.

run(df: pandas.core.frame.DataFrame, target: str, features: Optional[List[str]] = None, biased_groups: Optional[List[str]] = None, debiased_features: Optional[List[str]] = None, eval_metric: str = 'r2', validation_ratio: float = 0.2, prediction_quantiles: Optional[List[float]] = None, explain_samples: bool = False, model_directory: Optional[str] = None, presets: str = 'medium_quality_faster_train', hyperparameters: Optional[dict] = None, train_task_params: Optional[dict] = None, kfolds: int = 1, cross_validation_max_concurrency: int = 1, residuals_hyperparameters: Optional[dict] = None, drop_duplicates: bool = True, return_residuals: bool = False, kde_steps: int = 10, num_gpus: Union[int, str] = 0, time_limit: Optional[int] = None, drop_unique: bool = True, drop_useless_features: bool = True, split_by_datetime: bool = False, datetime_column: Optional[str] = None, ag_automm_enabled: bool = False, refit_full: bool = False, feature_prune: bool = True, feature_prune_time_limit: Optional[float] = None, intervention_run_params: Optional[Dict] = None, causal_feature_selection: bool = False, causal_feature_selection_max_concurrent_tasks: int = 20, ci_for_causal_feature_selection_task_params: Optional[dict] = None, ci_for_causal_feature_selection_run_params: Optional[dict] = None, run_pdp: bool = True, run_ice: bool = True, pdp_ice_grid_resolution: Optional[int] = 100, pdp_ice_n_samples: Optional[int] = 100, num_trials: int = 1, infer_limit: float = 60, infer_limit_batch_size: int = 100) Dict[str, Any]

Run this regression task and return results.

Parameters
  • df – Input data frame
  • target – Target columns in df. If there are empty values in this columns, predictions will be generated for these rows.
  • features – A list of features to be used for prediction. If None, all columns except target are used as features
  • biased_groups – A list of columns of groups that should be protected from biases (e.g. gender, race, age)
  • debiased_features – A list of proxy features that need to be debiased for protection of sensitive groups
  • eval_metric – Metric to be optimized during training. Possible values include ‘root_mean_squared_error’, ‘mean_squared_error’, ‘mean_absolute_error’, ‘median_absolute_error’, ‘r2’
  • validation_ratio – The ratio to randomly split data for training and validation
  • prediction_quantiles – List of quantiles. (in percentage)
  • explain_samples – If true, explanations for predictions in test and validation will be generated. It takes significantly longer time to run.
  • model_directory – Destination to store trained model. If not set, a temporary folder will be created
  • presets – Autogluon’s presets for training model. See https://auto.gluon.ai/stable/_modules/autogluon/tabular/predictor/predictor.html#TabularPredictor.fit.
  • hyperparameters – Autogluon’s hyperparameters for training model. See https://auto.gluon.ai/stable/_modules/autogluon/tabular/predictor/predictor.html#TabularPredictor.fit.
  • train_task_params – Parameters for _AAITrainTask constructor.
  • kfolds – Number of folds for cross validation. If 1, train test split is used instead.
  • cross_validation_max_concurrency – Maximum number of Ray actors used for cross validation (each actor execute for one split)
  • residuals_hyperparameters – Autogluon’s hyperparameteres used in final model of counterfactual predictions
  • drop_duplicates – Whether duplicate values should be dropped before training.
  • return_residuals – Whether residual values should be returned in counterfactual prediction
  • kde_steps – Steps used to generate KDE plots with debiasing
  • num_gpus – Number of GPUs used in nuisnace models in counterfactual prediction
  • time_limit – time limit (in seconds) of training. None means no time limit
  • drop_unique – Wether to drop columns with only unique values as preprocessing step
  • drop_useless_features – Whether to drop columns with only unique values at fit time
  • split_by_datetime – Wether train/validation sets are split using datetime. Training will be the most recent data and validation the latest.
  • datetime_column – The specified datetime column if split_by_datetime is enabled
  • ag_automm_enabled – Whether to use autogluon multimodal model on text columns.
  • refit_full – Wether the model is completely refitted on validation at the end of the task. Training time is divided by 2 to allow refitting for the other half of the time
  • feature_prune – Whether feature pruning is enabled. Can increase accuracy by removing harmful features for the model (features that are detrimental to the performance). If no training time left, this step is skipped
  • feature_prune_time_limit – Time limit for feature pruning.
  • intervention_run_params – Parameters for running an intervention task. Check actableai/tasks/intervention.py for more details.
  • causal_feature_selection – if True, it will search for direct causal features and use only these features for the prediction
  • causal_feature_selection_max_concurrent_tasks – maximum number of concurrent tasks for selecting causal features
  • ci_for_causal_feature_selection_task_params – Parameters for AAIDirectCausalFeatureSelectionTask
  • ci_for_causal_feature_selection_run_params – Kwargs for AAIDirectCausalFeatureSelectionTask’s run
  • run_pdp – Run Partial Dependency to get Partial Dependency Plot (PDP)
  • run_ice – Run Independent Conditional Expectation (ICE)
  • pdp_ice_grid_resolution – Maximum resolution of the grid to use for computation of the PDP and/or ICE
  • pdp_ice_n_samples – The number of rows to sample in df_train. If ‘None, no sampling is performed.
  • num_trials – The number of trials for hyperparameter optimization
  • infer_limit – The time in seconds to predict 1 row of data. For example, infer_limit=0.05 means 50 ms per row of data, or 20 rows / second throughput.
  • infer_limit_batch_size – The amount of rows passed at once to be predicted when calculating per-row speed. This is very important because infer_limit_batch_size=1 (online-inference) is highly suboptimal as various operations have a fixed cost overhead regardless of data size. If you can pass your test data in bulk, you should specify infer_limit_batch_size=10000. Must be an integer greater than 0.

Examples

>>> import pandas as pd
>>> from actableai.tasks.regression import AAIRegressionTask
>>> df = pd.read_csv("path/to/csv")
>>> result = AAIRegressionTask().run(
...     df,
...     'target_column',
... )
Returns
Dictionary containing the results for this task
  • ”status”: “SUCCESS” if the task successfully ran else “FAILURE”
  • ”messenger”: Message returned with the task
  • ”validations”: List of validations on the data.
    non-empty if the data presents a problem for the task
  • ”runtime”: Execution time of the task
  • ”data”: Dictionary containing the data for the task
    • ”validation_table”: Validation table
    • ”prediction_table”: Prediction table
    • ”predict_shaps”: Shapley values for prediction table
    • ”evaluate”: Evaluation metrics for the task
    • ”validation_shaps”: Shapley values for the validation table
    • ”importantFeatures”: Feature importance for the validation table
    • ”debiasing_charts”: If debiasing enabled, charts to display debiasing
    • ”leaderboard”: Leaderboard of the best trained models
  • ”model”: AAIModel to redeploy the model
Return type
Dict

actableai.tasks.sentiment_analysis module

class actableai.tasks.sentiment_analysis.AAISentimentAnalysisTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

Sentiment Analysis Task

run(df: pandas.core.frame.DataFrame, target: str, batch_size: int = 32, rake_threshold=1.0) Dict

Run a sentiment analysis on Input DataFrame

Parameters
  • df – Input DataFrame
  • target – Target for sentiment analysis
  • batch_size – Batch Size. Defaults to 32.
  • rake_threshold – Threshold for Rake scores used to extract keywords . Defaults to 1.0.

Examples

>>> df = pd.read_csv("path/to/dataframe")
>>> AAISentimentAnalysisTask().run(df, "target")
Returns
Dictionnary of results
Return type
Dict

actableai.tasks.text_extraction module

class actableai.tasks.text_extraction.AAITextExtractionTask(use_ray: bool = False, ray_params: Optional[dict] = None, optimize_memory_allocation: bool = False, collect_memory_usage: bool = False, optimize_memory_allocation_nrmse_threshold: float = 0.2, max_memory_offset: float = 0.1, optimize_gpu_memory_allocation: bool = False, collect_gpu_memory_usage: bool = False, optimize_gpu_memory_allocation_nrmse_threshold: float = 0.2, max_gpu_memory_offset: float = 0.1, resources_predictors_actor: Optional[ray.actor.ActorHandle] = None, cpu_percent_interval: float = 1.0, return_model: bool = True, upload_model: bool = False, s3_models_bucket: Optional[str] = None, s3_models_prefix: Optional[str] = None, seed=None)

Bases: actableai.tasks.base.AAITask

classmethod get_parameters() actableai.parameters.parameters.Parameters
run(df: pandas.core.frame.DataFrame, document_name_column: str, text_column: str, default_openai_api_key: str, openai_rate_limit_per_minute: float = None, parameters: Optional[Dict[str, Any]] = None) Dict[str, Any]

Abstract method called to run the task

Module contents

class actableai.tasks.TaskType(value)

Bases: str, enum.Enum

Enum representing the different tasks available

ASSOCIATION_RULES = 'association_rules'
BAYESIAN_REGRESSION = 'bayesian_regression'
CAUSAL_DISCOVERY = 'causal_discovery'
CAUSAL_INFERENCE = 'causal_inference'
CLASSIFICATION = 'classification'
CLASSIFICATION_TRAIN = 'classification_train'
CLUSTERING = 'clustering'
CORRELATION = 'correlation'
DATA_IMPUTATION = 'data_imputation'
DEC_ANCHOR_CLUSTERING = 'dec_anchor_clustering'
DIRECT_CAUSAL_FEATURE_SELECTION = 'direct_causal_feature_selection'
FORECAST = 'forecast'
INTERVENTION = 'intervention'
OCR = 'ocr'
REGRESSION = 'regression'
REGRESSION_TRAIN = 'regression_train'
SENTIMENT_ANALYSIS = 'sentiment_analysis'
TEXT_EXTRACTION = 'text_extraction'