actableai.utils package

Submodules

actableai.utils.autogluon module

TODO write documentation

actableai.utils.autogluon.get_final_features(predictor, model_name)

TODO write documentation

actableai.utils.autogluon.transform_features(predictor, model_name, data)

TODO write documentation

actableai.utils.categorical_numerical_convert module

actableai.utils.categorical_numerical_convert.convert_categorical_to_num(df, inplace=False)

Convert categorical features in a dataframe to numerical values.

Parameters: df (pandas DataFrame): The dataframe containing the categorical features. inplace (bool, optional): Whether to perform modifications to df in-place

Returns: df (pandas DataFrame): The modified DataFrame with categorical features converted to numerical values. dict_label_encoders (dict): A dictionary containing the fitted LabelEncoder object for each converted column (the categorical features).

actableai.utils.categorical_numerical_convert.get_categorical_columns(df)
actableai.utils.categorical_numerical_convert.inverse_convert_categorical_to_num(df_new, d, feat_name=None)

Convert numerical values back to their original categorical values.

This function takes in a DataFrame and a dictionary of unique values for each column, and converts the numerical values in the DataFrame back to their original categorical values. It can be used to reverse the effect of the convert_categorical_to_num function.

Parameters: df (pandas DataFrame): The DataFrame containing the numerical values to be converted back to categorical. d (dict): A dictionary containing the fitted LabelEncoder object for each column. feat_name (str, optional): The name of a specific feature to be converted. If None, all features in the DataFrame will be converted.

Returns: df (pandas DataFrame): The modified DataFrame with numerical values converted back to categorical values.

Raises: ValueError: If the feature name (column) is not in the DataFrame.

actableai.utils.dataset_generator module

class actableai.utils.dataset_generator.DatasetGenerator

Bases: object

classmethod generate(columns_parameters: List[dict], rows: int = 1000, output_path: Optional[Union[str, pathlib.Path]] = None, save_parameters_path: Optional[Union[str, pathlib.Path]] = None, random_state: Optional[int] = None) Optional[pandas.core.frame.DataFrame]

Generate a dataset, this function generates random data and no sense should be expected from it

Parameters
  • columns_parameters (List of parameters for columns, see below) –
  • rows (Number of rows to generate) –
  • output_path (Path where the dataset will be saved, if None the function returns a DataFrame) –
  • save_parameters_path (If not None will save the parameters used to generate this dataset to this path) –
  • random_state (Seed for the random generator, used to fix the result, default: None) –
  • Common parameters:
{

“name”: <column_name>, # Name of the column, default: col_<index> “type”: <type>, # Type of the column, choices: [“text”, “number”, “date”] “values”: [<value_list>] # List of values, if len(values) == rows then those values will

be used in the given order. If not each row will pick a random value in values. It will always override other parameters

# If value is set type is omitted

} - Text column parameters {

“type”: “text”, “n_categories”: <n_categories>, # The number of categories (number of unique strings), default:

rows
“range”: (<min_range>, <max_range>), # Range for generated word lengths, min included, max excluded,
default: (5, 10)
“word_range”: (<min_range, <max_range>), # Range for the number of words to create, min included, max
excluded, default: (1, 2)

} - Number column parameters: {

“type”: “number”, “float”: <True or False>, # default: True “range”: (<min_range>, <max_range>), # Min included, max excluded, default: (0, <rows>)

} - Date column: {

“type”: “date”, “freq”: <frequency>, # See pandas date_range freq parameter, default: “D” “start”: <date>, # Start date, default: None “end”: <date> # End date, default: Today - <random_number_of_days>

# At least one of those three parameters (freq, start, and end) must be None

}

Examples

Column parameters examples: Text column containing yes or no: {

“values”: [“yes”, “no”]

} Text column containing unique random string of len 10 {

“type”: “text”, “range”: (10, 11)

} Number column with random float between 0 and 1 {

“type”: “number”, “float”: True, “range”: (0, 1)

} Number column containing either 10, 100 or 1000 {

“values”: [10, 100, 1000]

}

Return type
Pandas DataFrame containing the generated dataset
classmethod generate_from_file(parameters_path: Union[str, pathlib.Path], output_path: Optional[Union[str, pathlib.Path]] = None) Optional[pandas.core.frame.DataFrame]

Generate dataset from a file containing the parameters

Parameters
  • parameters_path (The path to the parameters json file) –
  • output_path (Path where the dataset will be saved, if None the function returns a DataFrame) –
Return type

Pandas DataFrame containing the generated dataset

actableai.utils.dowhy module

actableai.utils.dowhy.causal_model_to_dot(causal_model)

actableai.utils.language module

actableai.utils.language.get_language_display_name(langcode: str) str

actableai.utils.multilabel_predictor module

class actableai.utils.multilabel_predictor.MultilabelPredictor(labels, path, problem_types=None, eval_metrics=None, consider_labels_correlation=True, **kwargs)

Bases: object

Tabular Predictor for predicting multiple columns in table. Creates multiple TabularPredictor objects which you can also use individually. You can access the TabularPredictor for a particular label via: multilabel_predictor.get_predictor(label_i)

Parameters
  • labels (List[str]) – The ith element of this list is the column (i.e. label) predicted by the ith TabularPredictor stored in this object.
  • path (str) – Path to directory where models and intermediate outputs should be saved. If unspecified, a time-stamped folder called “AutogluonModels/ag-[TIMESTAMP]” will be created in the working directory to store all models. Note: To call fit() twice and save all results of each fit, you must specify different path locations or don’t specify path at all. Otherwise files from first fit() will be overwritten by second fit(). Caution: when predicting many labels, this directory may grow large as it needs to store many TabularPredictors.
  • problem_types (List[str]) – The ith element is the problem_type for the ith TabularPredictor stored in this object.
  • eval_metrics (List[str]) – The ith element is the eval_metric for the ith TabularPredictor stored in this object.
  • consider_labels_correlation (bool) – Whether the predictions of multiple labels should account for label correlations or predict each label independently of the others. If True, the ordering of labels may affect resulting accuracy as each label is predicted conditional on the previous labels appearing earlier in this list (i.e. in an auto-regressive fashion). Set to False if during inference you may want to individually use just the ith TabularPredictor without predicting all the other labels.
  • kwargs – Arguments passed into the initialization of each TabularPredictor.
  • Reference
  • --------------
  • https (//auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-multilabel.html) –
evaluate(data, **kwargs)

Returns dict where each key is a label and the corresponding value is the evaluate() output for just that label.

Parameters
  • data (str or autogluon.tabular.TabularDataset or pd.DataFrame) – Data to evalate predictions of all labels for, must contain all labels as columns. See documentation for TabularPredictor.evaluate().
  • kwargs – Arguments passed into the evaluate() call for each TabularPredictor (also passed into the predict() call).
fit(train_data, tuning_data=None, **kwargs)

Fits a separate TabularPredictor to predict each of the labels.

Parameters
  • train_data (str or autogluon.tabular.TabularDataset or pd.DataFrame) – See documentation for TabularPredictor.fit().
  • tuning_data (str or autogluon.tabular.TabularDataset or pd.DataFrame) – See documentation for TabularPredictor.fit().
  • kwargs – Arguments passed into the fit() call for each TabularPredictor.
get_predictor(label)

Returns TabularPredictor which is used to predict this label.

classmethod load(path)

Load MultilabelPredictor from disk path previously specified when creating this MultilabelPredictor.

multi_predictor_file = 'multilabel_predictor.pkl'
persist_models()

TODO write documentation

predict(data, **kwargs)

Returns DataFrame with label columns containing predictions for each label.

Parameters
  • data (str or autogluon.tabular.TabularDataset or pd.DataFrame) – Data to make predictions for. If label columns are present in this data, they will be ignored. See documentation for TabularPredictor.predict().
  • kwargs – Arguments passed into the predict() call for each TabularPredictor.
predict_proba(data, **kwargs)

Returns dict where each key is a label and the corresponding value is the predict_proba() output for just that label.

Parameters
  • data (str or autogluon.tabular.TabularDataset or pd.DataFrame) – Data to make predictions for. See documentation for TabularPredictor.predict() and TabularPredictor.predict_proba().
  • kwargs – Arguments passed into the predict_proba() call for each TabularPredictor (also passed into a predict() call).
save(path=None)

Save MultilabelPredictor to disk.

unpersist_models()

TODO write documentation

actableai.utils.openai module

actableai.utils.openai.num_tokens_from_messages(messages, model='gpt-3.5-turbo-0301')

Returns the number of tokens used by a list of messages.

actableai.utils.pdp_ice module

actableai.utils.pdp_ice.get_pdp_and_ice(model, df_train, features='all', pdp=True, ice=True, grid_resolution=100, verbosity=0, n_samples=None)

Get Partial Dependence Plot (PDP) and/or Individual Conditional Expectation (ICE) for a given model and dataframe.

Parameters: model: The trained model from AAIRegressionTask() or AAIClassificationTask() df_train (pandas DataFrame): dataset on which to compute the PDP/ICE features (list or str, optional): list of feature names/column numbers on

which to compute PDP/ICE, or ‘all’ to use all columns. If only one fetaure is required, its name or column number should be in a list.

pdp (bool, optional): set to True to compute PDP ICE (bool, optional): set to True to compute ICE grid_resolution (int, optional): number of points to sample in the grid

and plot (x-axis values)
verbosity (int, optional): 0 for no output, 1 for summary output, 2 for
detailed output
n_samples (int, optional): The number of rows to sample in df_train. If ‘None,
no sampling is performed.

Returns: A dictionary with keys as feature names and values as the computed PDP/ICE

results

If return_type=’raw’: tuple of two numpy arrays. First array represents the feature values and

second array represents the model predictions

If return_type=’plot’: sklearn.inspection.PartialDependenceDisplay object containing the plot

actableai.utils.river module

class actableai.utils.river.MultiOutputPipeline(pipeline: river.compose.pipeline.Pipeline, metric_class: Type[ray.util.metrics.Metric])

Bases: object

Wrapper around a pipeline with a multi output regressor or classifier

learn_one(x: dict, y, learn_unsupervised=False, **params)

Learn one data point and update metrics

predict_one(x: dict, learn_unsupervised=True)

Predict one data point, use the internal metrics to give the best output

class actableai.utils.river.MultiOutputRegressor(models: list)

Bases: river.base.regressor.Regressor, river.base.multi_output.MultiOutputMixin

Class representing a regressor with multiple output, one regressor per output

learn_one(x: dict, y: numbers.Number, **kwargs) river.base.regressor.Regressor

Learn one data point

predict_one(x: dict) List[numbers.Number]

Predict one data point

class actableai.utils.river.NRMSE

Bases: river.metrics.mse.RMSE

Normalized RMSE class (wrapper around river’s RMSE class)

get()

Return the current value of the metric.

revert(y_true, y_pred, sample_weight=1.0)

Revert the metric.

update(y_true, y_pred, sample_weight=1.0)

Update the metric.

actableai.utils.river.metrics_to_dict(metrics_object: river.metrics.base.Metrics) Dict[str, float]

Transform a river metrics object to a dictionary

Parameters
metrics_object – The metrics object containing the metrics
Return type
The metrics values as a dict

actableai.utils.sanitize module

actableai.utils.sanitize.sanitize_timezone(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Sanitize TimeZone from DataFrame

Parameters
df – Original DataFrame
Returns
Sanitized Dataframe
Return type
pd.DataFrame

actableai.utils.sklearn module

actableai.utils.sklearn.sklearn_canonical_pipeline(df, clf)

actableai.utils.testing module

actableai.utils.testing.generate_date_range(np_rng=None, start_date=None, min_periods=10, max_periods=60, periods=None, freq=None)

TODO write documentation

actableai.utils.testing.generate_forecast_dataset(np_rng, prediction_length, n_groups=1, n_targets=1, freq=None, n_real_dynamic_features=0, n_cat_dynamic_features=0, n_real_static_features=0, n_cat_static_features=0, date_range_kwargs=None)

TODO write documentation

actableai.utils.testing.generate_forecast_df(np_rng, prediction_length, n_group_by=0, n_targets=1, freq=None, n_real_static_features=0, n_cat_static_features=0, n_real_dynamic_features=0, n_cat_dynamic_features=0, date_range_kwargs=None)

TODO write documentation

actableai.utils.testing.generate_random_date(np_rng=None, min_year=1900, min_month=1, min_day=1, max_year=2000, max_month=1, max_day=1, random_state=None)

TODO write documentation

actableai.utils.testing.init_ray(**kwargs)
actableai.utils.testing.unittest_autogluon_hyperparameters()
actableai.utils.testing.unittest_dml_parameters()
actableai.utils.testing.unittest_estimator_parameters()
actableai.utils.testing.unittest_hyperparameters()

actableai.utils.typing module

Module contents

actableai.utils.check_if_integer_feature(X: pandas.core.series.Series)
actableai.utils.custom_precision_recall_curve(y_true, probas_pred, *, pos_label=None, sample_weight=None)
actableai.utils.debiasing_feature_generator_args()
actableai.utils.debiasing_hyperparameters()
actableai.utils.explanation_hyperparameters()
actableai.utils.fast_categorical_hyperparameters()
actableai.utils.fill_na(df, fillna_dict=None, fill_median=True)
actableai.utils.get_all_subclasses(cls: Type[actableai.utils.ClassType]) List[Type[actableai.utils.ClassType]]
actableai.utils.get_type_special(X: pandas.core.series.Series) str
actableai.utils.get_type_special_no_ag(X: pandas.core.series.Series) str

From autogluon library TODO improve

actableai.utils.handle_boolean_features(df)
actableai.utils.handle_datetime_features(df)
actableai.utils.is_fitted(transformer)
actableai.utils.is_gpu_available()
actableai.utils.is_text_column(X, text_ratio=0.1)
actableai.utils.memory_efficient_hyperparameters(ag_automm_enabled: bool = False, tabpfn_enabled: bool = False)
actableai.utils.preprocess_dataset(df)
actableai.utils.quantile_regression_hyperparameters()
actableai.utils.random_directory(path='')

Create random directory,