etiq.etiq_dataissues package

Submodules

etiq.etiq_dataissues.core module

etiq.etiq_dataissues.core.get_abstractdataset_issues(base_dataset: AbstractDataset, comparison_dataset: AbstractDataset, search_for_missing_features: bool = True, search_for_unknown_features: bool = True, identical_feature_filter: List[str] | None = None, range_feature_filter: List[str] | None = None, missing_category_feature_filter: List[str] | None = None, unknown_category_feature_filter: List[str] | None = None) List[DataIssue]
etiq.etiq_dataissues.core.get_data_issues(base_dataset: DataFrame, comparison_dataset: DataFrame, categorical_features: List[str] | None = None, continuous_features: List[str] | None = None, search_for_missing_features: bool = True, search_for_unknown_features: bool = True, identical_feature_filter: List[str] | None = None, range_feature_filter: List[str] | None = None, missing_category_feature_filter: List[str] | None = None, unknown_category_feature_filter: List[str] | None = None) List[DataIssue]

etiq.etiq_dataissues.data_issues module

class etiq.etiq_dataissues.data_issues.DataIssue(name: str, feature: str, value: str | float, segment: str)

Bases: object

A simple data class to represent a data issue e.g. “unknown_category” found by a pipeline the data

feature: str
name: str
segment: str
value: str | float
class etiq.etiq_dataissues.data_issues.DataIssueBuilder

Bases: object

A builder for the Issue class

add_feature_name(issue_name: str)
add_issue_name(issue_name: str)
add_segment(asegment_name: str)
add_value(value: str | float)
build() DataIssue

Builds and returns an Issue object

Returns:

An issue built using data and parameters passed to the builder.

Return type:

DataIssue

reset()

Reset the builder.

Returns:

This instance of the builder with data reset

Return type:

DataIssueBuilder

etiq.etiq_dataissues.utils module

etiq.etiq_dataissues.utils.categorical_feature_missing(base_dataset: DataFrame, comparison_dataset: DataFrame, afeature: str) List[Any]
etiq.etiq_dataissues.utils.categorical_feature_unrecognized(base_dataset: DataFrame, comparison_dataset: DataFrame, afeature: str) List[Any]
etiq.etiq_dataissues.utils.continuous_feature_above_base_max(base_dataset: DataFrame, comparison_dataset: DataFrame, afeature: str) float
etiq.etiq_dataissues.utils.continuous_feature_below_base_min(base_dataset: DataFrame, comparison_dataset: DataFrame, afeature: str) float
etiq.etiq_dataissues.utils.guess_cont_cat_col(data=None, names_col=None)

Function attempts to guess which columns have categorical data and which ones have continuous data

Parameters:
  • data (numpy array or pandas Dataframe) – input data

  • names_col (numpy array of strings) – a list of all the column names in data

Returns:

A tuple of Numpy arrays. The first array has column names that we assume are categorical features. The second array has column names that we assume are continuous features.

test to implemnet: there is no column name that is not categorized as continuous or categorical

etiq.etiq_dataissues.utils.identical_values(base_dataset: DataFrame, comparison_dataset: DataFrame, afeature: str) bool
etiq.etiq_dataissues.utils.missing_features(base_dataset: DataFrame, comparison_dataset: DataFrame) List[str]
etiq.etiq_dataissues.utils.unknown_features(base_dataset: DataFrame, comparison_dataset: DataFrame) List[str]

Module contents

class etiq.etiq_dataissues.DataIssue(name: str, feature: str, value: str | float, segment: str)

Bases: object

A simple data class to represent a data issue e.g. “unknown_category” found by a pipeline the data

feature: str
name: str
segment: str
value: str | float
class etiq.etiq_dataissues.DataIssueBuilder

Bases: object

A builder for the Issue class

add_feature_name(issue_name: str)
add_issue_name(issue_name: str)
add_segment(asegment_name: str)
add_value(value: str | float)
build() DataIssue

Builds and returns an Issue object

Returns:

An issue built using data and parameters passed to the builder.

Return type:

DataIssue

reset()

Reset the builder.

Returns:

This instance of the builder with data reset

Return type:

DataIssueBuilder

etiq.etiq_dataissues.get_abstractdataset_issues(base_dataset: AbstractDataset, comparison_dataset: AbstractDataset, search_for_missing_features: bool = True, search_for_unknown_features: bool = True, identical_feature_filter: List[str] | None = None, range_feature_filter: List[str] | None = None, missing_category_feature_filter: List[str] | None = None, unknown_category_feature_filter: List[str] | None = None) List[DataIssue]
etiq.etiq_dataissues.get_data_issues(base_dataset: DataFrame, comparison_dataset: DataFrame, categorical_features: List[str] | None = None, continuous_features: List[str] | None = None, search_for_missing_features: bool = True, search_for_unknown_features: bool = True, identical_feature_filter: List[str] | None = None, range_feature_filter: List[str] | None = None, missing_category_feature_filter: List[str] | None = None, unknown_category_feature_filter: List[str] | None = None) List[DataIssue]