etiq.etiq_dataissues package
Submodules
etiq.etiq_dataissues.core module
- etiq.etiq_dataissues.core.get_abstractdataset_issues(base_dataset: AbstractDataset, comparison_dataset: AbstractDataset, search_for_missing_features: bool = True, search_for_unknown_features: bool = True, identical_feature_filter: List[str] | None = None, range_feature_filter: List[str] | None = None, missing_category_feature_filter: List[str] | None = None, unknown_category_feature_filter: List[str] | None = None) List[DataIssue]
- etiq.etiq_dataissues.core.get_data_issues(base_dataset: DataFrame, comparison_dataset: DataFrame, categorical_features: List[str] | None = None, continuous_features: List[str] | None = None, search_for_missing_features: bool = True, search_for_unknown_features: bool = True, identical_feature_filter: List[str] | None = None, range_feature_filter: List[str] | None = None, missing_category_feature_filter: List[str] | None = None, unknown_category_feature_filter: List[str] | None = None) List[DataIssue]
etiq.etiq_dataissues.data_issues module
- class etiq.etiq_dataissues.data_issues.DataIssue(name: str, feature: str, value: str | float, segment: str)
Bases:
object
A simple data class to represent a data issue e.g. “unknown_category” found by a pipeline the data
- feature: str
- name: str
- segment: str
- value: str | float
- class etiq.etiq_dataissues.data_issues.DataIssueBuilder
Bases:
object
A builder for the Issue class
- add_feature_name(issue_name: str)
- add_issue_name(issue_name: str)
- add_segment(asegment_name: str)
- add_value(value: str | float)
- build() DataIssue
Builds and returns an Issue object
- Returns:
An issue built using data and parameters passed to the builder.
- Return type:
- reset()
Reset the builder.
- Returns:
This instance of the builder with data reset
- Return type:
etiq.etiq_dataissues.utils module
- etiq.etiq_dataissues.utils.categorical_feature_missing(base_dataset: DataFrame, comparison_dataset: DataFrame, afeature: str) List[Any]
- etiq.etiq_dataissues.utils.categorical_feature_unrecognized(base_dataset: DataFrame, comparison_dataset: DataFrame, afeature: str) List[Any]
- etiq.etiq_dataissues.utils.continuous_feature_above_base_max(base_dataset: DataFrame, comparison_dataset: DataFrame, afeature: str) float
- etiq.etiq_dataissues.utils.continuous_feature_below_base_min(base_dataset: DataFrame, comparison_dataset: DataFrame, afeature: str) float
- etiq.etiq_dataissues.utils.guess_cont_cat_col(data=None, names_col=None)
Function attempts to guess which columns have categorical data and which ones have continuous data
- Parameters:
data (numpy array or pandas Dataframe) – input data
names_col (numpy array of strings) – a list of all the column names in data
- Returns:
A tuple of Numpy arrays. The first array has column names that we assume are categorical features. The second array has column names that we assume are continuous features.
test to implemnet: there is no column name that is not categorized as continuous or categorical
- etiq.etiq_dataissues.utils.identical_values(base_dataset: DataFrame, comparison_dataset: DataFrame, afeature: str) bool
- etiq.etiq_dataissues.utils.missing_features(base_dataset: DataFrame, comparison_dataset: DataFrame) List[str]
- etiq.etiq_dataissues.utils.unknown_features(base_dataset: DataFrame, comparison_dataset: DataFrame) List[str]
Module contents
- class etiq.etiq_dataissues.DataIssue(name: str, feature: str, value: str | float, segment: str)
Bases:
object
A simple data class to represent a data issue e.g. “unknown_category” found by a pipeline the data
- feature: str
- name: str
- segment: str
- value: str | float
- class etiq.etiq_dataissues.DataIssueBuilder
Bases:
object
A builder for the Issue class
- add_feature_name(issue_name: str)
- add_issue_name(issue_name: str)
- add_segment(asegment_name: str)
- add_value(value: str | float)
- build() DataIssue
Builds and returns an Issue object
- Returns:
An issue built using data and parameters passed to the builder.
- Return type:
- reset()
Reset the builder.
- Returns:
This instance of the builder with data reset
- Return type:
- etiq.etiq_dataissues.get_abstractdataset_issues(base_dataset: AbstractDataset, comparison_dataset: AbstractDataset, search_for_missing_features: bool = True, search_for_unknown_features: bool = True, identical_feature_filter: List[str] | None = None, range_feature_filter: List[str] | None = None, missing_category_feature_filter: List[str] | None = None, unknown_category_feature_filter: List[str] | None = None) List[DataIssue]
- etiq.etiq_dataissues.get_data_issues(base_dataset: DataFrame, comparison_dataset: DataFrame, categorical_features: List[str] | None = None, continuous_features: List[str] | None = None, search_for_missing_features: bool = True, search_for_unknown_features: bool = True, identical_feature_filter: List[str] | None = None, range_feature_filter: List[str] | None = None, missing_category_feature_filter: List[str] | None = None, unknown_category_feature_filter: List[str] | None = None) List[DataIssue]