Reference¶
dapla_pseudo package¶
Subpackages¶
dapla_pseudo.constants module¶
This module defines constants that are referenced throughout the codebase.
- class Env(*values)¶
Bases:
str,EnumEnvironment variable keys.
- PSEUDO_CLIENT_MAX_TOTAL_PARTITIONS = 'PSEUDO_CLIENT_MAX_TOTAL_PARTITIONS'¶
- PSEUDO_CLIENT_ROWS_PER_PARTITION = 'PSEUDO_CLIENT_ROWS_PER_PARTITION'¶
- PSEUDO_SERVICE_AUTH_TOKEN = 'PSEUDO_SERVICE_AUTH_TOKEN'¶
- PSEUDO_SERVICE_URL = 'PSEUDO_SERVICE_URL'¶
- class MapFailureStrategy(*values)¶
Bases:
str,EnumUnknownCharacterStrategy defines how encryption/decryption should handle non-alphabet characters.
- RETURN_NULL = 'RETURN_NULL'¶
- RETURN_ORIGINAL = 'RETURN_ORIGINAL'¶
- class PredefinedKeys(*values)¶
Bases:
str,EnumNames of ‘global keys’ that the Dapla Pseudo Service is familiar with.
- PAPIS_COMMON_KEY_1 = 'papis-common-key-1'¶
- SSB_COMMON_KEY_1 = 'ssb-common-key-1'¶
- SSB_COMMON_KEY_2 = 'ssb-common-key-2'¶
- class PseudoFunctionTypes(*values)¶
Bases:
str,EnumNames of well known pseudo functions.
- DAEAD = 'daead'¶
- FF31 = 'ff31'¶
- MAP_SID = 'map-sid-ff31'¶
- REDACT = 'redact'¶
dapla_pseudo.exceptions module¶
Common exceptions for the Dapla Pseudo package.
- exception ExtensionNotValidError(message)¶
Bases:
ExceptionException raised when a file extension is invalid.
- Parameters:
message (str)
- Return type:
None
- exception FileInvalidError(message)¶
Bases:
ExceptionException raised when a file is in an invalid state.
- Parameters:
message (str)
- Return type:
None
- exception MimetypeNotSupportedError(message)¶
Bases:
ExceptionException raised when a Mimetype is invalid.
- Parameters:
message (str)
- Return type:
None
- exception NoFileExtensionError(message)¶
Bases:
ExceptionException raised when a file has no file extension.
- Parameters:
message (str)
- Return type:
None
dapla_pseudo.models module¶
The models module contains base classes used by other models.
- class APIModel(**data)¶
Bases:
BaseModelAPIModel is a base class for models that are used for communicating with the Dapla Pseudo Service.
It provides configuration for serializing/converting between camelCase (required by the API) and snake_case (used pythonically by this lib). It also provides some good defaults for converting a model to JSON.
- model_config: ClassVar[ConfigDict] = {'alias_generator': <function camelize>, 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- to_json()¶
Convert the model to JSON using camelCase aliases and only including assigned values.
- Return type:
str
dapla_pseudo.types module¶
Type declarations for dapla-toolbelt-pseudo.
dapla_pseudo.utils module¶
Utility functions for Dapla Pseudo.
- build_pseudo_field_request(pseudo_operation, mutable_df, rules, custom_keyset=None, target_custom_keyset=None, target_rules=None)¶
Builds a FieldRequest object.
- Return type:
list[PseudoFieldRequest|DepseudoFieldRequest|RepseudoFieldRequest]- Parameters:
pseudo_operation (PseudoOperation)
mutable_df (MutableDataFrame)
rules (list[PseudoRule])
custom_keyset (PseudoKeyset | str | None)
target_custom_keyset (PseudoKeyset | str | None)
target_rules (list[PseudoRule] | None)
- convert_to_date(sid_snapshot_date=None)¶
Converts the SID version date to the ‘date’ type, if it is a string.
If None, simply passes the None through the function.
- Return type:
date|None- Parameters:
sid_snapshot_date (date | str | None)
- encode_datadoc_variables(variables, indent=2)¶
Encore datadoc variables to a fromatted json list.
- Return type:
str- Parameters:
variables (list[Variable])
indent (int)
- find_multipart_obj(obj_name, multipart_files_tuple)¶
Find “multipart object” by name.
The requests lib specifies multipart file arguments as file-tuples, such as (‘filename’, fileobj, ‘content_type’) This method searches a tuple of such file-tuples ((file-tuple1),…,(file-tupleN)) It returns the fileobj for the first matching file-tuple with a specified filename.
- Parameters:
obj_name (
str) – The name of the objectmultipart_files_tuple (
set[Any]) – The multipart tuple
- Return type:
Any- Returns:
The fileobject associated with the matched tuple
Example
``` multipart_tuple = ((‘filename1’, fileobj1, ‘application/json’), (‘filename2’, fileobj2, ‘application/json’))
find_multipart_obj(“filename2”, multipart_tuple) -> fileobj2 ```
- get_file_format_from_file_name(file_path)¶
Extracts the file format from a file path.
- Return type:
- Parameters:
file_path (str | Path)
- redact_field(request)¶
Perform the redact operation locally.
This is in order to avoid making unnecessary requests to the API.
- Return type:
tuple[str,list[str|None],RawPseudoMetadata]- Parameters:
request (PseudoFieldRequest)
- running_asyncio_loop()¶
Returns the asyncio event loop if it exists.
- Return type:
AbstractEventLoop|None
dapla_pseudo.v1 package¶
dapla_pseudo.v1.client module¶
Module that implements a client abstraction that makes it easy to communicate with the Dapla Pseudo Service REST API.
- class PseudoClient(pseudo_service_url=None, auth_token=None, rows_per_partition=None, max_total_partitions=None)¶
Bases:
objectClient for interacting with the Dapla Pseudo Service REST API.
- Parameters:
pseudo_service_url (str | None)
auth_token (str | None)
rows_per_partition (str | None)
max_total_partitions (str | None)
- async static is_json_parseable(response)¶
Check if response content is JSON parseable.
- Return type:
bool- Parameters:
response (ClientResponse)
- async post_to_field_endpoint(path, timeout, pseudo_requests)¶
Post a request to the Pseudo Service field endpoint.
- Parameters:
path (
str) – Full URL to the endpointtimeout (
int) – Request timeoutpseudo_requests (
list[PseudoFieldRequest|DepseudoFieldRequest|RepseudoFieldRequest]) – Pseudo requests
- Returns:
A list of tuple of (field_name, data, metadata)
- Return type:
list[tuple[str, list[str], RawPseudoMetadata]]
dapla_pseudo.v1.depseudo module¶
Builder for submitting a pseudonymization request.
- class Depseudonymize¶
Bases:
objectStarting point for depseudonymization of datasets.
This class should not be instantiated, only the static methods should be used.
-
dataset:
DataFrame¶
- static from_pandas(dataframe)¶
Initialize a depseudonymization request from a pandas DataFrame.
- Return type:
_Depseudonymizer- Parameters:
dataframe (DataFrame)
- static from_polars(dataframe)¶
Initialize a depseudonymization request from a polars DataFrame.
- Return type:
_Depseudonymizer- Parameters:
dataframe (DataFrame)
-
schema:
Series|Schema¶
-
dataset:
dapla_pseudo.v1.pseudo module¶
Builder for submitting a pseudonymization request.
- class Pseudonymize¶
Bases:
objectStarting point for pseudonymization of datasets.
This class should not be instantiated, only the static methods should be used.
-
dataset:
DataFrame¶
- static from_pandas(dataframe)¶
Initialize a pseudonymization request from a Pandas DataFrame.
- Parameters:
dataframe (
DataFrame) – A Pandas DataFrame- Returns:
An instance of the _Pseudonymizer class.
- Return type:
_Pseudonymizer
- static from_polars(dataframe)¶
Initialize a pseudonymization request from a Polars DataFrame.
- Parameters:
dataframe (
DataFrame) – A Polars DataFrame- Returns:
An instance of the _Pseudonymizer class.
- Return type:
_Pseudonymizer
-
schema:
Series|Schema¶
-
dataset:
dapla_pseudo.v1.result module¶
Common API models for builder packages.
- class Result(pseudo_response, pseudo_operation=None, targeted_columns=None, user_provided_metadata=None, schema=None)¶
Bases:
objectResult represents the result of a pseudonymization operation.
- Parameters:
pseudo_response (PseudoFieldResponse)
pseudo_operation (PseudoOperation | None)
targeted_columns (list[str] | None)
user_provided_metadata (Datadoc | None)
schema (Series | Schema | None)
- property datadoc: str¶
Returns the pseudonymization metadata as a formatted json string.
- Returns:
A JSON-formattted string representing the datadoc metadata.
- Return type:
str
- Raises:
ValueError – If list of a variables is malformed.
- property datadoc_model: dict[str, Any] | list[Any]¶
Returns the pseudonymization metadata as a dictionary.
- Returns:
A dictionary representing the datadoc metadata.
- Return type:
dict
- Raises:
ValueError – If list of a variables is malformed.
- property metadata: dict[str, Any]¶
Returns the aggregated metadata for all fields as a dictionary.
- Returns:
A dictionary containing the pseudonymization metadata, where the keys are field names and the values are corresponding pseudo field metadata. If no metadata is set, returns an empty dictionary.
- Return type:
Optional[dict[str, str]]
- property metadata_details: dict[str, Any]¶
Returns the pseudonymization metadata as a dictionary, for each field that has been processed.
- Returns:
A dictionary containing the pseudonymization metadata, where the keys are field names and the values are corresponding pseudo field metadata. If no metadata is set, returns an empty dictionary.
- Return type:
Optional[dict[str, str]]
- to_file(file_path, **kwargs)¶
Write pseudonymized data to a file, with the metadata being written to the same folder.
- Parameters:
file_path (
str) – The path to the file to be written. If writing to a bucket, use the “gs://” prefix.**kwargs (
Any) – Additional keyword arguments to be passed the Polars writer function if the input data is a DataFrame. The specific writer function depends on the format of the output file, e.g. write_csv() for CSV files.
- Raises:
ValueError – If the result is not of type Polars DataFrame or if the output file format does not match the input file format.
- Return type:
None
- to_pandas(**kwargs)¶
Output pseudonymized data as a Pandas DataFrame.
- Parameters:
**kwargs (
Any) – Additional keyword arguments to be passed the Pandas reader function if the input data is from a file. The specific reader function depends on the format of the input file, e.g. read_csv() for CSV files.- Raises:
ValueError – If the result is not of type Polars DataFrame.
- Returns:
A Pandas DataFrame containing the pseudonymized data.
- Return type:
pd.DataFrame
- to_polars(**kwargs)¶
Output pseudonymized data as a Polars DataFrame.
- Parameters:
**kwargs (
Any) – Additional keyword arguments to be passed the Polars “from_dicts” function if the input data is from a file.- Raises:
ValueError – If the result is not of type Polars DataFrame.
- Returns:
A Polars DataFrame containing the pseudonymized data.
- Return type:
pl.DataFrame
- aggregate_metrics(metadata)¶
Aggregates logs and metrics. Each unique metric is summarized.
- Return type:
dict[str,Any]- Parameters:
metadata (dict[str, dict[str, list[Any]]])
dapla_pseudo.v1.supported_file_format module¶
Classes used to support reading of dataframes from file.
- class SupportedOutputFileFormat(*values)¶
Bases:
EnumSupportedOutputFileFormat contains the supported file formats when outputting the result to a file.
Note that this does NOT describe the valid file extensions of _input_ data when reading from a file.
- CSV = 'csv'¶
- JSON = 'json'¶
- PARQUET = 'parquet'¶
- XML = 'xml'¶
- ZIP = 'zip'¶
- read_to_pandas_df(supported_format, df_dataset, **kwargs)¶
Reads a file with a supported file format to a Pandas Dataframe.
- Return type:
DataFrame- Parameters:
supported_format (SupportedOutputFileFormat)
df_dataset (BytesIO | Path)
kwargs (Any)
- read_to_polars_df(supported_format, df_dataset, **kwargs)¶
Reads a file with a supported file format to a Polars Dataframe.
- Return type:
DataFrame- Parameters:
supported_format (SupportedOutputFileFormat)
df_dataset (BytesIO | Path)
kwargs (Any)
- write_from_df(df, supported_format, file_like, **kwargs)¶
Writes to a file with a supported file format from a Dataframe.
- Return type:
None- Parameters:
df (DataFrame)
supported_format (SupportedOutputFileFormat)
file_like (BufferedWriter)
kwargs (Any)
- write_from_dicts(data, supported_format, file_like)¶
Writes data from a list of dicts to a file of the given format.
- Return type:
None- Parameters:
data (list[dict[str, Any]])
supported_format (SupportedOutputFileFormat)
file_like (BufferedWriter)
dapla_pseudo.v1.validation module¶
Builder for submitting a validation request.
- class Validator¶
Bases:
objectStarting point for validation of datasets.
This class should not be instantiated, only the static methods should be used.
- static from_file(file_path_str, **kwargs)¶
Initialize a validation request from a pandas dataframe read from file.
- Parameters:
file_path_str (
str) – The path to the file to be read.**kwargs (
Any) – Additional keyword arguments to be passed to the file reader.
- Raises:
FileNotFoundError – If no file is found at the specified local path.
- Returns:
An instance of the _FieldSelector class.
- Return type:
_FieldSelector
Examples
# Read from bucket from dapla_pseudo import Validator bucket_path = “gs://ssb-staging-dapla-felles-data-delt/felles/smoke-tests/fruits/data.parquet” field_selector = Validator.from_file(bucket_path)
# Read from local filesystem from dapla_pseudo import Validator
local_path = “some_file.csv” field_selector = Validator.from_file(local_path)
- static from_pandas(dataframe)¶
Initialize a validation request from a pandas DataFrame.
- Return type:
_FieldSelector- Parameters:
dataframe (DataFrame)
- static from_polars(dataframe)¶
Initialize a validation request from a polars DataFrame.
- Return type:
_FieldSelector- Parameters:
dataframe (DataFrame)