Reference¶

dapla_pseudo package¶

Subpackages¶

dapla_pseudo.constants module¶

This module defines constants that are referenced throughout the codebase.

class Env(*values)¶

Bases: str, Enum

Environment variable keys.

PSEUDO_CLIENT_MAX_TOTAL_PARTITIONS = 'PSEUDO_CLIENT_MAX_TOTAL_PARTITIONS'¶

PSEUDO_CLIENT_ROWS_PER_PARTITION = 'PSEUDO_CLIENT_ROWS_PER_PARTITION'¶

PSEUDO_SERVICE_AUTH_TOKEN = 'PSEUDO_SERVICE_AUTH_TOKEN'¶

PSEUDO_SERVICE_URL = 'PSEUDO_SERVICE_URL'¶

class MapFailureStrategy(*values)¶

Bases: str, Enum

UnknownCharacterStrategy defines how encryption/decryption should handle non-alphabet characters.

RETURN_NULL = 'RETURN_NULL'¶

RETURN_ORIGINAL = 'RETURN_ORIGINAL'¶

class PredefinedKeys(*values)¶

Bases: str, Enum

Names of ‘global keys’ that the Dapla Pseudo Service is familiar with.

PAPIS_COMMON_KEY_1 = 'papis-common-key-1'¶

SSB_COMMON_KEY_1 = 'ssb-common-key-1'¶

SSB_COMMON_KEY_2 = 'ssb-common-key-2'¶

class PseudoFunctionTypes(*values)¶

Bases: str, Enum

Names of well known pseudo functions.

DAEAD = 'daead'¶

FF31 = 'ff31'¶

MAP_SID = 'map-sid-ff31'¶

REDACT = 'redact'¶

class PseudoOperation(*values)¶

Bases: str, Enum

Pseudo operation.

DEPSEUDONYMIZE = 'depseudonymize'¶

PSEUDONYMIZE = 'pseudonymize'¶

REPSEUDONYMIZE = 'repseudonymize'¶

class UnknownCharacterStrategy(*values)¶

Bases: str, Enum

UnknownCharacterStrategy defines how encryption/decryption should handle non-alphabet characters.

DELETE = 'delete'¶

FAIL = 'fail'¶

REDACT = 'redact'¶

SKIP = 'skip'¶

dapla_pseudo.exceptions module¶

Common exceptions for the Dapla Pseudo package.

exception ExtensionNotValidError(message)¶

Bases: Exception

Exception raised when a file extension is invalid.

Parameters:: message (str)
Return type:: None

exception FileInvalidError(message)¶

Bases: Exception

Exception raised when a file is in an invalid state.

Parameters:: message (str)
Return type:: None

exception MimetypeNotSupportedError(message)¶

Bases: Exception

Exception raised when a Mimetype is invalid.

Parameters:: message (str)
Return type:: None

exception NoFileExtensionError(message)¶

Bases: Exception

Exception raised when a file has no file extension.

Parameters:: message (str)
Return type:: None

dapla_pseudo.models module¶

The models module contains base classes used by other models.

class APIModel(**data)¶

Bases: BaseModel

APIModel is a base class for models that are used for communicating with the Dapla Pseudo Service.

It provides configuration for serializing/converting between camelCase (required by the API) and snake_case (used pythonically by this lib). It also provides some good defaults for converting a model to JSON.

model_config: ClassVar[ConfigDict] = {'alias_generator': <function camelize>, 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_json()¶

Convert the model to JSON using camelCase aliases and only including assigned values.

Return type:: str

dapla_pseudo.types module¶

Type declarations for dapla-toolbelt-pseudo.

dapla_pseudo.utils module¶

Utility functions for Dapla Pseudo.

build_pseudo_field_request(pseudo_operation, mutable_df, rules, custom_keyset=None, target_custom_keyset=None, target_rules=None)¶

Builds a FieldRequest object.

Return type:

list[PseudoFieldRequest | DepseudoFieldRequest | RepseudoFieldRequest]

Parameters:

pseudo_operation (PseudoOperation)
mutable_df (MutableDataFrame)
rules (list[PseudoRule])
custom_keyset (PseudoKeyset | str | None)
target_custom_keyset (PseudoKeyset | str | None)
target_rules (list[PseudoRule] | None)

convert_to_date(sid_snapshot_date=None)¶

Converts the SID version date to the ‘date’ type, if it is a string.

If None, simply passes the None through the function.

Return type:: date | None
Parameters:: sid_snapshot_date (date | str | None)

encode_datadoc_variables(variables, indent=2)¶

Encore datadoc variables to a fromatted json list.

Return type:

str

Parameters:

variables (list[Variable])
indent (int)

find_multipart_obj(obj_name, multipart_files_tuple)¶

Find “multipart object” by name.

The requests lib specifies multipart file arguments as file-tuples, such as (‘filename’, fileobj, ‘content_type’) This method searches a tuple of such file-tuples ((file-tuple1),…,(file-tupleN)) It returns the fileobj for the first matching file-tuple with a specified filename.

Parameters:

obj_name (str) – The name of the object
multipart_files_tuple (set[Any]) – The multipart tuple

Return type:

Any

Returns:

The fileobject associated with the matched tuple

Example

``` multipart_tuple = ((‘filename1’, fileobj1, ‘application/json’), (‘filename2’, fileobj2, ‘application/json’))

find_multipart_obj(“filename2”, multipart_tuple) -> fileobj2 ```

get_file_format_from_file_name(file_path)¶

Extracts the file format from a file path.

Return type:: SupportedOutputFileFormat
Parameters:: file_path (str | Path)

redact_field(request)¶

Perform the redact operation locally.

This is in order to avoid making unnecessary requests to the API.

Return type:: tuple[str, list[str | None], RawPseudoMetadata]
Parameters:: request (PseudoFieldRequest)

running_asyncio_loop()¶

Returns the asyncio event loop if it exists.

Return type:: AbstractEventLoop | None

dapla_pseudo.v1 package¶

dapla_pseudo.v1.client module¶

Module that implements a client abstraction that makes it easy to communicate with the Dapla Pseudo Service REST API.

class PseudoClient(pseudo_service_url=None, auth_token=None, rows_per_partition=None, max_total_partitions=None)¶

Bases: object

Client for interacting with the Dapla Pseudo Service REST API.

Parameters:

pseudo_service_url (str | None)
auth_token (str | None)
rows_per_partition (str | None)
max_total_partitions (str | None)

async static is_json_parseable(response)¶

Check if response content is JSON parseable.

Return type:: bool
Parameters:: response (ClientResponse)

async post_to_field_endpoint(path, timeout, pseudo_requests)¶

Post a request to the Pseudo Service field endpoint.

Parameters:

path (str) – Full URL to the endpoint
timeout (int) – Request timeout
pseudo_requests (list[PseudoFieldRequest | DepseudoFieldRequest | RepseudoFieldRequest]) – Pseudo requests

Returns:

A list of tuple of (field_name, data, metadata)

Return type:

list[tuple[str, list[str], RawPseudoMetadata]]

dapla_pseudo.v1.depseudo module¶

Builder for submitting a pseudonymization request.

class Depseudonymize¶

Bases: object

Starting point for depseudonymization of datasets.

This class should not be instantiated, only the static methods should be used.

dataset: DataFrame | LazyFrame¶

static from_pandas(dataframe)¶

Initialize a depseudonymization request from a pandas DataFrame.

Return type:: _Depseudonymizer
Parameters:: dataframe (DataFrame)

static from_polars(dataframe)¶

Initialize a depseudonymization request from a polars DataFrame.

Return type:: _Depseudonymizer
Parameters:: dataframe (DataFrame | LazyFrame)

schema: Series | Schema¶

dapla_pseudo.v1.pseudo module¶

Builder for submitting a pseudonymization request.

class Pseudonymize¶

Bases: object

Starting point for pseudonymization of datasets.

This class should not be instantiated, only the static methods should be used.

dataset: DataFrame | LazyFrame¶

static from_pandas(dataframe)¶

Initialize a pseudonymization request from a Pandas DataFrame.

Parameters:: dataframe (DataFrame) – A Pandas DataFrame
Returns:: An instance of the _Pseudonymizer class.
Return type:: _Pseudonymizer

static from_polars(dataframe)¶

Initialize a pseudonymization request from a Polars DataFrame.

Parameters:: dataframe (DataFrame | LazyFrame) – A Polars DataFrame
Returns:: An instance of the _Pseudonymizer class.
Return type:: _Pseudonymizer

schema: Series | Schema¶

dapla_pseudo.v1.result module¶

Common API models for builder packages.

class Result(pseudo_response, pseudo_operation=None, targeted_columns=None, user_provided_metadata=None, schema=None)¶

Bases: object

Result represents the result of a pseudonymization operation.

Parameters:

pseudo_response (PseudoFieldResponse)
pseudo_operation (PseudoOperation | None)
targeted_columns (list[str] | None)
user_provided_metadata (Datadoc | None)
schema (Series | Schema | None)

property datadoc: str¶

Returns the pseudonymization metadata as a formatted json string.

Returns:: A JSON-formattted string representing the datadoc metadata.
Return type:: str
Raises:: ValueError – If list of a variables is malformed.

property datadoc_model: dict[str, Any] | list[Any]¶

Returns the pseudonymization metadata as a dictionary.

Returns:: A dictionary representing the datadoc metadata.
Return type:: dict
Raises:: ValueError – If list of a variables is malformed.

property metadata: dict[str, Any]¶

Returns the aggregated metadata for all fields as a dictionary.

Returns:: A dictionary containing the pseudonymization metadata, where the keys are field names and the values are corresponding pseudo field metadata. If no metadata is set, returns an empty dictionary.
Return type:: Optional[dict[str, str]]

property metadata_details: dict[str, Any]¶

Returns the pseudonymization metadata as a dictionary, for each field that has been processed.

Returns:: A dictionary containing the pseudonymization metadata, where the keys are field names and the values are corresponding pseudo field metadata. If no metadata is set, returns an empty dictionary.
Return type:: Optional[dict[str, str]]

to_file(file_path, **kwargs)¶

Write pseudonymized data to a file, with the metadata being written to the same folder.

Parameters:

file_path (str) – The path to the file to be written. If writing to a bucket, use the “gs://” prefix.
**kwargs (Any) – Additional keyword arguments to be passed the Polars writer function if the input data is a DataFrame. The specific writer function depends on the format of the output file, e.g. write_csv() for CSV files.

Raises:

ValueError – If the result is not of type Polars DataFrame or if the output file format does not match the input file format.

Return type:

None

to_pandas(**kwargs)¶

Output pseudonymized data as a Pandas DataFrame.

Parameters:: **kwargs (Any) – Additional keyword arguments to be passed the Pandas reader function if the input data is from a file. The specific reader function depends on the format of the input file, e.g. read_csv() for CSV files.
Raises:: ValueError – If the result is not of type Polars DataFrame.
Returns:: A Pandas DataFrame containing the pseudonymized data.
Return type:: pd.DataFrame

to_polars(**kwargs)¶

Output pseudonymized data as a Polars DataFrame.

Parameters:: **kwargs (Any) – Additional keyword arguments to be passed the Polars “from_dicts” function if the input data is from a file.
Raises:: ValueError – If the result is not of type Polars DataFrame.
Returns:: A Polars DataFrame containing the pseudonymized data.
Return type:: pl.DataFrame

to_polars_lazy(**kwargs)¶

Output pseudonymized data as a Polars LazyFrame.

Parameters:: **kwargs (Any) – Additional keyword arguments to be passed the Polars “from_dicts” function if the input data is from a file.
Raises:: ValueError – If the result is not of type Polars LazyFrame.
Returns:: A Polars LazyFrame containing the pseudonymized data.
Return type:: pl.LazyFrame

aggregate_metrics(metadata)¶

Aggregates logs and metrics. Each unique metric is summarized.

Return type:: dict[str, Any]
Parameters:: metadata (dict[str, dict[str, list[Any]]])

dapla_pseudo.v1.supported_file_format module¶

Classes used to support reading of dataframes from file.

class SupportedOutputFileFormat(*values)¶

Bases: Enum

SupportedOutputFileFormat contains the supported file formats when outputting the result to a file.

Note that this does NOT describe the valid file extensions of _input_ data when reading from a file.

CSV = 'csv'¶

JSON = 'json'¶

PARQUET = 'parquet'¶

XML = 'xml'¶

ZIP = 'zip'¶

read_to_pandas_df(supported_format, df_dataset, **kwargs)¶

Reads a file with a supported file format to a Pandas Dataframe.

Return type:

DataFrame

Parameters:

supported_format (SupportedOutputFileFormat)
df_dataset (BytesIO | Path)
kwargs (Any)

read_to_polars_df(supported_format, df_dataset, **kwargs)¶

Reads a file with a supported file format to a Polars Dataframe.

Return type:

DataFrame

Parameters:

supported_format (SupportedOutputFileFormat)
df_dataset (BytesIO | Path)
kwargs (Any)

write_from_df(df, supported_format, file_path, **kwargs)¶

Writes to a file with a supported file format from a Dataframe.

Return type:

None

Parameters:

df (DataFrame)
supported_format (SupportedOutputFileFormat)
file_path (str)
kwargs (Any)

write_from_dicts(data, supported_format, file_like)¶

Writes data from a list of dicts to a file of the given format.

Return type:

None

Parameters:

data (list[dict[str, Any]])
supported_format (SupportedOutputFileFormat)
file_like (BufferedWriter)

write_from_lazy_df(ldf, supported_format, file_path, **kwargs)¶

Writes to a file with a supported file format from a Dataframe.

Return type:

None

Parameters:

ldf (LazyFrame)
supported_format (SupportedOutputFileFormat)
file_path (str)
kwargs (Any)

dapla_pseudo.v1.validation module¶

Builder for submitting a validation request.

class Validator¶

Bases: object

Starting point for validation of datasets.

This class should not be instantiated, only the static methods should be used.

static from_file(file_path_str, **kwargs)¶

Initialize a validation request from a pandas dataframe read from file.

Parameters:

file_path_str (str) – The path to the file to be read.
**kwargs (Any) – Additional keyword arguments to be passed to the file reader.

Raises:

FileNotFoundError – If no file is found at the specified local path.

Returns:

An instance of the _FieldSelector class.

Return type:

_FieldSelector

Examples

# Read from bucket from dapla_pseudo import Validator bucket_path = “gs://ssb-staging-dapla-felles-data-delt/felles/smoke-tests/fruits/data.parquet” field_selector = Validator.from_file(bucket_path)

# Read from local filesystem from dapla_pseudo import Validator

local_path = “some_file.csv” field_selector = Validator.from_file(local_path)

static from_pandas(dataframe)¶

Initialize a validation request from a pandas DataFrame.

Return type:: _FieldSelector
Parameters:: dataframe (DataFrame)

static from_polars(dataframe)¶

Initialize a validation request from a polars DataFrame.

Return type:: _FieldSelector
Parameters:: dataframe (DataFrame)