SSB Coefficient Maker

PyPI Status Python Version License

Documentation Tests Coverage Quality Gate Status

pre-commit Black Ruff Poetry

Features

  • Arbitrary decimal precision support using mpmath for more accurate calculations

  • Validation system for detecting and handling invalid values (NaN, Inf, pd.NA)

  • Coefficient calculation based on formula definitions stored in a dataframe

  • Comprehensive error reporting with detailed diagnostics for debugging formulas

  • Support for mixed operations between DataFrames and Series

  • Configurable precision and error handling to suit different use cases

  • Flexible column naming in coefficient definition tables

Requirements

  • python >=3.10

  • click >=8.0.1

  • pandas >=2.2.3

  • numpy >=2.2.3

  • sympy >=1.13.3

  • mpmath >=1.3.0

  • pydantic >=2.10.6

Installation

You can install SSB Coefficient Maker via pip from PyPI:

pip install ssb-coefficient-maker

# or alternatively, if you're using poetry
poetry add ssb-coefficient-maker

Usage

Basic Formula Evaluation

The FormulaEvaluator allows you to evaluate mathematical expressions using pandas DataFrames and Series:

import pandas as pd
import numpy as np
from ssb_coefficient_maker import FormulaEvaluator

# Create some sample data
data = {
    'matrix_a': pd.DataFrame({
        'col1': [1.0, 2.0, 3.0],
        'col2': [4.0, 5.0, 6.0],
        'col3': [7.0, 8.0, 9.0],
    }),
    'vector_b': pd.Series([10.0, 20.0, 30.0])  # Note: length matches the number of columns in matrix_a
}

# Initialize the evaluator with default settings
evaluator = FormulaEvaluator(data)

# Evaluate a formula
result = evaluator.evaluate_formula('matrix_a * vector_b')
print(result)

This would produce output similar to:

     col1   col2   col3
0   10.0   80.0  210.0
1   20.0  100.0  240.0
2   30.0  120.0  270.0

Computing Multiple Coefficients

import pandas as pd
from ssb_coefficient_maker import CoefficientCalculator

# Create input data
data = {
    'input_matrix': pd.DataFrame({
        'A': [1.0, 2.0],
        'B': [3.0, 4.0]
    }),
    'adjustment': pd.Series([0.9, 1.1], index=['A', 'B'])  # Series with column names as index
}

# Define coefficient formulas
coef_map = pd.DataFrame({
    'coefficient_name': ['adjusted_matrix', 'squared_matrix'],
    'formula': ['input_matrix * adjustment', 'input_matrix * input_matrix']
})

# Create calculator with custom column names and safe settings
calculator = CoefficientCalculator(
    data,
    coef_map,
    result_name_col='coefficient_name',  # Specify which column contains result names
    formula_name_col='formula',          # Specify which column contains formulas
    adp_enabled=True,                    # Use arbitrary precision
    fill_invalid=True,                   # Replace invalid values with zeros
    verbose=True                         # Print detailed information during calculation
)

# Compute all coefficients
results = calculator.compute_coefficients()

# Access the results
adjusted = results['adjusted_matrix']
squared = results['squared_matrix']

Handling Division by Zero

import pandas as pd
from ssb_coefficient_maker import FormulaEvaluator

# Data with potential division by zero
data = {
    'numerator': pd.DataFrame({'A': [1.0, 2.0], 'B': [3.0, 4.0]}),
    'denominator': pd.DataFrame({'A': [1.0, 0.0], 'B': [0.0, 2.0]})
}

# Safe evaluator that replaces Inf/NaN with zeros
safe_eval = FormulaEvaluator(data, fill_invalid=True)
result = safe_eval.evaluate_formula('numerator / denominator')
print(result)

Output:

     A    B
0  1.0  0.0
1  0.0  2.0

Working with High Precision

import pandas as pd
from ssb_coefficient_maker import FormulaEvaluator

# Create data with fractions that produce repeating decimals
data = {
    'numerator': pd.Series([1, 2, 1]),
    'denominator': pd.Series([3, 3, 7])
}

# Compare precision differences in division operations
print("Arbitrary precision result (50 digits):")
high_prec = FormulaEvaluator(data, decimal_precision=50)
print(high_prec.evaluate_formula('numerator / denominator'))

print("\nStandard precision result (float64):")
std_prec = FormulaEvaluator(data, adp_enabled=False)
print(std_prec.evaluate_formula('numerator / denominator'))

The actual representation of these values would be:

# Arbitrary precision result (50 digits):
# Each value is stored as an mpmath.mpf object with 50 digits of precision
0    0.33333333333333333333333333333333333333333333333333
1    0.66666666666666666666666666666666666666666666666667
2    0.14285714285714285714285714285714285714285714285714
dtype: object

# Standard precision result (float64):
# Each value is stored as a 64-bit floating point number with ~15-17 significant digits
0    0.3333333333333333
1    0.6666666666666666
2    0.14285714285714285
dtype: float64

Please see the [Reference Guide] for more detailed examples and advanced usage.

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, SSB Coefficient Maker is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from Statistics Norway’s SSB PyPI Template.