差分隐私 EBMs#

API 参考链接:DPExplainableBoostingClassifierDPExplainableBoostingRegressor

有关完整详细信息,请参阅参考论文 [1]。 链接

代码示例

以下代码将使用 adult income 数据集训练一个 DPEBM 分类器。提供的可视化将包含全局解释和局部解释。

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

from interpret.privacy import DPExplainableBoostingClassifier
from interpret import show

df = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
    header=None)
df.columns = [
    "Age", "WorkClass", "fnlwgt", "Education", "EducationNum",
    "MaritalStatus", "Occupation", "Relationship", "Race", "Gender",
    "CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", "Income"
]
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

feature_types = ['continuous', 'nominal', 'continuous', 'nominal',
    'continuous', 'nominal', 'nominal', 'nominal', 'nominal', 'nominal',
    'continuous', 'continuous', 'continuous', 'nominal']

privacy_bounds = {"Age": (17, 90), "fnlwgt": (12285, 1484705), 
    "EducationNum": (1, 16), "CapitalGain": (0, 99999), 
    "CapitalLoss": (0, 4356), "HoursPerWeek": (1, 99)
}

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

dpebm = DPExplainableBoostingClassifier(random_state=None, epsilon=1.0, delta=1e-5, 
    feature_types=feature_types, privacy_bounds=privacy_bounds)
dpebm.fit(X_train, y_train)

auc = roc_auc_score(y_test, dpebm.predict_proba(X_test)[:, 1])
print("AUC: {:.3f}".format(auc))
AUC: 0.885
show(dpebm.explain_global())




show(dpebm.explain_local(X_test[:5], y_test[:5]), 0)




参考书目

[1] Harsha Nori, Rich Caruana, Zhiqi Bu, Judy Hanwen Shen, and Janardhan Kulkarni. Accuracy, Interpretability, and Differential Privacy via Explainable Boosting. In Proceedings of the 38th International Conference on Machine Learning, 8227-8237. 2021. 论文链接