线性模型#
API 参考链接:LogisticRegression,LinearRegression
请在此处查看线性模型的支持仓库。
摘要
线性/逻辑回归,其中响应与其解释变量之间的关系通过线性预测函数建模。这是统计建模中的基础模型之一,训练时间快且具有良好的可解释性,但模型性能各异。此实现是 scikit-learn
中提供的线性/逻辑回归的轻量级封装。
工作原理
Christoph Molnar 的“可解释机器学习”电子书 [1] 对线性和回归模型有很好的概述,分别可以在此处和此处找到。
有关实现细节,scikit-learn 关于线性和回归模型的用户指南 [2] 非常扎实,可以在此处找到。
代码示例
以下代码将对乳腺癌数据集进行逻辑回归训练。提供的可视化将包括全局解释和局部解释。
from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from interpret.glassbox import LogisticRegression
from interpret import show
seed = 42
np.random.seed(seed)
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)
lr = LogisticRegression(max_iter=3000, random_state=seed)
lr.fit(X_train, y_train)
auc = roc_auc_score(y_test, lr.predict_proba(X_test)[:, 1])
print("AUC: {:.3f}".format(auc))
AUC: 0.998
/opt/hostedtoolcache/Python/3.9.21/x64/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:465: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.cn/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.cn/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
show(lr.explain_global())
show(lr.explain_local(X_test[:5], y_test[:5]), 0)
更多资源
参考文献
[1] Christoph Molnar. 可解释机器学习. Lulu. com, 2020.
[2] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg 等人. Scikit-learn: Python 中的机器学习. 机器学习研究杂志, 12:2825–2830, 2011.