Kernel ridge regression for
supervised classification

1.1 Introduction
In the previous chapter, several prominent supervised linear classifiers have been developed. According to Theorem 1.1, they all satisfy the learning subspace property (LSP)
and therefore the optimal solution of w has the following form:This chapter will address the following topics.

（9.1）

for certain coefficients {an}. It follows that the original linear discriminant function

f (x) = wTx b = xTw b

may now be re-expressed as

（9.2）

which has the appearance of a kernel-based formulation. This chapter will further that a nonlinear discriminant function may be obtained by simply replacing

（9.3）

By being allowed to choose a desirable nonlinear kernel function, we gain a great deal of design versatility to cope with various application scenarios.

This chapter will explore the following topics related to kernel-based regressors for supervised classification.

(i) In Section 9.2, the linear classifier FDA is extended to its nonlinear counterpart, named kernel discriminant analysis (KDA). Such a kernel-based classifier is formally derived by optimizing a safety margin measured by the distance metric defined in the kernel-induced intrinsic space. The same result may also be obtained by using the kernel trick, i.e. by simply substituting the linear kernel in FDA by a nonlinear kernel.

(ii) In Section 9.3, we shall develop kernel ridge regression (KRR) learning models. Ridge regression is one of the classic techniques used to enhance the robustness of classifiers. The principle lies in the incorporation of a penalty term, parameterized by the ridge factor rho;, into the loss function in order to control the regressor coefficients. KRR models may be developed via one of the following two kernel-induced vector spaces.

bull; Intrinsic vector space. In this approach, the classic ridge regression techniques, which were originally applied to the original vector space, will be reapplied to the training feature vectors in the intrinsic space, whose dimension equals to the intrinsic degree J.

bull; Empirical vector space. Using the LSP, we can convert the intrinsic-space optimizer into its empirical-space counterpart. This leads to a kernel-matrixbased learning model whose dimension equals the empirical degree N, i.e. the size of the training dataset.

This section also explores the intricate relationship and tradeoff between the intrinsic and empirical representations.
(iii) In Section 9.4, the perturbational discriminant analysis (PDA), which was originally applied to the original vector space, will be reapplied to the intrinsic space, leading to a kernel-based PDA. The section will also show that the three kernelbased classifiers (KRR, LS-SVM, and PDA) are basically equivalent; therefore, for convenience, the term KRR will be used to represent these three equivalent
classifiers.
(iv) The focus of Section 9.5 is placed on numerical properties and robustness of the KRR learning models. The robustness analysis would be best conducted in the spectral space because the effect of errors in variables can be isolated componentwise. The regularization effect can be quantitatively expressed by a regression ratio. This offers a simple explanation of how oversubscription of sensitive components can be effectively avoided.
(v) In Section 9.6, simulation studies are conducted on several real-world datasets. The performance comparison between KDA and KRR demonstrates that the robustness and performance of the classifiers can be improved significantly by incorporating a proper ridge parameter rho; into the KRR learning models.
(vi) Section 9.7 presents a new way of exploiting the WEC analysis to improve the learned model. The most alarming aspect of KRRrsquo;s WEC is that error is linearly proportional to , implying that erroneous and untrustworthy training vectors unduly receive high empirical weights. Intuitively, the performance should improve since the learned model can now be freed from the undue influence of the removed subset. This leads to a new pruned-KRR technique (PPDA). The basic principle behind PPDA lies in the removal of a small fraction of potentially harmful “anti-support” vectors from the training dataset. In short, PPDA is an iterative algorithm aiming to trim the training dataset to a selective subset. Experimental studies on an ECG dataset demonstrated that each iteration of PPDA can continue improving the prediction performance until the “surviving vectors” have
been reduced to a small fraction of the original training size.

(vii) Section 9.8 extends the binary classification formulation to multi-class, and, moreover, multi-label, classification formulations.
(viii) Unsupervised learning criteria designed for PCA is unsuitable for supervised application, whose objective is to discriminate among different classes. In this regard, the SNR criterion adopted by FDA is a much more relevant metric. In Section 9.9, a SODA subspace projection method offers a numerically efficient procedure to approximately maximize an SoSNR criterion. On the other hand, the PCArsquo;s trace-norm optimizer can be naturally modified to permit the teacherrsquo;s information to be incorporated into the optimizer, thus enhancing the discriminant capability. It can be shown that all PCA, MD-FDA, PC-DCA, and DCA have the same form of trace-norm formulation and their optimal solutions may be obtained from the (principal or mixed) eigenvectors of their respective discriminant matrices.

1.2 Kernel-based

剩余内容已隐藏，支付完成后下载完整资料

附录A 译文

监督分类金属的内核岭回归

1.1介绍

在前面的章节中,几个著名监督线性分类器已经开发出来。根据定理1.1,他们都满足学习子空间属性(LSP)因此w具有以下形式的最优解

（9.1）

对于某些系数{一}。由此可见,原线性判别函数

f(x)=wTx b=xTw b

可能现在重新被

（9.2）

基于公式的外观。本章将进一步证明可以通过简单非线性判别函数取代了欧几里得内积通过非线性函数,导致

(9.3)

被允许选择一个理想的非线性核函数, 我们获得了大量的设计多功能性来应对不同的应用场景。

本章将探讨下列主题相关的基于监督分类的解释变量。

（1）在9.2节中,FDA的线性分类器扩展到其非线性,名叫内核判别分析(KDA)。这种基于分类器是正式派生通过优化安全裕度测量的距离度量定义内核诱导的内在空间。同样的结果也可能获得通过使用内核的诀窍,即通过简单地用线性内核在FDA非线性内核。

(2) 在9.3节中,我们将开发内核岭回归(KRR)学习模型。岭回归是一个经典的技术用来提高分类器的鲁棒性。它的原理在于整合一个点球,岭参数化的因素rho;,为损失函数来控制回归量系数。KRR模型可能是发达国家通过以下两个内核诱导的向量空间之一。

bull;本征向量空间。在这种方法中,经典的岭回归技术,最初应用于原来的向量空间,将重新应用到培训内在空间的特征向量,其维数等于内在程度J。

bull;经验向量空间。使用太阳能发电,我们可以把内在空间

优化器在其经验空间。这导致内核基于矩阵的学习模型的尺寸等于经验度N,即训练数据集的大小。

这一节还探讨了错综复杂的关系和内在和经验表示之间的权衡。

(3) 在9.4节,微扰判别分析(PDA),这是起源之一应用于原来的向量空间,将重新应用到内在空间,导致基于PDA。部分也将显示三个内核——基于分类器(KRR、回归和PDA)基本上是等价的;因此,为了方便,KRR将这个词用于表示这三种分类器。

(4) 在9.5节的重点是放在数值属性和KRR学习模型的鲁棒性。鲁棒性分析最好在光谱空间进行,因为错误的影响变量可以孤立的离散。正则化效应可以定量表达的回归系数。这提供了一个简单的解释,如何有效地避免敏感组件的超额认购。

（5）在9.6节,模拟研究进行一些真实的数据集。KDA,KRR之间的性能比较表明,分类器的鲁棒性和性能可以明显改善将适当的岭参数rho;纳入KRR学习模型。

（6） 9.7节提出了一种新的利用WEC分析改善学习的方法模型。最令人担忧的方面KRR WEC是错误Ei线性正比于人工智能的,这意味着错误的和不可靠的训练向量过度接受经验权重高。直观,性能应该改善自学习模型现在可以摆脱不正当影响的子集。这将导致一个新的修剪KRR技术(PPDA)。PPDA背后的基本原理在于消除有害的“反支持”的一小部分向量从训练数据集。简而言之,PPDA迭代算法旨在削减训练数据集选择性子集。擅长对研究的心电图数据集显示每个迭代PPDA可以继续改善预测性能,直到“幸存的向量”已经减少到原始培训规模的一小部分。

(7) 在9.8节扩展二进制分类制定多层次,而且,此外,多标记、分类配方。

(8) 无监督学习标准为PCA不适合监督程序,其目的是区分不同类别之一。在这方面,信噪比则通过FDA更相关的指标。在9.9节,一个苏打水子空间投影方法提供了一个数值有效过程大约最大化SoSNR判据。另一方面,PCA的跟踪规范优化器可以自然地修改,允许老师的信息被纳入优化器,从而提高判别能力。MD-FDA,它可以显示所有PCA PC-DCA,和DCA具有相同形式的跟踪规范制定和他们的最优解可能从(主或混合)获得各自的判别矩阵的特征向量。

1.2基于判别分析(KDA)

有一个标准过程扩展线性分类基于非线性分类。首先,一种内在的空间可以诱导从任何给定的内核函数满足Mercer条件。然后大多数(如果不是全部的话)线性分类器,如伦敦或LDA)可以方便地应用于内核诱导空间,导致基于非线性分类器。这种类型的学习模式已经被众多研究者探索[10,63,76,63,76,179,180,181,189,226,241]。此外,他们通常被称为费舍尔判别分析内核。突出的对比对其线性变体,即FDA或LDA,我们应当基于内核的名字变体内核判别分析(KDA)。

参照图9.1,学习决定的向量u可以遵循以下所需的平等:

hellip;，N （9.4）

回想一下,内在空间的维数称为内在学位j .一个有用的内核是将一个具有足够的学位。因此,在这里我们将简单地假设J gt; n .在这种情况下,规定约束Eq的数量。公式(9.4)低于的未知数。因此我们可以利用代数方程组,制定适合一个我要说在前面讨论在8.4节。

我要说系统,目标是找到一个分类器最大的分离以内核诱导内在空间。更确切的说,现在分离边缘特征向量u的规范决定。如公式(8.38),这将导致以下优化配方:

服从于= 0 forall;i = 1, . . . , N, （9.5）

在这里,一个向量x在原始空间映射到一个新的代表向量在其内在空间。内在空间的维数用J,这可能是有限或无限。

错误在哪里

采用{ , = 1,hellip;N }的拉格朗日乘数法,拉格朗日函数就变成了

（9.6）

L的零梯度(u,b,a)对u的LSP条件:

（9.7）

用公式(9.7)替换公式9.6),我们得到

（9.8）

我们表示内核矩阵K = { },(i,j)th的元素是 = K()。L(a,b)的零梯度对b和收益率分别形成条件

并且

Ka = y minus; be.

同样,以一种更简洁的矩阵形式,我们得到

（9.9）

与公式(8.65) 相同的外观。注意,当内核矩阵K是满秩的,它再次封闭的解:

其中（9.10）

一般来说,K将非奇异的如果Jge;N. 一旦一个最佳解决,判别函数可以很容易获得

在文献中,这种封闭的解称为内核判别分析(KDA)解决方案。

示例1.1在异或(XOR)数据集的数据集,训练向量

，，，，

和是积极而和是负面的。如果内核对XOR矩阵就是

（9.11）

封闭形式的解决方案

a和b的最优解可以通过公式(9.9) 解决,即

，（9.12）

并且解决线性方程可以获得

注意,原有经验特征向量推导公式(3.50)。这将导致以下(非线性)决策边界:

1.3内核岭回归(KRR)监督分类

本节扩展线性回归/正则化技术在第7章中引入基于学习模型,因此线性分类扩展到非线性分类。

1.3.1设备上装KRR和回归模型:内在空间的方法

岭回归的原理在于惩罚项的合并到损失函数来控制回归量系数。可调的岭参数rho;,惩罚项可以有效地减轻过度学习或病态问题,从而增强学习分类器的鲁棒性。因此,KRR提供了一个统一的治疗已经决定和我要说的系统。KRR基本上是相当于苏奎因和范德维尔提出的最小二乘支持向量机(二)”[262]。他们共享相同的学习制定基于岭回归(RR)公式8.49)。

KRR (回归模型)学习模型

学习的目标是找到向量u和阈值b这样的决定

（9.13）

其中

forall;i = 1,hellip;, N

矩阵表示法，

（9.14）

在固有的空间解决方案

零梯度的E(u,b)对u可以获得

（9.15）

这就引出下面的最优决策向量:

（9.16）

归零的一阶梯度E(u,b)对b导致

（9.17）

结合公式(9.15)和(9.17)的解KRR可能源自于一个矩阵系统类似于公式(8.54),i，e

（9.18）

其中并且微扰证交所分类器是相同的。

1.3.2内核学习模式:经验空间的方法

学习子空间属性

通过公式（9.15）我们得到

因此,存在一个n维向量(目前仍是一个未知的)

这样

（9.19）

这建立了太阳能发电的有效性, i.e. u isin;跨度，太阳能发电是工具性的知识用于实际的解决方案。更确切的说,堵塞LSP,公式(9.19),公式(9.14),我们得到

（9.20）

美世条件的规则

假定不满足Mercer条件和K不是半正定,成本函数在公式(9.20)没有下限。换句话说,公式(9.20)不构成一个有意义的极小化准则。假设Mercer条件满足,K将半正定,于是总是一个合法的最佳解决方案的存在。

推导最优KRR解决方案

零梯度的Ersquo;(a,b)，对a导致

[K rho;I]a = y minus; be.

零梯度的Ersquo;(a,b)，对b导致

结合两个零梯度方程,我们获得一个配方一样公式(8.71)(除了现在):

（9.21）

由于K rho;I总是非奇异的,如果rho;gt; 0,RR有封闭解如下:

,其中（9.22）

由此可见,判别函数

（9.23）

当rho;→0,Eq。KDA公式(9.21)是收敛的,看到公式(9.9)。进一步假设K是可逆的,KRR解决方案公式(9.22)收敛KDA解决方案在公式(9.10)。

KRR学习模型的经验总结如下。

算法9.1(KRR学习算法在经验空间)给定的训练数据集(X,Y)和一个特定的内核函数K(X,Y),KRR回归算法包含以下步骤。

bull;计算Ntimes;N对称内核矩阵

（9.24）

bull;计算经验决策向量a和b阈值通过公式(9.21):

（9.25）

bull;判别函数

（9.26）

bull;可以形成决策规则如下:

（9.27）

1.3.3两个公式的等价性证明

我们迄今为止独立导出最优KRR解决方案对经验的内在空间和空间构想。从逻辑上讲,这两种方法应该导致相同的解决方案。然而,作为一种替代方法验证,我们现在提供一个代数方法直接连接两个解决方案。

基于内核的矩阵方法,通过自左乘公式(9.25) 指出,我们得到

（9.28）

通过伍德伯里的矩阵身份[303]，我们得到

并且它遵循

（9.29）

从方程式(9.19)和(9.22),我们获得

在自左乘以上[rho;I S],我们得到

（9.30）

通过结合方程式(9.28)和(9.30),我们得到以下矩阵系统:

（9.31）

这相当于制定公式(8.54)派生的原始空间。

<stro

剩余内容已隐藏，支付完成后下载完整资料</stro

资料编号：[497890]，资料为PDF文档或Word文档，PDF文档可免费转换为Word

原文和译文剩余内容已隐藏，您需要先支付 30元 才能查看原文和译文全部内容！立即支付

以上是毕业论文外文翻译，课题毕业论文、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。

注册

找回密码

监督分类金属的内核岭回归外文翻译资料

bull; Intrinsic vector space. In this approach, the classic ridge regression techniques, which were originally applied to the original vector space, will be reapplied to the training feature vectors in the intrinsic space, whose dimension equals to the intrinsic degree J.

bull; Empirical vector space. Using the LSP, we can convert the intrinsic-space optimizer into its empirical-space counterpart. This leads to a kernel-matrixbased learning model whose dimension equals the empirical degree N, i.e. the size of the training dataset.

1.2 Kernel-based

附录A 译文

您可能感兴趣的文章

登录

bull; Intrinsic vector space. In this approach, the classic ridge regression techniques, which were originally applied to the original vector space, will be reapplied to the training feature vectors in the intrinsic space, whose dimension equals to the intrinsic degree J.

bull; Empirical vector space. Using the LSP, we can convert the intrinsic-space optimizer into its empirical-space counterpart. This leads to a kernel-matrixbased learning model whose dimension equals the empirical degree N, i.e. the size of the training dataset.

1.2 Kernel-based

附录A 译文

您可能感兴趣的文章