论文题目 :《Transfer Adaptation Learning: A Decade Survey》

论文作者 :Lei Zhang

论文链接 http://cn.arxiv.org/pdf/1903.04687.pdf

在很多实际的情况中, 源域(source domain) 目标域(target domain) 之间存在:

  • 分布不匹配(distribution mismatch)
  • 领域偏移(domain shift)
  • 独立同分布(independent identical distribution, i.i.d) 的假设不再满足!

  • 迁移学习(transfer learning) 假设源域与目标域拥有不同的 联合概率分布
  • \[P(X_{source}, Y_{source}) \neq P(X_{target}, Y_{target})
  • 领域自适应(domain adaptation) 假设源域与目标域拥有不同的 边缘概率分布 , 但拥有相同的 条件概率分布
  • \[P(X_{source}) \neq P(X_{target}), P(Y_{source} | X_{source}) = P(Y_{target} | X_{target})

    实例权重调整自适应

    当训练集和测试集来自不同分布时, 这通常被称为 采样选择偏差(sample selection bias) 或者 协方差偏移(covariant shift) .

    实例权重调整方法旨在 通过非参数方式对跨域特征分布匹配直接推断出重采样的权重 .

    基于直觉的权重调整

    直接对原始数据进行权重调整.

    首次提出于NLP领域 [1] , 主要的方法有著名的 TrAdaBoost [2] .

    基于核映射的权重调整

    将原始数据映射到高维空间(如,再生核希尔伯特空间RKHS)中进行权重调整.

    主要思想是 通过重新采样源数据的权重来匹配再生核希尔伯特空间中源数据和目标数据之间的均值 .

    主要有两种非参数统计量来衡量分布差异:

  • 核均值匹配(kernel mean matching, KMM)
  • \[\begin{array}{l} {\min \limits_{\beta}\left\|E_{x^{\prime} \sim P_{r}^{\prime}}\left[\Phi\left(x^{\prime}\right)\right]-E_{x \sim P_{r}}[\beta(x) \Phi(x)]\right\|} \\ {\text {s.t.} \quad \beta(x) \geq 0, E_{x \sim P_{r}}[\beta(x)]=1} \end{array} \]

    Huang等人 [3] 首次提出通过调整源样本的 \(\beta\) 权重系数, 使得带权源样本和目标样本的KMM最小.

  • 最大均值差异(maximum mean discrepancy, MMD) [4] [5]
  • \[d_{\mathcal{H}}^{2}\left(\mathcal{D}_{s}, \mathcal{D}_{t}\right)=\left\|\frac{1}{M} \sum_{i=1}^{M} \phi\left(x_{i}^{s}\right)-\frac{1}{N} \sum_{j=1}^{N} \phi\left(x_{j}^{t}\right)\right\|_{\mathcal{H}}^{2} \]

    weighted MMD [6] 方法考虑了类别权重偏差.

    主要方法有基于k-means聚类的 KMapWeighted [7] , 基于MMD和 \(\ell_{2,1}\) -norm的 TJM [8] 等.

    主要思想是假设数据集被表征为两个不同的视角, 使两个分类器独立地从每个视角中进行学习.

    主要方法有 CODA [9] , 以及基于GAN的 RANN [10] 等.

    特征自适应

    特征自适应方法旨在 寻找多源数据(multiple sources)的共同特征表示 .

    基于特征子空间

    该方法假设 数据可以被低维线性子空间进行表示, 即低维的格拉斯曼流形(Grassmann manifold)被嵌入到高维数据中 .

    通常用PCA方法来构造该流形, 使得源域和目标域可以看成流形上的两个点, 并得到两者的测地线距离(geodesic flow).

    基于流形的方法有 SGF [11] GFK [12]

    基于子空间对齐的方法有 SA [13] , SDA [14] GTH [15]

    基于特征变换

    特征变换方法旨在 学习变换或投影矩阵,使得源域和目标域中的数据在某种分布度量准则下更接近 .

    该方法通过减少不同域之间的边缘分布和条件分布差异, 求解出最优的投影矩阵.

    主要方法有:

    基于边缘分布MMD的 TCA [16] , 条件分布MMD的 JDA [17]

    基于布拉格曼散度(Bregman divergence)的 TSL [18]

    基于希尔伯特-施密特独立性(Hilbert-Schmidt Independence Criterion) [19]

    该方法通过在带标签的源域中学习一个好的距离度量, 使得其能够应用于相关但不同的目标域中.

    主要方法有:

    基于一阶统计量的 RTML [20]

    基于二阶统计量的 CORAL [21]

    该方法假设数据的特征被分为三种类型:公共特征/源域特征/目标域特征.

    主要方法有:

  • 基于零填充(Zero Padding)的 EasyAdapt(EA) [22]
  • 基于生成式模型(Generative Model) [23]
  • 基于特征重构

    主要方法有:

  • 低秩重构(Low-rank Reconstruction) [24]
  • 稀疏重构(Sparse Reconstruction) [25]
  • 基于特征编码

    主要方法有:

  • 共享域字典(Domain-shared dictionary) [26]
  • 指定域字典(Domain-specific dictionary) [27]
  • 分类器自适应

    分类器自适应旨在 利用源域中带标签数据和目标域中少量带标签数据学习一个通用的分类器 .

    基于核分类器

    主要方法有:

    自适应支持向量机(adaptive support vector machine, ASVM) [28]

    基于多核学习(multiple kernel learning, MKL)的域迁移分类器 [29]

    基于流形正则项

    主要方法有 ARTL [30] , DMM [31] , MEDA [32] 等.

    基于贝叶斯分类器

    主要方法有核贝叶斯迁移学习 KBTL [33] 等.

    深度网络自适应

    2014年, Yosinski等人 [34] 讨论了深度神经网络中不同层特征的可迁移特性.

    基于边缘分布对齐

    主要方法有:

  • 深度域混淆 DDC [35]
  • 深度自适应网络 DAN [36]
  • 联合自适应网络 JAN [37] , 同时提出了 Joint MMD 准则
  • 基于条件分布对齐

    主要方法有深度迁移网络 DTN [38]

    基于自动编码器

    主要方法有边缘堆叠式降噪自动编码器 mSDA [39]

    对抗式自适应

    通过对抗目标(如,域判别器)来减少域间差异.

    基于梯度转换

    Ganin等人 [40] 首次提出可以通过添加一个简单的 梯度转换层(gradient reversal layer, GRL) 来实现领域自适应.

    基于Minimax优化

    Ajakan等人 [41] 首次结合分类损失和对抗目标, 提出了 DANN 方法.

    其它方法还有:

    对抗判别领域自适应 ADDA [42]

    条件领域对抗网络 CDAN [43]

    最大分类器差异 MCD [44]

    基于生成对抗网络

    主要方法有:

    CyCADA [45]

    Duplex GAN [46]

    基准数据集

  • Office-31 (3DA)
  • Office+Caltech-10 (4DA)
  • MNIST+USPS
  • Multi-PIE
  • COIL-20
  • MSRC+VOC2007
  • IVLSC
  • Cross-dataset Testbed
  • Office Home NEW
  • ImageCLEF
  • P-A-C-S NEW
  • J. Jiang and C. Zhai, Instance weighting for domain adaptation in nlp , in ACL, 2007, pp. 264–271. ↩︎

  • W. Dai, Q. Yang, G. R. Xue, and Y. Yu, Boosting for transfer learning , in ICML, 2007, pp. 193–200. ↩︎

  • J. Huang, A. Smola, A. Gretton, K. Borgwardt, and B. Scholkopf, Correcting sample selection bias by unlabeled data , in NIPS, 2007, pp. 1–8. ↩︎

  • A. Gretton, K. Borgwardt, M. Rasch, B. Schoelkopf, and A. Smola, A kernel method for the two-sample-problem , in NIPS, 2006. ↩︎

  • A. Gretton, K. Borgwardt, M. Rasch, B. Scholkopf, and A. Smola, A kernel two-sample test , Journal of Machine Learning Research, pp. 723–773, 2012 ↩︎

  • H. Yan, Y. Ding, P. Li, Q. Wang, Y. Xu, and W. Zuo, Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation , in CVPR, 2017, pp. 2272–2281 ↩︎

  • E. H. Zhong, W. Fan, J. Peng, K. Zhang, J. Ren, D. S. Turaga, and O. Verscheure, Cross domain distribution adaptation via kernel mapping , in ACM SIGKDD, 2009, pp. 1027–1036. ↩︎

  • M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu, Transfer joint matching for unsupervised domain adaptation , in CVPR, 2014, pp. 1410–1417. ↩︎

  • M. Chen, K. Q. Weinberger, and J. C. Blitzer, Co-training for domain adaptation , in NIPS, 2011. ↩︎

  • Q. Chen, Y. Liu, Z. Wang, I. Wassell, and K. Chetty, Re-weighted adversarial adaptation network for unsupervised domain adaptation , in CVPR, 2018, pp. 7976–7985. ↩︎

  • R. Gopalan, R. Li, and R. Chellappa, Domain adaptation for object recognition: An unsupervised approach , in ICCV, 2011, pp. 999–1006 ↩︎

  • B. Gong, Y. Shi, F. Sha, and K. Grauman, Geodesic flow kernel for unsupervised domain adaptation , in CVPR, 2012, pp. 2066–2073 ↩︎

  • B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars, Unsupervised visual domain adaptation using subspace alignment , in ICCV, 2013, pp. 2960–2967. ↩︎

  • B. Sun and K. Saenko, Subspace distribution alignment for unsupervised domain adaptation , in BMVC, 2015, pp. 24.1–24.10. ↩︎

  • J. Liu and L. Zhang, Optimal projection guided transfer hashing for image retrieval , in AAAI, 2018. ↩︎

  • S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, Domain adaptation via transfer component analysis , IEEE Trans. Neural Networks, vol. 22, no. 2, p. 199, 2011 ↩︎

  • M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu, Transfer feature learning with joint distribution adaptation , in ICCV, 2014, pp. 2200–2207. ↩︎

  • S. Si, D. Tao, and B. Geng, Bregman divergence-based regularization for transfer subspace learning , IEEE Trans. Knowledge and Data Engineering, vol. 22, no. 7, pp. 929–942, 2010. ↩︎

  • A. Gretton, O. Bousquet, A. Smola, and B. Scholkopf, Measuring statistical dependence with hilbert-schmidt norms , in ALT, 2005. ↩︎

  • Z. Ding and Y. Fu, Robust transfer metric learning for image classification , IEEE Trans. Image Processing, vol. 26, no. 2, p. 660670, 2017. ↩︎

  • B. Sun, J. Feng, and K. Saenko, Return of frustratingly easy domain adaptation , in AAAI, 2016, pp. 153–171. ↩︎

  • H. Daume III, Frustratingly easy domain adaptation , in arXiv,2009. ↩︎

  • R. Volpi, P. Morerio, S. Savarese, and V. Murino, Adversarial feature augmentation for unsupervised domain adaptation , in CVPR, 2018, pp. 5495–5504. ↩︎

  • I. H. Jhuo, D. Liu, D. T. Lee, and S. F. Chang, Robust visual domain adaptation with low-rank reconstruction , in CVPR, 2012, pp. 2168–2175. ↩︎

  • L. Zhang, W. Zuo, and D. Zhang, Lsdt: Latent sparse domain transfer learning for visual adaptation , IEEE Trans. Image Processing, vol. 25, no. 3, pp. 1177–1191, 2016. ↩︎

  • S. Shekhar, V. Patel, H. Nguyen, and R. Chellappa, Generalized domain-adaptive dictionaries , in CVPR, 2013, pp. 361–368. ↩︎

  • F. Zhu and L. Shao, Weakly-supervised cross-domain dictionary learning for visual recognition , International Journal of Computer Vision, vol. 109, no. 1-2, pp. 42–59, 2014. ↩︎

  • J. Yang, R. Yan, and A. G. Hauptmann, Cross-domain video concept detection using adaptive svms , in ACM MM, 2007, pp. 188–197. ↩︎

  • L. Duan, I. Tsang, D. Xu, and S. Maybank, Domain transfer svm for video concept detection , in CVPR, 2009 ↩︎

  • M. Long, J. Wang, G. Ding, S. Pan, and P. Yu, Adaptation regularization: a general framework for transfer learning , IEEE Trans. Knowledge and Data Engineering, vol. 26, no. 5, p. 10761089, 2014. ↩︎

  • Y. Cao, M. Long, and J. Wang, Unsupervised domain adaptation with distribution matching machines , in AAAI, 2018 ↩︎

  • J. Wang, W. Feng, Y. Chen, H. Yu, M. Huang, and P. S. Yu, Visual domain adaptation with manifold embedded distribution alignment , 2018. ↩︎

  • M. Gonen and A. Margolin, Kernelized bayesian transfer learning , in AAAI, 2014, pp. 1831–1839. ↩︎

  • J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, How transferable are features in deep neural networks , in NIPS, 2014. ↩︎

  • E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, Deep domain confusion: Maximizing for domain invariance , arXiv, 2014 ↩︎

  • M. Long, Y. Cao, J. Wang, and M. I. Jordan, Learning transferable features with deep adaptation networks , in ICML, 2015, pp. 97–105. ↩︎

  • M. Long, H. Zhu, J. Wang, and M. Jordan, Deep transfer learning with joint adaptation networks , in ICML, 2017. ↩︎

  • X. Zhang, F. Yu, S. Wang, and S. Chang, Deep transfer network: Unsupervised domain adaptation , in arXiv, 2015. ↩︎

  • M. Chen, Z. Xu, K. Weinberger, and F. Sha, Marginalized denoising autoencoders for domain adaptation , in ICML, 2012 ↩︎

  • Y. Ganin and V. Lempitsky, Unsupervised domain adaptation by backpropagation , in arXiv, 2015. ↩︎

  • H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, and M. Marchand, Domain-adversarial neural network , in arXiv, 2015 ↩︎

  • E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, Adversarial discriminative domain adaptation , in CVPR, 2017, pp. 7167–7176 ↩︎

  • M. Long, Z. Cao, J. Wang, and M. I. Jordan, Conditional adversarial domain adaptation , in NIPS, 2018. ↩︎

  • K. Saito, K. Watanabe, Y. Ushiku, and T. Harada, Maximum classifier discrepancy for unsupervised domain adaptation , in CVPR, 2018, pp. 3723–3732. ↩︎

  • J. Hoffman, E. Tzeng, T. Park, and J. Zhu, Cycada: Cycleconsistent adversarial domain adaptation , in ICML, 2018. ↩︎

  • L. Hu, M. Kan, S. Shan, and X. Chen, Duplex generative adversarial network for unsupervised domain adaptation , in CVPR, 2018, pp. 1498–1507. ↩︎

  •