New approaches for heterogeneous transfer learning
Date of Issue2015
School of Computer Engineering
Centre for Computational Intelligence
In many real-world problems, it is often time-consuming and expensive to collect labeled data. To alleviate this challenge, transfer learning (TL) techniques that adapt a model from a related task with ample labeled data to a task of interest with little or no additional human supervision have been proposed in recent years. Most TL methods assume that the data come from different domains having the same feature space and dimensionality. However, the assumptions may also be violated in some real world applications such as text-based image classification, cross-language document classification, and cross system recommendation. To handle situations when the assumptions do not hold, new TL approaches that utilize heterogeneous feature spaces are needed to solve the heterogeneous transfer learning (HTL) problem. In this thesis, three novel HTL approaches are proposed to handle issues related to different settings: Sparse Heterogeneous Feature Representation (SHFR) is developed to address the sparsity issue as well as the HTL multi-class classification problem. Formulating the feature transformation matrix learning problem as a compressed sensing problem, we propose a solution to learn a sparse feature transformation matrix to map each feature from one domain to another one. Moreover, a generalized HTL error bound is derived under the multi-class setting. Specifically, in order to guarantee the reconstruction performance and enhance the prediction accuracy, we can construct a sufficient number of binary classifiers based on the error correcting output correcting (ECOC) scheme. To further speed up the estimation of the transformation matrix, we also present a very efficient batch-mode algorithm to solve the corresponding non-negative sparse recovery problem. Hybrid Heterogeneous Transfer Learning (HHTL) is proposed to allow the corresponding instances across domains to be biased in either the source or target domain. In other words, HHTL works well even when the cross-domain correspondences are not identically and independently drawn from two domains. Our solution utilizes a deep learning approach to learn a robust feature mapping between cross domain heterogeneous features as well as a better feature representation for mapped data. Heterogeneous Transfer Learning through Active Correspondences construction (HTLA) is proposed to enable realistic correspondence construction and better distribution matching. HTLA is based on matrix completion with a regularization term on distribution matching. For the cross-language document classification task, the construction of correspondences between documents of different languages is very time consuming for precise translation. However, existing HTL approaches assume these correspondences are given. To reduce the budget of constructing correspondences, we extend our proposed method to actively construct correspondences between domains. Extensive experimental results on both synthetic and real-world classification tasks are presented to verify the effectiveness of our proposed three approaches compared with the state-of-the-art baselines.