多任務學習

2019-11-11 00:11:18

字體：大中小

來源：轉載

供稿：網友

http://blog.csdn.net/demon7639/article/details/41804619?locationNum=1

幾種分類問題的區別：多類分類，多標簽分類，多示例學習，多任務學習

標簽：機器學習分類2014-12-08 15:49 713人閱讀評論(0) 收藏舉報本文章已收錄于：

機器學習知識庫

分類：機器學習（14）

目錄(?)[+]

多類分類（Multiclass Classification）

一個樣本屬于且只屬于多個類中的一個，一個樣本只能屬于一個類，不同類之間是互斥的。

典型方法：

One-vs-All or One-vs.-rest：

將多類問題分成N個二類分類問題，訓練N個二類分類器，對第i個類來說，所有屬于第i個類的樣本為正（positive）樣本，其他樣本為負（negative）樣本，每個二類分類器將屬于i類的樣本從其他類中分離出來。

one-vs-one or All-vs-All：

訓練出N(N-1)個二類分類器，每個分類器區分一對類(i,j)。

多標簽分類(multilabel classification)

又稱，多標簽學習、多標記學習，不同于多類分類，一個樣本可以屬于多個類別（或標簽），不同類之間是有關聯的。

典型方法

問題轉換方法

問題轉換方法的核心是“改造樣本數據使其適應現有學習算法”。該類方法的思路是通過處理多標記訓練樣本，使其適應現有的學習算法，也就是將多標記學習問題轉換為現有的學習問題進行求解。

代表性學習算法有一階方法Binary Relevance，該方法將多標記學習問題轉化為“二類分類( binary classification )”問題求解；二階方法Calibrated Label Ranking，該方法將多標記學習問題轉化為“標記排序( labelranking )問題求解；高階方法Random k-labelset，該方法將多標記學習問題轉化為“多類分類(Multiclass classification)”問題求解。

算法適應方法

算法適應方法的核心是“改造現有的單標記學習算法使其適應多標記數據”。該類方法的基本思想是通過對傳統的機器學習方法的改進，使其能夠解決多標記問題。

代表性學習算法有一階方法ML-kNN}，該方法將“惰性學習(lazy learning )”算法k近鄰進行改造以適應多標記數據；二階方法Rank-SVM，該方法將“核學習(kernel learning )”算法SVM進行改造以適應多標記數據；高階方法LEAD，該方法將“貝葉斯學習(Bayes learning)算法”Bayes網絡進行改造以適應多標記數據。

多示例學習（multi-instance learning）

在此類學習中，訓練集由若干個具有概念標記的包（bag）組成，每個包包含若干沒有概念標記的示例。若一個包中至少有一個正例，則該包被標記為正（positive），若一個包中所有示例都是反例，則該包被標記為反（negative）。通過對訓練包的學習，希望學習系統盡可能正確地對訓練集之外的包的概念標記進行預測。

多任務學習（Multi-task learning）

多任務學習（Multi-task learning）是和單任務學習（single-task learning）相對的一種機器學習方法。在機器學習領域，標準的算法理論是一次學習一個任務，也就是系統的輸出為實數的情況。復雜的學習問題先被分解成理論上獨立的子問題，然后分別對每個子問題進行學習，最后通過對子問題學習結果的組合建立復雜問題的數學模型。多任務學習是一種聯合學習，多個任務并行學習，結果相互影響。

拿大家經常使用的school data做個簡單的對比，school data是用來預測學生成績的回歸問題的數據集，總共有139個中學的15362個學生，其中每一個中學都可以看作是一個預測任務。單任務學習就是忽略任務之間可能存在的關系分別學習139個回歸函數進行分數的預測，或者直接將139個學校的所有數據放到一起學習一個回歸函數進行預測。而多任務學習則看重任務之間的聯系，通過聯合學習，同時對139個任務學習不同的回歸函數，既考慮到了任務之間的差別，又考慮到任務之間的聯系，這也是多任務學習最重要的思想之一。

多任務學習早期的研究工作源于對機器學習中的一個重要問題，即“歸納偏置(inductive bias)”問題的研究。機器學習的過程可以看作是對與問題相關的經驗數據進行分析，從中歸納出反映問題本質的模型的過程。歸納偏置的作用就是用于指導學習算法如何在模型空間中進行搜索，搜索所得模型的性能優劣將直接受到歸納偏置的影響，而任何一個缺乏歸納偏置的學習系統都不可能進行有效的學習。不同的學習算法(如決策樹，神經網絡，支持向量機等)具有不同的歸納偏置，人們在解決實際問題時需要人工地確定采用何種學習算法，實際上也就是主觀地選擇了不同的歸納偏置策略。一個很直觀的想法就是，是否可以將歸納偏置的確定過程也通過學習過程來自動地完成，也就是采用“學習如何去學(learning to learn)”的思想。多任務學習恰恰為上述思想的實現提供了一條可行途徑，即利用相關任務中所包含的有用信息，為所關注任務的學習提供更強的歸納偏置。

典型方法

目前多任務學習方法大致可以總結為兩類，一是不同任務之間共享相同的參數（common parameter），二是挖掘不同任務之間隱藏的共有數據特征（latent feature）。

=================================================================================================

----------------------------------------------------------------我是個人理解分割線---------------------------------------------------------

單任務就是已經是元任務，不能再細劃分為多個子任務了，比如中文分詞等任務

個人認為的多任務學習是這樣的：比如說目前的事件抽取，子任務1是觸發詞（trigger）的識別，子任務2是關系的抽取，對這個事件抽取而言，我們可以使用串行的方法首先識別觸發詞其次判斷觸發詞和論元之間的關系。也可以采用聯合的方法：觸發詞的識別和關系的判斷是同時進行的，兩個子任務是相互影響的。

不過這樣理解的話與上面的概念有偏差，待確認。

========================================================================================================

Multi-task learning（多任務學習）簡介

標簽： machine learningmulti-task learning2014-08-07 21:30 5918人閱讀評論(0) 收藏舉報

分類：Multi-task learning

1. 什么是Multi-task learning？

Multi-tasklearning （多任務學習）是和single-task learning （單任務學習）相對的一種機器學習方法。拿大家經常使用的school data做個簡單的對比，school data是用來預測學生成績的回歸問題的數據集，總共有139個中學的15362個學生，其中每一個中學都可以看作是一個預測任務。單任務學習就是忽略任務之間可能存在的關系分別學習139個回歸函數進行分數的預測，或者直接將139個學校的所有數據放到一起學習一個回歸函數進行預測。而多任務學習則看重任務之間的聯系，通過聯合學習，同時對139個任務學習不同的回歸函數，既考慮到了任務之間的差別，又考慮到任務之間的聯系，這也是多任務學習最重要的思想之一。

2. Multi-task learning的優勢

Multi-tasklearning的研究也有大概20年之久，雖然沒有單任務學習那樣受關注，但是也不時有不錯的研究成果出爐，與單任務學習相比，多任務學習的優勢在哪呢？上節我們已經提到了單任務學習的過程中忽略了任務之間的聯系，而現實生活中的學習任務往往是有千絲萬縷的聯系的，比如多標簽圖像的分類，人臉的識別等等，這些任務都可以分為多個子任務去學習，多任務學習的優勢就在于能發掘這些子任務之間的關系，同時又能區分這些任務之間的差別。

3. Multi-tasklearning的學習方法

目前多任務學習方法大致可以總結為兩類，一是不同任務之間共享相同的參數（common parameter），二是挖掘不同任務之間隱藏的共有數據特征（latent feature）。下面將簡單介紹幾篇比較經典的多任務學習的論文及算法思想。

A. Regularizedmulti-task learning

這篇文章很有必要一看，文中提出了基于最小正則化方程的多任務學習方法，并以SVM為例給出多任務學習支持向量機，將多任務學習與經典的單任務學習SVM聯系在一起，并給出了詳細的求解過程和他們之間的聯系，當然實驗結果也證明了多任務支持向量機的優勢。文中最重要的假設就是所有任務的分界面共享一個中心分界面，然后再次基礎上平移，偏移量和中心分界面最終決定了當前任務的分界面。

B. Convex multi-taskfeature learning

本文也是一篇典型的多任務學習模型，并且是典型的挖掘多任務之間共有特征的多任務模型，文中給出了多任務特征學習的一個框架，也成為后來許多多任務學習參考的基礎。

C. Multitasksparsity via maximum entropy discrimination

這篇文章可以看作是比較全面的總結性文章，文中總共討論了四種情況，feature selection, kernel selection,adaptive pooling and graphical model structure。并詳細介紹了四種多任務學習方法，很具有參考價值。

本文只是對多任務學習做了簡單的介紹，想要深入了解的話，建議大家看看上面提到的論文。

Reference

[1] T. Evgeniouand M. Pontil. Regularized multi-task learning. In PRoceeding of thetenth ACM SIGKDD international conference on Knowledge Discovery and DataMining, 2004.

[2] T. Jebara. MultitaskSparsity via Maximum Entropy Discrimination. In Journal of Machine LearningResearch, (12):75-110, 2011.

[3] A. Argyriou,T. Evgeniou and M. Pontil. Convex multitask feature learning. In MachineLearning, 73(3):243-272, 2008.

中國科學技術大學多媒體計算與通信教育部-微軟重點實驗室 MultiMedia Computing Group

=====================================================================================

Multi-task learning

From Wikipedia, the free encyclopedia

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately.^[1]^[2]^[3]

In a widely cited 1997 paper, Rich Caruana gave the following characterization:

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better.^[3]
In the classification context, MTL aims to improve the performance of multiple classification tasks by learning them jointly. One example is a spam-filter, which can be treated as distinct but related classification tasks across different users. To make this more concrete, consider that different people have different distributions of features which distinguish spam emails from legitimate ones, for example an English speaker may find that all emails in Russian are spam, not so for Russian speakers. Yet there is a definite commonality in this classification task across users, for example one common feature might be text related to money transfer. Solving each user's spam classification problem jointly via MTL can let the solutions inform each other and improve performance.^[4] Further examples of settings for MTL include multiclass classification and multi-label classification.^[5]
Multi-task learning works because regularization induced by requiring an algorithm to perform well on a related task can be superior to regularization that prevents overfitting by penalizing all complexity uniformly. One situation where MTL may be particularly helpful is if the tasks share significant commonalities and are generally slightly under sampled.^[4] However, as discussed below, MTL has also been shown to be beneficial for learning unrelated tasks.^[6]
Contents
[hide] 1Methods1.1Task grouping and overlap1.2Exploiting unrelated tasks1.3Transfer of knowledge2Mathematics2.1Reproducing Hilbert space of vector valued functions (RKHSvv)2.1.1RKHSvv concepts2.1.2Separable kernels2.1.3Known task structure2.1.3.1Task structure representations2.1.3.2Task structure examples2.1.4Learning tasks together with their structure2.1.4.1Optimization of Q2.1.4.2Special cases2.1.4.3Generalizations3applications3.1Spam filtering3.2Web search3.3RoboEarth4Software package5See also6References7External links7.1Software
Methods[edit]
Task grouping and overlap[edit]
Within the MTL paradigm, information can be shared across some or all of the tasks. Depending on the structure of task relatedness, one may want to share information selectively across the tasks. For example, tasks may be grouped or exist in a hierarchy, or be related according to some general metric. Suppose, as developed more formally below, that the parameter vector modeling each task is a linear combination of some underlying basis. Similarity in terms of this basis can indicate the relatedness of the tasks. For example with sparsity, overlap of nonzero coefficients across tasks indicates commonality. A task grouping then corresponds to those tasks lying in a subspace generated by some subset of basis elements, where tasks in different groups may be disjoint or overlap arbitrarily in terms of their bases.^[7] Task relatedness can be imposed a priori or learned from the data.^[5]^[8]
Exploiting unrelated tasks[edit]
One can attempt learning a group of principal tasks using a group of auxiliary tasks, unrelated to the principal ones. In many applications, joint learning of unrelated tasks which use the same input data can be bene?cial. The reason is that prior knowledge about task relatedness can lead to sparser and more informative representations for each task grouping, essentially by screening out idiosyncrasies of the data distribution. Novel methods which builds on a prior multitask methodology by favoring a shared low-dimensional representation within each task grouping have been proposed. The programmer can impose a penalty on tasks from different groups which encourages the two representations to be orthogonal. Experiments on synthetic and real data have indicated that incorporating unrelated tasks can result in significant improvements over standard multi-task learning methods.^[6]
Transfer of knowledge[edit]
Related to multi-task learning is the concept of knowledge transfer. Whereas traditional multi-task learning implies that a shared representation is developed concurrently across tasks, transfer of knowledge implies a sequentially shared representation. Large scale machine learning projects such as the deep convolutional neural network GoogLeNet,^[9] an image-based object classifier, can develop robust representations which may be useful to further algorithms learning related tasks. For example, the pre-trained model can be used as a feature extractor to perform pre-processing for another learning algorithm. Or the pre-trained model can be used to initialize a model with similar architecture which is then fine-tuned to learn a different classification task.^[10]
Mathematics[edit]
Reproducing Hilbert space of vector valued functions (RKHSvv)[edit]
The MTL problem can be cast within the context of RKHSvv (a complete inner product space of vector-valued functions equipped with a reproducing kernel). In particular, recent focus has been on cases where task structure can be identified via a separable kernel, described below. The presentation here derives from Ciliberto et al, 2015.^[5]
RKHSvv concepts[edit]
Suppose the training data set is {/displaystyle {/mathcal {S}}_{t}=/{(x_{i}^{t},y_{i}^{t})/}_{i=1}^{n_{t}}},%20with {/displaystyle%20x_{i}^{t}/in%20{/mathcal%20{X}}}, {/displaystyle%20y_{i}^{t}/in%20{/mathcal%20{Y}}},%20where {/displaystyle%20t} indexes%20task,%20and {/displaystyle%20t/in%201,...,T}.%20Let {/displaystyle%20n=/sum%20_{t=1}^{T}n_{t}}.%20In%20this%20setting%20there%20is%20a%20consistent%20input%20and%20output%20space%20and%20the%20same loss%20function {/displaystyle%20{/mathcal%20{L}}:/mathbb%20{R}%20/times%20/mathbb%20{R}%20/rightarrow%20/mathbb%20{R}%20_{+}} for%20each%20task:%20.%20This%20results%20in%20the%20regularized%20machine%20learning%20problem:
{/displaystyle%20/min%20_{f/in%20{/mathcal%20{H}}}/sum%20_{t=1}^{T}{/frac%20{1}{n_{t}}}/sum%20_{i=1}^{n_{t}}{/mathcal%20{L}}(y_{i}^{t},f_{t}(x_{i}^{t}))+/lambda%20||f||_{/mathcal%20{H}}^{2}}

(1)
where {/displaystyle%20{/mathcal%20{H}}} is%20a%20vector%20valued%20reproducing%20kernel%20Hilbert%20space%20with%20functions {/displaystyle%20f:{/mathcal%20{X}}/rightarrow%20{/mathcal%20{Y}}^{T}} having%20components {/displaystyle%20f_{t}:{/mathcal%20{X}}/rightarrow%20{/mathcal%20{Y}}}.
The%20reproducing%20kernel%20for%20the%20space {/displaystyle%20{/mathcal%20{H}}} of%20functions {/displaystyle%20f:{/mathcal%20{X}}/rightarrow%20/mathbb%20{R}%20^{T}} is%20a%20symmetric%20matrix-valued%20function {/displaystyle%20/Gamma%20:{/mathcal%20{X}}/times%20{/mathcal%20{X}}/rightarrow%20/mathbb%20{R}%20^{T/times%20T}} ,%20such%20that {/displaystyle%20/Gamma%20(/cdot%20,x)c/in%20{/mathcal%20{H}}} and%20the%20following%20reproducing%20property%20holds:
{/displaystyle%20/langle%20f(x),c/rangle%20_{/mathbb%20{R}%20^{T}}=/langle%20f,/Gamma%20(x,/cdot%20)c/rangle%20_{/mathcal%20{H}}}

(2)
The%20reproducing%20kernel%20gives%20rise%20to%20a%20representer%20theorem%20showing%20that%20any%20solution%20to%20equation 1 has%20the%20form:
{/displaystyle%20f(x)=/sum%20_{t=1}^{T}/sum%20_{i=1}^{n_{t}}/Gamma%20(x,x_{i}^{t})c_{i}^{t}}

(3)
Separable%20kernels[edit]The%20form%20of%20the%20kernel {/displaystyle%20/Gamma%20} induces%20both%20the%20representation%20of%20the feature%20space and%20structures%20the%20output%20across%20tasks.%20A%20natural%20simplification%20is%20to%20choose%20a separable%20kernel, which%20factors%20into%20separate%20kernels%20on%20the%20input%20space {/displaystyle%20{/mathcal%20{X}}}and%20on%20the%20tasks {/displaystyle%20/{1,...,T/}}.%20In%20this%20case%20the%20kernel%20relating%20scalar%20components {/displaystyle%20f_{t}} and {/displaystyle%20f_{s}} is%20given%20by {/textstyle%20/gamma%20((x_{i},t),(x_{j},s))=k(x_{i},x_{j})k_{T}(s,t)=k(x_{i},x_{j})A_{s,t}}.%20For%20vector%20valued%20functions {/displaystyle%20f/in%20{/mathcal%20{H}}} we%20can%20write {/displaystyle%20/Gamma%20(x_{i},x_{j})=k(x_{i},x_{j})A},%20where {/displaystyle%20k} is%20a%20scalar%20reproducing%20kernel,%20and {/displaystyle%20A} is%20a%20symmetric%20positive%20semi-definite {/displaystyle%20T/times%20T} matrix.%20Henceforth%20denote {/displaystyle%20S_{+}^{T}=/{{/text{.^ Jump up to:^a ^b Romera-Paredes, B., Argyriou, A., Bianchi-Berthouze, N., & Pontil, M., (2012) Exploiting Unrelated Tasks in Multi-Task Learning. http://jmlr.csail.mit.edu/proceedings/papers/v22/romera12/romera12.pdfJump up^ Kumar, A., & Daume III, H., (2012) Learning Task Grouping and Overlap in Multi-Task Learning. http://icml.cc/2012/papers/690.pdfJump up^ Jawanpuria, P., & Saketha Nath, J., (2012) A Convex Feature Learning Formulation for Latent Task Structure Discovery. http://icml.cc/2012/papers/90.pdfJump up^ Szegedy, C. (2014). "Going Deeper with Convolutions". Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. arXiv:1409.4842. doi:10.1109/CVPR.2015.7298594.Jump up^ Roig, Gemma. "Deep Learning Overview" (PDF).Jump up^ Dinuzzo, Francesco (2011). "Learning output kernels with block coordinate descent.". Proceedings of the 28th International Conference on Machine Learning (ICML-11).Jump up^ Jacob, Laurent (2009). "Clustered multi-task learning: A convex formulation". Advances in neural information processing systems.Jump up^ Attenberg, J., Weinberger, K., & Dasgupta, A. Collaborative Email-Spam Filtering with the Hashing-Trick. http://www.cse.wustl.edu/~kilian/papers/ceas2009-paper-11.pdfJump up^ Chappelle, O., Shivaswamy, P., & Vadrevu, S. Multi-Task Learning for Boosting with Application to Web Search Ranking. http://www.cse.wustl.edu/~kilian/papers/multiboost2010.pdfJump up^ Description of RoboEarth ProjectJump up^ Zhou, J., Chen, J. and Ye, J. MALSAR: Multi-tAsk Learning via StructurAl Regularization. Arizona State University, 2012. http://www.public.asu.edu/~jye02/Software/MALSAR. On-line manualJump up^ Evgeniou, T., & Pontil, M. (2004). Regularized multi–task learning. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 109–117).Jump up^ Evgeniou, T., Micchelli, C., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615.Jump up^ Argyriou, A., Evgeniou, T., & Pontil, M. (2008a). Convex multi-task feature learning. Machine Learning, 73, 243–272.Jump up^ Chen, J., Zhou, J., & Ye, J. (2011). Integrating low-rank and group-sparse structures for robust multi-task learning. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.Jump up^ Ji, S., & Ye, J. (2009). An accelerated gradient method for trace norm minimization. Proceedings of the 26th Annual International Conference on Machine Learning (pp. 457–464).Jump up^ Ando, R., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research, 6, 1817–1853.Jump up^ Chen, J., Tang, L., Liu, J., & Ye, J. (2009). A convex formulation for learning shared structures from multiple tasks. Proceedings of the 26th Annual International Conference on Machine Learning (pp. 137–144).Jump up^ Chen, J., Liu, J., & Ye, J. (2010). Learning incoherent sparse and low-rank patterns from multiple tasks. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1179–1188).Jump up^ Jacob, L., Bach, F., & Vert, J. (2008). Clustered multi-task learning: A convex formulation. Advances in Neural Information Processing Systems， 2008Jump up^ Zhou, J., Chen, J., & Ye, J. (2011). Clustered multi-task learning via alternating structure optimization. Advances in Neural Information Processing Systems.