多任務(wù)學(xué)習(xí)

2019-11-11 01:38:05

字體：大中小

來源：轉(zhuǎn)載

供稿：網(wǎng)友

http://blog.csdn.net/demon7639/article/details/41804619?locationNum=1

幾種分類問題的區(qū)別：多類分類，多標(biāo)簽分類，多示例學(xué)習(xí)，多任務(wù)學(xué)習(xí)

標(biāo)簽：機(jī)器學(xué)習(xí)分類2014-12-08 15:49 713人閱讀評論(0) 收藏舉報本文章已收錄于：

機(jī)器學(xué)習(xí)知識庫

分類：機(jī)器學(xué)習(xí)（14）

版權(quán)聲明：本文為博主原創(chuàng)文章，未經(jīng)博主允許不得轉(zhuǎn)載。

目錄(?)[+]

多類分類（Multiclass Classification）

一個樣本屬于且只屬于多個類中的一個，一個樣本只能屬于一個類，不同類之間是互斥的。

典型方法：

One-vs-All or One-vs.-rest：

將多類問題分成N個二類分類問題，訓(xùn)練N個二類分類器，對第i個類來說，所有屬于第i個類的樣本為正（positive）樣本，其他樣本為負(fù)（negative）樣本，每個二類分類器將屬于i類的樣本從其他類中分離出來。

one-vs-one or All-vs-All：

訓(xùn)練出N(N-1)個二類分類器，每個分類器區(qū)分一對類(i,j)。

多標(biāo)簽分類(multilabel classification)

又稱，多標(biāo)簽學(xué)習(xí)、多標(biāo)記學(xué)習(xí)，不同于多類分類，一個樣本可以屬于多個類別（或標(biāo)簽），不同類之間是有關(guān)聯(lián)的。

典型方法

問題轉(zhuǎn)換方法

問題轉(zhuǎn)換方法的核心是“改造樣本數(shù)據(jù)使其適應(yīng)現(xiàn)有學(xué)習(xí)算法”。該類方法的思路是通過處理多標(biāo)記訓(xùn)練樣本，使其適應(yīng)現(xiàn)有的學(xué)習(xí)算法，也就是將多標(biāo)記學(xué)習(xí)問題轉(zhuǎn)換為現(xiàn)有的學(xué)習(xí)問題進(jìn)行求解。

代表性學(xué)習(xí)算法有一階方法Binary Relevance，該方法將多標(biāo)記學(xué)習(xí)問題轉(zhuǎn)化為“二類分類( binary classification )”問題求解；二階方法Calibrated Label Ranking，該方法將多標(biāo)記學(xué)習(xí)問題轉(zhuǎn)化為“標(biāo)記排序( labelranking )問題求解；高階方法Random k-labelset，該方法將多標(biāo)記學(xué)習(xí)問題轉(zhuǎn)化為“多類分類(Multiclass classification)”問題求解。

算法適應(yīng)方法

算法適應(yīng)方法的核心是“改造現(xiàn)有的單標(biāo)記學(xué)習(xí)算法使其適應(yīng)多標(biāo)記數(shù)據(jù)”。該類方法的基本思想是通過對傳統(tǒng)的機(jī)器學(xué)習(xí)方法的改進(jìn)，使其能夠解決多標(biāo)記問題。

代表性學(xué)習(xí)算法有一階方法ML-kNN}，該方法將“惰性學(xué)習(xí)(lazy learning )”算法k近鄰進(jìn)行改造以適應(yīng)多標(biāo)記數(shù)據(jù)；二階方法Rank-SVM，該方法將“核學(xué)習(xí)(kernel learning )”算法SVM進(jìn)行改造以適應(yīng)多標(biāo)記數(shù)據(jù)；高階方法LEAD，該方法將“貝葉斯學(xué)習(xí)(Bayes learning)算法”Bayes網(wǎng)絡(luò)進(jìn)行改造以適應(yīng)多標(biāo)記數(shù)據(jù)。

多示例學(xué)習(xí)（multi-instance learning）

在此類學(xué)習(xí)中，訓(xùn)練集由若干個具有概念標(biāo)記的包（bag）組成，每個包包含若干沒有概念標(biāo)記的示例。若一個包中至少有一個正例，則該包被標(biāo)記為正（positive），若一個包中所有示例都是反例，則該包被標(biāo)記為反（negative）。通過對訓(xùn)練包的學(xué)習(xí)，希望學(xué)習(xí)系統(tǒng)盡可能正確地對訓(xùn)練集之外的包的概念標(biāo)記進(jìn)行預(yù)測。

多任務(wù)學(xué)習(xí)（Multi-task learning）

多任務(wù)學(xué)習(xí)（Multi-task learning）是和單任務(wù)學(xué)習(xí)（single-task learning）相對的一種機(jī)器學(xué)習(xí)方法。在機(jī)器學(xué)習(xí)領(lǐng)域，標(biāo)準(zhǔn)的算法理論是一次學(xué)習(xí)一個任務(wù)，也就是系統(tǒng)的輸出為實數(shù)的情況。復(fù)雜的學(xué)習(xí)問題先被分解成理論上獨立的子問題，然后分別對每個子問題進(jìn)行學(xué)習(xí)，最后通過對子問題學(xué)習(xí)結(jié)果的組合建立復(fù)雜問題的數(shù)學(xué)模型。多任務(wù)學(xué)習(xí)是一種聯(lián)合學(xué)習(xí)，多個任務(wù)并行學(xué)習(xí)，結(jié)果相互影響。

拿大家經(jīng)常使用的school data做個簡單的對比，school data是用來預(yù)測學(xué)生成績的回歸問題的數(shù)據(jù)集，總共有139個中學(xué)的15362個學(xué)生，其中每一個中學(xué)都可以看作是一個預(yù)測任務(wù)。單任務(wù)學(xué)習(xí)就是忽略任務(wù)之間可能存在的關(guān)系分別學(xué)習(xí)139個回歸函數(shù)進(jìn)行分?jǐn)?shù)的預(yù)測，或者直接將139個學(xué)校的所有數(shù)據(jù)放到一起學(xué)習(xí)一個回歸函數(shù)進(jìn)行預(yù)測。而多任務(wù)學(xué)習(xí)則看重任務(wù)之間的聯(lián)系，通過聯(lián)合學(xué)習(xí)，同時對139個任務(wù)學(xué)習(xí)不同的回歸函數(shù)，既考慮到了任務(wù)之間的差別，又考慮到任務(wù)之間的聯(lián)系，這也是多任務(wù)學(xué)習(xí)最重要的思想之一。

多任務(wù)學(xué)習(xí)早期的研究工作源于對機(jī)器學(xué)習(xí)中的一個重要問題，即“歸納偏置(inductive bias)”問題的研究。機(jī)器學(xué)習(xí)的過程可以看作是對與問題相關(guān)的經(jīng)驗數(shù)據(jù)進(jìn)行分析，從中歸納出反映問題本質(zhì)的模型的過程。歸納偏置的作用就是用于指導(dǎo)學(xué)習(xí)算法如何在模型空間中進(jìn)行搜索，搜索所得模型的性能優(yōu)劣將直接受到歸納偏置的影響，而任何一個缺乏歸納偏置的學(xué)習(xí)系統(tǒng)都不可能進(jìn)行有效的學(xué)習(xí)。不同的學(xué)習(xí)算法(如決策樹，神經(jīng)網(wǎng)絡(luò)，支持向量機(jī)等)具有不同的歸納偏置，人們在解決實際問題時需要人工地確定采用何種學(xué)習(xí)算法，實際上也就是主觀地選擇了不同的歸納偏置策略。一個很直觀的想法就是，是否可以將歸納偏置的確定過程也通過學(xué)習(xí)過程來自動地完成，也就是采用“學(xué)習(xí)如何去學(xué)(learning to learn)”的思想。多任務(wù)學(xué)習(xí)恰恰為上述思想的實現(xiàn)提供了一條可行途徑，即利用相關(guān)任務(wù)中所包含的有用信息，為所關(guān)注任務(wù)的學(xué)習(xí)提供更強(qiáng)的歸納偏置。

典型方法

目前多任務(wù)學(xué)習(xí)方法大致可以總結(jié)為兩類，一是不同任務(wù)之間共享相同的參數(shù)（common parameter），二是挖掘不同任務(wù)之間隱藏的共有數(shù)據(jù)特征（latent feature）。

=================================================================================================

----------------------------------------------------------------我是個人理解分割線---------------------------------------------------------

單任務(wù)就是已經(jīng)是元任務(wù)，不能再細(xì)劃分為多個子任務(wù)了，比如中文分詞等任務(wù)

個人認(rèn)為的多任務(wù)學(xué)習(xí)是這樣的：比如說目前的事件抽取，子任務(wù)1是觸發(fā)詞（trigger）的識別，子任務(wù)2是關(guān)系的抽取，對這個事件抽取而言，我們可以使用串行的方法首先識別觸發(fā)詞其次判斷觸發(fā)詞和論元之間的關(guān)系。也可以采用聯(lián)合的方法：觸發(fā)詞的識別和關(guān)系的判斷是同時進(jìn)行的，兩個子任務(wù)是相互影響的。

不過這樣理解的話與上面的概念有偏差，待確認(rèn)。

========================================================================================================

Multi-task learning（多任務(wù)學(xué)習(xí)）簡介

標(biāo)簽： machine learningmulti-task learning2014-08-07 21:30 5918人閱讀評論(0) 收藏舉報

分類：Multi-task learning

版權(quán)聲明：本文為博主原創(chuàng)文章，未經(jīng)博主允許不得轉(zhuǎn)載。

1. 什么是Multi-task learning？

Multi-tasklearning （多任務(wù)學(xué)習(xí)）是和single-task learning （單任務(wù)學(xué)習(xí)）相對的一種機(jī)器學(xué)習(xí)方法。拿大家經(jīng)常使用的school data做個簡單的對比，school data是用來預(yù)測學(xué)生成績的回歸問題的數(shù)據(jù)集，總共有139個中學(xué)的15362個學(xué)生，其中每一個中學(xué)都可以看作是一個預(yù)測任務(wù)。單任務(wù)學(xué)習(xí)就是忽略任務(wù)之間可能存在的關(guān)系分別學(xué)習(xí)139個回歸函數(shù)進(jìn)行分?jǐn)?shù)的預(yù)測，或者直接將139個學(xué)校的所有數(shù)據(jù)放到一起學(xué)習(xí)一個回歸函數(shù)進(jìn)行預(yù)測。而多任務(wù)學(xué)習(xí)則看重任務(wù)之間的聯(lián)系，通過聯(lián)合學(xué)習(xí)，同時對139個任務(wù)學(xué)習(xí)不同的回歸函數(shù)，既考慮到了任務(wù)之間的差別，又考慮到任務(wù)之間的聯(lián)系，這也是多任務(wù)學(xué)習(xí)最重要的思想之一。

2. Multi-task learning的優(yōu)勢

Multi-tasklearning的研究也有大概20年之久，雖然沒有單任務(wù)學(xué)習(xí)那樣受關(guān)注，但是也不時有不錯的研究成果出爐，與單任務(wù)學(xué)習(xí)相比，多任務(wù)學(xué)習(xí)的優(yōu)勢在哪呢？上節(jié)我們已經(jīng)提到了單任務(wù)學(xué)習(xí)的過程中忽略了任務(wù)之間的聯(lián)系，而現(xiàn)實生活中的學(xué)習(xí)任務(wù)往往是有千絲萬縷的聯(lián)系的，比如多標(biāo)簽圖像的分類，人臉的識別等等，這些任務(wù)都可以分為多個子任務(wù)去學(xué)習(xí)，多任務(wù)學(xué)習(xí)的優(yōu)勢就在于能發(fā)掘這些子任務(wù)之間的關(guān)系，同時又能區(qū)分這些任務(wù)之間的差別。

3. Multi-tasklearning的學(xué)習(xí)方法

目前多任務(wù)學(xué)習(xí)方法大致可以總結(jié)為兩類，一是不同任務(wù)之間共享相同的參數(shù)（common parameter），二是挖掘不同任務(wù)之間隱藏的共有數(shù)據(jù)特征（latent feature）。下面將簡單介紹幾篇比較經(jīng)典的多任務(wù)學(xué)習(xí)的論文及算法思想。

A. Regularizedmulti-task learning

這篇文章很有必要一看，文中提出了基于最小正則化方程的多任務(wù)學(xué)習(xí)方法，并以SVM為例給出多任務(wù)學(xué)習(xí)支持向量機(jī)，將多任務(wù)學(xué)習(xí)與經(jīng)典的單任務(wù)學(xué)習(xí)SVM聯(lián)系在一起，并給出了詳細(xì)的求解過程和他們之間的聯(lián)系，當(dāng)然實驗結(jié)果也證明了多任務(wù)支持向量機(jī)的優(yōu)勢。文中最重要的假設(shè)就是所有任務(wù)的分界面共享一個中心分界面，然后再次基礎(chǔ)上平移，偏移量和中心分界面最終決定了當(dāng)前任務(wù)的分界面。

B. Convex multi-taskfeature learning

本文也是一篇典型的多任務(wù)學(xué)習(xí)模型，并且是典型的挖掘多任務(wù)之間共有特征的多任務(wù)模型，文中給出了多任務(wù)特征學(xué)習(xí)的一個框架，也成為后來許多多任務(wù)學(xué)習(xí)參考的基礎(chǔ)。

C. Multitasksparsity via maximum entropy discrimination

這篇文章可以看作是比較全面的總結(jié)性文章，文中總共討論了四種情況，feature selection, kernel selection,adaptive pooling and graphical model structure。并詳細(xì)介紹了四種多任務(wù)學(xué)習(xí)方法，很具有參考價值。

本文只是對多任務(wù)學(xué)習(xí)做了簡單的介紹，想要深入了解的話，建議大家看看上面提到的論文。

Reference

[1] T. Evgeniouand M. Pontil. Regularized multi-task learning. In PRoceeding of thetenth ACM SIGKDD international conference on Knowledge Discovery and DataMining, 2004.

[2] T. Jebara. MultitaskSparsity via Maximum Entropy Discrimination. In Journal of Machine LearningResearch, (12):75-110, 2011.

[3] A. Argyriou,T. Evgeniou and M. Pontil. Convex multitask feature learning. In MachineLearning, 73(3):243-272, 2008.

中國科學(xué)技術(shù)大學(xué)多媒體計算與通信教育部-微軟重點實驗室 MultiMedia Computing Group

=====================================================================================

Multi-task learning

From Wikipedia, the free encyclopedia

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately.^[1]^[2]^[3]

In a widely cited 1997 paper, Rich Caruana gave the following characterization:

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better.^[3]
In the classification context, MTL aims to improve the performance of multiple classification tasks by learning them jointly. One example is a spam-filter, which can be treated as distinct but related classification tasks across different users. To make this more concrete, consider that different people have different distributions of features which distinguish spam emails from legitimate ones, for example an English speaker may find that all emails in Russian are spam, not so for Russian speakers. Yet there is a definite commonality in this classification task across users, for example one common feature might be text related to money transfer. Solving each user's spam classification problem jointly via MTL can let the solutions inform each other and improve performance.^[4] Further examples of settings for MTL include multiclass classification and multi-label classification.^[5]
Multi-task learning works because regularization induced by requiring an algorithm to perform well on a related task can be superior to regularization that prevents overfitting by penalizing all complexity uniformly. One situation where MTL may be particularly helpful is if the tasks share significant commonalities and are generally slightly under sampled.^[4] However, as discussed below, MTL has also been shown to be beneficial for learning unrelated tasks.^[6]
Contents
[hide] 1Methods1.1Task grouping and overlap1.2Exploiting unrelated tasks1.3Transfer of knowledge2Mathematics2.1Reproducing Hilbert space of vector valued functions (RKHSvv)2.1.1RKHSvv concepts2.1.2Separable kernels2.1.3Known task structure2.1.3.1Task structure representations2.1.3.2Task structure examples2.1.4Learning tasks together with their structure2.1.4.1Optimization of Q2.1.4.2Special cases2.1.4.3Generalizations3applications3.1Spam filtering3.2Web search3.3RoboEarth4Software package5See also6References7External links7.1Software
Methods[edit]
Task grouping and overlap[edit]
Within the MTL paradigm, information can be shared across some or all of the tasks. Depending on the structure of task relatedness, one may want to share information selectively across the tasks. For example, tasks may be grouped or exist in a hierarchy, or be related according to some general metric. Suppose, as developed more formally below, that the parameter vector modeling each task is a linear combination of some underlying basis. Similarity in terms of this basis can indicate the relatedness of the tasks. For example with sparsity, overlap of nonzero coefficients across tasks indicates commonality. A task grouping then corresponds to those tasks lying in a subspace generated by some subset of basis elements, where tasks in different groups may be disjoint or overlap arbitrarily in terms of their bases.^[7] Task relatedness can be imposed a priori or learned from the data.^[5]^[8]
Exploiting unrelated tasks[edit]
One can attempt learning a group of principal tasks using a group of auxiliary tasks, unrelated to the principal ones. In many applications, joint learning of unrelated tasks which use the same input data can be bene?cial. The reason is that prior knowledge about task relatedness can lead to sparser and more informative representations for each task grouping, essentially by screening out idiosyncrasies of the data distribution. Novel methods which builds on a prior multitask methodology by favoring a shared low-dimensional representation within each task grouping have been proposed. The programmer can impose a penalty on tasks from different groups which encourages the two representations to be orthogonal. Experiments on synthetic and real data have indicated that incorporating unrelated tasks can result in significant improvements over standard multi-task learning methods.^[6]
Transfer of knowledge[edit]
Related to multi-task learning is the concept of knowledge transfer. Whereas traditional multi-task learning implies that a shared representation is developed concurrently across tasks, transfer of knowledge implies a sequentially shared representation. Large scale machine learning projects such as the deep convolutional neural network GoogLeNet,^[9] an image-based object classifier, can develop robust representations which may be useful to further algorithms learning related tasks. For example, the pre-trained model can be used as a feature extractor to perform pre-processing for another learning algorithm. Or the pre-trained model can be used to initialize a model with similar architecture which is then fine-tuned to learn a different classification task.^[10]
Mathematics[edit]
Reproducing Hilbert space of vector valued functions (RKHSvv)[edit]
The MTL problem can be cast within the context of RKHSvv (a complete inner product space of vector-valued functions equipped with a reproducing kernel). In particular, recent focus has been on cases where task structure can be identified via a separable kernel, described below. The presentation here derives from Ciliberto et al, 2015.^[5]
RKHSvv concepts[edit]
Suppose the training data set is {/displaystyle {/mathcal {S}}_{t}=/{(x_{i}^{t},y_{i}^{t})/}_{i=1}^{n_{t}}},%20with {/displaystyle%20x_{i}^{t}/in%20{/mathcal%20{X}}}, {/displaystyle%20y_{i}^{t}/in%20{/mathcal%20{Y}}},%20where {/displaystyle%20t} indexes%20task,%20and {/displaystyle%20t/in%201,...,T}.%20Let {/displaystyle%20n=/sum%20_{t=1}^{T}n_{t}}.%20In%20this%20setting%20there%20is%20a%20consistent%20input%20and%20output%20space%20and%20the%20same loss%20function {/displaystyle%20{/mathcal%20{L}}:/mathbb%20{R}%20/times%20/mathbb%20{R}%20/rightarrow%20/mathbb%20{R}%20_{+}} for%20each%20task:%20.%20This%20results%20in%20the%20regularized%20machine%20learning%20problem:
{/displaystyle%20/min%20_{f/in%20{/mathcal%20{H}}}/sum%20_{t=1}^{T}{/frac%20{1}{n_{t}}}/sum%20_{i=1}^{n_{t}}{/mathcal%20{L}}(y_{i}^{t},f_{t}(x_{i}^{t}))+/lambda%20||f||_{/mathcal%20{H}}^{2}}

(1)
where {/displaystyle%20{/mathcal%20{H}}} is%20a%20vector%20valued%20reproducing%20kernel%20Hilbert%20space%20with%20functions {/displaystyle%20f:{/mathcal%20{X}}/rightarrow%20{/mathcal%20{Y}}^{T}} having%20components {/displaystyle%20f_{t}:{/mathcal%20{X}}/rightarrow%20{/mathcal%20{Y}}}.
The%20reproducing%20kernel%20for%20the%20space {/displaystyle%20{/mathcal%20{H}}} of%20functions {/displaystyle%20f:{/mathcal%20{X}}/rightarrow%20/mathbb%20{R}%20^{T}} is%20a%20symmetric%20matrix-valued%20function {/displaystyle%20/Gamma%20:{/mathcal%20{X}}/times%20{/mathcal%20{X}}/rightarrow%20/mathbb%20{R}%20^{T/times%20T}} ,%20such%20that {/displaystyle%20/Gamma%20(/cdot%20,x)c/in%20{/mathcal%20{H}}} and%20the%20following%20reproducing%20property%20holds:
{/displaystyle%20/langle%20f(x),c/rangle%20_{/mathbb%20{R}%20^{T}}=/langle%20f,/Gamma%20(x,/cdot%20)c/rangle%20_{/mathcal%20{H}}}

(2)
The%20reproducing%20kernel%20gives%20rise%20to%20a%20representer%20theorem%20showing%20that%20any%20solution%20to%20equation 1 has%20the%20form:
{/displaystyle%20f(x)=/sum%20_{t=1}^{T}/sum%20_{i=1}^{n_{t}}/Gamma%20(x,x_{i}^{t})c_{i}^{t}}

(3)
Separable%20kernels[edit]The%20form%20of%20the%20kernel {/displaystyle%20/Gamma%20} induces%20both%20the%20representation%20of%20the feature%20space and%20structures%20the%20output%20across%20tasks.%20A%20natural%20simplification%20is%20to%20choose%20a separable%20kernel, which%20factors%20into%20separate%20kernels%20on%20the%20input%20space {/displaystyle%20{/mathcal%20{X}}}and%20on%20the%20tasks {/displaystyle%20/{1,...,T/}}.%20In%20this%20case%20the%20kernel%20relating%20scalar%20components {/displaystyle%20f_{t}} and {/displaystyle%20f_{s}} is%20given%20by {/textstyle%20/gamma%20((x_{i},t),(x_{j},s))=k(x_{i},x_{j})k_{T}(s,t)=k(x_{i},x_{j})A_{s,t}}.%20For%20vector%20valued%20functions {/displaystyle%20f/in%20{/mathcal%20{H}}} we%20can%20write {/displaystyle%20/Gamma%20(x_{i},x_{j})=k(x_{i},x_{j})A},%20where {/displaystyle%20k} is%20a%20scalar%20reproducing%20kernel,%20and {/displaystyle%20A} is%20a%20symmetric%20positive%20semi-definite {/displaystyle%20T/times%20T} matrix.%20Henceforth%20denote {/displaystyle%20S_{+}^{T}=/{{/text{.^ Jump up to:^a ^b Romera-Paredes, B., Argyriou, A., Bianchi-Berthouze, N., & Pontil, M., (2012) Exploiting Unrelated Tasks in Multi-Task Learning. http://jmlr.csail.mit.edu/proceedings/papers/v22/romera12/romera12.pdfJump up^ Kumar, A., & Daume III, H., (2012) Learning Task Grouping and Overlap in Multi-Task Learning. http://icml.cc/2012/papers/690.pdfJump up^ Jawanpuria, P., & Saketha Nath, J., (2012) A Convex Feature Learning Formulation for Latent Task Structure Discovery. http://icml.cc/2012/papers/90.pdfJump up^ Szegedy, C. (2014). "Going Deeper with Convolutions". Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. arXiv:1409.4842. doi:10.1109/CVPR.2015.7298594.Jump up^ Roig, Gemma. "Deep Learning Overview" (PDF).Jump up^ Dinuzzo, Francesco (2011). "Learning output kernels with block coordinate descent.". Proceedings of the 28th International Conference on Machine Learning (ICML-11).Jump up^ Jacob, Laurent (2009). "Clustered multi-task learning: A convex formulation". Advances in neural information processing systems.Jump up^ Attenberg, J., Weinberger, K., & Dasgupta, A. Collaborative Email-Spam Filtering with the Hashing-Trick. http://www.cse.wustl.edu/~kilian/papers/ceas2009-paper-11.pdfJump up^ Chappelle, O., Shivaswamy, P., & Vadrevu, S. Multi-Task Learning for Boosting with Application to Web Search Ranking. http://www.cse.wustl.edu/~kilian/papers/multiboost2010.pdfJump up^ Description of RoboEarth ProjectJump up^ Zhou, J., Chen, J. and Ye, J. MALSAR: Multi-tAsk Learning via StructurAl Regularization. Arizona State University, 2012. http://www.public.asu.edu/~jye02/Software/MALSAR. On-line manualJump up^ Evgeniou, T., & Pontil, M. (2004). Regularized multi–task learning. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 109–117).Jump up^ Evgeniou, T., Micchelli, C., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615.Jump up^ Argyriou, A., Evgeniou, T., & Pontil, M. (2008a). Convex multi-task feature learning. Machine Learning, 73, 243–272.Jump up^ Chen, J., Zhou, J., & Ye, J. (2011). Integrating low-rank and group-sparse structures for robust multi-task learning. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.Jump up^ Ji, S., & Ye, J. (2009). An accelerated gradient method for trace norm minimization. Proceedings of the 26th Annual International Conference on Machine Learning (pp. 457–464).Jump up^ Ando, R., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research, 6, 1817–1853.Jump up^ Chen, J., Tang, L., Liu, J., & Ye, J. (2009). A convex formulation for learning shared structures from multiple tasks. Proceedings of the 26th Annual International Conference on Machine Learning (pp. 137–144).Jump up^ Chen, J., Liu, J., & Ye, J. (2010). Learning incoherent sparse and low-rank patterns from multiple tasks. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1179–1188).Jump up^ Jacob, L., Bach, F., & Vert, J. (2008). Clustered multi-task learning: A convex formulation. Advances in Neural Information Processing Systems， 2008Jump up^ Zhou, J., Chen, J., & Ye, J. (2011). Clustered multi-task learning via alternating structure optimization. Advances in Neural Information Processing Systems.