Information Retrieval.txt

  * overview
  * study
  * challenges
  * question answering
  * industry
  * interesting papers
    - document models
    - ranking
    - recommendations
    - entity-centric search
    - intent search


[overview]

  Amit Singhal - "Constructing the Conversational Computer" - https://vimeo.com/84711332

  Oren Etzioni - "The Future of Semantic Web Search" - http://tce.technion.ac.il/files/2013/06/OrenEtzioni.pptx + http://youtube.com/watch?v=qsK-yQR1cGs
  Oren Etzioni - "Open Information Extraction as Web Scale" - http://acml2011.wikispaces.com/file/view/Oren%20Etzioni.pdf

  Peter Meyers - "Beyond 10 Blue Links - The Future of Ranking" - http://slideshare.net/crumplezone/beyond-10-blue-links-the-future-of-ranking

  Peter Mika - "Semantic Search On The Rise" - http://labs.yahoo.com/_c/uploads/SemTech2014-v2.pptx + http://youtube.com/watch?v=Dw2OhqvB0cE
                "Making the Web Searchable" - http://slideshare.net/pmika1/sem-search-icwe


[study]

  course by Chris Manning
	http://youtube.com/watch?v=5L1qemKyUKA&index=75&list=PL6397E4B26D00A269

  course by Victor Lavrenko
	http://homepages.inf.ed.ac.uk/vlavrenk/tts.html
	http://youtube.com/user/victorlavrenko/playlists?view=1&sort=dd

  course by Mail.ru (in russian)
	https://youtube.com/playlist?list=PLrCZzMib1e9o_BlrSB5bFkLq8h2i4pQjz
	https://youtube.com/playlist?list=PLrCZzMib1e9o7YIhOfJtD1EaneGOGkN-_
	http://habrahabr.ru/company/mailru/blog/257119/

  course by Yandex (in russian)
	https://compscicenter.ru/courses/information-retrieval/2016-autumn/
	https://compsciclub.ru/courses/informationretrieval

  course by Nikita Zhiltsov (in russian)
	http://nzhiltsov.github.io/IR-course/


  introduction to ranking by Nikita Volkov (in russian)
	https://youtube.com/watch?v=GctrEpJinhI
	https://youtube.com/watch?v=GZmXKBzIfkA

  seminars at Yandex
	http://youtube.com/playlist?list=PLJOzdkh8T5kqsYZSzcpKoS8dwf4uizFDk
	http://youtube.com/playlist?list=PLJOzdkh8T5krfgXb4peSed78ODO7tUFHf

  "Neural Text Embeddings for Information Retrieval" tutorial at WSDM 2017 -
	https://slideshare.net/BhaskarMitra3/neural-text-embeddings-for-information-retrieval-wsdm-2017


  "An Introduction to Information Retrieval" book by Manning, Raghavan, Schutze - http://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf
  "Search Engines. Information Retrieval in Practice" book by Croft, Metzler, Strohman - http://ciir.cs.umass.edu/irbook/


[challenges]

  - Full text document retrieval, passage retrieval, question answering
  - Web search, searching social media, distributed information retrieval, entity ranking
  - Learning to rank combined with neural network based representation learning
  - User and task modelling, personalized search, diversity
  - Query formulation assistance, query recommendation, conversational search
  - Multimedia retrieval
  - Learning dense representations for long documents
  - Dealing with rare queries and rare words
  - Modelling text at different granularities (character, word, passage, document)
  - Compositionality of vector representations
  - Jointly modelling queries, documents, entities and other structured/knowledge data


[question answering]

  "Some people work on picking the best answer passage, others focus on "simply" translating the question to a structured database query, some try to replicate the classic pipeline model and QANTA is "storing" the content using recursive neural networks and distributed representations."

 "The state-of-the-art techniques in open question answering can be classified into two main classes, namely, information retrieval based and semantic parsing based.
  Information retrieval methods first retrieve a broad set of candidate answers by querying the search API of knowledge base with a transformation of the question into a valid query and then use fine-grained detection heuristics to identify the exact answer.
  Semantic parsing methods focus on the correct interpretation of the meaning of a question by a semantic parsing system. A correct interpretation converts a question into the exact database query that returns the correct answer.'

  (Fernando Pereira) "When Google gets a query like, “Where do the Giants play?” it has to know a lot of things: that the query involves sports, that a team “plays” at a home stadium, and so on. And it has to make choices — is this the baseball Giants or the football team? Does the user want to know where the team usually plays its games, i.e. the home stadium, or where it’s playing next week? Google uses signals and previous user behavior to nail the answer. “All that figuring out, all that inference, is stuff we do now that we were not doing a few years ago”.


  approaches for question answering over knowledge graphs:
  - entity embedding
  - shallow parsing and retrieval
  - query graph matching
  - query semantic parsing

  see sections "[question answering over knowledge graphs]" and "[question answering over texts]" of https://github.com/brylevkirill/notes/blob/master/Knowledge%20Representation%20and%20Reasoning.md


[industry]

  http://time.com/google-now/

  http://venturebeat.com/2015/11/30/the-4-things-google-believes-are-key-to-the-future-of-search/

  http://thesempost.com/rankbrain-everything-we-know-about-googles-ai-algorithm/

  http://techcrunch.com/2015/09/07/facebooks-messenger-and-the-challenge-to-googles-search-dominance/

  https://medium.com/backchannel/how-google-search-dealt-with-mobile-33bc09852dc9
  https://medium.com/backchannel/google-search-will-be-your-next-brain-5207c26e4523

  http://thenextweb.com/apps/2015/02/26/vurb-is-a-mobile-search-engine-that-helps-you-get-things-done-without-jumping-between-apps/
  http://www.ft.com/intl/cms/s/0/4f2f97ea-b8ec-11e4-b8e6-00144feab7de.html#slide0

  https://quora.com/Why-is-machine-learning-used-heavily-for-Googles-ad-ranking-and-less-for-their-search-ranking
  http://anand.typepad.com/datawocky/2008/05/are-human-experts-less-prone-to-catastrophic-errors-than-machine-learned-models.html

  http://moz.com/blog/101-google-answer-boxes-a-journey-into-the-knowledge-graph

  https://blogs.dropbox.com/tech/2015/03/firefly-instant-full-text-search-engine/


selected papers - https://dropbox.com/sh/pvpzyxfcpy39j8p/AACduJ-pVF9Lh-gn3_SExj1va


interesting papers (see below):
  - ranking
  - document models
  - entity-centric search
  - intent search

interesting papers (see https://github.com/brylevkirill/notes/blob/master/Knowledge%20Representation%20and%20Reasoning.md):
  - question answering over knowledge bases
  - question answering over texts
  - information extraction and integration


[interesting papers]

Zhang et al. - "Neural Information Retrieval: A Literature Review" [https://arxiv.org/abs/1611.06792]

Nogueira, Cho - "End-to-End Goal-Driven Web Navigation" [http://arxiv.org/pdf/1602.02261]
	"We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments. In this challenging task, an agent navigates through a website, which is represented as a graph consisting of web pages as nodes and hyperlinks as directed edges, to find a web page in which a query appears. The agent is required to have sophisticated high-level reasoning based on natural languages and efficient sequential decision-making capability to succeed. We release a software tool, called WebNav, that automatically transforms a website into this goal-driven web navigation task, and as an example, we make WikiNav, a dataset constructed from the English Wikipedia. We extensively evaluate different variants of neural net based artificial agents on WikiNav and observe that the proposed goal-driven web navigation well reflects the advances in models, making it a suitable benchmark for evaluating future progress. Furthermore, we extend the WikiNav with questionanswer pairs from Jeopardy! and test the proposed agent based on recurrent neural networks against strong inverted index based search engines. The artificial agents trained on WikiNav outperforms the engined based approaches, demonstrating the capability of the proposed goal-driven navigation as a good proxy for measuring the progress in real-world tasks such as focused crawling and question-answering."
	"In this work, we describe a large-scale goal-driven web navigation task and argue that it serves as a useful test bed for evaluating the capabilities of artificial agents on natural language understanding and planning. We release a software tool, called WebNav, that compiles a given website into a goal-driven web navigation task. As an example, we construct WikiNav from Wikipedia using WebNav. We extend WikiNav with Jeopardy! questions, thus creating WikiNav-Jeopardy. We evaluate various neural net based agents on WikiNav and WikiNav-Jeopardy. Our results show that more sophisticated agents have better performance, thus supporting our claim that this task is well suited to evaluate future progress in natural language understanding and planning. Furthermore, we show that our agent pretrained on WikiNav outperforms two strong inverted-index based search engines on the WikiNav-Jeopardy. These empirical results support our claim on the usefulness of the proposed task and agents in challenging applications such as focused crawling and question-answering."
	-- Value Iteration Networks for this problem [https://arxiv.org/abs/1602.02867] - https://youtu.be/tXBHfbHHlKc?t=31m20s (Tamar)


[interesting papers - ranking]

Burges - "From RankNet to LambdaRank to LambdaMART: An Overview" [https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview/] (overview of NDCG and ERR/pFound)
	"LambdaMART is the boosted tree version of LambdaRank, which is based on RankNet. RankNet, LambdaRank, and LambdaMART have proven to be very successful algorithms for solving real world ranking problems: for example an ensemble of LambdaMART rankers won Track 1 of the 2010 Yahoo! Learning To Rank Challenge. The details of these algorithms are spread across several papers and reports, and so here we give a self-contained, detailed and complete description of them."
	"Although here we will concentrate on ranking, it is straightforward to modify MART in general, and LambdaMART in particular, to solve a wide range of supervised learning problems (including maximizing information retrieval functions, like NDCG, which are not smooth functions of the model scores).

Burges, Shaked, Renshaw, Lazier, Deeds, Hamilton, Hullender - "Learning to Rank using Gradient Descent" [http://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf]
	"We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data from a commercial internet search engine."
	"We have proposed a probabilistic cost for training systems to learn ranking functions using pairs of training examples. The approach can be used for any differentiable function; we explored using a neural network formulation, RankNet. RankNet is simple to train and gives excellent performance on a real world ranking problem with large amounts of data. Comparing the linear RankNet with other linear systems clearly demonstrates the benefit of using our pair-based cost function together with gradient descent; the two layer net gives further improvement. For future work it will be interesting to investigate extending the approach to using other machine learning methods for the ranking function; however evaluation speed and simplicity is a critical constraint for such systems."
	-- http://videolectures.net/icml2015_burges_learning_to_rank/

Burges, Ragno, Le - "Learning to Rank with Nonsmooth Cost Functions" [https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions]
	"The quality measures used in information retrieval are particularly difficult to optimize directly, since they depend on the model scores only through the sorted order of the documents returned for a given query. Thus, the derivatives of the cost with respect to the model parameters are either zero, or are undefined. In this paper, we propose a class of simple, flexible algorithms, called LambdaRank, which avoids these difficulties by working with implicit cost functions. We describe LambdaRank using neural network models, although the idea applies to any differentiable function class. We give necessary and sufficient conditions for the resulting implicit cost function to be convex, and we show that the general method has a simple mechanical interpretation. We demonstrate significantly improved accuracy, over a state-of-the-art ranking algorithm, on several datasets. We also show that LambdaRank provides a method for significantly speeding up the training phase of that ranking algorithm. Although this paper is directed towards ranking, the proposed method can be extended to any non-smooth and multivariate cost functions."

Severyn, Moschitti - "Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks" [http://disi.unitn.it/~severyn/papers/sigir-2015-long.pdf]
	"Learning a similarity function between pairs of objects is at the core of learning to rank approaches. In information retrieval tasks we typically deal with query-document pairs, in question answering - question-answer pairs. However, before learning can take place, such pairs needs to be mapped from the original space of symbolic words into some feature space encoding various aspects of their relatedness, e.g. lexical, syntactic and semantic. Feature engineering is often a laborious task and may require external knowledge sources that are not always available or difficult to obtain. Recently, deep learning approaches have gained a lot of attention from the research community and industry for their ability to automatically learn optimal feature representation for a given task, while claiming state-of-the-art performance in many tasks in computer vision, speech recognition and natural language processing. In this paper, we present a convolutional neural network architecture for reranking pairs of short texts, where we learn the optimal representation of text pairs and a similarity function to relate them in a supervised way from the available training data. Our network takes only words in the input, thus requiring minimal preprocessing. In particular, we consider the task of reranking short text pairs where elements of the pair are sentences. We test our deep learning system on two popular retrieval tasks from TREC: Question Answering and Microblog Retrieval. Our model demonstrates strong performance on the first task beating previous state-of-the-art systems by about 3% absolute points in both MAP and MRR and shows comparable results on tweet reranking, while enjoying the benefits of no manual feature engineering and no additional syntactic parsers."
	"In this paper, we propose a novel deep learning architecture for reranking short texts. It has the benefits of requiring no manual feature engineering or external resources, which may be expensive or not available. The model with the same architecture can be successfully applied to other domains and tasks. Our experimental findings show that our deep learning model: (i) greatly improves on the previous state-of-the-art systems and a recent deep learning approach in on answer sentence selection task showing a 3% absolute improvement in MAP and MRR; (ii) our system is able to improve even the best system runs from TREC Microblog 2012 challenge; (iii) is comparable to the syntactic reranker, while our system requires no external parsers or resources."
	-- https://github.com/aseveryn/deep-qa
	-- https://github.com/shashankg7/Keras-CNN-QA

Dehghani, Zamani, Severyn, Kamps, Croft - "Neural Ranking Models with Weak Supervision" [https://arxiv.org/abs/1704.08803]
	"Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking problem, as it is not obvious how to learn from queries and documents when no supervised signal is available. Hence, in this paper, we propose to train a neural ranking model using weak supervision, where labels are obtained automatically without human annotators or any external resources (e.g., click data). To this aim, we use the output of an unsupervised ranking model, such as BM25, as a weak supervision signal. We further train a set of simple yet effective ranking models based on feed-forward neural networks. We study their effectiveness under various learning scenarios (point-wise and pair-wise models) and using different input representations (i.e., from encoding query-document pairs into dense/sparse vectors to using word embedding representation). We train our networks using tens of millions of training instances and evaluate it on two standard collections: a homogeneous news collection(Robust) and a heterogeneous large-scale web collection (ClueWeb). Our experiments indicate that employing proper objective functions and letting the networks to learn the input representation based on weakly supervised data leads to impressive performance, with over 13% and 35% MAP improvements over the BM25 model on the Robust and the ClueWeb collections. Our findings also suggest that supervised neural ranking models can greatly benefit from pre-training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models."

Borisov, Markov, Rijke, Serdyukov - "A Neural Click Model for Web Search" [http://www2016.net/proceedings/proceedings/p531.pdf]
	"Understanding user browsing behavior in web search is key to improving web search effectiveness. Many click models have been proposed to explain or predict user clicks on search engine results. They are based on the probabilistic graphical model (PGM) framework, in which user behavior is represented as a sequence of observable and hidden events. The PGM framework provides a mathematically solid way to reason about a set of events given some information about other events. But the structure of the dependencies between the events has to be set manually. Different click models use different hand-crafted sets of dependencies. We propose an alternative based on the idea of distributed representations: to represent the user’s information need and the information available to the user with a vector state. The components of the vector state are learned to represent concepts that are useful for modeling user behavior. And user behavior is modeled as a sequence of vector states associated with a query session: the vector state is initialized with a query, and then iteratively updated based on information about interactions with the search engine results. This approach allows us to directly understand user browsing behavior from click-through data, i.e., without the need for a predefined set of rules as is customary for PGM-based click models. We illustrate our approach using a set of neural click models. Our experimental results show that the neural click model that uses the same training data as traditional PGM-based click models, has better performance on the click prediction task (i.e., predicting user click on search engine results) and the relevance prediction task (i.e., ranking documents by their relevance to a query). An analysis of the best performing neural click model shows that it learns similar concepts to those used in traditional click models, and that it also learns other concepts that cannot be designed manually."

Vorobev, Lefortier, Gusev, Serdyukov - "Gathering Additional Feedback on Search Results by Multi-Armed Bandits with Respect to Production Ranking" [http://www.www2015.it/documents/proceedings/proceedings/p1177.pdf]
	"Given a repeatedly issued query and a document with a not-yet-confirmed potential to satisfy the users’ needs, a search system should place this document on a high position in order to gather user feedback and obtain a more confident estimate of the document utility. On the other hand, the main objective of the search system is to maximize expected user satisfaction over a rather long period, what requires showing more relevant documents on average. The state-of-the-art approaches to solving this exploration-exploitation dilemma rely on strongly simplified settings making these approaches infeasible in practice. We improve the most flexible and pragmatic of them to handle some actual practical issues. The first one is utilizing prior information about queries and documents, the second is combining bandit-based learning approaches with a default production ranking algorithm. We show experimentally that our framework enables to significantly improve the ranking of a leading commercial search engine."


[interesting papers - document models]

Huang, He, Gao, Deng, Acero, Heck - "Learning Deep Structured Semantic Models for Web Search using Clickthrough Data" [http://research.microsoft.com/apps/pubs/default.aspx?id=198202] (DSSM model)
	"Latent semantic models, such as LSA, intend to map a query to its relevant documents at the semantic level where keyword-based matching often fails. In this study we strive to develop a series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them. The proposed deep structured semantic models are discriminatively trained by maximizing the conditional likelihood of the clicked documents given a query using the clickthrough data. To make our models applicable to large-scale Web search applications, we also use a technique called word hashing, which is shown to effectively scale up our semantic models to handle large vocabularies which are common in such tasks. The new models are evaluated on a Web document ranking task using a real-world data set. Results show that our best model significantly outperforms other latent semantic models, which were considered state-of-the-art in the performance prior to the work presented in this paper."
	"We present and evaluate a series of new latent semantic models, notably those with deep architectures which we call the DSSM. The main contribution lies in our significant extension of the previous latent semantic models (e.g., LSA) in three key aspects. First, we make use of the clickthrough data to optimize the parameters of all versions of the models by directly targeting the goal of document ranking. Second, inspired by the deep learning framework recently shown to be highly successful in speech recognition, we extend the linear semantic models to their nonlinear counterparts using multiple hidden-representation layers. The deep architectures adopted have further enhanced the modeling capacity so that more sophisticated semantic structures in queries and documents can be captured and represented. Third, we use a letter n-gram based word hashing technique that proves instrumental in scaling up the training of the deep models so that very large vocabularies can be used in realistic web search. In our experiments, we show that the new
techniques pertaining to each of the above three aspects lead to significant performance improvement on the document ranking task. A combination of all three sets of new techniques has led to a new state-of-the-art semantic model that beats all the previously developed competing models with a significant margin."
	"DSSM stands for Deep Structured Semantic Model, or more general, Deep Semantic Similarity Model. DSSM is a deep neural network modeling technique for representing text strings (sentences, queries, predicates, entity mentions, etc.) in a continuous semantic space and modeling semantic similarity between two text strings (e.g., Sent2Vec). DSSM has wide applications including information retrieval and web search ranking (Huang et al. 2013; Shen et al. 2014a,2014b), ad selection/relevance, contextual entity search and interestingness tasks (Gao et al. 2014a), question answering (Yih et al., 2014), knowledge inference (Yang et al., 2014), image captioning (Fang et al., 2014), and machine translation (Gao et al., 2014b) etc. DSSM can be used to develop latent semantic models that project entities of different types (e.g., queries and documents) into a common low-dimensional semantic space for a variety of machine learning tasks such as ranking and classification. For example, in web search ranking, the relevance of a document given a query can be readily computed as the distance between them in that space. With the latest GPUs from Nvidia, we are able to train our models on billions of words."
	-- http://research.microsoft.com/en-us/projects/dssm/
	-- http://research.microsoft.com/pubs/232372/CIKM14_tutorial_HeGaoDeng.pdf
	-- sent2vec: http://research.microsoft.com/en-us/downloads/731572aa-98e4-4c50-b99d-ae3f0c9562b9/default.aspx
	"Sent2vec maps a pair of short text strings (e.g., sentences or query-answer pairs) to a pair of feature vectors in a continuous, low-dimensional space where the semantic similarity between the text strings is computed as the cosine similarity between their vectors in that space. sent2vec performs the mapping using the Deep Structured Semantic Model (DSSM) or the DSSM with convolutional-pooling structure (CDSSM)."
	-- https://youtu.be/x7B6RudUQLI?t=1h5m5s (Gulin, in russian)
	-- https://habrahabr.ru/company/yandex/blog/314222/ (in russian)

Shen, He, Gao, Deng, Mesnil - "Learning Semantic Representations Using Convolutional Neural Networks for Web Search" [http://www.iro.umontreal.ca/~lisa/pointeurs/WWW2014.pdf] (C-DSSM model)
	"This paper presents a series of new latent semantic models based on a convolutional neural network to learn low-dimensional semantic vectors for search queries and Web documents. By using the convolution-max pooling operation, local contextual information at the word n-gram level is modeled first. Then, salient local features in a word sequence are combined to form a global feature vector. Finally, the high-level semantic information of the word sequence is extracted to form a global vector representation. The proposed models are trained on click-through data by maximizing the conditional likelihood of clicked documents given a query, using stochastic gradient ascent. The new models are evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that our model significantly outperforms other semantic models, which were state-of-the-art in retrieval performance prior to this work."
	"The work presented in this paper developed a novel learnable deep learning architecture based on the use of a CNN to extract both local contextual features (via the convolution layer) and global contextual features (via the max-pooling layer) from text. Then the higher layer(s) in the overall deep architecture makes effective use of the extracted context-sensitive features to perform semantic matching between documents and queries, both in the form of text, for Web search applications."
	"Model local context at the convolutional layer: Capture the local context dependent word sense. Learn one embedding vector for each local context dependent word.
	Model global context at the pooling layer: Aggregate local topics to form the global intent. Identify salient words/phrase at the max-pooling layer. Words that win the most active neurons at the max-pooling layers: Those are salient words containing clear intents/topics."
	"NDCG@1 Results: BM25 (30.5), ULM (30.4), PLSA (30.5), BLTM (31.6), WTM (31.5), DSSM (32.7), CDSSM (34.8)"
	-- sent2vec: http://research.microsoft.com/en-us/downloads/731572aa-98e4-4c50-b99d-ae3f0c9562b9/default.aspx

Shen, He, Gao, Deng, Mesnil - "A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval" [http://www.msr-waypoint.com/pubs/226585/cikm2014_cdssm_final.pdf] (CLSM model)
	"In this paper, we propose a new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents. In order to capture the rich contextual structures in a query or a document, we start with each word within a temporal context window in a word sequence to directly capture contextual features at the word n-gram level. Next, the salient word n-gram features in the word sequence are discovered by the model and are then aggregated to form a sentence-level feature vector. Finally, a non-linear transformation is applied to extract high-level semantic information to generate a continuous vector representation for the full text string. The proposed convolutional latent semantic model is trained on clickthrough data and is evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that the proposed model effectively captures salient semantic information in queries and documents for the task while significantly outperforming previous state-of-the-art semantic models."
	"In this paper, we have reported a novel deep learning architecture called the CLSM, motivated by the convolutional structure of the CNN, to extract both local contextual features at the word-n-gram level (via the convolutional layer) and global contextual features at the sentence-level (via the max-pooling layer) from text. The higher layer(s) in the overall deep architecture makes effective use of the extracted context-sensitive features to generate latent semantic vector representations which facilitates semantic matching between documents and queries for Web search applications. We have carried out extensive experimental studies of the proposed model whereby several state-of-the-art semantic models are compared and significant performance improvement on a large-scale real-world Web search data set is observed. Extended from our previous work on DSSM and C-DSSM models, the CLSM and its variations have also been demonstrated giving superior performance on a range of natural language processing tasks beyond information retrieval, including semantic parsing and question answering, entity search and online recommendation."
	-- https://youtu.be/x7B6RudUQLI?t=1h33m39s (Gulin, in russian)
	-- https://github.com/airalcorn2/Deep-Semantic-Similarity-Model

Palangi, Deng, Shen, Gao, He, Chen, Song, Ward - "Semantic Modelling with Long Short-Term Memory for Information Retrieval" [http://arxiv.org/abs/1412.6629] (LSTM-DSSM model)
	"In this paper we address the following problem in web document and information retrieval: How can we use long-term context information to gain better IR performance? Unlike common IR methods that use bag of words representation for queries and documents, we treat them as a sequence of words and use long short term memory to capture contextual dependencies. The resulting model, the LSTM version of the Deep-Structured Semantic Model, is a significant extension of the recent Recurrent-DSSM without the LSTM structure. Experimental evaluation on an IR task derived from the Bing web search demonstrates the ability of the proposed LSTM-DSSM in addressing both lexical mismatch and long-term context modelling issues, thereby, significantly outperforming the state of the art method of R-DSSM for web search."
	--
	"We focus on Web document retrieval and ranking problem. We want to address the following question: How important is the context information and how we can exploit it in favor of better performance? Performance is measured by Normalized Discounted Cumulative Gain. Using a word by word representation instead of the bag of words used in Deep Structured Semantic Modeling, and using Recurrent Neural Networks to capture context information, we show that Recurrent DSSM outperforms DSSM significantly. We use max-pooling for training of RDSSM which resolves the slow convergence problem of RDSSM. RNNs usually have a limited memory length, to address this problem we use Long Short Term Memory. We show that LSTM-DSSM outperforms RDSSM significantly. We further argue that the proposed methods can be used for modeling correlated topics in text data."

Palangi, Deng, Shen, Gao, He, Chen, Song, Ward - "Deep Sentence Embedding Using the Long Short Term Memory Network: Analysis and Application to Information Retrieval" [http://arxiv.org/abs/1502.06922] (LSTM-RNN model)
	"This paper develops a model that addresses sentence embedding using recurrent neural networks with Long Short Term Memory cells. The proposed LSTM-RNN model sequentially takes each word in a sentence, extracts its information, and embeds it into a semantic vector. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. In this paper, the LSTM-RNN is trained in a weakly supervised manner on user click-through data logged by a commercial web search engine. Visualization and analysis are performed to understand how the embedding process works. The model automatically attenuates the unimportant words and detects the salient keywords in the sentence. Furthermore, these detected keywords automatically activate different cells of the LSTM-RNN, where words belonging to a similar topic activate the same cell. As a semantic representation of the sentence, the embedding vector can be used in many different applications. These keyword detection and topic allocation tasks enabled by the LSTM-RNN allow the network to perform web document retrieval, where the similarity between the query and documents can be measured by the distance between their corresponding sentence embedding vectors computed by the LSTM-RNN. On a web search task, the LSTM-RNN embedding is shown to significantly outperform all existing state of the art methods."
	"By performing a detailed analysis on the model, we showed that: 1) The proposed model is robust to noise, i.e., it mainly embeds keywords in the final semantic vector representing the whole sentence and 2) In the proposed model, each cell is usually allocated to keywords from a specific topic. These findings have been supported using extensive examples. As a sample application of the proposed sentence embedding method, we evaluated it on the important task of web document retrieval. We showed that, for this task, the proposed method outperforms all existing state of the art methods significantly."
	"Encode the word one by one in the recurrent hidden layer. The hidden layer at the last word codes the semantics of the full sentence. Model is trained by a cosine similarity driven objective. Minimize sentence-level semantic matching loss."

Gao, Pantel, Gamon, He, Deng - "Modeling Interestingness with Deep Neural Networks" [http://research.microsoft.com/apps/pubs/default.aspx?id=226584]
	"This paper presents a deep semantic similarity model, a special type of deep neural networks designed for text analysis, for recommending target documents to be of interest to a user based on a source document that she is reading. We observe, identify, and detect naturally occurring signals of interestingness in click transitions on the Web between source and target documents, which we collect from commercial Web browser logs. The DSSM is trained on millions of Web transitions, and maps source-target document pairs to feature vectors in a latent space in such a way that the distance between source documents and their corresponding interesting targets in that space is minimized. The effectiveness of the DSSM is demonstrated using two interestingness tasks: automatic highlighting and contextual entity search. The results on large-scale, real-world datasets show that the semantics of documents are important for modeling interestingness and that the DSSM leads to significant quality improvement on both tasks, outperforming not only the classic document models that do not use semantics but also state-of-the-art topic models."
	-- https://youtube.com/watch?v=YXi66Zgd0D0 (Yih)

Mitra, Diaz, Craswell - "Learning to Match Using Local and Distributed Representations of Text for Web Search" [https://arxiv.org/abs/1610.08136] (Duet model)
	"Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text. We hypothesize that matching with distributed representations complements matching with traditional local representations, and that a combination of the two is favorable. We propose a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that matches the query and the document using learned distributed representations. The two networks are jointly trained as part of a single neural network. We show that this combination or ‘duet’ performs significantly better than either neural network individually on a Web page ranking task, and also significantly outperforms traditional baselines and other recently proposed models based on neural networks."
	-- https://github.com/bmitra-msft/NDRM/blob/master/notebooks/Duet.ipynb


[interesting papers - entity-centric search]

Blanco, Ottaviano, Meij - "Fast and Space-Efficient Entity Linking in Queries" [http://labs.yahoo.com/publication/fast-and-space-efficient-entity-linking-in-queries/]
	"Entity linking deals with identifying entities from a knowledge base in a given piece of text and has become a fundamental building block for web search engines, enabling numerous downstream improvements from better document ranking to enhanced search results pages. A key problem in the context of web search queries is that this process needs to run under severe time constraints as it has to be performed before any actual retrieval takes place, typically within milliseconds. In this paper we propose a probabilistic model that leverages user-generated information on the web to link queries to entities in a knowledge base. There are three key ingredients that make the algorithm fast and space-efficient. First, the linking process ignores any dependencies between the different entity candidates, which allows for a O(k^2) implementation in the number of query terms. Second, we leverage hashing and compression techniques to reduce the memory footprint. Finally, to equip the algorithm with contextual knowledge without sacrificing speed, we factor the distance between distributional semantics of the query words and entities into the model. We show that our solution significantly outperforms several state-of-the-art baselines by more than 14% while being able to process queries in sub-millisecond times—at least two orders of magnitude faster than existing systems."

Huang, Heck, Ji - "Leveraging Deep Neural Networks and Knowledge Graphs for Entity Disambiguation" [http://arxiv.org/abs/1504.07678]
	"Entity Disambiguation aims to link mentions of ambiguous entities to a knowledge base (e.g., Wikipedia). Modeling topical coherence is crucial for this task based on the assumption that information from the same semantic context tends to belong to the same topic. This paper presents a novel deep semantic relatedness model based on deep neural networks and semantic knowledge graphs to measure entity semantic relatedness for topical coherence modeling. The DSRM is directly trained on large-scale KGs and it maps heterogeneous types of knowledge of an entity from KGs to numerical feature vectors in a latent space such that the distance between two semantically-related entities is minimized. Compared with the state-of-the-art relatedness approach proposed by (Milne and Witten, 2008a), the DSRM obtains 19.4% and 24.5% reductions in entity disambiguation errors on two publicly available datasets respectively."

Gupta, Halevy, Wang, Whang, Wu - "Biperpedia: An Ontology for Search Applications" [http://research.google.com/pubs/pub41894.html]
	"Search engines make significant efforts to recognize queries that can be answered by structured data and invest heavily in creating and maintaining high-precision databases. While these databases have a relatively wide coverage of entities, the number of attributes they model (e.g., GDP, CAPITAL, ANTHEM) is relatively small. Extending the number of attributes known to the search engine can enable it to more precisely answer queries from the long and heavy tail, extract a broader range of facts from the Web, and recover the semantics of tables on the Web. We describe Biperpedia, an ontology with 1.6M (class, attribute) pairs and 67K distinct attribute names. Biperpedia extracts attributes from the query stream, and then uses the best extractions to seed attribute extraction from text. For every attribute Biperpedia saves a set of synonyms and text patterns in which it appears, thereby enabling it to recognize the attribute in more contexts. In addition to a detailed analysis of the quality of Biperpedia, we show that it can increase the number of Web tables whose semantics we can recover by more than a factor of 4 compared with Freebase."

Divvala, Farhadi, Guestrin - "Learning Everything about Anything: Webly-Supervised Visual Concept Learning" [http://allenai.org/content/publications/objectNgrams_cvpr14.pdf]
	"Recognition is graduating from labs to real-world applications. While it is encouraging to see its potential being tapped, it brings forth a fundamental challenge to the vision researcher: scalability. How can we learn a model for any concept that exhaustively covers all its appearance variations, while requiring minimal or no human supervision for compiling the vocabulary of visual variance, gathering the training images and annotations, and learning the models? In this paper, we introduce a fully-automated approach for learning extensive models for a wide range of variations (e.g. actions, interactions, attributes and beyond) within any concept. Our approach leverages vast resources of online books to discover the vocabulary of variance, and intertwines the data collection and modeling steps to alleviate the need for explicit human supervision in training the models. Our approach organizes the visual knowledge about a concept in a convenient and useful way, enabling a variety of applications across vision and NLP. Our online system has been queried by users to learn models for several interesting concepts including breakfast, Gandhi, beautiful, etc. To date, our system has models available for over 50,000 variations within 150 concepts, and has annotated more than 10 million images with bounding boxes."
	"We have presented a fully automated approach to discover a detailed vocabulary for any concept and train a full-fledged detection model for it. We have shown results for several concepts (including objects, scenes, events, actions and places) in this paper, and more concepts can be obtained by using our online system. Our approach enables several future applications and research directions:
	Coreference resolution: A core problem in NLP is to determine when two textual mentions name the same entity. The biggest challenge here is the inability to reason about semantic knowledge. For example, the Stanford state-of-the-art system fails to link ‘Mohandas Gandhi’ to ‘Mahatma Gandhi’, and ‘Mrs. Gandhi’ to ‘Indira Gandhi’ in the following sentence: "Indira Gandhi was the third Indian prime minister. Mohandas Gandhi was the leader of Indian nationalism. Mrs. Gandhi was inspired by Mahatma Gandhi’s writings." Our method is capable of relating Mahatma Gandhi to Mohandas Gandhi and Indira Gandhi to Mrs Gandhi. We envision that the information provided by our method should provide useful semantic knowledge for coreference resolution.
	Paraphrasing: Rewriting a textual phrase in other words while preserving its semantics is an active research area in NLP. Our method can be used to discover paraphrases. For example, we discover that a ‘grazing horse’ is semantically very similar to a ‘eating horse’. Our method can be used to produce a semantic similarity score for textual phrases.
	Deeper image interpretation: Recent works have emphasized the importance of providing deeper interpretation for object detections rather than simply labeling them with bounding boxes. Our work corroborates this line of research by producing enhanced detections for any concept. For example, apart from an object bounding box (e.g., ‘horse’), it can provide object part boxes (e.g., ‘horse head’, ‘horse foot’, etc) and can also annotate the object action (e.g., ‘fighting’) or the object type (e.g., ‘jennet horse’). Since the ngram labels that we use correspond to real-world entities, it is also possible to directly link a detection to its corresponding wikipedia page to infer more details.
	Understanding actions: Actions and interactions (e.g., ‘horse fighting’, ‘reining horse’) are too complex to be explained using simple primitives. Our methods helps in discovering a comprehensive vocabulary that covers all (subtle) nuances of any action. For example, we have discovered over 150 different variations of the walking action including ‘ball walking’, ‘couple walking’, ‘frame walking’. Such an exhaustive vocabulary helps in generating fine-grained descriptions of images."
	-- http://levan.cs.uw.edu

Pantel, Fuxman - "Jigs and Lures: Associating Web Queries with Structured Entities" [http://www.aclweb.org/anthology/P11-1009]
	"We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. A large-scale empirical analysis of the smoothing techniques, over a 2-year click graph collected from a commercial search engine, shows significant reductions in modeling error. The association models are then applied to the task of recommending products to web queries, by annotating queries with products from a large catalog and then mining query-product associations through web search session analysis. Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our top-performing model affects 9% of general web queries with 94% precision."

Koumenides, Shadbolt - "Ranking Methods for Entity-Oriented Semantic Web Search" [http://onlinelibrary.wiley.com/doi/10.1002/asi.23018/epdf]
	"This article provides a technical review of semantic search methods used to support text-based search over formal Semantic Web knowledge bases. Our focus is on ranking methods and auxiliary processes explored by existing semantic search systems, outlined within broad areas of classification. We present reflective examples from the literature in some detail, which should appeal to readers interested in a deeper perspective on the various methods and systems implemented in the outlined literature. The presentation covers graph exploration and propagation methods, adaptations of classic probabilistic retrieval models, and query-independent link analysis via flexible extensions to the PageRank algorithm. Future research directions are discussed, including development of more cohesive retrieval models to unlock further potentials and uses, data indexing schemes, integration with user interfaces, and building community consensus for more systematic evaluation and gradual development."

Franz, Schultz, Sizov, Staab - "TripleRank: Ranking Semantic Web Data By Tensor Decomposition" [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.175.2919&rep=rep1&type=pdf]
	"The Semantic Web fosters novel applications targeting a more efficient and satisfying exploitation of the data available on the web, e.g. faceted browsing of linked open data. Large amounts and high diversity of knowledge in the Semantic Web pose the challenging question of appropriate relevance ranking for producing fine-grained and rich descriptions of the available data, e.g. to guide the user along most promising knowledge aspects. Existing methods for graph-based authority ranking lack support for fine-grained latent coherence between resources and predicates (i.e. support for link semantics in the linked data model). In this paper, we present TripleRank, a novel approach for faceted authority ranking in the context of RDF knowledge bases. TripleRank captures the additional latent semantics of Semantic Web data by means of statistical methods in order to produce richer descriptions of the available data. We model the Semantic Web by a 3-dimensional tensor that enables the seamless representation of arbitrary semantic links. For the analysis of that model, we apply the PARAFAC decomposition, which can be seen as a multi-modal counterpart to Web authority ranking with HITS. The result are groupings of resources and predicates that characterize their authority and navigational (hub) properties with respect to identified topics. We have applied TripleRank to multiple data sets from the linked open data community and gathered encouraging feedback in a user evaluation where TripleRank results have been exploited in a faceted browsing scenario."

Zhiltsov, Agichtein - "Improving Entity Search over Linked Data by Modeling Latent Semantics" [http://researchgate.net/publication/260419630_Improving_entity_search_over_linked_data_by_modeling_latent_semantics]
	"Entity ranking has become increasingly important, both for retrieving structured entities and for use in general web search applications. The most common format for linked data, RDF graphs, provide extensive semantic structure via predicate links. While the semantic information is potentially valuable for effective search, the resulting adjacency matrices are often sparse, which introduces challenges for representation and ranking. In this paper, we propose a principled and scalable approach for integrating of latent semantic information into a learning-to-rank model, by combining compact representation of semantic similarity, achieved by using a modified algorithm for tensor factorization, with explicit entity information. Our experiments show that the resulting ranking model scales well to the graphs with millions of entities, and outperforms the state-of-the-art baseline on realistic Yahoo! SemSearch Challenge data sets."
	"In this paper, we presented a novel, principled, and scalable approach for incorporating structural and term-based evidence for entity ranking. In particular, we have introduced a scalable application of tensor factorization to entity search, and developed new and effective features for entity ranking. Our method outperforms the previous state of the art on a large-scale evaluation over a standard benchmark data set. We complemented our experimental results with thorough error analysis and discussion. In the future, we plan to explore extending the entity structure representation by incorporating term information into the latent space, because it will enable us to infer a distribution of latent factors for entities with limited link information. It could be done by enhancing the tensor structure with the entity-term matrix. Yet another prospective research direction is an application of the method in the entity list search scenario."
	-- https://github.com/nzhiltsov/Ext-RESCAL + http://nzhiltsov.blogspot.ru/2014/10/ext-rescal-tensor-factorization.html


[interesting papers - intent search]

Sordoni, Bengio, Nie - "Learning Concept Embeddings for Query Expansion by Quantum Entropy Minimization" [http://www-etud.iro.umontreal.ca/~sordonia/pdf/aaai2014_sordoni.pdf]
	"In web search, users queries are formulated using only few terms and term-matching retrieval functions could fail at retrieving relevant documents. Given a user query, the technique of query expansion consists in selecting related terms that could enhance the likelihood of retrieving relevant documents. Selecting such expansion terms is challenging and requires a computational framework capable of encoding complex semantic relationships. In this paper, we propose a novel method for learning, in a supervised way, semantic representations for words and phrases. By embedding queries and documents in special matrices, our model disposes of an increased representational power with respect to existing approaches adopting a vector representation. We show that our model produces high-quality query expansion terms. Our expansion increase IR measures beyond expansion from current word-embeddings models and well-established traditional QE methods."
	"Overall, we believe that the potential of latent semantic model for encoding useful semantic relationship is real and should be fostered by enriching query and document representations. To this end, we proposed a new method called Quantum Entropy Minimization, an embedding model that allocates text sequences in a larger space than their component terms. This is automatically encoded in the notion of rank. Higher-rank objects encode broader semantic information while unit-rank objects bring only localized semantic content. Experimental results show that our model is useful in order to boost precision at top-ranks with respect to a state-of-the-art expansion model and a recently proposed semantic model. Particularly interesting was the ability of our model to find useful expansion terms for longer queries: we believe this is a direct consequence of the higher semantic resolution allocated by our model. There are many interesting directions for future research. One could find more reasonable approximations both to the scoring function and the representation capable of bringing further improvements. Finally, we argue that incorporating existing advanced gradient descent procedures, refined loss functions can certainly further increase the retrieval performance, well beyond traditional query expansion methods."

Sordoni, Bengio, Vahabi, Lioma, Simonsen, Nie - "A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion" [http://arxiv.org/abs/1507.02221]
	"Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a probabilistic suggestion model that is able to account for sequences of previous queries of arbitrary lengths. Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity. Additionally, our model can suggest for rare, or long-tail, queries. The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques. This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets. Results show that it outperforms existing context-aware approaches in a next query prediction setting. In addition to query suggestion, our model is general enough to be used in a variety of other applications."
	"In this paper, we formulated a novel hierarchical neural network architecture and used it to produce query suggestions. Our model is context-aware and it can handle rare queries. It can be trained end-to-end on query sessions by simple optimization procedures. Our experiments show that the scores provided by our model help improving MRR for next-query ranking. Additionally, it is generative by definition. We showed with a user study that the synthetic generated queries are better than the compared methods. In future works, we aim to explicitly capture the usefulness of a suggestion by exploiting user clicks. This may be done without much effort as our architecture is flexible enough to allow joint training of other differentiable loss functions. Then, we plan to further study the synthetic generation by means of a large-scale automatic evaluation. Currently, the synthetic suggestions tend to be horizontal, i.e. the model prefers to add or remove terms from the context queries and rarely proposes orthogonal but related reformulations. Future efforts may be dedicated to diversify the generated suggestions to account for this effect. Finally, the interactions of the user with previous suggestions can also be leveraged to better capture the behaviour of the user and to make better suggestions accordingly. We are the most excited about possible future applications beyond query suggestion: auto-completion, next-word prediction and other NLP tasks such as Language Modelling may be fit as possible candidates."
	-- https://github.com/sordonia/hred-qs

Lin, Pantel, Gamon, Kannan, Fuxman - "Active Objects: Actions for Entity-Centric Search" [http://research.microsoft.com/apps/pubs/default.aspx?id=161389]
	"We introduce an entity-centric search experience, called Active Objects, in which entity-bearing queries are paired with actions that can be performed on the entities. For example, given a query for a specific flashlight, we aim to present actions such as reading reviews, watching demo videos, and finding the best price online. In an annotation study conducted over a random sample of user query sessions, we found that a large proportion of queries in query logs involve actions on entities, calling for an automatic approach to identifying relevant actions for entity-bearing queries. In this paper, we pose the problem of finding actions that can be performed on entities as the problem of probabilistic inference in a graphical model that captures how an entity bearing query is generated. We design models of increasing complexity that capture latent factors such as entity type and intended actions that determine how a user writes a query in a search box, and the URL that they click on. Given a large collection of real-world queries and clicks from a commercial search engine, the models are learned efficiently through maximum likelihood estimation using an EM algorithm. Given a new query, probabilistic inference enables recommendation of a set of pertinent actions and hosts. We propose an evaluation methodology for measuring the relevance of our recommended actions, and show empirical evidence of the quality and the diversity of the discovered actions."
	"Search as an action broker: A promising future search scenario involves modeling the user intents (or “verbs”) underlying the queries and brokering the webpages that accomplish the intended actions. In this vision, the broker is aware of all entities and actions of interest to its users, understands the intent of the user, ranks all providers of actions, and provides direct actionable results through APIs with the providers."

Adar, Dontcheva, Laput - "CommandSpace: Modeling the Relationships Between Tasks, Descriptions and Features" [http://cond.org/commandspace.html]
	"Users often describe what they want to accomplish with an application in a language that is very different from the application’s domain language. To address this gap between system and human language, we propose modeling an application’s domain language by mining a large corpus of Web documents about the application using deep learning techniques. A high dimensional vector space representation can model the relationships between user tasks, system commands, and natural language descriptions and supports mapping operations, such as identifying likely system commands given natural language queries and identifying user tasks given a trace of user operations. We demonstrate the feasibility of this approach with a system, CommandSpace, for the popular photo editing application Adobe Photoshop. We build and evaluate several applications enabled by our model showing the power and flexibility of this approach."

Chen, Rudnicky - "Dynamically Supporting Unexplored Domains in Conversational Interactions by Enriching Semantics with Neural Word Embeddings" [http://www.cs.cmu.edu/~yvchen/doc/SLT14_OpenDomain.pdf]
	"Spoken language interfaces are being incorporated into various devices (e.g. smart-phones, smart TVs, etc). However, current technology typically limits conversational interactions to a few narrow predefined domains/topics. For example, dialogue systems for smartphone operation fail to respond when users ask for functions not supported by currently installed applications. We propose to dynamically add application-based domains according to users’ requests by using descriptions of applications as a retrieval cue to find relevant applications. The approach uses structured knowledge resources (e.g. Freebase, Wikipedia, FrameNet) to induce types of slots for generating semantic seeds, and enriches the semantics of spoken queries with neural word embeddings, where semantically related concepts can be additionally included for acquiring knowledge that does not exist in the predefined domains. The system can then retrieve relevant applications or dynamically suggest users install applications that support unexplored domains. We find that vendor descriptions provide a reliable source of information for this purpose."

Williams, Niraula, Dasigi, Lakshmiratan, Suarez, Reddy, Zweig - "Rapidly Scaling Dialog Systems with Interactive Learning" [http://www.uni-ulm.de/fileadmin/website_uni_ulm/allgemein/2015_iwsds/iwsds2015_submission_1.pdf]
	"In personal assistant dialog systems, intent models are classifiers that identify the intent of a user utterance, such as to add a meeting to a calendar, or get the director of a stated movie. Rapidly adding intents is one of the main bottlenecks to scaling - adding functionality to - personal assistants. In this paper we show how interactive learning can be applied to the creation of statistical intent models. Interactive learning combines model definition, labeling, model building, active learning, model evaluation, and feature engineering in a way that allows a domain expert - who need not be a machine learning expert - to build classifiers. We apply interactive learning to build a handful of intent models in three different domains. In controlled lab experiments, we show that intent detectors can be built using interactive learning, and then improved in a novel end-to-end visualization tool. We then applied this method to a publicly deployed personal assistant - Microsoft Cortana - where a non-machine learning expert built an intent model in just over two hours, yielding excellent performance in the commercial service."

Melamud, Levy, Dagan - "A Simple Word Embedding Model for Lexical Substitution" [http://u.cs.biu.ac.il/~melamuo/publications/melamud_vsm15.pdf] (query reformulation)
	"The lexical substitution task requires identifying meaning-preserving substitutes for a target word instance in a given sentential context. Since its introduction in SemEval-2007, various models addressed this challenge, mostly in an unsupervised setting. In this work we propose a simple model for lexical substitution, which is based on the popular skip-gram word embedding model. The novelty of our approach is in leveraging explicitly the context embeddings generated within the skip-gram model, which were so far considered only as an internal component of the learning process. Our model is efficient, very simple to implement, and at the same time achieves state-ofthe-art results on lexical substitution tasks in an unsupervised setting."

Sun, Zeng, Liu, Lu, Chen at Microsoft Research - "CubeSVD: A Novel Approach to Personalized Web Search" [http://research.microsoft.com/pubs/79497/p382.pdf]
	"As the competition of Web search market increases, there is a high demand for personalized Web search to conduct retrieval incorporating Web users’ information needs. This paper focuses on utilizing clickthrough data to improve Web search. Since millions of searches are conducted everyday, a search engine accumulates a large volume of clickthrough data, which records who submits queries and which pages he/she clicks on. The clickthrough data is highly sparse and contains different types of objects (user, query and Web page), and the relationships among these objects are also very complicated. By performing analysis on these data, we attempt to discover Web users’ interests and the patterns that users locate information. In this paper, a novel approach CubeSVD is proposed to improve Web search. The clickthrough data is represented by a 3-order tensor, on which we perform 3-mode analysis using the higher-order singular value decomposition technique to automatically capture the latent factors that govern the relations among these multi-type objects: users, queries and Web pages. A tensor reconstructed based on the CubeSVD analysis reflects both the observed interactions among these objects and the implicit associations among them. Therefore, Web search activities can be carried out based on CubeSVD analysis. Experimental evaluations using a real-world data set collected from an MSN search engine show that CubeSVD achieves encouraging search results in comparison with some standard methods."
	-- http://youtube.com/watch?v=VyiMW23OVNU (in russian)
	-- Brice - "Applications of Multilinear Algebra To World Wide Web Search"


[interesting industry publications]

Google - "Personalization for Google Now: User Understanding and Application to Information Recommendation and Exploration" [http://dl.acm.org/citation.cfm?id=2959192]
	"At the heart of any personalization application, such as Google Now, is a deep model for users. The understanding of users ranges from raw history to lower dimensional reductions like interest, locations, preferences, etc. We will discuss different representations of such user understanding. Going from understanding to application, we will talk about two broad applications recommendations of information and guided exploration - both in the context of Google Now. We will focus on such applications from an information retrieval perspective. Information recommendation then takes the form of biasing information retrieval, in response to a query or, in the limit, in a query-less application. Somewhere in between lies broad declaration of user intent, e.g., interest in food, and we will discuss how personalization and guided exploration play together to provide a valuable tool to the user. We will discuss valuable lessons learned along the way."
	-- https://youtube.com/watch?v=X9Fsn1j1CE8 (Thakur)

Google - "Using concepts as contexts for query term substitutions" [http://gofishdigital.com/investigating-google-rankbrain-and-query-term-substitutions/] (RankBrain)

Facebook - "Recommending Items to More than a Billion People" [https://code.facebook.com/posts/861999383875667/recommending-items-to-more-than-a-billion-people/]


<brylevkirill (at) gmail.com>