-
Notifications
You must be signed in to change notification settings - Fork 2
/
sigmod.tex
167 lines (152 loc) · 6.35 KB
/
sigmod.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
% THIS IS SIGPROC-SP.TEX - VERSION 3.1
% WORKS WITH V3.2SP OF ACM_PROC_ARTICLE-SP.CLS
% APRIL 2009
%
% It is an example file showing how to use the 'acm_proc_article-sp.cls' V3.2SP
% LaTeX2e document class file for Conference Proceedings submissions.
% ----------------------------------------------------------------------------------------------------------------
% This .tex file (and associated .cls V3.2SP) *DOES NOT* produce:
% 1) The Permission Statement
% 2) The Conference (location) Info information
% 3) The Copyright Line with ACM data
% 4) Page numbering
% ---------------------------------------------------------------------------------------------------------------
% It is an example which *does* use the .bib file (from which the .bbl file
% is produced).
% REMEMBER HOWEVER: After having produced the .bbl file,
% and prior to final submission,
% you need to 'insert' your .bbl file into your source .tex file so as to provide
% ONE 'self-contained' source file.
%
% Questions regarding SIGS should be sent to
% Adrienne Griscti ---> griscti@acm.org
%
% Questions/suggestions regarding the guidelines, .tex and .cls files, etc. to
% Gerald Murray ---> murray@hq.acm.org
%
% For tracking purposes - this is V3.1SP - APRIL 2009
\documentclass{acm_proc_article-sp}
\newtheorem{algorithm}{Algorithm}
\usepackage{algpseudocode}
\usepackage{varwidth}
\usepackage{graphicx}
\usepackage{color}
\usepackage{url}
\graphicspath{{./images}}
%\usepackage{hyperref}
%\hypersetup{colorlinks=true}
\begin{document}
\title{Scalable Learning of Tree-Based Models on Sparsely Representable Data}
%\subtitle{[Extended Abstract]
%\titlenote{A full version of this paper is available as
%\textit{Author's Guide to Preparing ACM SIG Proceedings Using
%\LaTeX$2_\epsilon$\ and BibTeX} at
%\texttt{www.acm.org/eaddress.htm}}}
%
% You need the command \numberofauthors to handle the 'placement
% and alignment' of the authors beneath the title.
%
% For aesthetic reasons, we recommend 'three authors at a time'
% i.e. three 'name/affiliation blocks' be placed beneath the title.
%
% NOTE: You are NOT restricted in how many 'rows' of
% "name/affiliations" may appear. We just ask that you restrict
% the number of 'columns' to three.
%
% Because of the available 'opening page real-estate'
% we ask you to refrain from putting more than six authors
% (two rows with three columns) beneath the article title.
% More than six makes the first-page appear very cluttered indeed.
%
% Use the \alignauthor commands to handle the names
% and affiliations for an 'aesthetic maximum' of six authors.
% Add names, affiliations, addresses for
% the seventh etc. author(s) as the argument for the
% \additionalauthors command.
% These 'additional authors' will be output/set for you
% without further effort on your part as the last section in
% the body of your article BEFORE References or any Appendices.
\numberofauthors{3} % in this sample file, there are a *total*
% of EIGHT authors. SIX appear on the 'first-page' (for formatting
% reasons) and the remaining two appear in the \additionalauthors section.
%
\author{
\alignauthor
Fares Hedayati\\
\affaddr{Elance-oDesk}\\
\email{fares19@elance-odesk.com}
% 2nd. author
\alignauthor
Arnauld Joly\\
\affaddr{University of Li{e}ge}\\
\email{a.joly@ulg.ac.be}
% 3rd. author
\alignauthor Panagiotis Papadimitriou\\
\affaddr{Elance-oDesk}\\
\email{papadimitriou@elance-odesk.com}
}
\date{01 December 2014}
% Just remember to make sure that the TOTAL number of authors
% is the number that will appear on the first page PLUS the
% number that will appear in the \additionalauthors section.
\maketitle
\begin{abstract}
Many machine learning tasks such as text annotation usually require
training over very big datasets, e.g., millions of web documents,
that can be represented in a sparse input space. State-of-the-art
tree-based ensemble algorithms cannot scale to such datasets, since
they include operations whose running time is a function of the
input space size rather than a function of the non-zero input
elements.
% and implementations have focused on tasks with dense input space,
% while at most emulating random memory access on sparse input data.
In this paper, we propose an efficient splitting algorithm to
leverage input sparsity within decision tree
methods.
% We exploit the a-priori knowledge that each column has very few
% non-zero elements and show how learning time is significantly
% decreased.
Our algorithm improves training time over sparse datasets by more
than two orders of magnitude and it has been incorporated in the current
version of \emph{scikit-learn}\footnote{http://scikit-learn.org},
the most popular open source Python machine learning library.
\end{abstract}
% A category with the (minimum) three required fields
%\category{H.4}{Information Systems Applications}{Miscellaneous}
%A category including the fourth, optional field follows...
%\category{D.2.8}{Software Engineering}{Metrics}[complexity measures, performance measures]
\terms{Algorithms, Experimentation, Performance}
\keywords{machine learning, classification trees, regression trees, sparse data} % NOT required for Proceedings
\input{intro.tex}
\input{dense.tex}
\input{sparse.tex}
\input{experiment.tex}
\input{related.tex}
\section{Conclusion}
\label{sec:conclusion}
We proposed a method for building tree-based models with sparse input
support. Our method takes advantage of input sparsity by avoiding
sorting sample sets of a node along a feature unless they are nonzero
at that feature. This approach speeds up training substantially as
sorting is a a costly but an essential and ubiquitous component of
tree-based models.
\section{Acknowledgment} % use section* for acknowledgement
Arnaud Joly is a research fellow of the FNRS, Belgium. This work is
partially supported by PASCAL2 and the IUAP DYSCO, initiated by the
Belgian State, Science Policy Office.
%
% The following two commands are all you need in the
% initial runs of your .tex file to
% produce the bibliography for the citations in your paper.
\bibliographystyle{abbrv}
\bibliography{references} % sigproc.bib is the name of the Bibliography in this case
% You must have a proper ".bib" file
% and remember to run:
% latex bibtex latex latex
% to resolve all references
%
% ACM needs 'a single self-contained file'!
%
%APPENDICES are optional
\balancecolumns
\end{document}