%
% File acl2019.tex
%
%% Based on the style files for ACL 2018, NAACL 2018/19, which were
%% Based on the style files for ACL-2015, with some improvements
%%  taken from the NAACL-2016 style
%% Based on the style files for ACL-2014, which were, in turn,
%% based on ACL-2013, ACL-2012, ACL-2011, ACL-2010, ACL-IJCNLP-2009,
%% EACL-2009, IJCNLP-2008...
%% Based on the style files for EACL 2006 by 
%%e.agirre@ehu.es or Sergi.Balari@uab.es
%% and that of ACL 08 by Joakim Nivre and Noah Smith
\documentclass[11pt,a4paper]{article}
\usepackage[hyperref]{acl2019}
\usepackage{times}
\usepackage{latexsym}
\usepackage{graphicx}
\usepackage{url}
\aclfinalcopy % Uncomment this line for the final submission
%\def\aclpaperid{***} %  Enter the acl Paper ID here
%\setlength\titlebox{5cm}
% You can expand the titlebox if you need extra space
% to show all the authors. Please do not make the titlebox
% smaller than 5cm (the original size); we will check this
% in the camera-ready version and ask you to change it back.
\newcommand\BibTeX{B\textsc{ib}\TeX}
\title{CSCI662 HW1: Linear Classifiers}
\author{Divya Choudhary\\
  MS in Computer Science at University of Southern California/Los Angeles, CA, US \\
  \texttt{dchoudha@usc.edu} \\\
%   And
%   Second Author \\
%   Affiliation / Address line 1 \\
%   Affiliation / Address line 2 \\
%   Affiliation / Address line 3 \\
%   \texttt{email@domain} \\
}
\date{09-30-2019}
\begin{document}
\maketitle
\begin{abstract}
  Text classification is one of most common tasks in the field of natural language processing. It had solution for important practical problems like spam vs ham classification, information retrieval and others. The complexity of text data has been increasing with the constant growth of digital platforms, new lingo, writing styles etc. This necessitates a much deeper understanding of both the data and models in order to be able to classify these complex text correctly. Selection of the model to be used for a particular task is the most trivial and complex task any machine learning project.This document, \textit{Comparative Study of Linear Text Classifiers} provides a comparative study of multiple linear classifiers across different data sets. The report dives into aspects of general Machine Learning algorithm building steps, feature engineering, nuances of textual data, model's data dependence and a comparison report of results of multiple linear classifiers. The document report has been compiled with comparison between 3 models- Naive Bayes, Perceptron, Logistic Regression. 
\end{abstract}
\section{Problem Statement}
Text classification is an important field of text mining. In a classical , text classification, a new class label is predicted and assigned to each piece of textual data that can be either a document or a sentence. Any text classification task mostly involves following tasks - data processing, features extraction, model building \& evaluation. There are 4 varied text data sets - \textit{authors, products, 4dim, urdu} that need to be classified. For this, efforts have been made to go slightly beyond the usual \emph{"it depends"} answer to \textbf{"which model to use for classification?"}.There are many state of the art text classifiers like Decision Trees, SVM, Random Forest, Perceptron, Naive Bayes, Logistic regression etc. This report is concentrated on the last three classification models for classifying the above mentioned data sets. It also gives a comparative study of performance of models when they are coded from scratch 'vs' when they are used directly from the state of the art libraries like \textit{sklearn}. It is \textit{important to note} that the features have been kept constant for these models for a fair comparison of the models with a fixed set of features. 
\section{Data Description}
There are 4 different data sets namely - \textit{'authors', 'products', '4dim', 'urdu'}. Each data set consists of lines that's text and label separated by a tab. Data sets are of varying sizes as shown in plots below:
\begin{figure}[h!]
  \caption{All data sets used}
  \centering
  \includegraphics[width=0.5\textwidth]{all_data_sets.png}
\end{figure}
A brief description of the data:
\begin{itemize}
\item authors: Short lines of English poetry by either Emily Bronte (b) or William Shakespeare (s).\# 3665
\item products: Very variable length lines of various kinds of English product reviews that are either positive(pos) or negative (neg).\# 32,592
\item 4dim: English reviews of variable length that are positive or negative and truthful or deceptive (pos.tru,pos.dec, neg.tru, neg.dec).\# 1560
\item  urdu: Urdu BBC news articles either about Pakistan (pakistan) or the rest of the world (world).\# 220
\end{itemize}
\textbf{This paper, for the sake of simplicity, focuses on just one data -'authors'.}
Machine Learning models need to be tested on unseen data to gauge their true on-field performance. Also, we need a subset of the data to be used as development set in order to tune model parameters. So, I divided each of these data into 3 parts in the 80:10:10 ratio for \textit{training data, development data and test data} respectively. 
\section{Fetaure Extraxtion}
To be able to make sense of the data, all of data sets are engineered for features. Features are concise of describing the textual data numerically.
\subsection{Data Pre-Processing}
Processing of the data to plays a vital role in understanding the data as well as making the model more robust. Text data might have erroneous characters or words or sometimes even sentences that don't make sense and should be removed from the data for robustness.For example characters like '\'', ';', '()', '{}' etc. can be replace by empty space if they don't add any understanding to the corpus. Removal of stop words from the vocabulary also helps in catering to the long tail issue of the textual data.
\begin{figure}[h!]
  \caption{Distribution of words frequencies before pre-processing in 'authors'}
  \centering
  \includegraphics[width=0.5\textwidth]{authors_before_preprocess.png}
\end{figure}
For \textit{'authors'} data set, I looked at the histogram of words before the pre-processing and it very clearly had a long tail issue. This is in accordance with the \textit{Zipf's law} for the any corpus of natural language. As a pre-processing step apart from removal of space, stop-words, punctuation's etc., I addressed the long tail issue of the data by removing top words constituting the top 10 percent of the entire word frequency. This pre-process step ensures our classification algorithm is not misguided by the artifacts of the grammar of a particular language (In general, an English corpus will see more determiners/pronouns/prepositions than actual verbs). The post processing data doesn't resemble normal distribution exactly but is much better in avoiding the long tail issue as shown earlier.
\begin{figure}[h!]
  \caption{Distribution of words frequencies post pre-processing in 'authors'}
  \centering
  \includegraphics[width=0.5\textwidth]{authors_post_process.png}
\end{figure}
\subsection{Feature Engineering}
Features are extracted from the textual data to capture the quirks of data set. Data domain expertise plays a very vital role in the feature identification. Features should be such that when combined together should be able to explain the data set well. The pipeline input consists of raw data set D = [X1, X2, ...Xn]. We identified a few basic features for 'authors' data after processing each X. \textit{Note that in the interest of time, same features have been used across data sets}. These are:
\begin{itemize}
\item Length of text
\item Count of all words, Count of unique words
\item Average word length used, Maximum word length used
\item Uppercase word counts, Title word counts, Punctuation counts etc
\end{itemize}
These features are very basic, more advanced features like POS tags frequency counts, n-gram based features, term frequency inverse document frequency(TF-IDF), word or sentence embedding etc. could result have given even better results.
\section{Modeling}
To better understand the modeling comparison and leanings mentioned in below sections, here are a few \textbf{assumptions that should be kept in mind}:
\begin{itemize}
    \item Features had been generated only for one data set 'authors' for one model 'perceptron'. But same features were used across all data sets and models(except NB BOW and NB TF-IDF). It's not generalised and could lead to poor accuracy.
    \item The intent of the exercise was to implement models from scratch and understand the impact of various features, pre-processing on the model 
    \item The goal was \textbf{not} to optimize the accuracy of the model but to understand why is the accuracy low and what can be done to boost it, I have identified enhancements and tried implementing them through pre-built functionalities of nltk, sklearn etc.
\end{itemize}
\subsection{Perceptron}
Perceptron model is a linear classifier that can learn from its mistakes and also doesn't have the feature independence assumption of Naive Bayes. A perceptron model starts with a random weight to features and updates its weights based on every error in the prediction.Every time, the model makes a mistake, the weight vector is shifted more towards the optimal weight vector. There is no reward if the prediction is right. I implemented \textit{batch} update perceptron model. Feature set used for the model was the same as mentioned above. With 1200 iterations, I got an accuracy of \textbf{81.3\%} on 'authors' data. I used the same feature set to generate the score of 'sklearn Perceptron' model- \textbf{78.2\%}. This clearly indicates that my Perceptron model implementation is at par or even better with the sklearn's implemenation. This is because sklearn has auto assigned the maximum number of iterations to 1000 giving accuracy of 78.2\% while I got an accuracy of 81.3\% for number of iterations for batch update being 1200. Although, this accuracy is test accuracy but this might be overfitting the data given the dimension of training data is just 2968. It was observed that if we set the number of iterations to be really low or really high, the test accuracy of the model goes down because of the problem of underfitting and overfitting respectively(the highest accuracy is in the middle, as can be seen in Fig 4).
\begin{figure}[h!]
  \caption{Change in test accuracy with number of iterations used for 'authors' data}
  \centering
  \includegraphics[width=0.5\textwidth]{perceptron_num_iterations.png}
\end{figure}
Data is underfitted with Perceptron for number of iterations less than 300. Test accuracy gradually increases initially, shoots up to 82\% and then again recedes with increase in number of iterations. It's clear from the plot that the stable range of test accuracy lies around 76\% and should be preferred to avoid overfitting or underfitting issues. 
As the Perceptron model is able to fit the data really well, 'authors' data must be linearly separable to a large extent.\textbf{Future work on perceptron:} study the difference in the linear separability of data sets.
\subsection{Naive Bayes}
I implemented the Naive Bayes model first with a few basic features that I finalised above for 'authors' data set.
\textbf{First approach:} I used a version of the algorithm that supports numeric attributes and assumes the values of each numerical attribute are normally distributed. This is a strong assumption, but I wanted to check the result. I calculated summary stats(mean and standard deviation) of each numerical feature by class values.We can use a Gaussian function to estimate the probability of a given attribute value for a class, given the known mean and standard deviation for the attribute estimated from the training data.Now that we can calculate the probability of an attribute belonging to a class, we can combine the probabilities of all of the attribute values for a data instance and come up with a probability of the entire data instance belonging to the class. And the class that has the highest probability for the given data point is actual prediction.This of course, gave me a poor accuracy of \textbf{53\%} on the validation set. Running 'MultinomialNB' model from \textbf{sklearn},the accuracy was \textbf{59.8\%} with the same feature set as used for my hand-coded model. This slight difference was expected given the lack of efficient processing in my model creation.So, overall my hand coded model was working fairly good when compared  to sklearn model with same features.
\textbf{Second approach:} Next, I tried Bag of Words approach to be able to classify the data.In bag of words, a sentence is considered to be made of words and the probability of sentence given a class is given as product of probability of each word given the class. This is fit into the Bayes rule to get the most probable class for a given set of words forming a sentence.I am using 'add 1' smoothing to handle for unseen words in the data. The accuracy obtained with this model was \textbf{56.5\%}. The poor performance with the Bag of Words model is majorly contributed by poor pre-processing of the data. After investigating into the incorrect classifications I have found out following \textbf{problems} that needed to be improved:
\begin{itemize}
    \item there are multiple phrases connected by '-' and characters other than ' ' which could not be split into words during pre-processing
    \item the removal of just the top 10\% of words from the corpus is not completely handling the long tail problem
    \item counting just the frequency of occurrence of words has the problem of not generalizing importance of the word for the sentence. A more frequent word occurring across sentences should have lower weight in identifying a specific sentence
    \item \textbf{Naive Bayes model works better} for large data sets like 'products' where it can learn variations
\end{itemize} 
\textbf{Third approach:} Next, I wanted to use \textbf{better processing of the data}. This can be very easily achieved functionalities of NLTK library like 'tokenizer'. Words with very high frequencies were messing with the prediction although they were not the unique identifier of the sentence. So, I also wanted to incorporate the concept of weight to words in identifying a sentence based on Term Frequency and Inverse Document Frequency(TF-IDF). The document in our case actually represents sentences.\textit{I have implemented TF-IDF Naive Bayes model too}. But the \textbf{problem} was the training computational considerations, my TF-IDF implementation considered only top 200 words for the most frequent vocabulary which is \textbf{not} reflective of the actual scenario and hence accuracy is low. In order to quickly check this hypothesis, I used the 'sklearn' module to implement this considering all words for the TF-IDF and the accuracy achieved was \textbf{82.6\%}.
\subsection{Logistic Regression}
Unlike Perceptron, Logistic regression updates the weight vector of features using probability. There are many possible hyper-planes that can separate data and perceptron stops when it finds any of these hyper-planes. The \textbf{hyper-plane found by perceptron might not be the optimal} hyper-plane. That's why Logistic Regression is preferred since it gives the most optimal hyper-plane separating the data. Logistic regression minimizes the logistic loss. It updates the weight with every prediction in terms of reward or punishment. I implemented the logistic regression model using the same set of features as created for 'authors' for Perceptron model. The accuracy of my logistic regression model was \textbf{74.8\%} with number of iterations for batch update being 1500. I compared it with the accuracy obtained my implementing sklearn's 'LogisticRegression' which was \textbf{78.2\%} with the same feature set and number of iterations.
\begin{figure}[h!]
  \caption{Change in test accuracy with number of iterations in Logistic Regression used for 'authors' data}
  \centering
  \includegraphics[width=0.5\textwidth]{Logisticregression_accuracy.png}
\end{figure}
The loss is really high in the beginning of the iteration and gradually decreases with the progress in the epoch and then stabilises towards the end of the epoch. Similar to perceptron, the accuracy is low with the lower number of epochs, increases with increase in epochs. As the number of epochs go beyond the size of the data itself, it's highly overfitted and the accuracy is not entirely trustworthy.
% \subsection{The Ruler}
% The ACL 2019 style defines a printed ruler which should be presented in the
% version submitted for review.  The ruler is provided in order that
% reviewers may comment on particular lines in the paper without
% circumlocution.  If you are preparing a document without the provided
% style files, please arrange for an equivalent ruler to
% appear on the final output pages.  The presence or absence of the ruler
% should not change the appearance of any other content on the page.  The
% camera ready copy should not contain a ruler. (\LaTeX\ users may uncomment the {\small\verb|\aclfinalcopy|} command in the document preamble.)  
% Reviewers: note that the ruler measurements do not align well with
% lines in the paper -- this turns out to be very difficult to do well
% when the paper contains many figures and equations, and, when done,
% looks ugly. In most cases one would expect that the approximate
% location will be adequate, although you can also use fractional
% references (\emph{e.g.}, the first paragraph on this page ends at mark $108.5$).
% \subsection{Electronically-available resources}
% ACL provides this description in \LaTeX2e{} (\texttt{\small acl2019.tex}) and PDF
% format (\texttt{\small acl2019.pdf}), along with the \LaTeX2e{} style file used to
% format it (\texttt{\small acl2019.sty}) and an ACL bibliography style (\texttt{\small acl\_natbib.bst})
% and example bibliography (\texttt{\small acl2019.bib}).
% These files are all available at
% \texttt{\small http://acl2019.org/downloads/ acl2019-latex.zip}. 
%  We
% strongly recommend the use of these style files, which have been
% appropriately tailored for the ACL 2019 proceedings.
% \subsection{Format of Electronic Manuscript}
% \label{sect:pdf}
% For the production of the electronic manuscript you must use Adobe's
% Portable Document Format (PDF). PDF files are usually produced from
% \LaTeX\ using the \textit{pdflatex} command. If your version of
% \LaTeX\ produces Postscript files, you can convert these into PDF
% using \textit{ps2pdf} or \textit{dvipdf}. On Windows, you can also use
% Adobe Distiller to generate PDF.
% Please make sure that your PDF file includes all the necessary fonts
% (especially tree diagrams, symbols, and fonts with Asian
% characters). When you print or create the PDF file, there is usually
% an option in your printer setup to include none, all or just
% non-standard fonts.  Please make sure that you select the option of
% including ALL the fonts. \textbf{Before sending it, test your PDF by
%   printing it from a computer different from the one where it was
%   created.} Moreover, some word processors may generate very large PDF
% files, where each page is rendered as an image. Such images may
% reproduce poorly. In this case, try alternative ways to obtain the
% PDF. One way on some systems is to install a driver for a postscript
% printer, send your document to the printer specifying ``Output to a
% file'', then convert the file to PDF.
% It is of utmost importance to specify the \textbf{A4 format} (21 cm
% x 29.7 cm) when formatting the paper. When working with
% \texttt{dvips}, for instance, one should specify \texttt{-t a4}.
% Or using the command \verb|\special{papersize=210mm,297mm}| in the latex
% preamble (directly below the \verb|\usepackage| commands). Then using 
% \texttt{dvipdf} and/or \texttt{pdflatex} which would make it easier for some.
% Print-outs of the PDF file on A4 paper should be identical to the
% hardcopy version. If you cannot meet the above requirements about the
% production of your electronic submission, please contact the
% publication chairs as soon as possible.
% \subsection{Layout}
% \label{ssec:layout}
% Format manuscripts two columns to a page, in the manner these
% instructions are formatted. The exact dimensions for a page on A4
% paper are:
% \begin{itemize}
% \item Left and right margins: 2.5 cm
% \item Top margin: 2.5 cm
% \item Bottom margin: 2.5 cm
% \item Column width: 7.7 cm
% \item Column height: 24.7 cm
% \item Gap between columns: 0.6 cm
% \end{itemize}
% \noindent Papers should not be submitted on any other paper size.
%  If you cannot meet the above requirements about the production of 
%  your electronic submission, please contact the publication chairs 
%  above as soon as possible.
% \subsection{Fonts}
% For reasons of uniformity, Adobe's \textbf{Times Roman} font should be
% used. In \LaTeX2e{} this is accomplished by putting
% \begin{quote}
% \begin{verbatim}
% \usepackage{times}
% \usepackage{latexsym}
% \end{verbatim}
% \end{quote}
% in the preamble. If Times Roman is unavailable, use \textbf{Computer
%   Modern Roman} (\LaTeX2e{}'s default).  Note that the latter is about
%   10\% less dense than Adobe's Times Roman font.
% \begin{table}[t!]
% \begin{center}
% \begin{tabular}{|l|rl|}
% \hline \textbf{Type of Text} & \textbf{Font Size} & \textbf{Style} \\ \hline
% paper title & 15 pt & bold \\
% author names & 12 pt & bold \\
% author affiliation & 12 pt & \\
% the word ``Abstract'' & 12 pt & bold \\
% section titles & 12 pt & bold \\
% subsection titles & 11 pt & bold \\
% document text & 11 pt  &\\
% captions & 10 pt & \\
% abstract text & 10 pt & \\
% bibliography & 10 pt & \\
% footnotes & 9 pt & \\
% \hline
% \end{tabular}
% \end{center}
% \caption{\label{font-table} Font guide. }
% \end{table}
% \subsection{The First Page}
% \label{ssec:first}
% Center the title, author's name(s) and affiliation(s) across both
% columns. Do not use footnotes for affiliations. Do not include the
% paper ID number assigned during the submission process. Use the
% two-column format only when you begin the abstract.
% \textbf{Title}: Place the title centered at the top of the first page, in
% a 15-point bold font. (For a complete guide to font sizes and styles,
% see Table~\ref{font-table}) Long titles should be typed on two lines
% without a blank line intervening. Approximately, put the title at 2.5
% cm from the top of the page, followed by a blank line, then the
% author's names(s), and the affiliation on the following line. Do not
% use only initials for given names (middle initials are allowed). Do
% not format surnames in all capitals (\emph{e.g.}, use ``Mitchell'' not
% ``MITCHELL'').  Do not format title and section headings in all
% capitals as well except for proper names (such as ``BLEU'') that are
% conventionally in all capitals.  The affiliation should contain the
% author's complete address, and if possible, an electronic mail
% address. Start the body of the first page 7.5 cm from the top of the
% page.
% The title, author names and addresses should be completely identical
% to those entered to the electronical paper submission website in order
% to maintain the consistency of author information among all
% publications of the conference. If they are different, the publication
% chairs may resolve the difference without consulting with you; so it
% is in your own interest to double-check that the information is
% consistent.
% \textbf{Abstract}: Type the abstract at the beginning of the first
% column. The width of the abstract text should be smaller than the
% width of the columns for the text in the body of the paper by about
% 0.6 cm on each side. Center the word \textbf{Abstract} in a 12 point bold
% font above the body of the abstract. The abstract should be a concise
% summary of the general thesis and conclusions of the paper. It should
% be no longer than 200 words. The abstract text should be in 10 point font.
% \textbf{Text}: Begin typing the main body of the text immediately after
% the abstract, observing the two-column format as shown in the present document. Do not include page numbers.
% \textbf{Indent}: Indent when starting a new paragraph, about 0.4 cm. Use 11 points for text and subsection headings, 12 points for section headings and 15 points for the title. 
% \begin{table}
% \centering
% \small
% \begin{tabular}{cc}
% \begin{tabular}{|l|l|}
% \hline
% \textbf{Command} & \textbf{Output}\\\hline
% \verb|{\"a}| & {\"a} \\
% \verb|{\^e}| & {\^e} \\
% \verb|{\`i}| & {\`i} \\ 
% \verb|{\.I}| & {\.I} \\ 
% \verb|{\o}| & {\o} \\
% \verb|{\'u}| & {\'u}  \\ 
% \verb|{\aa}| & {\aa}  \\\hline
% \end{tabular} & 
% \begin{tabular}{|l|l|}
% \hline
% \textbf{Command} & \textbf{Output}\\\hline
% \verb|{\c c}| & {\c c} \\ 
% \verb|{\u g}| & {\u g} \\ 
% \verb|{\l}| & {\l} \\ 
% \verb|{\~n}| & {\~n} \\ 
% \verb|{\H o}| & {\H o} \\ 
% \verb|{\v r}| & {\v r} \\ 
% \verb|{\ss}| & {\ss} \\\hline
% \end{tabular}
% \end{tabular}
% \caption{Example commands for accented characters, to be used in, \emph{e.g.}, \BibTeX\ names.}\label{tab:accents}
% \end{table}
% \subsection{Sections}
% \textbf{Headings}: Type and label section and subsection headings in the
% style shown on the present document.  Use numbered sections (Arabic
% numerals) in order to facilitate cross references. Number subsections
% with the section number and the subsection number separated by a dot,
% in Arabic numerals.
% Do not number subsubsections.
% \begin{table*}[t!]
% \centering
% \begin{tabular}{lll}
%   output & natbib & previous ACL style files\\
%   \hline
%   \citep{Gusfield:97} & \verb|\citep| & \verb|\cite| \\
%   \citet{Gusfield:97} & \verb|\citet| & \verb|\newcite| \\
%   \citeyearpar{Gusfield:97} & \verb|\citeyearpar| & \verb|\shortcite| \\
% \end{tabular}
% \caption{Citation commands supported by the style file.
%   The citation style is based on the natbib package and
%   supports all natbib citation commands.
%   It also supports commands defined in previous ACL style files
%   for compatibility.
%   }
% \end{table*}
% \textbf{Citations}: Citations within the text appear in parentheses
% as~\cite{Gusfield:97} or, if the author's name appears in the text
% itself, as Gusfield~\shortcite{Gusfield:97}.
% Using the provided \LaTeX\ style, the former is accomplished using
% {\small\verb|\cite|} and the latter with {\small\verb|\shortcite|} or {\small\verb|\newcite|}. Collapse multiple citations as in~\cite{Gusfield:97,Aho:72}; this is accomplished with the provided style using commas within the {\small\verb|\cite|} command, \emph{e.g.}, {\small\verb|\cite{Gusfield:97,Aho:72}|}. Append lowercase letters to the year in cases of ambiguities.  
%  Treat double authors as
% in~\cite{Aho:72}, but write as in~\cite{Chandra:81} when more than two
% authors are involved. Collapse multiple citations as
% in~\cite{Gusfield:97,Aho:72}. Also refrain from using full citations
% as sentence constituents.
% We suggest that instead of
% \begin{quote}
%   ``\cite{Gusfield:97} showed that ...''
% \end{quote}
% you use
% \begin{quote}
% ``Gusfield \shortcite{Gusfield:97}   showed that ...''
% \end{quote}
% If you are using the provided \LaTeX{} and Bib\TeX{} style files, you
% can use the command \verb|\citet| (cite in text)
% to get ``author (year)'' citations.
% You can use the command \verb|\citealp| (alternative cite without 
% parentheses) to get ``author year'' citations (which is useful for 
% using citations within parentheses, as in \citealp{Gusfield:97}).
% If the Bib\TeX{} file contains DOI fields, the paper
% title in the references section will appear as a hyperlink
% to the DOI, using the hyperref \LaTeX{} package.
% To disable the hyperref package, load the style file
% with the \verb|nohyperref| option: \\{\small
% \verb|\usepackage[nohyperref]{acl2019}|}
% \textbf{Compilation Issues}: Some of you might encounter the following error during compilation: 
% ``{\em \verb|\pdfendlink| ended up in different nesting level than \verb|\pdfstartlink|.}''
% This happens when \verb|pdflatex| is used and a citation splits across a page boundary. To fix this, the style file contains a patch consisting of the following two lines: (1) \verb|\RequirePackage{etoolbox}| (line 454 in \texttt{acl2019.sty}), and (2) A long line below (line 455 in \texttt{acl2019.sty}).
% If you still encounter compilation issues even with the patch enabled, disable the patch by commenting the two lines, and then disable the \verb|hyperref| package (see above), recompile and see the problematic citation.
% Next rewrite that sentence containing the citation. (See, {\em e.g.}, {\small\tt http://tug.org/errors.html})
% \textbf{Digital Object Identifiers}:  As part of our work to make ACL
% materials more widely used and cited outside of our discipline, ACL
% has registered as a CrossRef member, as a registrant of Digital Object
% Identifiers (DOIs), the standard for registering permanent URNs for
% referencing scholarly materials.  As of 2017, we are requiring all
% camera-ready references to contain the appropriate DOIs (or as a
% second resort, the hyperlinked ACL Anthology Identifier) to all cited
% works.  Thus, please ensure that you use Bib\TeX\ records that contain
% DOI or URLs for any of the ACL materials that you reference.
% Appropriate records should be found for most materials in the current
% ACL Anthology at \url{http://aclanthology.info/}.
% As examples, we cite \cite{P16-1001} to show you how papers with a DOI
% will appear in the bibliography.  We cite \cite{C14-1001} to show how
% papers without a DOI but with an ACL Anthology Identifier will appear
% in the bibliography.  
% As reviewing will be double-blind, the submitted version of the papers
% should not include the authors' names and affiliations. Furthermore,
% self-references that reveal the author's identity, \emph{e.g.},
% \begin{quote}
% ``We previously showed \cite{Gusfield:97} ...''  
% \end{quote}
% should be avoided. Instead, use citations such as 
% \begin{quote}
% ``\citeauthor{Gusfield:97} \shortcite{Gusfield:97}
% previously showed ... ''
% \end{quote}
% Any preliminary non-archival versions of submitted papers should be listed in the submission form but not in the review version of the paper. ACL 2019 reviewers are generally aware that authors may present preliminary versions of their work in other venues, but will not be provided the list of previous presentations from the submission form. 
% \textbf{Please do not use anonymous citations} and do not include
%  when submitting your papers. Papers that do not
% conform to these requirements may be rejected without review.
% \textbf{References}: Gather the full set of references together under
% the heading \textbf{References}; place the section before any Appendices. 
% Arrange the references alphabetically
% by first author, rather than by order of occurrence in the text.
% By using a .bib file, as in this template, this will be automatically 
% handled for you. See the \verb|\bibliography| commands near the end for more.
% Provide as complete a citation as possible, using a consistent format,
% such as the one for \emph{Computational Linguistics\/} or the one in the 
% \emph{Publication Manual of the American 
% Psychological Association\/}~\cite{APA:83}. Use of full names for
% authors rather than initials is preferred. A list of abbreviations
% for common computer science journals can be found in the ACM 
% \emph{Computing Reviews\/}~\cite{ACM:83}.
% The \LaTeX{} and Bib\TeX{} style files provided roughly fit the
% American Psychological Association format, allowing regular citations, 
% short citations and multiple citations as described above.  
% \begin{itemize}
% \item Example citing an arxiv paper: \cite{rasooli-tetrault-2015}. 
% \item Example article in journal citation: \cite{Ando2005}.
% \item Example article in proceedings, with location: \cite{borsch2011}.
% \item Example article in proceedings, without location: \cite{andrew2007scalable}.
% \end{itemize}
% See corresponding .bib file for further details.
% Submissions should accurately reference prior and related work, including code and data. If a piece of prior work appeared in multiple venues, the version that appeared in a refereed, archival venue should be referenced. If multiple versions of a piece of prior work exist, the one used by the authors should be referenced. Authors should not rely on automated citation indices to provide accurate references for prior and related work.
% \textbf{Appendices}: Appendices, if any, directly follow the text and the
% references (but see above).  Letter them in sequence and provide an
% informative title: \textbf{Appendix A. Title of Appendix}.
% \subsection{Footnotes}
% \textbf{Footnotes}: Put footnotes at the bottom of the page and use 9
% point font. They may be numbered or referred to by asterisks or other
% symbols.\footnote{This is how a footnote should appear.} Footnotes
% should be separated from the text by a line.\footnote{Note the line
% separating the footnotes from the text.}
% \subsection{Graphics}
% \textbf{Illustrations}: Place figures, tables, and photographs in the
% paper near where they are first discussed, rather than at the end, if
% possible.  Wide illustrations may run across both columns.  Color
% illustrations are discouraged, unless you have verified that  
% they will be understandable when printed in black ink.
% \textbf{Captions}: Provide a caption for every illustration; number each one
% sequentially in the form:  ``Figure 1. Caption of the Figure.'' ``Table 1.
% Caption of the Table.''  Type the captions of the figures and 
% tables below the body, using 10 point text. Captions should be placed below illustrations. Captions that are one line are centered (see Table~\ref{font-table}). Captions longer than one line are left-aligned (see Table~\ref{tab:accents}). Do not overwrite the default caption sizes. The acl2019.sty file is compatible with the caption and subcaption packages; do not add optional arguments.
% \subsection{Accessibility}
% \label{ssec:accessibility}
% In an effort to accommodate people who are color-blind (as well as those printing
% to paper), grayscale readability for all accepted papers will be
% encouraged.  Color is not forbidden, but authors should ensure that
% tables and figures do not rely solely on color to convey critical
% distinctions. A simple criterion: All curves and points in your figures should be clearly distinguishable without color.
% Min: no longer used as of ACL 2018, following ACL exec's decision to
% remove this extra workflow that was not executed much.
% BEGIN: remove
%% \section{XML conversion and supported \LaTeX\ packages}
%% Following ACL 2014 we will also we will attempt to automatically convert 
%% your \LaTeX\ source files to publish papers in machine-readable 
%% XML with semantic markup in the ACL Anthology, in addition to the 
%% traditional PDF format.  This will allow us to create, over the next 
%% few years, a growing corpus of scientific text for our own future research, 
%% and picks up on recent initiatives on converting ACL papers from earlier 
%% years to XML. 
%% We encourage you to submit a ZIP file of your \LaTeX\ sources along
%% with the camera-ready version of your paper. We will then convert them
%% to XML automatically, using the LaTeXML tool
%% (\url{http://dlmf.nist.gov/LaTeXML}). LaTeXML has \emph{bindings} for
%% a number of \LaTeX\ packages, including the ACL 2018 stylefile. These
%% bindings allow LaTeXML to render the commands from these packages
%% correctly in XML. For best results, we encourage you to use the
%% packages that are officially supported by LaTeXML, listed at
%% \url{http://dlmf.nist.gov/LaTeXML/manual/included.bindings}
% END: remove
% \section{Translation of non-English Terms}
% It is also advised to supplement non-English characters and terms
% with appropriate transliterations and/or translations
% since not all readers understand all such characters and terms.
% Inline transliteration or translation can be represented in
% the order of: original-form transliteration ``translation''.
% \section{Length of Submission}
% \label{sec:length}
% The ACL 2019 main conference accepts submissions of long papers and
% short papers.
%  Long papers may consist of up to eight (8) pages of
% content plus unlimited pages for references. Upon acceptance, final
% versions of long papers will be given one additional page -- up to nine (9)
% pages of content plus unlimited pages for references -- so that reviewers' comments
% can be taken into account. Short papers may consist of up to four (4)
% pages of content, plus unlimited pages for references. Upon
% acceptance, short papers will be given five (5) pages in the
% proceedings and unlimited pages for references. 
% For both long and short papers, all illustrations and tables that are part
% of the main text must be accommodated within these page limits, observing
% the formatting instructions given in the present document. Papers that do not conform to the specified length and formatting requirements are subject to be rejected without review.
% ACL 2019 does encourage the submission of additional material that is relevant to the reviewers but not an integral part of the paper. There are two such types of material: appendices, which can be read, and non-readable supplementary materials, often data or code.  Do not include this additional material in the same document as your main paper. Additional material must be submitted as one or more separate files, and must adhere to the same anonymity guidelines as the main paper. The paper must be self-contained: it is optional for reviewers to look at the supplementary material. Papers should not refer, for further detail, to documents, code or data resources that are not available to the reviewers. Refer to Appendix~\ref{sec:appendix} and Appendix~\ref{sec:supplemental} for further information. 
% Workshop chairs may have different rules for allowed length and
% whether supplemental material is welcome. As always, the respective
% call for papers is the authoritative source.
% \section*{Acknowledgments}
% The acknowledgments should go immediately before the references.  Do
% not number the acknowledgments section. Do not include this section
% when submitting your paper for review. \\
% \noindent \textbf{Preparing References:} \\
% Include your own bib file like this:
% \verb|\bibliographystyle{acl_natbib}|
% \verb|\bibliography{acl2019}| 
% where \verb|acl2019| corresponds to a acl2019.bib file.
% \bibliography{acl2019}
% \bibliographystyle{acl_natbib}
% \appendix
% \section{Appendices}
% \label{sec:appendix}
% Appendices are material that can be read, and include lemmas, formulas, proofs, and tables that are not critical to the reading and understanding of the paper. 
% Appendices should be \textbf{uploaded as supplementary material} when submitting the paper for review. Upon acceptance, the appendices come after the references, as shown here. Use
% \verb|\appendix| before any appendix section to switch the section
% numbering over to letters.
% \section{Supplemental Material}
% \label{sec:supplemental}
% Submissions may include non-readable supplementary material used in the work and described in the paper. Any accompanying software and/or data should include licenses and documentation of research review as appropriate. Supplementary material may report preprocessing decisions, model parameters, and other details necessary for the replication of the experiments reported in the paper. Seemingly small preprocessing decisions can sometimes make a large difference in performance, so it is crucial to record such decisions to precisely characterize state-of-the-art methods. 
% Nonetheless, supplementary material should be supplementary (rather
% than central) to the paper. \textbf{Submissions that misuse the supplementary 
% material may be rejected without review.}
% Supplementary material may include explanations or details
% of proofs or derivations that do not fit into the paper, lists of
% features or feature templates, sample inputs and outputs for a system,
% pseudo-code or source code, and data. (Source code and data should
% be separate uploads, rather than part of the paper).
% The paper should not rely on the supplementary material: while the paper
% may refer to and cite the supplementary material and the supplementary material will be available to the
% reviewers, they will not be asked to review the
% supplementary material.
\end{document}