\documentclass[letterpaper, 10 pt, conference]{ieeeconf} % Comment this line out
% if you need a4paper
%\documentclass[a4paper, 10pt, conference]{ieeeconf} % Use this line for a4
% paper
\IEEEoverridecommandlockouts % This command is only
% needed if you want to
% use the \thanks command
\overrideIEEEmargins
\usepackage{graphicx}
\usepackage{lipsum}
\usepackage{xcolor}
\usepackage{hyperref}
\graphicspath{ {images/} }
\title{%
MT-Adapted Datasheet for Datasets Template \\
}
\begin{document}
\maketitle
\thispagestyle{empty}
\pagestyle{empty}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Disclaimer}
\textcolor{blue}{{This Datasheet has been inspired by \cite{Datasheets4Datasets} and modified as proposed by \cite{costajussa:2020} and it is not filled out by the dataset creator. Therefore it is strongly recommended to only make use of this if the creator has not filled in a proper datasheet or to use it in combination.
It is required that writers indicate their personal and contact data as well as the date this datasheet was last reviewed hereunder. Please, also remember to change the datasheet title to the name of the datatset in question.}}
\lipsum[1]
\\
\section{Motivation}
\textcolor{blue}{\subsection{Who created the dataset(e.g., which team, research group) and on behalf of which entity (e.g. company, institution, organization)?}}
\lipsum[1]
\textcolor{blue}{\subsection{Did they fund it themselves? If there is an associated grant, please provide the name of the grantor and the grant name and number.}}
\lipsum[1]
\textcolor{blue}{\subsection{For what purpose was the data set created? Was there a specific task in mind? If so, please specify the result type ( e.g. unit ) to be expected.}}
\lipsum[1]
\textcolor{blue}{\subsection{Could any of these uses, or their results, interfere with human will or communicate a false reality?}}
\lipsum[1]
\textcolor{blue}{\subsection{What is the antiquity of the file? Provide, please, the current date.}}
\lipsum[1]
\textcolor{blue}{\subsection{Has there been any monetary profit from the creation of this dataset?}}
\lipsum[1]
\textcolor{blue}{\subsection{Any other comments? }}
You may remove this question if not.
\\
\section{Composition}
\textcolor{blue}{\subsection{Is there any synthetic data in the dataset? If so, in what percentage?}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there multiple types of instances or is there just one type? Please specify the type(s), e.g. Raw data, preprocessed, symbolic.}}
\lipsum[1]
\textcolor{blue}{\subsection{What do the instances (of each type, if appropriate) that comprise the data set represent? (e.g. documents, photos, people, countries).}}
\lipsum[1]
\textcolor{blue}{\subsection{How many instances (of each type, if appropriate) are there in total?}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the dataset contain all possible instances or is it just a sample of a larger set? i.e. Is the dataset different than an original one due to the preprocessing process? In case this dataset is a subset of another one, is the original dataset available?}}
\lipsum[1]
\textcolor{blue}{\subsection{Is there a label or a target associated with each of the instances? If so, please provide a description.}}
\lipsum[1]
\textcolor{blue}{\subsection{What is the format of the data? e.g. .json, .xml, .csv .}}
\lipsum[1]
\textcolor{blue}{\subsection{Is any information missing from individual instances? If so, please provide a description, explaining why this information is missing (e.g. because it was unavailable). This does not include intentionally removed information, but might include, e.g. redacted text.}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there any errors, sources of noise, or redundancies in the dataset? If so, please provide a description. Do not include missing information here.}}
\lipsum[1]
\textcolor{blue}{\subsection{Is there any verification that guarantees there is not institutionalization of unfair biases? Both regarding the dataset itself and the potential algorithms that could use it.}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there recommended data splits, e.g. training, development/validation, testing? If so, please provide a description of these splits explaining the rationale behind them.}}
\lipsum[1]
\textcolor{blue}{\subsection{Is the dataset self-contained, or does it link to or otherwise rely on external resources? e.g., websites, tweets, other datasets. If it links to or relies on external resources, a) Are there any guarantees that they will exist, and remain constant over time? b) Are there official archival versions of the complete dataset? i.e. including the external resources as they existed at the time the dataset was created. c) Are there any restrictions (e.g. licenses, fees) associated with any of the external resources that might apply to a future user? Please provide descriptions of all external resources and any restrictions associated with them, as well as links or other access points, if appropriate.}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the dataset contain data that might be considered confidential? e.g. data that is protected by legal privilege or by doctor patient confidentiality, data that includes the content of individuals non-public communications. If so, please provide a description.}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the dataset contain data that, if viewed directly, might be offensive, insulting, threatening, or might otherwise cause anxiety? If so, please describe why.}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the dataset relate to people? If so, please specify a) Whether the dataset identifies subpopulations or not. b) Whether the dataset identifies indivual people or not. c) Whether it contains information that could vulnerate any individuals or their rights. c) Any other verified information on the topic that can be provided.}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the dataset cover included languages equally?}}
\lipsum[1]
\textcolor{blue}{\subsection{Is there any evidence that the data may be somehow biased? i.e. towards gender, ethics, beliefs.}}
\lipsum[1]
\textcolor{blue}{\subsection{Is the data made up of formal text, informal text or both equitably?}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the data contain incorrect language expressions on purpose? Does it contain slang terms? If that’s the case, please provide which instances of the data correspond to these.}}
\lipsum[1]
\textcolor{blue}{\subsection{Any other comments?}}
You may remove this question if not.
\\
\section{Collection Process}
\textcolor{blue}{\subsection{Where was the data collected at? Please include as much detail; i.e. country, city, community, entity and so on.
}}
\lipsum[1]
\textcolor{blue}{\subsection{If the dataset is a sample from a larger set, what was the sampling strategy? i.e. deterministic, probabilistic with specific sampling probabilities.}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there any guarantees that the acquisition of the data did not violate any law or anyone's rights?}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there any guarantees that prove the data is reliable?}}
\lipsum[1]
\textcolor{blue}{\subsection{Did the collection process involve the participation of individual people? If so, please report any information available regarding the following questions: Was the data collected from people directly? Did all the involved parts give their explicit consent? Is there any mechanism available to revoke this consent in the future, if desired?}}
\lipsum[1]
\textcolor{blue}{\subsection{Has an analysis of the potential impact of the dataset and its use on data subjects been conducted? i.e. a data protection impact analysis. If so, please provide a description of this analysis, including the outcomes, as well as a link or other access point to any supporting documentation.}}
\lipsum[1]
\textcolor{blue}{\subsection{Were any ethical review processes conducted?}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the data come from a single source or is it the result of a combination of data coming from different sources? In any case, please provide references.}}
\lipsum[1]
\textcolor{blue}{\subsection{If the same content was to be collected from a different source, would it be similar?}}
\lipsum[1]
\textcolor{blue}{\subsection{Please specify any other information regarding the collection process. i.e. Who collected the data, whether they were compensated or not, what mechanisms were used. Please, only include if verified.}}
\lipsum[1]
\\
\section{Preprocessing/Cleaning/Labelling}
\textcolor{blue}{\subsection{Please specify any information regarding the preprocessing that you may know (e.g. the person who created the dataset has somehow explained it) or be able to find (e.g. there exists and informational site). Please, only include if verified. i.e. Was there any mechanism applied to obtain a neutral language? Were all instances preprocessed the same way?}}
\lipsum[1]
\\
\section{Uses}
\textcolor{blue}{\subsection{Has the dataset been used already? If so, please provide a description.}}
\lipsum[1]
\textcolor{blue}{\subsection{Is there a repository that links to any or all papers or systems that use this dataset? If so, please provide a link or any other access point.}}
\lipsum[1]
\textcolor{blue}{\subsection{What (other) tasks could the dataset be used for? Please include your own intentions, if any.}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there tasks for which the dataset should not be used? If so, please provide a description.}}
\lipsum[1]
\textcolor{blue}{\subsection{Any other comments? i.e. Do the collection or preprocessing processes impact future uses?)}}
You may remove this question if not.
\\
\section{Distribution}
\textcolor{blue}{\subsection{Please specify the source where you got the dataset from.}}
\lipsum[1]
\textcolor{blue}{\subsection{When was the dataset first released?}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there any restrictions regarding the distribution and/or usage of this data in any particular geographic regions?}}
\lipsum[1]
\textcolor{blue}{\subsection{Is the dataset distributed under a copyright or other intellectual property (IP) license? And/or under applicable terms of use (ToU)? Please cite a verified source.}}
\lipsum[1]
\textcolor{blue}{\subsection{Any other comments? i.e. How has the data been distributed? Who has access to the dataset? When was the dataset first distributed? Are there any other regulations on the dataset?}}
You may remove this question if not.
\\
\section{Maintenance}
\textcolor{blue}{\subsection{Is there any verified manner of contacting the creator of the dataset?}}
\lipsum[1]
\textcolor{blue}{\subsection{Specify any limitations there might be to contributing to the dataset. i.e. Can anyone contribute to it? Can someone do it at all?}}
\lipsum[1]
\textcolor{blue}{\subsection{Has any erratum been notified?}}
\lipsum[1]
\textcolor{blue}{\subsection{Is there any verified information on whether the dataset will be updated in any form in the future? Is someone in charge of checking if any of the data has become irrelevant throughout time? If so, will it be removed or labeled somehow?}}
\lipsum[1]
\textcolor{blue}{\subsection{Is there any available log about the changes performed previously in the dataset?}}
\lipsum[1]
\textcolor{blue}{\subsection{Could changes to current legislation end the right-of-use of the dataset?}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there any lifelong learning updates, such as vocabulary enrichment, automatically developed?}}
\lipsum[1]
\textcolor{blue}{\subsection{Any other comments? i.e. Is there someone supporting/hosting/maintaining the dataset? If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances? )}}
You may remove this question if not.
\\
\textcolor{blue}{\subsection{Please provide any other information that might be relevant.}}
You may remove this question if there is nothing else to add.
\medskip
\bibliographystyle{unsrt}
\bibliography{references}
\end{document}