doc: tr, fix typos

author: Martin Czygan <martin.czygan@gmail.com> 2021-09-27 22:05:21 +0200
committer: Martin Czygan <martin.czygan@gmail.com> 2021-09-27 22:05:21 +0200
commit: b0c0677ce81c5e98904898683b9c59ae37207404 (patch)
tree: b6e9877f19c9317d36a5bea530b0798611f4fa5e /docs
parent: b5a5938fb1ff93a4b673e2db7069072f6a17c52d (diff)
download: refcat-b0c0677ce81c5e98904898683b9c59ae37207404.tar.gz
refcat-b0c0677ce81c5e98904898683b9c59ae37207404.zip
2 files changed, 9 insertions, 9 deletions
diff --git a/docs/TR-20210808100000-IA-WDS-REFCAT/main.pdf b/docs/TR-20210808100000-IA-WDS-REFCAT/main.pdf
index 076b8f3..830f25f 100644
--- a/docs/TR-20210808100000-IA-WDS-REFCAT/main.pdf
+++ b/docs/TR-20210808100000-IA-WDS-REFCAT/main.pdf
diff --git a/docs/TR-20210808100000-IA-WDS-REFCAT/main.tex b/docs/TR-20210808100000-IA-WDS-REFCAT/main.tex
index 35d73b1..0543612 100644
--- a/docs/TR-20210808100000-IA-WDS-REFCAT/main.tex
+++ b/docs/TR-20210808100000-IA-WDS-REFCAT/main.tex
@@ -98,13 +98,13 @@ reference entries, protocols or datasets. References can be extracted manually
 or through more automated methods, by accessing relevant metadata or structured
 data extraction from full text documents. Automated methods offer the benefits
 of scalability. The completeness of bibliographic metadata in references ranges
-from documents with one or more persistant identifiers to raw, potentially
+from documents with one or more persistent identifiers to raw, potentially
 unclean strings partially describing a scholarly artifact.
 
 \section{Related Work}
 
 Two typical problems in citation graph development are related to data
-aquisition and citation matching. Data acquisition itself can take different
+acquisition and citation matching. Data acquisition itself can take different
 forms: bibliographic metadata can contain explicit reference data as provided
 by publishers and aggregators; this data can be relatively consistent when
 looked at per source, but may vary in style and comprehensiveness when looked
@@ -365,7 +365,7 @@ which is implemented for \emph{release entity}\footnote{\href{https://guide.fatc
 domain dependent rule based verification, able to identify different versions
 of a publication, preprint-published pairs and documents, which are
 are similar by various metrics calculated over title and author fields. The fuzzy matching
-approach is applied on all reference documents without identifier (a title is
+approach is applied on all reference documents without any identifier (a title is
 currently required).
 
 We currently implement performance sensitive parts in the
@@ -383,7 +383,7 @@ GNU \emph{sort}~\citep{mcilroy1971research}.
 During a last processing step, we fuse reference matches and unmatched items
 into a single, indexable file. This step includes deduplication of different
 matching methods (e.g. prefer exact matches over fuzzy matches). This file is
-indexed into an search index and serves both matched and unmatched references
+indexed into a search index and serves both matched and unmatched references
 for the web application, allowing for further collection of feedback on match
 quality and possible improvements.
 
@@ -405,11 +405,11 @@ As other dataset in this field we expect this dataset to be iterated upon.
 
 \begin{itemize}
 	\item The fatcat catalog updates its metadata
-	      continously\footnote{A changelog can currenly be followed here:
+	      continuously\footnote{A changelog can currently be followed here:
 		      \href{https://fatcat.wiki/changelog}{https://fatcat.wiki/changelog}.} and web crawls are conducted
 	      regularly. Current processing pipelines cover raw reference snapshot
-	      creation and derivation of the graph structure, which allows to rerun
-	      processing based on updated data as it becomes available.
+	      creation and derivation of the graph structure, which allows to rerun the
+	      processing pipeline based on updated data as it becomes available.
 
 	\item Metadata extraction from PDFs depends on supervised machine learning
 	      models, which in turn depend on available training datasets. With additional crawls and
@@ -517,8 +517,8 @@ more easily (see~Table~\ref{table:matches}).
 		\caption{Table of match counts (top 25), reference provenance, match
 			status and match reason. Provenance currently can name the raw
 			origin (e.g. \emph{crossref}) or the method (e.g. \emph{fuzzy}). The match reason
-			identifier encode a specific rule in the domain dependent
-			verification process and are included for completeness - we do not
+			identifier encodes a specific rule in the domain dependent
+			verification process and is included for completeness - we do not
 			include the details of each rule in this report.}
 		\label{table:matches}
 	\end{center}
author	Martin Czygan <martin.czygan@gmail.com>	2021-09-27 22:05:21 +0200
committer	Martin Czygan <martin.czygan@gmail.com>	2021-09-27 22:05:21 +0200
commit	b0c0677ce81c5e98904898683b9c59ae37207404 (patch)
tree	b6e9877f19c9317d36a5bea530b0798611f4fa5e /docs
parent	b5a5938fb1ff93a4b673e2db7069072f6a17c52d (diff)
download	refcat-b0c0677ce81c5e98904898683b9c59ae37207404.tar.gz refcat-b0c0677ce81c5e98904898683b9c59ae37207404.zip