Merge branch 'bnewbold-report' into 'master'

feedback and tweaks on report See merge request webgroup/refcat!4
author: Martin Czygan <martin@archive.org> 2021-08-19 16:13:22 +0000
committer: Martin Czygan <martin@archive.org> 2021-08-19 16:13:22 +0000
commit: 1726c78c4a37cd6da3738fc65f51dd174443fa7f (patch)
tree: a8fc2f5f7d7fe0cc5676b49f91f8976c59e94be4 /docs/TR-20210808100000-IA-WDS-REFCAT/main.tex
parent: 240299fcc23f4a3dfe5097c62fbd0074986062f0 (diff)
parent: cb6d5f3b17a201d57b26bea11b2728300c370c2c (diff)
download: refcat-1726c78c4a37cd6da3738fc65f51dd174443fa7f.tar.gz
refcat-1726c78c4a37cd6da3738fc65f51dd174443fa7f.zip
1 files changed, 11 insertions, 11 deletions
diff --git a/docs/TR-20210808100000-IA-WDS-REFCAT/main.tex b/docs/TR-20210808100000-IA-WDS-REFCAT/main.tex
index 76f1456..a5536d8 100644
--- a/docs/TR-20210808100000-IA-WDS-REFCAT/main.tex
+++ b/docs/TR-20210808100000-IA-WDS-REFCAT/main.tex
@@ -18,7 +18,7 @@
 
 \begin{document}
 
-\title{Fatcat Reference Dataset}
+\title{REFCAT: The Fatcat Citation Graph}
 
 \author{Martin Czygan \\
 	\\
@@ -75,12 +75,12 @@ were first devised, living on in existing commercial knowledge bases today.
 Open alternatives were started such as the Open Citations Corpus (OCC) in 2010
 - the first version of which contained 6,325,178 individual
 references\citep{shotton2013publishing}. Other notable early projects
-include CiteSeerX\citep{wu2019citeseerx} and CitEc\citep{CitEc}. The last
+include CiteSeerX\citep{wu2019citeseerx} and CitEc\footnote{\url{https://citec.repec.org}}. The last
 decade has seen the emergence of more openly available, large scale
 citation projects like Microsoft Academic\citep{sinha2015overview} or the
-Initiative for Open Citations\citep{i4oc}\citep{shotton2018funders}. In 2021,
-according to \citep{hutchins2021tipping} over 1B citations are publicly
-available, marking a tipping point for this category of data.
+Initiative for Open Citations\footnote{\url{https://i4oc.org}}\citep{shotton2018funders}.
+In 2021, over one billion citations are publicly available, marking a ``tipping point''
+for this category of data\citep{hutchins2021tipping}.
 
 \section{Related Work}
 
@@ -117,16 +117,16 @@ citations is not expected to shrink in the future.
 We release the first version of the \emph{refcat} dataset in an format used
 internally for storage and to serve queries (and which we call \emph{biblioref}
 or \emph{bref} for short). The dataset includes metadata from fatcat, the
-Open Library Project and inbound links from the English Wikipedia. The fatcat
+Open Library project and inbound links from the English Wikipedia. The fatcat
 project itself aggregates data from variety of open data sources, such as
-Crossref\citep{crossref}, PubMed\citep{canese2013pubmed},
-DataCite\citep{brase2009datacite}, DOAJ\citep{doaj}, dblp\citep{ley2002dblp} and others,
+Crossref\footnote{\url{https://crossref.org}}, PubMed\footnote{\url{https://pubmed.ncbi.nlm.nih.gov/}},
+DataCite\footnote{\url{https://datacite.org}}, Directory of Open Access Jourals (DOAJ)\footnote{\url{https://doaj.org}}, dblp\citep{ley2002dblp} and others,
 as well as metadata generated from analysis of data preserved at the Internet
 Archive and active crawls of publication sites on the web.
 
 The dataset is
 integrated into the \href{https://fatcat.wiki}{fatcat website} and allows users
-to explore inbound and outbound references\cite{fatcatguidereferencegraph}.
+to explore inbound and outbound references\footnote{\url{https://guide.fatcat.wiki/reference_graph.html}}.
 
 The format records source and target (fatcat release and work) identifiers, a
 few attributes from the metadata (such as year or release stage) as well as
@@ -196,7 +196,7 @@ Table~\ref{table:fields}.
 			\toprule
 			\bf{Fields}                                                                                     & \bf{Percentage} \\
 			\midrule
-			\multicolumn{1}{l}{CN $\cdot$ RN $\cdot$ P $\cdot$ T $\cdot$  U $\cdot$  V $\cdot$ Y}           & 14\%            \\
+			\multicolumn{1}{l}{CN $\cdot$ CRN $\cdot$ P $\cdot$ T $\cdot$  U $\cdot$  V $\cdot$ Y}          & 14\%            \\
 			\multicolumn{1}{l}{\textbf{DOI}}                                                                & 14\%            \\
 			\multicolumn{1}{l}{CN $\cdot$ CRN $\cdot$ IS $\cdot$ P $\cdot$ T $\cdot$ U $\cdot$ V $\cdot$ Y} & 5\%             \\
 			\multicolumn{1}{l}{CN $\cdot$ CRN $\cdot$ \textbf{DOI} $\cdot$ U $\cdot$ V $\cdot$ Y}           & 4\%             \\
@@ -225,7 +225,7 @@ our target schema or perform
 additional operations such as deduplication or fusion of matched and unmatched references.
 
 The key derivation can be exact (via an identifier like DOI, PMID, etc) or
-based on a value normalization, like slugifying a title string. For identifier
+based on a value normalization, like ``slugifying'' a title string. For identifier
 based matches we can generate the target schema directly.  For fuzzy matching
 candidates, we pass possible match pairs through a verification procedure,
 which is implemented for \emph{release entity}\footnote{\url{https://guide.fatcat.wiki/entity_release.html}.} pairs. This procedure is a
author	Martin Czygan <martin@archive.org>	2021-08-19 16:13:22 +0000
committer	Martin Czygan <martin@archive.org>	2021-08-19 16:13:22 +0000
commit	1726c78c4a37cd6da3738fc65f51dd174443fa7f (patch)
tree	a8fc2f5f7d7fe0cc5676b49f91f8976c59e94be4 /docs/TR-20210808100000-IA-WDS-REFCAT/main.tex
parent	240299fcc23f4a3dfe5097c62fbd0074986062f0 (diff)
parent	cb6d5f3b17a201d57b26bea11b2728300c370c2c (diff)
download	refcat-1726c78c4a37cd6da3738fc65f51dd174443fa7f.tar.gz refcat-1726c78c4a37cd6da3738fc65f51dd174443fa7f.zip