aboutsummaryrefslogtreecommitdiffstats
path: root/extra/mag/README.md
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-09-28 10:40:35 +0200
committerMartin Czygan <martin.czygan@gmail.com>2021-09-28 10:40:35 +0200
commitf75a3cd2683509cb0a090f669a911cb3155532bc (patch)
tree365254809d3b490569ba81439b08618a546abd69 /extra/mag/README.md
parent6b15378b95392a7b387c39d4ef126ad9a71ee4bb (diff)
downloadrefcat-f75a3cd2683509cb0a090f669a911cb3155532bc.tar.gz
refcat-f75a3cd2683509cb0a090f669a911cb3155532bc.zip
extra: turn mag reference table to doi-to-doi mapping
Diffstat (limited to 'extra/mag/README.md')
-rw-r--r--extra/mag/README.md15
1 files changed, 15 insertions, 0 deletions
diff --git a/extra/mag/README.md b/extra/mag/README.md
new file mode 100644
index 0000000..8f8a7fb
--- /dev/null
+++ b/extra/mag/README.md
@@ -0,0 +1,15 @@
+# Notes on MAG
+
+Using: https://archive.org/details/mag-2021-06-07
+
+* 260M entities
+* 96.7M with DOI
+
+In order to generate a doi-to-doi version, we need to:
+
+* create a mapping from id-to-doi
+* apply the mapping to the PaperReferences file
+
+```sh
+$ time unpigz -c /magna/data/mag-2021-06-07/Papers.txt.gz | cut -f1,3 | awk '$2 != ""' | mkocidb -o /sandcrawler-db/tmp-refcat/mag_id_doi.db
+```