aboutsummaryrefslogtreecommitdiffstats
path: root/extra/mag/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'extra/mag/README.md')
-rw-r--r--extra/mag/README.md15
1 files changed, 15 insertions, 0 deletions
diff --git a/extra/mag/README.md b/extra/mag/README.md
new file mode 100644
index 0000000..8f8a7fb
--- /dev/null
+++ b/extra/mag/README.md
@@ -0,0 +1,15 @@
+# Notes on MAG
+
+Using: https://archive.org/details/mag-2021-06-07
+
+* 260M entities
+* 96.7M with DOI
+
+In order to generate a doi-to-doi version, we need to:
+
+* create a mapping from id-to-doi
+* apply the mapping to the PaperReferences file
+
+```sh
+$ time unpigz -c /magna/data/mag-2021-06-07/Papers.txt.gz | cut -f1,3 | awk '$2 != ""' | mkocidb -o /sandcrawler-db/tmp-refcat/mag_id_doi.db
+```