aboutsummaryrefslogtreecommitdiffstats
path: root/python/notes
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-07-22 04:12:51 +0200
committerMartin Czygan <martin.czygan@gmail.com>2021-07-22 04:12:51 +0200
commita67f6863c165f18f1a333309c316157312505b20 (patch)
tree1bfee604454698b1cd500cfa1a7f7cdb11418dbb /python/notes
parent8d56b68dfc635f3011207ec5c00f08799efc4aab (diff)
downloadrefcat-a67f6863c165f18f1a333309c316157312505b20.tar.gz
refcat-a67f6863c165f18f1a333309c316157312505b20.zip
start mag notes
Diffstat (limited to 'python/notes')
-rw-r--r--python/notes/mag_notes.md21
1 files changed, 21 insertions, 0 deletions
diff --git a/python/notes/mag_notes.md b/python/notes/mag_notes.md
new file mode 100644
index 0000000..6341676
--- /dev/null
+++ b/python/notes/mag_notes.md
@@ -0,0 +1,21 @@
+# MAG 2020 Notes
+
+* /magna/data/mag-2020-06-25
+* 1637615789
+
+```
+$ unpigz -c PaperReferences.txt.gz| pv -l | wc -l
+1637615789
+```
+
+* 238M rows in the papers table (238938563)
+* only 3516356 DOI?
+
+```
+$ zstdcat -T0 Papers.txt.zst | pv -l | LC_ALL=C cut -f3 | LC_ALL=C grep -v ^$ > mag_doi_list.txt
+ 238M 0:06:12 [ 641k/s]
+
+$ wc -l mag_doi_list.txt
+3516356 mag_doi_list.txt
+```
+