diff options
-rw-r--r-- | python/notes/mag_notes.md | 21 |
1 files changed, 21 insertions, 0 deletions
diff --git a/python/notes/mag_notes.md b/python/notes/mag_notes.md new file mode 100644 index 0000000..6341676 --- /dev/null +++ b/python/notes/mag_notes.md @@ -0,0 +1,21 @@ +# MAG 2020 Notes + +* /magna/data/mag-2020-06-25 +* 1637615789 + +``` +$ unpigz -c PaperReferences.txt.gz| pv -l | wc -l +1637615789 +``` + +* 238M rows in the papers table (238938563) +* only 3516356 DOI? + +``` +$ zstdcat -T0 Papers.txt.zst | pv -l | LC_ALL=C cut -f3 | LC_ALL=C grep -v ^$ > mag_doi_list.txt + 238M 0:06:12 [ 641k/s] + +$ wc -l mag_doi_list.txt +3516356 mag_doi_list.txt +``` + |