diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2021-07-22 04:12:51 +0200 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2021-07-22 04:12:51 +0200 |
commit | a67f6863c165f18f1a333309c316157312505b20 (patch) | |
tree | 1bfee604454698b1cd500cfa1a7f7cdb11418dbb /python | |
parent | 8d56b68dfc635f3011207ec5c00f08799efc4aab (diff) | |
download | refcat-a67f6863c165f18f1a333309c316157312505b20.tar.gz refcat-a67f6863c165f18f1a333309c316157312505b20.zip |
start mag notes
Diffstat (limited to 'python')
-rw-r--r-- | python/notes/mag_notes.md | 21 |
1 files changed, 21 insertions, 0 deletions
diff --git a/python/notes/mag_notes.md b/python/notes/mag_notes.md new file mode 100644 index 0000000..6341676 --- /dev/null +++ b/python/notes/mag_notes.md @@ -0,0 +1,21 @@ +# MAG 2020 Notes + +* /magna/data/mag-2020-06-25 +* 1637615789 + +``` +$ unpigz -c PaperReferences.txt.gz| pv -l | wc -l +1637615789 +``` + +* 238M rows in the papers table (238938563) +* only 3516356 DOI? + +``` +$ zstdcat -T0 Papers.txt.zst | pv -l | LC_ALL=C cut -f3 | LC_ALL=C grep -v ^$ > mag_doi_list.txt + 238M 0:06:12 [ 641k/s] + +$ wc -l mag_doi_list.txt +3516356 mag_doi_list.txt +``` + |