From a67f6863c165f18f1a333309c316157312505b20 Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Thu, 22 Jul 2021 04:12:51 +0200 Subject: start mag notes --- python/notes/mag_notes.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 python/notes/mag_notes.md (limited to 'python/notes') diff --git a/python/notes/mag_notes.md b/python/notes/mag_notes.md new file mode 100644 index 0000000..6341676 --- /dev/null +++ b/python/notes/mag_notes.md @@ -0,0 +1,21 @@ +# MAG 2020 Notes + +* /magna/data/mag-2020-06-25 +* 1637615789 + +``` +$ unpigz -c PaperReferences.txt.gz| pv -l | wc -l +1637615789 +``` + +* 238M rows in the papers table (238938563) +* only 3516356 DOI? + +``` +$ zstdcat -T0 Papers.txt.zst | pv -l | LC_ALL=C cut -f3 | LC_ALL=C grep -v ^$ > mag_doi_list.txt + 238M 0:06:12 [ 641k/s] + +$ wc -l mag_doi_list.txt +3516356 mag_doi_list.txt +``` + -- cgit v1.2.3