From e8484d7a78cbbe2905f45002a7f15adbbfcb86ef Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Tue, 28 Sep 2021 17:16:06 +0200 Subject: mag: update notes --- extra/mag/README.md | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'extra/mag') diff --git a/extra/mag/README.md b/extra/mag/README.md index cd4ec70..1c9b063 100644 --- a/extra/mag/README.md +++ b/extra/mag/README.md @@ -79,3 +79,34 @@ Creating lowercase, unique sorted version: ``` $ time zstdcat -T0 doi_refs.tsv.zst| tr '[[:upper:]]' '[[:lower:]]' | LC_ALL=C sort -u -T /sandcrawler-db/tmp-refcat/ -S50% > doi_refs_lower_sorted.tsv.zst ``` + +## Synopsis + +* OCI +* MAG +* refcat + + +refcat: + +``` +$ zstdcat -T0 /magna/refcat/2021-07-28/BrefDOIOnly/date-2021-07-28.tsv.zst| pv -l | wc -l +1.52G 0:09:30 [2.66M/s] [ <=> ] +1516746047 +``` + +slight filtering: + +``` +zstdcat -T0 /magna/refcat/2021-07-28/BrefDOIOnly/date-2021-07-28.tsv.zst| pv -l | LC_ALL=C grep -c ^1 +1482827332 +``` + + +oci: + +``` +$ zstdcat -T0 /magna/refcat/2021-07-28/COCIDOIOnly/date-2021-07-28.tsv.zst| pv -l | wc -l +1.09G 0:07:12 [2.53M/s] +1094394799 +``` -- cgit v1.2.3