From 4815943600cfeb7ad4a50f48a21b59df4c369b7c Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Thu, 27 Aug 2020 17:26:33 +0200 Subject: README: add performance data point --- README.md | 16 ++++++++++++++++ projects/grobid_refs/README.md | 6 ++++++ 2 files changed, 22 insertions(+) create mode 100644 projects/grobid_refs/README.md diff --git a/README.md b/README.md index 2a8721e..7c6468d 100644 --- a/README.md +++ b/README.md @@ -26,3 +26,19 @@ specific code using the fatcat openapi client. ## Matching approaches ![](static/approach.png) + +## Performance data point + +Candidate generation via elasticsearch, 40 parallel queries, sustained speed at +about 17857 queries per hour, that is around 5 queries/s. + +``` +$ time cat ~/data/researchgate/x04 | \ + parallel -j40 --pipe -N 1 ./fatcatx_rg_unmatched.py - \ + > ~/data/researchgate/x04_results.ndj +... +real 3409m16.442s +user 29177m5.516s +sys 4927m3.277s +``` + diff --git a/projects/grobid_refs/README.md b/projects/grobid_refs/README.md new file mode 100644 index 0000000..13ca3fc --- /dev/null +++ b/projects/grobid_refs/README.md @@ -0,0 +1,6 @@ +# Grobid refs + +References extracted from [grobid](https://grobid.readthedocs.io). + +Example grobid output: [grobid.tei.xml](grobid.tei.xml). + -- cgit v1.2.3