From f4442f600b3f66704063ac91ce2769fa250751c9 Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Fri, 25 Jun 2021 16:34:13 +0200 Subject: docs: add stats --- python/notes/version_4.md | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) (limited to 'python/notes') diff --git a/python/notes/version_4.md b/python/notes/version_4.md index b669a58..97811e7 100644 --- a/python/notes/version_4.md +++ b/python/notes/version_4.md @@ -850,3 +850,36 @@ igyewr6er5epfozhk7dyfqa5tu igyewr6er5epfozhk7dyfqa5tu exact doi * 740,248,530 unique edges +---- + +# Stats + +``` +553414112 exact doi +75738037 strong jaccardauthors +66257136 exact pmid +19646986 strong slugtitleauthormatch +17202451 strong tokenizedauthors +3730080 exact arxiv +2798816 exact titleauthormatch +482811 strong versioneddoi +303336 strong pmiddoipair +279212 exact isbn +240405 exact workid +52678 strong customieeearxiv +43797 strong dataciterelatedid +29027 strong arxivversion +27150 exact pmcid +1652 strong figshareversion +832 strong titleartifact +10 strong custombsiundated +2 strong custombsisubdoc +``` + +* total unique edges: 740248530 +* matches by id: 623707690 +* matches though title/author (fuzzy) matching: 116540840 +* scholarly resources: +* linked open library titles: +* URLs extracted from corpus: +* sample ratio IA/URL from corpus: -- cgit v1.2.3