From 7f81b1f2f1067b1835a4882f3ec8cf39fd9fb611 Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Wed, 9 Dec 2020 23:03:58 +0100 Subject: update stats --- README.md | 53 ++++++++++++++++++++++++++++++----------------- notes/2020_11_testruns.md | 29 +++++++++++++++----------- 2 files changed, 51 insertions(+), 31 deletions(-) diff --git a/README.md b/README.md index 1c41f4e..fbe144e 100644 --- a/README.md +++ b/README.md @@ -147,25 +147,40 @@ Notes on cadd28a version clustering (nysiis) and verification. * 665447 verification pairs ``` - 176 Miss.APPENDIX - 25 Miss.ARXIV_VERSION - 12082 Miss.BLACKLISTED - 5 Miss.BLACKLISTED_FRAGMENT - 46733 Miss.BOOK_CHAPTER - 1567 Miss.COMPONENT - 47691 Miss.CONTRIB_INTERSECTION_EMPTY - 30806 Miss.DATASET_DOI - 1 Miss.NUM_DIFF - 157718 Miss.RELEASE_TYPE - 16263 Miss.SHORT_TITLE - 6013 Miss.SUBTITLE - 57 Miss.TITLE_FILENAME - 148755 Miss.YEAR - 93 OK.ARXIV_VERSION - 88294 OK.DUMMY - 110 OK.PREPRINT_PUBLISHED - 15818 OK.SLUG_TITLE_AUTHOR_MATCH - 93240 OK.TITLE_AUTHOR_MATCH +3578378 OK.TITLE_AUTHOR_MATCH +2989618 Miss.CONTRIB_INTERSECTION_EMPTY +2731528 OK.SLUG_TITLE_AUTHOR_MATCH +2654787 Miss.YEAR +2434532 OK.WORK_ID +2050468 OK.DUMMY +1619330 Miss.SHARED_DOI_PREFIX +1145571 Miss.BOOK_CHAPTER +1023925 Miss.DATASET_DOI + 934075 OK.DATACITE_RELATED_ID + 868951 OK.DATACITE_VERSION + 704154 OK.FIGSHARE_VERSION + 682784 Miss.RELEASE_TYPE + 607117 OK.TOKENIZED_AUTHORS + 298928 OK.PREPRINT_PUBLISHED + 270658 Miss.SUBTITLE + 227537 Miss.SHORT_TITLE + 196402 Miss.COMPONENT + 163158 Miss.CUSTOM_PREFIX_10_5860_CHOICE_REVIEW + 122614 Miss.CUSTOM_PREFIX_10_7916 + 79687 OK.CUSTOM_IEEE_ARXIV + 69648 OK.PMID_DOI_PAIR + 46649 Miss.CUSTOM_PREFIX_10_14288 + 38598 OK.CUSTOM_BSI_UNDATED + 15465 OK.DOI + 13393 Miss.CUSTOM_IOP_MA_PATTERN + 10378 Miss.CONTAINER + 3045 Miss.BLACKLISTED + 2504 Miss.BLACKLISTED_FRAGMENT + 1574 Miss.TITLE_FILENAME + 1273 Miss.APPENDIX + 104 Miss.NUM_DIFF + 4 OK.ARXIV_VERSION + ``` #### Cases diff --git a/notes/2020_11_testruns.md b/notes/2020_11_testruns.md index e801df3..3ee4340 100644 --- a/notes/2020_11_testruns.md +++ b/notes/2020_11_testruns.md @@ -58,24 +58,28 @@ The cluster size distribution is: Preliminary case distribution: ``` -4017207 Miss.CONTRIB_INTERSECTION_EMPTY -3795537 OK.TITLE_AUTHOR_MATCH -3233073 OK.DUMMY -2898149 OK.SLUG_TITLE_AUTHOR_MATCH -2450884 Miss.YEAR -1402770 OK.ARXIV_VERSION +3578378 OK.TITLE_AUTHOR_MATCH +2989618 Miss.CONTRIB_INTERSECTION_EMPTY +2731528 OK.SLUG_TITLE_AUTHOR_MATCH +2654787 Miss.YEAR +2434532 OK.WORK_ID +2050468 OK.DUMMY +1619330 Miss.SHARED_DOI_PREFIX 1145571 Miss.BOOK_CHAPTER 1023925 Miss.DATASET_DOI 934075 OK.DATACITE_RELATED_ID 868951 OK.DATACITE_VERSION - 771091 OK.TOKENIZED_AUTHORS - 724727 OK.PREPRINT_PUBLISHED 704154 OK.FIGSHARE_VERSION 682784 Miss.RELEASE_TYPE - 273969 Miss.SUBTITLE - 227564 Miss.SHORT_TITLE + 607117 OK.TOKENIZED_AUTHORS + 298928 OK.PREPRINT_PUBLISHED + 270658 Miss.SUBTITLE + 227537 Miss.SHORT_TITLE 196402 Miss.COMPONENT - 102990 OK.CUSTOM_IEEE_ARXIV + 163158 Miss.CUSTOM_PREFIX_10_5860_CHOICE_REVIEW + 122614 Miss.CUSTOM_PREFIX_10_7916 + 79687 OK.CUSTOM_IEEE_ARXIV + 69648 OK.PMID_DOI_PAIR 46649 Miss.CUSTOM_PREFIX_10_14288 38598 OK.CUSTOM_BSI_UNDATED 15465 OK.DOI @@ -83,9 +87,10 @@ Preliminary case distribution: 10378 Miss.CONTAINER 3045 Miss.BLACKLISTED 2504 Miss.BLACKLISTED_FRAGMENT - 1605 Miss.TITLE_FILENAME + 1574 Miss.TITLE_FILENAME 1273 Miss.APPENDIX 104 Miss.NUM_DIFF + 4 OK.ARXIV_VERSION ``` ## Case Mining -- cgit v1.2.3