aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2020-12-09 23:03:58 +0100
committerMartin Czygan <martin.czygan@gmail.com>2020-12-09 23:03:58 +0100
commit7f81b1f2f1067b1835a4882f3ec8cf39fd9fb611 (patch)
treeb02ea1da9a07c318f763c7f8c49b499efff6492e
parent9d707a0203ac3aaf17e266a0f5a934b5f9e2dbbf (diff)
downloadfuzzycat-7f81b1f2f1067b1835a4882f3ec8cf39fd9fb611.tar.gz
fuzzycat-7f81b1f2f1067b1835a4882f3ec8cf39fd9fb611.zip
update stats
-rw-r--r--README.md53
-rw-r--r--notes/2020_11_testruns.md29
2 files changed, 51 insertions, 31 deletions
diff --git a/README.md b/README.md
index 1c41f4e..fbe144e 100644
--- a/README.md
+++ b/README.md
@@ -147,25 +147,40 @@ Notes on cadd28a version clustering (nysiis) and verification.
* 665447 verification pairs
```
- 176 Miss.APPENDIX
- 25 Miss.ARXIV_VERSION
- 12082 Miss.BLACKLISTED
- 5 Miss.BLACKLISTED_FRAGMENT
- 46733 Miss.BOOK_CHAPTER
- 1567 Miss.COMPONENT
- 47691 Miss.CONTRIB_INTERSECTION_EMPTY
- 30806 Miss.DATASET_DOI
- 1 Miss.NUM_DIFF
- 157718 Miss.RELEASE_TYPE
- 16263 Miss.SHORT_TITLE
- 6013 Miss.SUBTITLE
- 57 Miss.TITLE_FILENAME
- 148755 Miss.YEAR
- 93 OK.ARXIV_VERSION
- 88294 OK.DUMMY
- 110 OK.PREPRINT_PUBLISHED
- 15818 OK.SLUG_TITLE_AUTHOR_MATCH
- 93240 OK.TITLE_AUTHOR_MATCH
+3578378 OK.TITLE_AUTHOR_MATCH
+2989618 Miss.CONTRIB_INTERSECTION_EMPTY
+2731528 OK.SLUG_TITLE_AUTHOR_MATCH
+2654787 Miss.YEAR
+2434532 OK.WORK_ID
+2050468 OK.DUMMY
+1619330 Miss.SHARED_DOI_PREFIX
+1145571 Miss.BOOK_CHAPTER
+1023925 Miss.DATASET_DOI
+ 934075 OK.DATACITE_RELATED_ID
+ 868951 OK.DATACITE_VERSION
+ 704154 OK.FIGSHARE_VERSION
+ 682784 Miss.RELEASE_TYPE
+ 607117 OK.TOKENIZED_AUTHORS
+ 298928 OK.PREPRINT_PUBLISHED
+ 270658 Miss.SUBTITLE
+ 227537 Miss.SHORT_TITLE
+ 196402 Miss.COMPONENT
+ 163158 Miss.CUSTOM_PREFIX_10_5860_CHOICE_REVIEW
+ 122614 Miss.CUSTOM_PREFIX_10_7916
+ 79687 OK.CUSTOM_IEEE_ARXIV
+ 69648 OK.PMID_DOI_PAIR
+ 46649 Miss.CUSTOM_PREFIX_10_14288
+ 38598 OK.CUSTOM_BSI_UNDATED
+ 15465 OK.DOI
+ 13393 Miss.CUSTOM_IOP_MA_PATTERN
+ 10378 Miss.CONTAINER
+ 3045 Miss.BLACKLISTED
+ 2504 Miss.BLACKLISTED_FRAGMENT
+ 1574 Miss.TITLE_FILENAME
+ 1273 Miss.APPENDIX
+ 104 Miss.NUM_DIFF
+ 4 OK.ARXIV_VERSION
+
```
#### Cases
diff --git a/notes/2020_11_testruns.md b/notes/2020_11_testruns.md
index e801df3..3ee4340 100644
--- a/notes/2020_11_testruns.md
+++ b/notes/2020_11_testruns.md
@@ -58,24 +58,28 @@ The cluster size distribution is:
Preliminary case distribution:
```
-4017207 Miss.CONTRIB_INTERSECTION_EMPTY
-3795537 OK.TITLE_AUTHOR_MATCH
-3233073 OK.DUMMY
-2898149 OK.SLUG_TITLE_AUTHOR_MATCH
-2450884 Miss.YEAR
-1402770 OK.ARXIV_VERSION
+3578378 OK.TITLE_AUTHOR_MATCH
+2989618 Miss.CONTRIB_INTERSECTION_EMPTY
+2731528 OK.SLUG_TITLE_AUTHOR_MATCH
+2654787 Miss.YEAR
+2434532 OK.WORK_ID
+2050468 OK.DUMMY
+1619330 Miss.SHARED_DOI_PREFIX
1145571 Miss.BOOK_CHAPTER
1023925 Miss.DATASET_DOI
934075 OK.DATACITE_RELATED_ID
868951 OK.DATACITE_VERSION
- 771091 OK.TOKENIZED_AUTHORS
- 724727 OK.PREPRINT_PUBLISHED
704154 OK.FIGSHARE_VERSION
682784 Miss.RELEASE_TYPE
- 273969 Miss.SUBTITLE
- 227564 Miss.SHORT_TITLE
+ 607117 OK.TOKENIZED_AUTHORS
+ 298928 OK.PREPRINT_PUBLISHED
+ 270658 Miss.SUBTITLE
+ 227537 Miss.SHORT_TITLE
196402 Miss.COMPONENT
- 102990 OK.CUSTOM_IEEE_ARXIV
+ 163158 Miss.CUSTOM_PREFIX_10_5860_CHOICE_REVIEW
+ 122614 Miss.CUSTOM_PREFIX_10_7916
+ 79687 OK.CUSTOM_IEEE_ARXIV
+ 69648 OK.PMID_DOI_PAIR
46649 Miss.CUSTOM_PREFIX_10_14288
38598 OK.CUSTOM_BSI_UNDATED
15465 OK.DOI
@@ -83,9 +87,10 @@ Preliminary case distribution:
10378 Miss.CONTAINER
3045 Miss.BLACKLISTED
2504 Miss.BLACKLISTED_FRAGMENT
- 1605 Miss.TITLE_FILENAME
+ 1574 Miss.TITLE_FILENAME
1273 Miss.APPENDIX
104 Miss.NUM_DIFF
+ 4 OK.ARXIV_VERSION
```
## Case Mining