aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2020-11-19 11:23:53 +0100
committerMartin Czygan <martin.czygan@gmail.com>2020-11-19 11:23:53 +0100
commitbf44318a9cf3c28c464e8d9f94e6819bdf305368 (patch)
tree05e32f1bfe107fd5a9f00830e8cd9b7da8697e03 /README.md
parentc4d403f7f55ec0a9fee476dee637b8b44b7b7596 (diff)
downloadfuzzycat-bf44318a9cf3c28c464e8d9f94e6819bdf305368.tar.gz
fuzzycat-bf44318a9cf3c28c464e8d9f94e6819bdf305368.zip
update verification case list
Diffstat (limited to 'README.md')
-rw-r--r--README.md12
1 files changed, 10 insertions, 2 deletions
diff --git a/README.md b/README.md
index 342a03a..424876d 100644
--- a/README.md
+++ b/README.md
@@ -136,7 +136,7 @@ Notes on cadd28a version clustering (nysiis) and verification.
93240 OK.TITLE_AUTHOR_MATCH
```
-Cases
+#### Cases
* common title, "Books by Our Readers", https://fatcat.wiki/release/4uv5jsy5vnhdvnxvzmucqlksvq, https://fatcat.wiki/release/4uv5jsy5vnhdvnxvzmucqlksvq
* common title, "The Future of Imprisonment"
@@ -145,6 +145,7 @@ Cases
* common title, "ASMS News" (also different year)
* common title, "AMERICAN INSTITUTE OF INSTRUCTION"
* common title, "Contents lists"
+* common title, "Submissions"
* same, except DOI, but maybe the same item, after all? https://fatcat.wiki/release/kxgsbh66v5bwhobcaiuh4i7dwy, https://fatcat.wiki/release/thl7o44z3jgk3njdypixwrdbve
Authors may be messy:
@@ -153,7 +154,14 @@ Authors may be messy:
https://fatcat.wiki/release/2kpa6ynwjzhtbbokqyxcl25gmm,
https://fatcat.wiki/release/o4dh7w7nqvdknm4j336yrom4wy - may need to tokenize authors
-Possible improvements:
+A DOI prefix (10.1210, The Endocrine Society) may choose to include the same
+document in different publications:
+
+* https://fatcat.wiki/release/52lwj4ip3nbdbgrgk4uwolbjt4
+* https://fatcat.wiki/release/6tbrmc3pq5axzf3yhqayq256a4
+* https://fatcat.wiki/release/457lzlw7czeo7aspcyttccvyrq
+
+#### Possible fixes
* [ ] when title and authors match, check the year, and maybe the doi prefix; doi with the same prefix may not be duplicates
* [x] detect arxiv versions directly