aboutsummaryrefslogtreecommitdiffstats
path: root/notes/todo_manual_review.md
diff options
context:
space:
mode:
authorMartin Czygan <martin@archive.org>2021-09-21 18:34:31 +0000
committerMartin Czygan <martin@archive.org>2021-09-21 18:34:31 +0000
commita37404f30b2c1afa0b46ee30a5b59d7312c119d0 (patch)
tree8f38d396909049971559a5ab574561960d6d5f22 /notes/todo_manual_review.md
parentc587a084defe54103aa147b7ab91542a11a548b1 (diff)
parent5fa61d89320af880d5bf6b3231f6478887cfb6a6 (diff)
downloadfuzzycat-a37404f30b2c1afa0b46ee30a5b59d7312c119d0.tar.gz
fuzzycat-a37404f30b2c1afa0b46ee30a5b59d7312c119d0.zip
Merge branch 'wip-martin-review-cleanup' into 'master'
review notes and some cleanup See merge request webgroup/fuzzycat!7
Diffstat (limited to 'notes/todo_manual_review.md')
-rw-r--r--notes/todo_manual_review.md94
1 files changed, 94 insertions, 0 deletions
diff --git a/notes/todo_manual_review.md b/notes/todo_manual_review.md
new file mode 100644
index 0000000..b3474f8
--- /dev/null
+++ b/notes/todo_manual_review.md
@@ -0,0 +1,94 @@
+# TODO
+
+## Case mining
+
+* 2805572 undecided items
+
+## Examples
+
+* [x] https://fatcat.wiki/release/73pcaauzwbalvi7aqhv6vopxl4 https://fatcat.wiki/release/xp3oxb7tqbgaxdzkzbchfkcjn4
+
+> "reference-entry", "entry" - vs other type, e.g. article
+
+* [x] https://fatcat.wiki/release/63g4ukdxajcqhdytqla6du3t3u https://fatcat.wiki/release/rz72bzfevzeofdeb342c6z45qu
+
+This example comes from datacite, the original md:
+
+* [https://api.datacite.org/dois/10.14288/1.0151581](https://api.datacite.org/dois/10.14288/1.0151581)
+
+Metadata similarly off on:
+
+* [https://commons.datacite.org/doi.org/10.14288/1.0011045?query=%2210.14288%22](https://commons.datacite.org/doi.org/10.14288/1.0011045?query=%2210.14288%22)
+* [https://api.datacite.org/application/vnd.datacite.datacite+json/10.14288/1.0011045](https://api.datacite.org/application/vnd.datacite.datacite+json/10.14288/1.0011045)
+
+Picture categorized as article. Added custom rule as workaround.
+
+* [ ] https://fatcat.wiki/release/fwghjz4q7bdulismftuvagmgfu https://fatcat.wiki/release/jwbn7qohu5ggtc5okm4m7s5vja
+
+This seems to be a rerun or repackage of a science article:
+
+* https://stke.sciencemag.org/content/2006/316/tw466
+* https://science.sciencemag.org/content/310/5756/1865.6/tab-pdf
+
+STKE "fulltext" link does not lead anywhere; discontinued.
+
+* [x] https://fatcat.wiki/release/hhyyhosajjflpkufecx26gncwe https://fatcat.wiki/release/yxqwe4ns5vbntjzcse5igkgxk4
+
+> book vs article-journal
+
+* [x] https://fatcat.wiki/release/ij3yuoh6lrh3tkrv5o7gfk6yyi https://fatcat.wiki/release/tur236mqljdfdnlzbbnks2sily
+
+> preprint and IEEE published article
+
+* [x] https://fatcat.wiki/release/neznj5fb4nf3tdqnotnbe34b6e https://fatcat.wiki/release/gcqdvvjiq5bphl7lpc4invi4vy
+
+> a standard document; DOI and DOIu -- which means "undated" (as per URL) --
+> https://landingpage.bsigroup.com/LandingPage/Undated?UPI=000000000030281171
+
+* [ ] https://fatcat.wiki/release/fmi7hmpb3beotnj5kfyjjkolcy https://fatcat.wiki/release/isihxweh6ffxxhhrw2fthqymfa
+
+> Interestingly, the same item, altough different doi and URL, but image ID seems to be the same.
+
+* [x] https://fatcat.wiki/release/he334wpbobegxhptpkvvrufioq https://fatcat.wiki/release/td3ouhgtzbbe7ctevfnldqkoba
+
+> datacite version
+
+* [ ] https://fatcat.wiki/release/5zybwzmlsjexri6c3ma6tczf7q https://fatcat.wiki/release/35gerfmlirelfh3af6qug2oz4q
+* [x] https://fatcat.wiki/release/rnso2swxzvfonemgzrth3arumi https://fatcat.wiki/release/caxa7qbfqvg3bkgz4nwvapgnvi
+
+> too common title
+
+* [ ] https://fatcat.wiki/release/tfhflmc2gnfrncsv2pm2b4oraq https://fatcat.wiki/release/gp7cnryj5bczhao6oor5sbjaoe Status.AMBIGUOUS OK.DUMMY
+
+> Two items, datacite, but both version 1; one lead to an inaccessible item
+
+* [ ] https://fatcat.wiki/release/s4kjrs3g5ndlvixz2fgpydeuja https://fatcat.wiki/release/jn25jn44vzbc3nsubabl2wndsa Status.AMBIGUOUS OK.DUMMY
+* [ ] https://fatcat.wiki/release/5xbugnniynea3k3pllzrb4lfeu https://fatcat.wiki/release/e52xw23ec5cxzi6mkyfyxifvhu Status.AMBIGUOUS OK.DUMMY
+
+* [ ] https://fatcat.wiki/release/6udxu4cnk5egrcxtfrrqt3jcli https://fatcat.wiki/release/ett4oyembjfahhe3iwoc44dnja Status.AMBIGUOUS OK.DUMMY
+
+> todo: distinguish by page
+
+* [ ] https://fatcat.wiki/release/ehu6pdvzvvcmdoyq4l2yf4vciu https://fatcat.wiki/release/2omou6ehgjccbe6yjvr4wgnsha Status.AMBIGUOUS OK.DUMMY
+
+Blacklist fragment.
+
+* [ ] https://fatcat.wiki/release/zkqujozrx5cnjitmglclt6heqq https://fatcat.wiki/release/urr2gs4dsbbwdl7asgyqnwwtxy Status.AMBIGUOUS OK.DUMMY
+
+Blacklist fragment.
+
+* [ ] https://fatcat.wiki/release/yy2wzuaxhba7jht72mcjhxuaju https://fatcat.wiki/release/5b3lb2ebmrdp5nzxvohefmadre Status.AMBIGUOUS OK.DUMMY
+
+> Meeting abstract (ma) versus document.
+
+* [ ] https://fatcat.wiki/release/iwtrxnov2repzlgoi2at2md6tm https://fatcat.wiki/release/s5hm65waingwjmgf3plu76hzu4 Status.AMBIGUOUS OK.DUMMY
+
+> 10.1126 issue with moved items?
+
+* [ ] https://fatcat.wiki/release/b6wfpvotwrecdbygyn27kmihne https://fatcat.wiki/release/3vflegbxtrg4fknx4zyq3rf4im Status.AMBIGUOUS OK.DUMMY
+
+> The same content, but hard to separate.
+
+* [ ] https://fatcat.wiki/release/zlywxoy7cfexvaatziqp4ip5m4 https://fatcat.wiki/release/phqelg6oc5hs5dehhgmodcnh5u Status.AMBIGUOUS OK.DUMMY
+
+> one item contains more md, but the physical entity seems to be the same; 0058904_001 vs 0058904