diff options
author | Martin Czygan <martin@archive.org> | 2021-09-21 18:34:31 +0000 |
---|---|---|
committer | Martin Czygan <martin@archive.org> | 2021-09-21 18:34:31 +0000 |
commit | a37404f30b2c1afa0b46ee30a5b59d7312c119d0 (patch) | |
tree | 8f38d396909049971559a5ab574561960d6d5f22 /notes/todo_manual_review.md | |
parent | c587a084defe54103aa147b7ab91542a11a548b1 (diff) | |
parent | 5fa61d89320af880d5bf6b3231f6478887cfb6a6 (diff) | |
download | fuzzycat-a37404f30b2c1afa0b46ee30a5b59d7312c119d0.tar.gz fuzzycat-a37404f30b2c1afa0b46ee30a5b59d7312c119d0.zip |
Merge branch 'wip-martin-review-cleanup' into 'master'
review notes and some cleanup
See merge request webgroup/fuzzycat!7
Diffstat (limited to 'notes/todo_manual_review.md')
-rw-r--r-- | notes/todo_manual_review.md | 94 |
1 files changed, 94 insertions, 0 deletions
diff --git a/notes/todo_manual_review.md b/notes/todo_manual_review.md new file mode 100644 index 0000000..b3474f8 --- /dev/null +++ b/notes/todo_manual_review.md @@ -0,0 +1,94 @@ +# TODO + +## Case mining + +* 2805572 undecided items + +## Examples + +* [x] https://fatcat.wiki/release/73pcaauzwbalvi7aqhv6vopxl4 https://fatcat.wiki/release/xp3oxb7tqbgaxdzkzbchfkcjn4 + +> "reference-entry", "entry" - vs other type, e.g. article + +* [x] https://fatcat.wiki/release/63g4ukdxajcqhdytqla6du3t3u https://fatcat.wiki/release/rz72bzfevzeofdeb342c6z45qu + +This example comes from datacite, the original md: + +* [https://api.datacite.org/dois/10.14288/1.0151581](https://api.datacite.org/dois/10.14288/1.0151581) + +Metadata similarly off on: + +* [https://commons.datacite.org/doi.org/10.14288/1.0011045?query=%2210.14288%22](https://commons.datacite.org/doi.org/10.14288/1.0011045?query=%2210.14288%22) +* [https://api.datacite.org/application/vnd.datacite.datacite+json/10.14288/1.0011045](https://api.datacite.org/application/vnd.datacite.datacite+json/10.14288/1.0011045) + +Picture categorized as article. Added custom rule as workaround. + +* [ ] https://fatcat.wiki/release/fwghjz4q7bdulismftuvagmgfu https://fatcat.wiki/release/jwbn7qohu5ggtc5okm4m7s5vja + +This seems to be a rerun or repackage of a science article: + +* https://stke.sciencemag.org/content/2006/316/tw466 +* https://science.sciencemag.org/content/310/5756/1865.6/tab-pdf + +STKE "fulltext" link does not lead anywhere; discontinued. + +* [x] https://fatcat.wiki/release/hhyyhosajjflpkufecx26gncwe https://fatcat.wiki/release/yxqwe4ns5vbntjzcse5igkgxk4 + +> book vs article-journal + +* [x] https://fatcat.wiki/release/ij3yuoh6lrh3tkrv5o7gfk6yyi https://fatcat.wiki/release/tur236mqljdfdnlzbbnks2sily + +> preprint and IEEE published article + +* [x] https://fatcat.wiki/release/neznj5fb4nf3tdqnotnbe34b6e https://fatcat.wiki/release/gcqdvvjiq5bphl7lpc4invi4vy + +> a standard document; DOI and DOIu -- which means "undated" (as per URL) -- +> https://landingpage.bsigroup.com/LandingPage/Undated?UPI=000000000030281171 + +* [ ] https://fatcat.wiki/release/fmi7hmpb3beotnj5kfyjjkolcy https://fatcat.wiki/release/isihxweh6ffxxhhrw2fthqymfa + +> Interestingly, the same item, altough different doi and URL, but image ID seems to be the same. + +* [x] https://fatcat.wiki/release/he334wpbobegxhptpkvvrufioq https://fatcat.wiki/release/td3ouhgtzbbe7ctevfnldqkoba + +> datacite version + +* [ ] https://fatcat.wiki/release/5zybwzmlsjexri6c3ma6tczf7q https://fatcat.wiki/release/35gerfmlirelfh3af6qug2oz4q +* [x] https://fatcat.wiki/release/rnso2swxzvfonemgzrth3arumi https://fatcat.wiki/release/caxa7qbfqvg3bkgz4nwvapgnvi + +> too common title + +* [ ] https://fatcat.wiki/release/tfhflmc2gnfrncsv2pm2b4oraq https://fatcat.wiki/release/gp7cnryj5bczhao6oor5sbjaoe Status.AMBIGUOUS OK.DUMMY + +> Two items, datacite, but both version 1; one lead to an inaccessible item + +* [ ] https://fatcat.wiki/release/s4kjrs3g5ndlvixz2fgpydeuja https://fatcat.wiki/release/jn25jn44vzbc3nsubabl2wndsa Status.AMBIGUOUS OK.DUMMY +* [ ] https://fatcat.wiki/release/5xbugnniynea3k3pllzrb4lfeu https://fatcat.wiki/release/e52xw23ec5cxzi6mkyfyxifvhu Status.AMBIGUOUS OK.DUMMY + +* [ ] https://fatcat.wiki/release/6udxu4cnk5egrcxtfrrqt3jcli https://fatcat.wiki/release/ett4oyembjfahhe3iwoc44dnja Status.AMBIGUOUS OK.DUMMY + +> todo: distinguish by page + +* [ ] https://fatcat.wiki/release/ehu6pdvzvvcmdoyq4l2yf4vciu https://fatcat.wiki/release/2omou6ehgjccbe6yjvr4wgnsha Status.AMBIGUOUS OK.DUMMY + +Blacklist fragment. + +* [ ] https://fatcat.wiki/release/zkqujozrx5cnjitmglclt6heqq https://fatcat.wiki/release/urr2gs4dsbbwdl7asgyqnwwtxy Status.AMBIGUOUS OK.DUMMY + +Blacklist fragment. + +* [ ] https://fatcat.wiki/release/yy2wzuaxhba7jht72mcjhxuaju https://fatcat.wiki/release/5b3lb2ebmrdp5nzxvohefmadre Status.AMBIGUOUS OK.DUMMY + +> Meeting abstract (ma) versus document. + +* [ ] https://fatcat.wiki/release/iwtrxnov2repzlgoi2at2md6tm https://fatcat.wiki/release/s5hm65waingwjmgf3plu76hzu4 Status.AMBIGUOUS OK.DUMMY + +> 10.1126 issue with moved items? + +* [ ] https://fatcat.wiki/release/b6wfpvotwrecdbygyn27kmihne https://fatcat.wiki/release/3vflegbxtrg4fknx4zyq3rf4im Status.AMBIGUOUS OK.DUMMY + +> The same content, but hard to separate. + +* [ ] https://fatcat.wiki/release/zlywxoy7cfexvaatziqp4ip5m4 https://fatcat.wiki/release/phqelg6oc5hs5dehhgmodcnh5u Status.AMBIGUOUS OK.DUMMY + +> one item contains more md, but the physical entity seems to be the same; 0058904_001 vs 0058904 |