aboutsummaryrefslogtreecommitdiffstats
path: root/fuzzycat/verify.py
Commit message (Collapse)AuthorAgeFilesLines
* turn "match_release_fuzzy" into a classMartin Czygan2021-11-161-4/+3
| | | | | | | | Goal of this refactoring was to make the matching process a bit more configurable by using a class and a cascade of queries. For a limited test set: `FuzzyReleaseMatcher.match` is works the same as `match_release_fuzzy`.
* start larger refactoring: remove clusterMartin Czygan2021-09-241-8/+8
| | | | | | | | | | | | | | | | | | background: verifying hundreds of millions of documents turned out to be a bit slow; anecdata: running clustering and verification over 1.8B inputs tooks over 50h; cf. the Go port (skate) required about 2-4h for those operations. Also: with Go we do not need the extra GNU parallel wrapping. In any case, we aim for fuzzycat refactoring to provide: * better, more configurable verification and small scale matching * removal of batch clustering code (and improve refcat docs) * a place for a bit more generic, similarity based utils The most important piece in fuzzycat is a CSV file containing hand picked test examples for verification - and the code that is able to fulfill that test suite. We want to make this part more robust.
* style: apply formattingMartin Czygan2021-09-211-5/+4
|
* DOI clean/normalize helper; and use in verification etcBryan Newbold2021-07-011-3/+3
|
* verify: page count parsing and comparison improvementsBryan Newbold2021-07-011-1/+3
|
* workaround for a case found in refs:Martin Czygan2021-02-151-0/+6
| | | | | | | | | | | | | | | | | | * https://fatcat.wiki/release/2n7pyugxenb73gope52bn6m2ru * https://fatcat.wiki/release/p4bettvcszgn5d3zls5ogdjk4u Refs: Niaudet P. Steroid-sensitive idiopathic nephrotic syndrome in children. Pediatric Nephrology. 5th ed. Philadelphia: Lippincott Williams & Wilkins, 2004; pp 543–556. Doc: * https://fatcat.wiki/release/lc3d5q62zfa2rjyk2m7nr346nm, T-lymphocyte activation in steroid-sensitive nephrotic syndrome in childhood, by T J Neuhaus, V Shah, R E Callard, T M Barratt
* format docsMartin Czygan2021-01-091-1/+2
|
* case: translation in titleMartin Czygan2021-01-081-0/+10
|
* add casesMartin Czygan2021-01-041-3/+5
|
* tweak year a bit more, add caseMartin Czygan2020-12-241-1/+1
|
* tweak comparison, add testMartin Czygan2020-12-241-1/+1
|
* matching: fix importMartin Czygan2020-12-211-2/+3
|
* add verify_release_entities wrapperMartin Czygan2020-12-191-1/+6
|
* rename reason: dummy to unknownMartin Czygan2020-12-181-2/+2
|
* update notesMartin Czygan2020-12-171-0/+2
|
* remove obsolete assertMartin Czygan2020-12-171-3/+0
|
* check doi existenceMartin Czygan2020-12-171-1/+3
|
* update statsMartin Czygan2020-12-171-41/+40
|
* update docsMartin Czygan2020-12-171-8/+3
|
* add flagsMartin Czygan2020-12-171-1/+1
|
* single item verificationMartin Czygan2020-12-151-48/+50
|
* update docsMartin Czygan2020-12-141-0/+7
|
* verify: move out some code to utilsMartin Czygan2020-12-141-16/+6
|
* update docsMartin Czygan2020-12-121-0/+1
|
* update docsMartin Czygan2020-12-121-15/+28
|
* get rid of magic stringsMartin Czygan2020-12-121-9/+12
|
* update readmeMartin Czygan2020-12-121-45/+56
|
* fix importsMartin Czygan2020-12-121-1/+1
|
* move helper function into methodMartin Czygan2020-12-121-4/+3
|
* add type hintMartin Czygan2020-12-121-1/+2
|
* get rid of 'ok' and 'miss'Martin Czygan2020-12-121-52/+52
|
* add generic doi version caseMartin Czygan2020-12-111-17/+16
|
* 158 casesMartin Czygan2020-12-101-0/+2
|
* add casesMartin Czygan2020-12-101-0/+22
|
* add versioned doi patternMartin Czygan2020-12-101-1/+12
|
* pmid doi pair caseMartin Czygan2020-12-101-1/+1
|
* separate static dataMartin Czygan2020-12-101-3225/+8
|
* add a few more dummy casesMartin Czygan2020-12-091-0/+9
|
* add subdoc caseMartin Czygan2020-12-091-1/+9
|
* update verify.csvMartin Czygan2020-12-091-1/+2
|
* add two more casesMartin Czygan2020-12-091-1/+24
|
* add another caseMartin Czygan2020-12-091-0/+14
|
* another caseMartin Czygan2020-12-091-0/+18
|
* publications over N year apart are most likely differentMartin Czygan2020-12-041-2/+11
| | | | N=40 hardcoded for now, but should be probably a parameter.
* update cases; ok.work_idMartin Czygan2020-12-041-0/+3
|
* case: ignore choice reviewMartin Czygan2020-12-041-0/+10
|
* add caseMartin Czygan2020-12-031-0/+12
|
* add iop caseMartin Czygan2020-12-021-2/+12
|
* add casesMartin Czygan2020-12-021-1/+11
|
* add caseMartin Czygan2020-12-021-0/+7
|