aboutsummaryrefslogtreecommitdiffstats
path: root/python
Commit message (Collapse)AuthorAgeFilesLines
* fix GROBID null/short abstract additionsBryan Newbold2019-01-281-1/+2
|
* batch size as a general import paramBryan Newbold2019-01-281-13/+4
|
* add missing bezerk-mode flag to GROBID importBryan Newbold2019-01-281-3/+8
|
* enforce title len>1 for release importsBryan Newbold2019-01-282-1/+8
|
* fix typo in crossref importerBryan Newbold2019-01-281-1/+1
|
* drop creators with no display name at allBryan Newbold2019-01-281-3/+3
|
* make ORCID importer skip no-names, not assertBryan Newbold2019-01-281-1/+2
|
* more ES index fixesBryan Newbold2019-01-283-3/+4
|
* vastly improve entity_to_dict() speedBryan Newbold2019-01-281-1/+9
|
* add filesets and webcaptures to dumpsBryan Newbold2019-01-281-1/+2
|
* fatcat -> fatcat_release ES indexBryan Newbold2019-01-283-20/+21
|
* transform and import fixes/tweaksBryan Newbold2019-01-255-22/+92
|
* update journal meta import/transformBryan Newbold2019-01-256-154/+226
|
* grobid import extra metadata tweaksBryan Newbold2019-01-241-6/+7
|
* refactor _get_editgroup => get_editgroup_idBryan Newbold2019-01-242-5/+6
|
* refactor make_rel_urlBryan Newbold2019-01-243-29/+66
|
* tweak crossref import, and update testsBryan Newbold2019-01-244-29/+74
|
* empty fields testBryan Newbold2019-01-241-0/+13
|
* allow importing contrib/refs listsBryan Newbold2019-01-243-9/+25
| | | | | | The motivation here isn't really to support these gigantic lists on principle, but to be able to ingest large corpuses without having to decide whether to filter out or crop such lists.
* notes on refactoring container 'extra'Bryan Newbold2019-01-241-0/+79
|
* importer bugfixesBryan Newbold2019-01-233-8/+14
|
* more import script fixesBryan Newbold2019-01-231-1/+4
|
* start changes to release ES schemaBryan Newbold2019-01-234-119/+195
|
* bunch of crossref import tweaks (need tests)Bryan Newbold2019-01-231-50/+43
|
* try to fix any_abstractBryan Newbold2019-01-231-1/+1
|
* clean() checks if it returns null-length stringBryan Newbold2019-01-231-1/+5
|
* update importer scriptBryan Newbold2019-01-231-33/+24
|
* matched importer: bezerk mode to skip file updatesBryan Newbold2019-01-231-11/+5
|
* ensure crossref importer doesn't create empty editgroupsBryan Newbold2019-01-231-0/+2
|
* ftfy all over (needs Pipfile.lock)Bryan Newbold2019-01-238-39/+75
|
* add missing dateBryan Newbold2019-01-231-1/+1
|
* more tests; fix some importer behaviorBryan Newbold2019-01-237-50/+111
|
* specific test for desc/extra in editgroupsBryan Newbold2019-01-231-2/+26
|
* improve changelog testsBryan Newbold2019-01-236-12/+15
|
* refactor remaining importersBryan Newbold2019-01-2213-356/+324
|
* refactored crossref importer to new styleBryan Newbold2019-01-225-118/+198
|
* new importer API interfacesBryan Newbold2019-01-222-0/+181
|
* crossref importer updatesBryan Newbold2019-01-224-22/+82
|
* pubmed+datacite tokens; no journal,grobid,matched tokensBryan Newbold2019-01-222-5/+4
|
* fix issn -> journal-metadata renameBryan Newbold2019-01-221-1/+1
|
* more per-entity testsBryan Newbold2019-01-227-58/+312
|
* remove coden and abbrev from python toolsBryan Newbold2019-01-212-8/+0
|
* include filesets and webcaptures in exportsBryan Newbold2019-01-181-1/+1
|
* basic tests for filesets and webcapturesBryan Newbold2019-01-182-0/+160
|
* fix typo in elastic transform codeBryan Newbold2019-01-181-1/+1
|
* update import README with timesBryan Newbold2019-01-181-2/+3
|
* more 'true' -> True query param fixesBryan Newbold2019-01-184-4/+4
|
* state in elasticsearch (and deleted/redirects)Bryan Newbold2019-01-181-2/+8
|
* don't need string bools in query flags any moreBryan Newbold2019-01-171-8/+8
|
* issn => journal_metadata in several placesBryan Newbold2019-01-176-42/+42
|