summaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/importers
Commit message (Collapse)AuthorAgeFilesLines
* editgroup description overrideBryan Newbold2019-04-221-2/+2
|
* arabesque importer does require timestamp/waybackBryan Newbold2019-04-221-0/+3
|
* matched importer shouldn't require waybackBryan Newbold2019-04-221-5/+7
|
* handle API 400 in arabesque import (invalid extid)Bryan Newbold2019-04-191-7/+14
|
* fix arabesque importer crawl_id None bugBryan Newbold2019-04-181-1/+1
|
* mechanism to not double-update entitiesBryan Newbold2019-04-182-1/+9
|
* minor arabesque tweaksBryan Newbold2019-04-181-0/+2
|
* update URL rel listBryan Newbold2019-04-181-1/+10
|
* arabesque importer does fewer updatesBryan Newbold2019-04-181-1/+8
|
* arabesque importerBryan Newbold2019-04-181-0/+165
|
* early version of arabesque importerBryan Newbold2019-04-121-0/+1
|
* add SqlitePusher importer optionBryan Newbold2019-04-122-1/+21
|
* fix cdl_dash_dat license_slugBryan Newbold2019-03-191-7/+3
|
* importer for CDL/DASH dat pilot dweb datasetsBryan Newbold2019-03-192-0/+200
|
* new importer: wayback_staticBryan Newbold2019-03-192-0/+237
|
* bunch of lint/whitespace cleanupsBryan Newbold2019-02-223-5/+3
|
* better/additional crossref license lookupsBryan Newbold2019-02-141-20/+58
|
* crossref: import subtitle as str, not list[str]Bryan Newbold2019-02-141-0/+2
|
* don't print missing DOIs, just countBryan Newbold2019-02-051-1/+3
|
* add some missing LICENSE_SLUG_MAPBryan Newbold2019-02-051-1/+4
|
* yet another required field bugBryan Newbold2019-01-291-4/+5
|
* fix null name for container (required)Bryan Newbold2019-01-291-1/+5
|
* tweaks to GROBID metadata importBryan Newbold2019-01-291-3/+2
|
* crossref import tweaks/fixesBryan Newbold2019-01-291-7/+9
| | | | | - refs: article-title not title; save unstructured; authors not author - save 'language' field (already an ISO code)
* fix bug in clean() resulting in many consistency check failsBryan Newbold2019-01-292-12/+12
|
* fix refs extra ordering bugBryan Newbold2019-01-291-6/+6
|
* pass through kwargs (fixes bezerk imports)Bryan Newbold2019-01-295-5/+10
|
* ensure raw_name is not stubBryan Newbold2019-01-291-1/+4
|
* ensure abstracts aren't stubsBryan Newbold2019-01-291-2/+3
|
* add stub parse_record() to make pylint happyBryan Newbold2019-01-281-0/+4
|
* fix title length checks in crossrefBryan Newbold2019-01-281-2/+2
|
* fix rel/url order swapBryan Newbold2019-01-281-1/+1
|
* don't allow empty or single-character clean stringsBryan Newbold2019-01-281-1/+1
|
* filter short/stub original_titleBryan Newbold2019-01-281-3/+7
|
* many fixes in GROBID importerBryan Newbold2019-01-281-14/+10
|
* fix GROBID null/short abstract additionsBryan Newbold2019-01-281-1/+2
|
* enforce title len>1 for release importsBryan Newbold2019-01-282-1/+8
|
* drop creators with no display name at allBryan Newbold2019-01-281-3/+3
|
* make ORCID importer skip no-names, not assertBryan Newbold2019-01-281-1/+2
|
* transform and import fixes/tweaksBryan Newbold2019-01-252-4/+10
|
* update journal meta import/transformBryan Newbold2019-01-251-104/+39
|
* grobid import extra metadata tweaksBryan Newbold2019-01-241-6/+7
|
* refactor _get_editgroup => get_editgroup_idBryan Newbold2019-01-242-5/+6
|
* refactor make_rel_urlBryan Newbold2019-01-243-29/+66
|
* tweak crossref import, and update testsBryan Newbold2019-01-241-11/+27
|
* allow importing contrib/refs listsBryan Newbold2019-01-241-5/+13
| | | | | | The motivation here isn't really to support these gigantic lists on principle, but to be able to ingest large corpuses without having to decide whether to filter out or crop such lists.
* notes on refactoring container 'extra'Bryan Newbold2019-01-241-0/+79
|
* importer bugfixesBryan Newbold2019-01-233-8/+14
|
* bunch of crossref import tweaks (need tests)Bryan Newbold2019-01-231-50/+43
|
* clean() checks if it returns null-length stringBryan Newbold2019-01-231-1/+5
|