aboutsummaryrefslogtreecommitdiffstats
path: root/chocula
Commit message (Collapse)AuthorAgeFilesLines
* util: parse ISSN format with extra spacesBryan Newbold2020-09-131-0/+2
|
* update vanished journal importer for 2020-09-03 datasetBryan Newbold2020-09-132-30/+18
|
* do not create hathitrust-only journal rowsBryan Newbold2020-09-021-1/+2
|
* hathitrust KBART-style importerBryan Newbold2020-09-024-2/+106
|
* include pkp_pln as a kbart directory in summarization/export/etcBryan Newbold2020-08-311-1/+1
|
* fmtBryan Newbold2020-08-313-12/+29
|
* add support for PKP PLN (KBART-like)Bryan Newbold2020-08-313-1/+57
|
* fatcat export improvementsBryan Newbold2020-08-031-9/+28
|
* more blocked URLs and domainsBryan Newbold2020-08-031-0/+29
|
* directories: all extra metadata in top-level dictBryan Newbold2020-08-034-13/+9
| | | | Had been using slug-specific sub-objects, but this was too confusing.
* sim: some flag fields as booleanBryan Newbold2020-08-031-2/+12
|
* doaj bug: wasn't setting extra directory metadataBryan Newbold2020-08-031-9/+8
|
* remove trailing whitespace from commentBryan Newbold2020-06-251-7/+7
|
* add MAG importer; reorder directory class listingBryan Newbold2020-06-232-10/+73
|
* block some meta stringsBryan Newbold2020-06-231-0/+3
|
* skip umi.com in addition to www.umi.comBryan Newbold2020-06-231-0/+1
|
* road: proper language parsingBryan Newbold2020-06-231-2/+6
|
* ensure lang is len()==2; prep for original_name columnBryan Newbold2020-06-231-0/+5
|
* make fmtBryan Newbold2020-06-231-34/+39
|
* tests and fixes for parse_lang(), parse_country()Bryan Newbold2020-06-231-19/+78
| | | | These were basically entirely broken. Oof!
* block/skip more homepage patternsBryan Newbold2020-06-231-0/+9
|
* fix langs inclusion in summarization; remove unused/duplicate fieldsBryan Newbold2020-06-231-2/+2
|
* strip control characters from titles (issn_meta)Bryan Newbold2020-06-231-0/+4
|
* fix issn_meta country detectionBryan Newbold2020-06-231-5/+8
|
* improve lang parsingBryan Newbold2020-06-235-7/+11
|
* issn_meta: mainTitle can be an arrayBryan Newbold2020-06-231-1/+4
|
* set is_active flag based on directoriesBryan Newbold2020-06-231-0/+5
|
* sources, ISSN-L test mappings, __init__ for recent importersBryan Newbold2020-06-231-0/+12
|
* ZDB homepage (FIZE) scrape importerBryan Newbold2020-06-231-0/+34
|
* australian ERA journal list importerBryan Newbold2020-06-231-0/+54
|
* vanished journal metadata importerBryan Newbold2020-06-232-0/+113
|
* ISSN portal metadata directory importerBryan Newbold2020-06-231-0/+61
|
* AWOL directory importerBryan Newbold2020-06-231-0/+76
|
* filter out more meta/index URL hostsBryan Newbold2020-06-231-1/+15
|
* Revert "EZB color not a good proxy for OA status"Bryan Newbold2020-06-231-0/+2
| | | | | | | | I think this actually is Ok in the context of identifying longtail journals. We don't set the `is_oa` flag in release metdata based on this chocula flag. This reverts commit 9ba5b2e307c7f61f60304ba104bf3cc8424b7163.
* new manual homepage sourceBryan Newbold2020-06-232-0/+49
|
* be more careful with sherpa/romeo color summarizationBryan Newbold2020-06-221-3/+4
|
* EZB color not a good proxy for OA statusBryan Newbold2020-06-221-2/+0
|
* additional small flake8 fixesBryan Newbold2020-06-222-2/+2
|
* flake8 cleanupsBryan Newbold2020-06-227-17/+9
|
* norwegian: fixes from bugs flake8 helped findBryan Newbold2020-06-221-3/+2
|
* fmt (black)Bryan Newbold2020-06-2221-613/+766
|
* remove un-necessary list() in iterationBryan Newbold2020-06-221-1/+1
|
* additional OJS platform namesBryan Newbold2020-06-111-0/+2
|
* use and pass-through 'platform' extra metadataBryan Newbold2020-06-111-4/+7
|
* scielo metadata importBryan Newbold2020-06-032-1/+50
|
* update sources; fix as_of in DOAJBryan Newbold2020-06-021-1/+1
|
* fixes for KBART importBryan Newbold2020-06-022-8/+16
|
* add KBART parsing/importingBryan Newbold2020-06-024-63/+169
|
* refactor main commandsBryan Newbold2020-06-021-59/+77
|