aboutsummaryrefslogtreecommitdiffstats
path: root/chocula
Commit message (Collapse)AuthorAgeFilesLines
* fatcat export improvementsBryan Newbold2020-08-031-9/+28
|
* more blocked URLs and domainsBryan Newbold2020-08-031-0/+29
|
* directories: all extra metadata in top-level dictBryan Newbold2020-08-034-13/+9
| | | | Had been using slug-specific sub-objects, but this was too confusing.
* sim: some flag fields as booleanBryan Newbold2020-08-031-2/+12
|
* doaj bug: wasn't setting extra directory metadataBryan Newbold2020-08-031-9/+8
|
* remove trailing whitespace from commentBryan Newbold2020-06-251-7/+7
|
* add MAG importer; reorder directory class listingBryan Newbold2020-06-232-10/+73
|
* block some meta stringsBryan Newbold2020-06-231-0/+3
|
* skip umi.com in addition to www.umi.comBryan Newbold2020-06-231-0/+1
|
* road: proper language parsingBryan Newbold2020-06-231-2/+6
|
* ensure lang is len()==2; prep for original_name columnBryan Newbold2020-06-231-0/+5
|
* make fmtBryan Newbold2020-06-231-34/+39
|
* tests and fixes for parse_lang(), parse_country()Bryan Newbold2020-06-231-19/+78
| | | | These were basically entirely broken. Oof!
* block/skip more homepage patternsBryan Newbold2020-06-231-0/+9
|
* fix langs inclusion in summarization; remove unused/duplicate fieldsBryan Newbold2020-06-231-2/+2
|
* strip control characters from titles (issn_meta)Bryan Newbold2020-06-231-0/+4
|
* fix issn_meta country detectionBryan Newbold2020-06-231-5/+8
|
* improve lang parsingBryan Newbold2020-06-235-7/+11
|
* issn_meta: mainTitle can be an arrayBryan Newbold2020-06-231-1/+4
|
* set is_active flag based on directoriesBryan Newbold2020-06-231-0/+5
|
* sources, ISSN-L test mappings, __init__ for recent importersBryan Newbold2020-06-231-0/+12
|
* ZDB homepage (FIZE) scrape importerBryan Newbold2020-06-231-0/+34
|
* australian ERA journal list importerBryan Newbold2020-06-231-0/+54
|
* vanished journal metadata importerBryan Newbold2020-06-232-0/+113
|
* ISSN portal metadata directory importerBryan Newbold2020-06-231-0/+61
|
* AWOL directory importerBryan Newbold2020-06-231-0/+76
|
* filter out more meta/index URL hostsBryan Newbold2020-06-231-1/+15
|
* Revert "EZB color not a good proxy for OA status"Bryan Newbold2020-06-231-0/+2
| | | | | | | | I think this actually is Ok in the context of identifying longtail journals. We don't set the `is_oa` flag in release metdata based on this chocula flag. This reverts commit 9ba5b2e307c7f61f60304ba104bf3cc8424b7163.
* new manual homepage sourceBryan Newbold2020-06-232-0/+49
|
* be more careful with sherpa/romeo color summarizationBryan Newbold2020-06-221-3/+4
|
* EZB color not a good proxy for OA statusBryan Newbold2020-06-221-2/+0
|
* additional small flake8 fixesBryan Newbold2020-06-222-2/+2
|
* flake8 cleanupsBryan Newbold2020-06-227-17/+9
|
* norwegian: fixes from bugs flake8 helped findBryan Newbold2020-06-221-3/+2
|
* fmt (black)Bryan Newbold2020-06-2221-613/+766
|
* remove un-necessary list() in iterationBryan Newbold2020-06-221-1/+1
|
* additional OJS platform namesBryan Newbold2020-06-111-0/+2
|
* use and pass-through 'platform' extra metadataBryan Newbold2020-06-111-4/+7
|
* scielo metadata importBryan Newbold2020-06-032-1/+50
|
* update sources; fix as_of in DOAJBryan Newbold2020-06-021-1/+1
|
* fixes for KBART importBryan Newbold2020-06-022-8/+16
|
* add KBART parsing/importingBryan Newbold2020-06-024-63/+169
|
* refactor main commandsBryan Newbold2020-06-021-59/+77
|
* warn, but don't skip mangled SIM yearsBryan Newbold2020-06-021-3/+2
|
* fix tests and type annotationsBryan Newbold2020-06-012-23/+22
|
* 'everything' at least partially workingBryan Newbold2020-06-016-118/+308
|
* update code to work with new config structureBryan Newbold2020-05-0713-18/+18
|
* nice simple hack for config loadingBryan Newbold2020-05-071-26/+17
|
* start a MakefileBryan Newbold2020-05-0717-578/+1016
| | | | | | | | | | Move all "index" functions into classes, each in a separate file. Add lots of type annotations. Use dataclass objects to hold database rows. This aspect will need further refactoring to remove "extra" usage, probably by adding database rows to align with DatabaseInfo more closely.
* rename chocula.databaseBryan Newbold2020-05-062-1/+1
|