aboutsummaryrefslogtreecommitdiffstats
path: root/chocula
Commit message (Expand)AuthorAgeFilesLines
* database: work around annoying ISSN-L column issueBryan Newbold2022-07-291-1/+1
* more publisher_type pattern matchingBryan Newbold2022-07-212-7/+12
* more homepage domains to ignore (and resort)Bryan Newbold2022-07-211-28/+33
* in fatcat exports, skip 'UNKNOWN_TITLE'Bryan Newbold2021-11-301-0/+5
* handle homepage check with no status (skip, etc)Bryan Newbold2021-11-301-1/+1
* make fmtBryan Newbold2021-11-302-26/+26
* simplify homepage URL handling code a bitBryan Newbold2021-11-301-12/+14
* improve homepage URL filteringBryan Newbold2021-11-301-14/+28
* more HomepageUrl filteringBryan Newbold2021-11-241-0/+3
* codespell fix minor typos (there are some more in actual codeBryan Newbold2021-11-243-5/+5
* add openalex directory sourceBryan Newbold2021-11-222-0/+70
* make fmtBryan Newbold2021-04-238-11/+28
* doaj: updates for new file format; removed some fields/metadataBryan Newbold2021-04-231-55/+43
* SIM: cap maximum year of coverageBryan Newbold2020-12-071-0/+3
* database support for scholarsportal and cariniana preservation holdingsBryan Newbold2020-10-083-1/+72
* vanished_inactive: more tolerant handling of unicode BOMBryan Newbold2020-10-081-1/+2
* util: parse ISSN format with extra spacesBryan Newbold2020-09-131-0/+2
* update vanished journal importer for 2020-09-03 datasetBryan Newbold2020-09-132-30/+18
* do not create hathitrust-only journal rowsBryan Newbold2020-09-021-1/+2
* hathitrust KBART-style importerBryan Newbold2020-09-024-2/+106
* include pkp_pln as a kbart directory in summarization/export/etcBryan Newbold2020-08-311-1/+1
* fmtBryan Newbold2020-08-313-12/+29
* add support for PKP PLN (KBART-like)Bryan Newbold2020-08-313-1/+57
* fatcat export improvementsBryan Newbold2020-08-031-9/+28
* more blocked URLs and domainsBryan Newbold2020-08-031-0/+29
* directories: all extra metadata in top-level dictBryan Newbold2020-08-034-13/+9
* sim: some flag fields as booleanBryan Newbold2020-08-031-2/+12
* doaj bug: wasn't setting extra directory metadataBryan Newbold2020-08-031-9/+8
* remove trailing whitespace from commentBryan Newbold2020-06-251-7/+7
* add MAG importer; reorder directory class listingBryan Newbold2020-06-232-10/+73
* block some meta stringsBryan Newbold2020-06-231-0/+3
* skip umi.com in addition to www.umi.comBryan Newbold2020-06-231-0/+1
* road: proper language parsingBryan Newbold2020-06-231-2/+6
* ensure lang is len()==2; prep for original_name columnBryan Newbold2020-06-231-0/+5
* make fmtBryan Newbold2020-06-231-34/+39
* tests and fixes for parse_lang(), parse_country()Bryan Newbold2020-06-231-19/+78
* block/skip more homepage patternsBryan Newbold2020-06-231-0/+9
* fix langs inclusion in summarization; remove unused/duplicate fieldsBryan Newbold2020-06-231-2/+2
* strip control characters from titles (issn_meta)Bryan Newbold2020-06-231-0/+4
* fix issn_meta country detectionBryan Newbold2020-06-231-5/+8
* improve lang parsingBryan Newbold2020-06-235-7/+11
* issn_meta: mainTitle can be an arrayBryan Newbold2020-06-231-1/+4
* set is_active flag based on directoriesBryan Newbold2020-06-231-0/+5
* sources, ISSN-L test mappings, __init__ for recent importersBryan Newbold2020-06-231-0/+12
* ZDB homepage (FIZE) scrape importerBryan Newbold2020-06-231-0/+34
* australian ERA journal list importerBryan Newbold2020-06-231-0/+54
* vanished journal metadata importerBryan Newbold2020-06-232-0/+113
* ISSN portal metadata directory importerBryan Newbold2020-06-231-0/+61
* AWOL directory importerBryan Newbold2020-06-231-0/+76
* filter out more meta/index URL hostsBryan Newbold2020-06-231-1/+15