aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* move old scripts into subdirectoryBryan Newbold2019-12-233-0/+0
|
* update chocula usage of argparseBryan Newbold2019-12-231-14/+22
|
* update norwegian CSV importer schemaBryan Newbold2019-12-231-2/+4
|
* update chocula input data filesBryan Newbold2019-12-233-38/+35
| | | | | Including updating fetch script, README links, and chocula.py path references.
* use newer fatcat contianer dumpBryan Newbold2019-09-062-1/+3
|
* filter out bad ISSN{e,p}Bryan Newbold2019-09-061-0/+5
| | | | | Unfortunately a few hundred of these got pushed into fatcat already; will probably fix with a new fixer bot tool.
* last name/publisher cleanupsBryan Newbold2019-09-031-2/+6
|
* update TODOBryan Newbold2019-09-031-1/+10
|
* don't include doaj.org or NCBI homepage URLsBryan Newbold2019-09-031-0/+4
|
* improve fatcat_export metadata qualityBryan Newbold2019-09-031-3/+12
|
* fix SZCEPANSKI typoBryan Newbold2019-09-031-2/+2
|
* improve export_fatcatBryan Newbold2019-08-281-5/+22
|
* python script to fix fatcat ISSN-LsBryan Newbold2019-08-271-0/+75
|
* hand-coded corrections to invalid fatcat ISSN-LsBryan Newbold2019-08-271-88/+88
|
* current invalid fatcat ISSN-LsBryan Newbold2019-08-271-0/+118
| | | | | AKA, list of fatcat containers with an ISSN-L that isn't a valid ISSN (based on checksum)
* only fatcat_export 'valid' (syntax) ISSN-LsBryan Newbold2019-08-271-1/+1
|
* include Szczepanski in everything command (oops)Bryan Newbold2019-08-271-0/+1
|
* updated crossref title file; ISSN-L file linkBryan Newbold2019-08-273-3/+3
|
* update IA_CRAWL_FILEBryan Newbold2019-07-311-1/+1
|
* commit TODO listBryan Newbold2019-07-311-0/+37
|
* update fetch.sh with url_status filesBryan Newbold2019-07-311-0/+3
|
* webarchive_urls separate from regular URLsBryan Newbold2019-07-311-1/+21
|
* don't return 'error' for bad CDX lookupsBryan Newbold2019-07-311-1/+3
|
* add 'export_fatcat'Bryan Newbold2019-07-311-1/+51
|
* README updateBryan Newbold2019-07-311-21/+35
|
* more check_issn_urls corner-casesBryan Newbold2019-07-311-1/+5
|
* handle 'ttp://' URL prefix corner caseBryan Newbold2019-07-311-0/+2
|
* broader top-level gitignoreBryan Newbold2019-07-311-0/+25
|
* remove python 3.5 constraintBryan Newbold2019-07-312-6/+4
|
* pipenv: datasetteBryan Newbold2019-07-312-1/+145
|
* add wikidata SPARQL queryBryan Newbold2019-07-311-0/+35
|
* sqlite-notebook template for basic chocula statsBryan Newbold2019-07-312-0/+186
|
* iterate on homepage url import/statsBryan Newbold2019-07-312-21/+43
|
* more issn URL checker fixesBryan Newbold2019-07-312-11/+27
|
* major improvements to ISSN URL checkerBryan Newbold2019-07-301-20/+121
|
* import vanilla ISSN url checker scriptBryan Newbold2019-07-301-0/+52
|
* chocula: sherpa_color in summary; cleanupsBryan Newbold2019-07-303-6/+12
|
* chocula: openapcBryan Newbold2019-07-301-1/+31
|
* chocula: json exportBryan Newbold2019-07-301-0/+17
|
* chocula: fix wikidata_qid inclusionBryan Newbold2019-07-301-2/+3
|
* chocula: fix wikidata_qid inclusionBryan Newbold2019-07-302-1/+3
|
* chocula: better ISSN-L handlingBryan Newbold2019-07-304-24/+41
|
* chocula: updated fetches, new ISSN-L and DOAJ filesBryan Newbold2019-07-302-7/+10
|
* chocula: wikidata indexingBryan Newbold2019-07-301-4/+48
|
* chocula: crude publisher type bucketing; field cleanupBryan Newbold2019-07-302-40/+194
|
* shorter/simpler table namesBryan Newbold2019-07-262-9/+17
|
* chocula: more host/domain fixesBryan Newbold2019-07-261-3/+8
|
* GOLD OA parsingBryan Newbold2019-07-261-40/+54
|
* chocula: fix domain parsingBryan Newbold2019-07-261-10/+47
|
* pipenv: pytest for journal_metadataBryan Newbold2019-07-262-4/+83
|