aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* update URL crawl status snapshotBryan Newbold2019-12-262-5/+2
|
* add check to container stat fetch to ensure valid JSON returnedBryan Newbold2019-12-261-1/+1
|
* add stats and URL crawl status filesBryan Newbold2019-12-242-2/+6
|
* count chocula logo (yay)Bryan Newbold2019-12-241-0/+0
|
* example queries to run on sqliteBryan Newbold2019-12-242-0/+64
|
* update README with better directionsBryan Newbold2019-12-242-16/+48
|
* move old scripts into subdirectoryBryan Newbold2019-12-233-0/+0
|
* update chocula usage of argparseBryan Newbold2019-12-231-14/+22
|
* update norwegian CSV importer schemaBryan Newbold2019-12-231-2/+4
|
* update chocula input data filesBryan Newbold2019-12-233-38/+35
| | | | | Including updating fetch script, README links, and chocula.py path references.
* use newer fatcat contianer dumpBryan Newbold2019-09-062-1/+3
|
* filter out bad ISSN{e,p}Bryan Newbold2019-09-061-0/+5
| | | | | Unfortunately a few hundred of these got pushed into fatcat already; will probably fix with a new fixer bot tool.
* last name/publisher cleanupsBryan Newbold2019-09-031-2/+6
|
* update TODOBryan Newbold2019-09-031-1/+10
|
* don't include doaj.org or NCBI homepage URLsBryan Newbold2019-09-031-0/+4
|
* improve fatcat_export metadata qualityBryan Newbold2019-09-031-3/+12
|
* fix SZCEPANSKI typoBryan Newbold2019-09-031-2/+2
|
* improve export_fatcatBryan Newbold2019-08-281-5/+22
|
* python script to fix fatcat ISSN-LsBryan Newbold2019-08-271-0/+75
|
* hand-coded corrections to invalid fatcat ISSN-LsBryan Newbold2019-08-271-88/+88
|
* current invalid fatcat ISSN-LsBryan Newbold2019-08-271-0/+118
| | | | | AKA, list of fatcat containers with an ISSN-L that isn't a valid ISSN (based on checksum)
* only fatcat_export 'valid' (syntax) ISSN-LsBryan Newbold2019-08-271-1/+1
|
* include Szczepanski in everything command (oops)Bryan Newbold2019-08-271-0/+1
|
* updated crossref title file; ISSN-L file linkBryan Newbold2019-08-273-3/+3
|
* update IA_CRAWL_FILEBryan Newbold2019-07-311-1/+1
|
* commit TODO listBryan Newbold2019-07-311-0/+37
|
* update fetch.sh with url_status filesBryan Newbold2019-07-311-0/+3
|
* webarchive_urls separate from regular URLsBryan Newbold2019-07-311-1/+21
|
* don't return 'error' for bad CDX lookupsBryan Newbold2019-07-311-1/+3
|
* add 'export_fatcat'Bryan Newbold2019-07-311-1/+51
|
* README updateBryan Newbold2019-07-311-21/+35
|
* more check_issn_urls corner-casesBryan Newbold2019-07-311-1/+5
|
* handle 'ttp://' URL prefix corner caseBryan Newbold2019-07-311-0/+2
|
* broader top-level gitignoreBryan Newbold2019-07-311-0/+25
|
* remove python 3.5 constraintBryan Newbold2019-07-312-6/+4
|
* pipenv: datasetteBryan Newbold2019-07-312-1/+145
|
* add wikidata SPARQL queryBryan Newbold2019-07-311-0/+35
|
* sqlite-notebook template for basic chocula statsBryan Newbold2019-07-312-0/+186
|
* iterate on homepage url import/statsBryan Newbold2019-07-312-21/+43
|
* more issn URL checker fixesBryan Newbold2019-07-312-11/+27
|
* major improvements to ISSN URL checkerBryan Newbold2019-07-301-20/+121
|
* import vanilla ISSN url checker scriptBryan Newbold2019-07-301-0/+52
|
* chocula: sherpa_color in summary; cleanupsBryan Newbold2019-07-303-6/+12
|
* chocula: openapcBryan Newbold2019-07-301-1/+31
|
* chocula: json exportBryan Newbold2019-07-301-0/+17
|
* chocula: fix wikidata_qid inclusionBryan Newbold2019-07-301-2/+3
|
* chocula: fix wikidata_qid inclusionBryan Newbold2019-07-302-1/+3
|
* chocula: better ISSN-L handlingBryan Newbold2019-07-304-24/+41
|
* chocula: updated fetches, new ISSN-L and DOAJ filesBryan Newbold2019-07-302-7/+10
|
* chocula: wikidata indexingBryan Newbold2019-07-301-4/+48
|