aboutsummaryrefslogtreecommitdiffstats
path: root/extra
Commit message (Expand)AuthorAgeFilesLines
* fix some more isiarticles (with :80 in URL)Bryan Newbold2022-04-202-0/+18
* bulk edits: docs on initial dataset/fileset ingestBryan Newbold2022-04-201-0/+22
* cleanups: isiarticlesBryan Newbold2022-04-203-0/+49
* stats: just as unpaywall bulk ingest startingBryan Newbold2022-04-191-0/+1
* dump/export helper MakefileBryan Newbold2022-04-181-0/+93
* container status: add simple prod single-command scriptBryan Newbold2022-04-081-0/+20
* 2022-03-21 fatcat statsBryan Newbold2022-03-222-0/+48
* document recent bulk metadata edits/importsBryan Newbold2022-03-223-0/+62
* Merge branch 'bnewbold-container-web' into 'master'bnewbold2022-03-101-0/+6
|\
| * container ES schema: more aliasesBryan Newbold2022-02-091-0/+6
* | sql dumps: use 'custom' mode instead of 'tar'Bryan Newbold2022-02-231-1/+5
|/
* bulk cleanups: NCI chem entries; IRs with container_id; PLOS non-articlesBryan Newbold2022-02-094-0/+330
* bulk metadata edit logBryan Newbold2022-02-043-0/+223
* commit updated statsBryan Newbold2022-01-262-0/+47
* docker focal: update base image for focal/py38Bryan Newbold2022-01-261-36/+11
* container counts update process READMEBryan Newbold2022-01-211-0/+41
* update statsBryan Newbold2022-01-123-0/+49
* ES: update README for v05-era indicesBryan Newbold2022-01-121-15/+15
* ES schema: fix typo in container issns aliasBryan Newbold2022-01-121-1/+1
* another file_meta updateBryan Newbold2021-12-061-0/+60
* ES container schema: add 'sim_pubid' and `ia_sim_collection` fieldsBryan Newbold2021-12-031-0/+2
* SQL snashots/exports: updated prod commandsBryan Newbold2021-12-031-13/+15
* file_meta cleanup updateBryan Newbold2021-12-011-0/+75
* initial 'far-future' release date updatesBryan Newbold2021-11-301-0/+212
* chocula update notesBryan Newbold2021-11-301-0/+61
* container ISSN-L dedupe notesBryan Newbold2021-11-301-0/+198
* add stats (before re-indexing), and rename old files for consistencyBryan Newbold2021-11-306-0/+47
* cleanups: springer 'page-one' sample PDFsBryan Newbold2021-11-292-0/+129
* cleanups: truncated wayback PDFs from common crawlBryan Newbold2021-11-292-0/+292
* update to truncated wayback timestamp issueBryan Newbold2021-11-291-0/+24
* update to file short wayback timestamp cleanupBryan Newbold2021-11-292-1/+30
* commit old 2021-11-11 stats fileBryan Newbold2021-11-291-0/+1
* clean up extra/ folder a bitBryan Newbold2021-11-2911-24/+0
* move notes/bulk_edits/ to extra/bulk_edits/Bryan Newbold2021-11-2923-0/+1743
* move 'cleanups' directory from notes to extra/Bryan Newbold2021-11-2911-0/+1306
* codespell fixes to various other docsBryan Newbold2021-11-243-4/+4
* content_scope: include in file ES schema and transformBryan Newbold2021-11-171-0/+1
* ISSN-L dupes check: output all matchesBryan Newbold2021-11-171-1/+1
* sitemap generation improvementsBryan Newbold2021-11-102-1/+2
* elasticsearch schema changesBryan Newbold2021-10-132-3/+13
* update statsBryan Newbold2021-10-113-0/+48
* sql_dumps: set collection at upload timeBryan Newbold2021-09-021-2/+5
* prod stats snapshotBryan Newbold2021-08-064-0/+47
* stats snapshot (2021-06-23)Bryan Newbold2021-06-232-0/+47
* SQL dumps: more pigz (vs. gzip) for speedBryan Newbold2021-06-171-2/+2
* fatcat_ref ES schema: more doc_values; source_year not source_release_yearBryan Newbold2021-06-171-5/+2
* update dblp pre-import notes and pipenv python version (3.8)Bryan Newbold2021-06-032-6/+11
* elasticsearch ref schema: 6 shards, not 12Bryan Newbold2021-05-181-1/+1
* fix 'colected' typosBryan Newbold2021-04-131-1/+1
* update elasticsearch bootstrap indexing notesBryan Newbold2021-04-091-8/+16