aboutsummaryrefslogtreecommitdiffstats
path: root/extra
Commit message (Collapse)AuthorAgeFilesLines
* more chocula progressBryan Newbold2019-07-142-61/+183
|
* EZB and szczepanski indexersBryan Newbold2019-07-111-45/+146
|
* chocula early workBryan Newbold2019-07-104-0/+1009
| | | | (non-functional)
* more fixup notes (from QA server)Bryan Newbold2019-06-271-5/+46
|
* finish fixup_longtail_issnl_unique; but not going to run itBryan Newbold2019-06-271-4/+3
|
* initial work on longtail_issnl_unique.pyBryan Newbold2019-06-241-0/+192
|
* stats.json update after releases v03 cut-overBryan Newbold2019-06-061-0/+1
|
* elasticsearch index alias howtoBryan Newbold2019-06-061-1/+16
|
* QA checks (for hash, extid duplication)Bryan Newbold2019-06-044-0/+82
|
* recent prod table sizes; 380 GBytes or so totalBryan Newbold2019-06-041-0/+233
|
* dump_release_extid.sql changes for new schemaBryan Newbold2019-06-031-1/+1
|
* move export README info to sql_dumps docBryan Newbold2019-06-031-1/+29
|
* fix parse_merge_metadata.py merge_spans()Bryan Newbold2019-05-301-4/+8
|
* better KBART mergingBryan Newbold2019-05-301-4/+5
|
* initial code to handle multiple KBART spans betterBryan Newbold2019-05-301-2/+64
|
* add work-in-progress elastic index notesBryan Newbold2019-05-301-0/+11
|
* add 'superceded' release extra flag to elastic schemaBryan Newbold2019-05-231-0/+1
|
* also track work_id in release elasticsearch tableBryan Newbold2019-05-221-0/+1
|
* count linked refs (not just raw refs) in elasticsearchBryan Newbold2019-05-221-0/+1
|
* commit SQL table stats scriptsBryan Newbold2019-05-212-0/+36
|
* include creator_ids in release elastic schemaBryan Newbold2019-05-201-0/+1
| | | | Intent is to allow fast creator search/lookup
* elastic release schema updateBryan Newbold2019-05-201-1/+6
|
* start tracking statsBryan Newbold2019-05-072-0/+2
|
* IA collection page embed example descriptionBryan Newbold2019-05-071-0/+45
| | | | This code has some issues, but is worth commiting
* old fileset and webcapture example entitiesBryan Newbold2019-04-302-0/+146
|
* no-derive metadata and SQL dump uploads (to petabox)Bryan Newbold2019-04-301-0/+2
|
* faster elasticsearch importsBryan Newbold2019-04-301-1/+1
|
* more bots to bootstrapBryan Newbold2019-04-241-0/+15
|
* update sql dump READMEBryan Newbold2019-04-241-9/+12
|
* fix wild elastic schema typoBryan Newbold2019-04-121-1/+1
|
* record webcaptures added as demosBryan Newbold2019-03-191-0/+45
|
* new importer: wayback_staticBryan Newbold2019-03-191-203/+0
|
* update enrich examples demo scriptBryan Newbold2019-03-191-49/+63
|
* initial wayback-to-webcapture helperBryan Newbold2019-03-191-0/+203
|
* more integration of transform refactorBryan Newbold2019-03-111-2/+2
|
* elastic schema indentationBryan Newbold2019-03-061-6/+6
|
* gitignore SQL identifier dumpsBryan Newbold2019-02-221-0/+1
|
* include container_id in release ES schemaBryan Newbold2019-02-221-0/+1
|
* update ISSN-L fileBryan Newbold2019-02-202-2/+6
|
* robust-ify bootstrap bots scriptBryan Newbold2019-02-051-0/+7
|
* start of README files for item uploadsBryan Newbold2019-02-053-0/+26
|
* use pigz over gzip in more placesBryan Newbold2019-02-052-7/+15
|
* update dump and sort commandsBryan Newbold2019-02-012-7/+17
| | | | | Pipeline sorts are *so* starved and slow ; they only get a few MByte of RAM by default!
* update to newer ISSN-L mappingBryan Newbold2019-01-292-2/+2
|
* helper to delete 'builtin' example entitiesBryan Newbold2019-01-291-0/+73
| | | | Idea is to clear these before "real" metadata import.
* minor typo in esbulk container importBryan Newbold2019-01-281-1/+1
|
* more ES index name updatesBryan Newbold2019-01-281-2/+3
|
* add filesets and webcaptures to dumpsBryan Newbold2019-01-284-1/+33
|
* transform and import fixes/tweaksBryan Newbold2019-01-253-8/+122
|
* improved journal metadata mungerBryan Newbold2019-01-252-100/+325
|