Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | export raw affiliation strings for analysis | Bryan Newbold | 2019-10-03 | 1 | -0/+17 |
| | |||||
* | docker-compose: kafka 2.0, and -dev topic names | Bryan Newbold | 2019-09-20 | 1 | -3/+2 |
| | |||||
* | document release publish processv0.3.1 | Bryan Newbold | 2019-09-18 | 1 | -0/+48 |
| | |||||
* | create new collection just for fatcat exports | Bryan Newbold | 2019-09-09 | 1 | -1/+1 |
| | |||||
* | update more rust library name refs | Bryan Newbold | 2019-09-05 | 1 | -4/+4 |
| | |||||
* | update all other mentions of python client lib | Bryan Newbold | 2019-09-05 | 3 | -9/+9 |
| | |||||
* | sql_dumps: typo | Bryan Newbold | 2019-07-14 | 1 | -1/+1 |
| | |||||
* | more fixup notes (from QA server) | Bryan Newbold | 2019-06-27 | 1 | -5/+46 |
| | |||||
* | finish fixup_longtail_issnl_unique; but not going to run it | Bryan Newbold | 2019-06-27 | 1 | -4/+3 |
| | |||||
* | initial work on longtail_issnl_unique.py | Bryan Newbold | 2019-06-24 | 1 | -0/+192 |
| | |||||
* | stats.json update after releases v03 cut-over | Bryan Newbold | 2019-06-06 | 1 | -0/+1 |
| | |||||
* | elasticsearch index alias howto | Bryan Newbold | 2019-06-06 | 1 | -1/+16 |
| | |||||
* | QA checks (for hash, extid duplication) | Bryan Newbold | 2019-06-04 | 4 | -0/+82 |
| | |||||
* | recent prod table sizes; 380 GBytes or so total | Bryan Newbold | 2019-06-04 | 1 | -0/+233 |
| | |||||
* | dump_release_extid.sql changes for new schema | Bryan Newbold | 2019-06-03 | 1 | -1/+1 |
| | |||||
* | move export README info to sql_dumps doc | Bryan Newbold | 2019-06-03 | 1 | -1/+29 |
| | |||||
* | fix parse_merge_metadata.py merge_spans() | Bryan Newbold | 2019-05-30 | 1 | -4/+8 |
| | |||||
* | better KBART merging | Bryan Newbold | 2019-05-30 | 1 | -4/+5 |
| | |||||
* | initial code to handle multiple KBART spans better | Bryan Newbold | 2019-05-30 | 1 | -2/+64 |
| | |||||
* | add work-in-progress elastic index notes | Bryan Newbold | 2019-05-30 | 1 | -0/+11 |
| | |||||
* | add 'superceded' release extra flag to elastic schema | Bryan Newbold | 2019-05-23 | 1 | -0/+1 |
| | |||||
* | also track work_id in release elasticsearch table | Bryan Newbold | 2019-05-22 | 1 | -0/+1 |
| | |||||
* | count linked refs (not just raw refs) in elasticsearch | Bryan Newbold | 2019-05-22 | 1 | -0/+1 |
| | |||||
* | commit SQL table stats scripts | Bryan Newbold | 2019-05-21 | 2 | -0/+36 |
| | |||||
* | include creator_ids in release elastic schema | Bryan Newbold | 2019-05-20 | 1 | -0/+1 |
| | | | | Intent is to allow fast creator search/lookup | ||||
* | elastic release schema update | Bryan Newbold | 2019-05-20 | 1 | -1/+6 |
| | |||||
* | start tracking stats | Bryan Newbold | 2019-05-07 | 2 | -0/+2 |
| | |||||
* | IA collection page embed example description | Bryan Newbold | 2019-05-07 | 1 | -0/+45 |
| | | | | This code has some issues, but is worth commiting | ||||
* | old fileset and webcapture example entities | Bryan Newbold | 2019-04-30 | 2 | -0/+146 |
| | |||||
* | no-derive metadata and SQL dump uploads (to petabox) | Bryan Newbold | 2019-04-30 | 1 | -0/+2 |
| | |||||
* | faster elasticsearch imports | Bryan Newbold | 2019-04-30 | 1 | -1/+1 |
| | |||||
* | more bots to bootstrap | Bryan Newbold | 2019-04-24 | 1 | -0/+15 |
| | |||||
* | update sql dump README | Bryan Newbold | 2019-04-24 | 1 | -9/+12 |
| | |||||
* | fix wild elastic schema typo | Bryan Newbold | 2019-04-12 | 1 | -1/+1 |
| | |||||
* | record webcaptures added as demos | Bryan Newbold | 2019-03-19 | 1 | -0/+45 |
| | |||||
* | new importer: wayback_static | Bryan Newbold | 2019-03-19 | 1 | -203/+0 |
| | |||||
* | update enrich examples demo script | Bryan Newbold | 2019-03-19 | 1 | -49/+63 |
| | |||||
* | initial wayback-to-webcapture helper | Bryan Newbold | 2019-03-19 | 1 | -0/+203 |
| | |||||
* | more integration of transform refactor | Bryan Newbold | 2019-03-11 | 1 | -2/+2 |
| | |||||
* | elastic schema indentation | Bryan Newbold | 2019-03-06 | 1 | -6/+6 |
| | |||||
* | gitignore SQL identifier dumps | Bryan Newbold | 2019-02-22 | 1 | -0/+1 |
| | |||||
* | include container_id in release ES schema | Bryan Newbold | 2019-02-22 | 1 | -0/+1 |
| | |||||
* | update ISSN-L file | Bryan Newbold | 2019-02-20 | 2 | -2/+6 |
| | |||||
* | robust-ify bootstrap bots script | Bryan Newbold | 2019-02-05 | 1 | -0/+7 |
| | |||||
* | start of README files for item uploads | Bryan Newbold | 2019-02-05 | 3 | -0/+26 |
| | |||||
* | use pigz over gzip in more places | Bryan Newbold | 2019-02-05 | 2 | -7/+15 |
| | |||||
* | update dump and sort commands | Bryan Newbold | 2019-02-01 | 2 | -7/+17 |
| | | | | | Pipeline sorts are *so* starved and slow ; they only get a few MByte of RAM by default! | ||||
* | update to newer ISSN-L mapping | Bryan Newbold | 2019-01-29 | 2 | -2/+2 |
| | |||||
* | helper to delete 'builtin' example entities | Bryan Newbold | 2019-01-29 | 1 | -0/+73 |
| | | | | Idea is to clear these before "real" metadata import. | ||||
* | minor typo in esbulk container import | Bryan Newbold | 2019-01-28 | 1 | -1/+1 |
| |