Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | release schema: do doc_value on DOIs | Bryan Newbold | 2020-02-13 | 1 | -1/+1 |
| | | | | | | Because DOIs are pseudo-structured (prefix, and often structure within the publisher-controlled area), I suspect we will in fact be wanting to do analytics over these strings. | ||||
* | ES release: actually do want doc_values for work_id | Bryan Newbold | 2020-02-05 | 1 | -1/+1 |
| | | | | Eg, for fast "unique count" | ||||
* | fix axiv/arxiv typo in release schema | Bryan Newbold | 2020-02-04 | 1 | -1/+1 |
| | |||||
* | ES release schema: fix typo | Bryan Newbold | 2020-01-31 | 1 | -1/+1 |
| | |||||
* | fix json typos in changelog schema | Bryan Newbold | 2020-01-30 | 1 | -2/+2 |
| | |||||
* | add upper-case work-around from kibana map join | Bryan Newbold | 2020-01-30 | 1 | -0/+1 |
| | |||||
* | JSON typo in release mapping | Bryan Newbold | 2020-01-30 | 1 | -1/+0 |
| | |||||
* | ES schemas: make keywords case-insensitive by default | Bryan Newbold | 2020-01-30 | 4 | -66/+115 |
| | | | | But not applying asciifolding; don't see any need to do so? | ||||
* | tweak file ES archive.org domain tracking | Bryan Newbold | 2020-01-30 | 1 | -0/+1 |
| | |||||
* | elastic schema fixes | Bryan Newbold | 2020-01-29 | 2 | -7/+7 |
| | |||||
* | add country to v03b release schema | Bryan Newbold | 2020-01-29 | 1 | -0/+1 |
| | |||||
* | update ES docs and proposal | Bryan Newbold | 2020-01-29 | 1 | -0/+2 |
| | |||||
* | actually implement changelog transform | Bryan Newbold | 2020-01-29 | 1 | -1/+10 |
| | |||||
* | ES release schema updates | Bryan Newbold | 2020-01-29 | 1 | -23/+46 |
| | |||||
* | container ES schema changes | Bryan Newbold | 2020-01-29 | 1 | -13/+20 |
| | |||||
* | first implementation of ES file schema | Bryan Newbold | 2020-01-29 | 1 | -0/+46 |
| | | | | | Includes a trivial test and transform, but not any workers or doc updates. | ||||
* | stats: remove internal PG table sizes from old dumps | Bryan Newbold | 2020-01-19 | 2 | -292/+0 |
| | | | | For ease of reading and comparison | ||||
* | update stats and table sizes | Bryan Newbold | 2020-01-19 | 4 | -0/+96 |
| | |||||
* | sql table size script: shorter output | Bryan Newbold | 2020-01-15 | 1 | -0/+1 |
| | | | | This skips postgres-internal tables in size output | ||||
* | 2019-01-07 status update | Bryan Newbold | 2020-01-07 | 2 | -0/+36 |
| | |||||
* | DB loads take a long time now | Bryan Newbold | 2019-12-21 | 1 | -1/+1 |
| | |||||
* | add 2019-12-20 stats | Bryan Newbold | 2019-12-20 | 2 | -0/+148 |
| | |||||
* | add kafka-pixy to docker-compose file | Bryan Newbold | 2019-12-10 | 1 | -0/+8 |
| | |||||
* | tweaks to docker-compose image | Bryan Newbold | 2019-12-10 | 1 | -0/+5 |
| | | | | | - don't start kafka image until zookeeper is running - set very liberal "watermarks" for elasticsearch disk monitoring | ||||
* | increase max.message.bytes in container | Martin Czygan | 2019-12-05 | 1 | -0/+1 |
| | | | | | While working on datacite, some message were larger than the default of 1000012 bytes. | ||||
* | export raw affiliation strings for analysis | Bryan Newbold | 2019-10-03 | 1 | -0/+17 |
| | |||||
* | docker-compose: kafka 2.0, and -dev topic names | Bryan Newbold | 2019-09-20 | 1 | -3/+2 |
| | |||||
* | document release publish processv0.3.1 | Bryan Newbold | 2019-09-18 | 1 | -0/+48 |
| | |||||
* | create new collection just for fatcat exports | Bryan Newbold | 2019-09-09 | 1 | -1/+1 |
| | |||||
* | update more rust library name refs | Bryan Newbold | 2019-09-05 | 1 | -4/+4 |
| | |||||
* | update all other mentions of python client lib | Bryan Newbold | 2019-09-05 | 3 | -9/+9 |
| | |||||
* | sql_dumps: typo | Bryan Newbold | 2019-07-14 | 1 | -1/+1 |
| | |||||
* | more fixup notes (from QA server) | Bryan Newbold | 2019-06-27 | 1 | -5/+46 |
| | |||||
* | finish fixup_longtail_issnl_unique; but not going to run it | Bryan Newbold | 2019-06-27 | 1 | -4/+3 |
| | |||||
* | initial work on longtail_issnl_unique.py | Bryan Newbold | 2019-06-24 | 1 | -0/+192 |
| | |||||
* | stats.json update after releases v03 cut-over | Bryan Newbold | 2019-06-06 | 1 | -0/+1 |
| | |||||
* | elasticsearch index alias howto | Bryan Newbold | 2019-06-06 | 1 | -1/+16 |
| | |||||
* | QA checks (for hash, extid duplication) | Bryan Newbold | 2019-06-04 | 4 | -0/+82 |
| | |||||
* | recent prod table sizes; 380 GBytes or so total | Bryan Newbold | 2019-06-04 | 1 | -0/+233 |
| | |||||
* | dump_release_extid.sql changes for new schema | Bryan Newbold | 2019-06-03 | 1 | -1/+1 |
| | |||||
* | move export README info to sql_dumps doc | Bryan Newbold | 2019-06-03 | 1 | -1/+29 |
| | |||||
* | fix parse_merge_metadata.py merge_spans() | Bryan Newbold | 2019-05-30 | 1 | -4/+8 |
| | |||||
* | better KBART merging | Bryan Newbold | 2019-05-30 | 1 | -4/+5 |
| | |||||
* | initial code to handle multiple KBART spans better | Bryan Newbold | 2019-05-30 | 1 | -2/+64 |
| | |||||
* | add work-in-progress elastic index notes | Bryan Newbold | 2019-05-30 | 1 | -0/+11 |
| | |||||
* | add 'superceded' release extra flag to elastic schema | Bryan Newbold | 2019-05-23 | 1 | -0/+1 |
| | |||||
* | also track work_id in release elasticsearch table | Bryan Newbold | 2019-05-22 | 1 | -0/+1 |
| | |||||
* | count linked refs (not just raw refs) in elasticsearch | Bryan Newbold | 2019-05-22 | 1 | -0/+1 |
| | |||||
* | commit SQL table stats scripts | Bryan Newbold | 2019-05-21 | 2 | -0/+36 |
| | |||||
* | include creator_ids in release elastic schema | Bryan Newbold | 2019-05-20 | 1 | -0/+1 |
| | | | | Intent is to allow fast creator search/lookup |