| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | SQL dump timing note | Bryan Newbold | 2021-03-10 | 1 | -0/+3 | 
| | | |||||
| * | sql dump recent timing note | Bryan Newbold | 2021-03-08 | 1 | -1/+2 | 
| | | |||||
| * | elasticsearch: simple new dblp and doaj fields | Bryan Newbold | 2021-01-20 | 1 | -0/+3 | 
| | | |||||
| * | Merge branch 'bnewbold-ci-cleanups' into 'master' | bnewbold | 2021-01-05 | 1 | -5/+11 | 
| |\ | | | | | | | | | Gitlab CI and docker base image cleanups See merge request webgroup/fatcat!94 | ||||
| | * | docker xenial: use get-pipenv.py to install pipenv et al | Bryan Newbold | 2020-12-22 | 1 | -5/+6 | 
| | | | |||||
| | * | docker xenial: switch to rust 1.43.0 | Bryan Newbold | 2020-12-22 | 1 | -1/+1 | 
| | | | |||||
| | * | docker xenial: include python3.8 | Bryan Newbold | 2020-12-22 | 1 | -1/+6 | 
| | | | |||||
| * | | update stats (post DOAJ and dblp imports) | Bryan Newbold | 2020-12-29 | 2 | -0/+47 | 
| | | | |||||
| * | | DOAJ import notes, and SQL/stats update | Bryan Newbold | 2020-12-23 | 4 | -0/+94 | 
| |/ | |||||
| * | dblp: polish HTML scrape/extract pipeline | Bryan Newbold | 2020-12-17 | 3 | -3/+16 | 
| | | |||||
| * | dblp: script and notes on container metadata generation | Bryan Newbold | 2020-12-17 | 4 | -0/+134 | 
| | | |||||
| * | Merge pull request #65 from ibnesayeed/patch-1 | bnewbold | 2020-12-17 | 1 | -1/+1 | 
| |\ | | | | | Improve status counting efficiency | ||||
| | * | Improve status counting efficiency | Sawood Alam | 2020-12-17 | 1 | -1/+1 | 
| | | | | | | | When the input is large with a small number of unique items to be counted then counting as we go would be linear and more efficient approach than sorting and unique counting. | ||||
| * | | Revert "docker xenial base image: include python3.8" | Bryan Newbold | 2020-12-11 | 1 | -6/+1 | 
| | | | | | | | | | This reverts commit 91628426678a635f26cf992dbd5caedb4a3ae24b. | ||||
| * | | docker xenial base image: include python3.8 | Bryan Newbold | 2020-12-11 | 1 | -1/+6 | 
| | | | |||||
| * | | docker: how to push to dockerhub | Bryan Newbold | 2020-12-11 | 1 | -0/+4 | 
| |/ | |||||
| * | update database/table stats | Bryan Newbold | 2020-10-12 | 2 | -0/+48 | 
| | | |||||
| * | update stats snapshot | Bryan Newbold | 2020-09-03 | 2 | -0/+47 | 
| | | |||||
| * | sitemap fixes from testing | Bryan Newbold | 2020-08-19 | 3 | -4/+15 | 
| | | |||||
| * | iterate on sitemap generation | Bryan Newbold | 2020-08-19 | 6 | -7/+119 | 
| | | |||||
| * | initial sitemap.xml notes/template | Bryan Newbold | 2020-08-19 | 2 | -0/+29 | 
| | | |||||
| * | include releases_by_work in ident tarball | Bryan Newbold | 2020-08-04 | 1 | -1/+2 | 
| | | |||||
| * | update SQL dump docs with group-by-work command (by default) | Bryan Newbold | 2020-08-04 | 1 | -1/+1 | 
| | | |||||
| * | WIP: sorted release ident dumps | Bryan Newbold | 2020-08-04 | 1 | -0/+16 | 
| | | |||||
| * | update table/database size stats | Bryan Newbold | 2020-07-22 | 2 | -0/+48 | 
| | | |||||
| * | commit example of an elasticsearch SQL query | Bryan Newbold | 2020-07-01 | 1 | -0/+8 | 
| | | |||||
| * | commit old README about bulk downloads | Bryan Newbold | 2020-07-01 | 1 | -0/+40 | 
| | | |||||
| * | ES schema: add best_url to file schema | Bryan Newbold | 2020-06-04 | 1 | -0/+1 | 
| | | | | | | | | | | This will increase index size (URLs are often long in our corpus, and we have many file entities), but seems worth it. Initially added `ia_url` as a second field, guaranteed to always be an *.archive.org URL, but `best_url` defaults to that anyways so didn't seem worthwhile. | ||||
| * | sql: really don't double-dump requests | Bryan Newbold | 2020-05-26 | 1 | -1/+0 | 
| | | | | | | | I guess we were dumping 3 times originally; already had an earlier commit that removed one row from this README (that I copypaste to CLI every time) | ||||
| * | 2020-05-26 prod database size and stats | Bryan Newbold | 2020-05-26 | 2 | -0/+48 | 
| | | |||||
| * | update prod stats | Bryan Newbold | 2020-04-17 | 7 | -0/+149 | 
| | | |||||
| * | Add missing packages to Dockerfile and CI file | Bryan Newbold | 2020-04-16 | 1 | -1/+1 | 
| | | |||||
| * | test-base Dockerfile | Bryan Newbold | 2020-04-16 | 2 | -0/+51 | 
| | | | | | Used to create bnewbold/fatcat-test-base image | ||||
| * | update bulk export instructions | Bryan Newbold | 2020-04-07 | 1 | -4/+2 | 
| | | | | | | - don't do expanded and regular release dumps - default to sqldump_public for item name (as that is common-case) | ||||
| * | sql_dumps: stop doing redundant release dumps | Bryan Newbold | 2020-04-01 | 1 | -1/+3 | 
| | | |||||
| * | bulk exports README different from SQL README | Bryan Newbold | 2020-03-17 | 1 | -1/+1 | 
| | | |||||
| * | ES README: really need to limit to 1k esbulk batches | Bryan Newbold | 2020-02-26 | 1 | -3/+3 | 
| | | |||||
| * | Merge branch 'bnewbold-elastic-v03b' | Bryan Newbold | 2020-02-26 | 5 | -61/+203 | 
| |\ | |||||
| | * | update ES transform README | Bryan Newbold | 2020-02-26 | 1 | -2/+3 | 
| | | | | | | | | | | | - smaller batch sizes to prevent esbulk errors - file transform/index | ||||
| | * | ES container last tweaks | Bryan Newbold | 2020-02-26 | 1 | -3/+4 | 
| | | | |||||
| | * | ES release: last minor tweaks | Bryan Newbold | 2020-02-26 | 1 | -3/+5 | 
| | | | |||||
| | * | release schema: do doc_value on DOIs | Bryan Newbold | 2020-02-13 | 1 | -1/+1 | 
| | | | | | | | | | | | | | Because DOIs are pseudo-structured (prefix, and often structure within the publisher-controlled area), I suspect we will in fact be wanting to do analytics over these strings. | ||||
| | * | ES release: actually do want doc_values for work_id | Bryan Newbold | 2020-02-05 | 1 | -1/+1 | 
| | | | | | | | | | Eg, for fast "unique count" | ||||
| | * | fix axiv/arxiv typo in release schema | Bryan Newbold | 2020-02-04 | 1 | -1/+1 | 
| | | | |||||
| | * | ES release schema: fix typo | Bryan Newbold | 2020-01-31 | 1 | -1/+1 | 
| | | | |||||
| | * | fix json typos in changelog schema | Bryan Newbold | 2020-01-30 | 1 | -2/+2 | 
| | | | |||||
| | * | add upper-case work-around from kibana map join | Bryan Newbold | 2020-01-30 | 1 | -0/+1 | 
| | | | |||||
| | * | JSON typo in release mapping | Bryan Newbold | 2020-01-30 | 1 | -1/+0 | 
| | | | |||||
| | * | ES schemas: make keywords case-insensitive by default | Bryan Newbold | 2020-01-30 | 4 | -66/+115 | 
| | | | | | | | | | But not applying asciifolding; don't see any need to do so? | ||||
| | * | tweak file ES archive.org domain tracking | Bryan Newbold | 2020-01-30 | 1 | -0/+1 | 
| | | | |||||
