Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | cleanups: springer 'page-one' sample PDFs | Bryan Newbold | 2021-11-29 | 2 | -0/+129 |
| | |||||
* | cleanups: truncated wayback PDFs from common crawl | Bryan Newbold | 2021-11-29 | 2 | -0/+292 |
| | |||||
* | update to truncated wayback timestamp issue | Bryan Newbold | 2021-11-29 | 1 | -0/+24 |
| | |||||
* | update to file short wayback timestamp cleanup | Bryan Newbold | 2021-11-29 | 2 | -1/+30 |
| | |||||
* | commit old 2021-11-11 stats file | Bryan Newbold | 2021-11-29 | 1 | -0/+1 |
| | |||||
* | clean up extra/ folder a bit | Bryan Newbold | 2021-11-29 | 11 | -24/+0 |
| | |||||
* | move notes/bulk_edits/ to extra/bulk_edits/ | Bryan Newbold | 2021-11-29 | 23 | -0/+1743 |
| | |||||
* | move 'cleanups' directory from notes to extra/ | Bryan Newbold | 2021-11-29 | 11 | -0/+1306 |
| | |||||
* | codespell fixes to various other docs | Bryan Newbold | 2021-11-24 | 3 | -4/+4 |
| | |||||
* | content_scope: include in file ES schema and transform | Bryan Newbold | 2021-11-17 | 1 | -0/+1 |
| | |||||
* | ISSN-L dupes check: output all matches | Bryan Newbold | 2021-11-17 | 1 | -1/+1 |
| | |||||
* | sitemap generation improvements | Bryan Newbold | 2021-11-10 | 2 | -1/+2 |
| | |||||
* | elasticsearch schema changes | Bryan Newbold | 2021-10-13 | 2 | -3/+13 |
| | |||||
* | update stats | Bryan Newbold | 2021-10-11 | 3 | -0/+48 |
| | |||||
* | sql_dumps: set collection at upload time | Bryan Newbold | 2021-09-02 | 1 | -2/+5 |
| | |||||
* | prod stats snapshot | Bryan Newbold | 2021-08-06 | 4 | -0/+47 |
| | |||||
* | stats snapshot (2021-06-23) | Bryan Newbold | 2021-06-23 | 2 | -0/+47 |
| | |||||
* | SQL dumps: more pigz (vs. gzip) for speed | Bryan Newbold | 2021-06-17 | 1 | -2/+2 |
| | |||||
* | fatcat_ref ES schema: more doc_values; source_year not source_release_year | Bryan Newbold | 2021-06-17 | 1 | -5/+2 |
| | |||||
* | update dblp pre-import notes and pipenv python version (3.8) | Bryan Newbold | 2021-06-03 | 2 | -6/+11 |
| | |||||
* | elasticsearch ref schema: 6 shards, not 12 | Bryan Newbold | 2021-05-18 | 1 | -1/+1 |
| | |||||
* | fix 'colected' typos | Bryan Newbold | 2021-04-13 | 1 | -1/+1 |
| | | | | Thanks for the catch martin | ||||
* | update elasticsearch bootstrap indexing notes | Bryan Newbold | 2021-04-09 | 1 | -8/+16 |
| | |||||
* | ES: rename fatcat_ref.json to ref_schema.json for consistency; add to README | Bryan Newbold | 2021-04-08 | 2 | -1/+4 |
| | |||||
* | release ES schema: fix typo with shard/replica configuration | Bryan Newbold | 2021-04-08 | 1 | -1/+1 |
| | |||||
* | sitemaps: filter to releases with PDF fulltext (for now) | Bryan Newbold | 2021-04-07 | 1 | -0/+2 |
| | |||||
* | container search schema: preservation stats, new fields | Bryan Newbold | 2021-04-06 | 1 | -8/+9 |
| | | | | Includes transform code updates and partial test coverage. | ||||
* | release ES: add discipline field | Bryan Newbold | 2021-04-06 | 1 | -0/+1 |
| | |||||
* | ES schemas: add doc_index_ts to all mappings | Bryan Newbold | 2021-04-06 | 5 | -0/+9 |
| | |||||
* | elasticsearch schema, docs, docker: update from ES 6.x to ES 7.x | Bryan Newbold | 2021-04-06 | 7 | -125/+24 |
| | | | | | Including removing index document names (use '_doc' instead during transition) | ||||
* | add es draft schema for references | Martin Czygan | 2021-03-30 | 1 | -0/+106 |
| | |||||
* | SQL dump timing note | Bryan Newbold | 2021-03-10 | 1 | -0/+3 |
| | |||||
* | sql dump recent timing note | Bryan Newbold | 2021-03-08 | 1 | -1/+2 |
| | |||||
* | elasticsearch: simple new dblp and doaj fields | Bryan Newbold | 2021-01-20 | 1 | -0/+3 |
| | |||||
* | Merge branch 'bnewbold-ci-cleanups' into 'master' | bnewbold | 2021-01-05 | 1 | -5/+11 |
|\ | | | | | | | | | Gitlab CI and docker base image cleanups See merge request webgroup/fatcat!94 | ||||
| * | docker xenial: use get-pipenv.py to install pipenv et al | Bryan Newbold | 2020-12-22 | 1 | -5/+6 |
| | | |||||
| * | docker xenial: switch to rust 1.43.0 | Bryan Newbold | 2020-12-22 | 1 | -1/+1 |
| | | |||||
| * | docker xenial: include python3.8 | Bryan Newbold | 2020-12-22 | 1 | -1/+6 |
| | | |||||
* | | update stats (post DOAJ and dblp imports) | Bryan Newbold | 2020-12-29 | 2 | -0/+47 |
| | | |||||
* | | DOAJ import notes, and SQL/stats update | Bryan Newbold | 2020-12-23 | 4 | -0/+94 |
|/ | |||||
* | dblp: polish HTML scrape/extract pipeline | Bryan Newbold | 2020-12-17 | 3 | -3/+16 |
| | |||||
* | dblp: script and notes on container metadata generation | Bryan Newbold | 2020-12-17 | 4 | -0/+134 |
| | |||||
* | Merge pull request #65 from ibnesayeed/patch-1 | bnewbold | 2020-12-17 | 1 | -1/+1 |
|\ | | | | | Improve status counting efficiency | ||||
| * | Improve status counting efficiency | Sawood Alam | 2020-12-17 | 1 | -1/+1 |
| | | | | | | When the input is large with a small number of unique items to be counted then counting as we go would be linear and more efficient approach than sorting and unique counting. | ||||
* | | Revert "docker xenial base image: include python3.8" | Bryan Newbold | 2020-12-11 | 1 | -6/+1 |
| | | | | | | | | This reverts commit 91628426678a635f26cf992dbd5caedb4a3ae24b. | ||||
* | | docker xenial base image: include python3.8 | Bryan Newbold | 2020-12-11 | 1 | -1/+6 |
| | | |||||
* | | docker: how to push to dockerhub | Bryan Newbold | 2020-12-11 | 1 | -0/+4 |
|/ | |||||
* | update database/table stats | Bryan Newbold | 2020-10-12 | 2 | -0/+48 |
| | |||||
* | update stats snapshot | Bryan Newbold | 2020-09-03 | 2 | -0/+47 |
| | |||||
* | sitemap fixes from testing | Bryan Newbold | 2020-08-19 | 3 | -4/+15 |
| |