fatcat - [no description]

	Commit message (Collapse)	Author	Age	Files	Lines
*	include releases_by_work in ident tarball	Bryan Newbold	2020-08-04	1	-1/+2
\|
*	update SQL dump docs with group-by-work command (by default)	Bryan Newbold	2020-08-04	1	-1/+1
\|
*	WIP: sorted release ident dumps	Bryan Newbold	2020-08-04	1	-0/+16
\|
*	update table/database size stats	Bryan Newbold	2020-07-22	2	-0/+48
\|
*	commit example of an elasticsearch SQL query	Bryan Newbold	2020-07-01	1	-0/+8
\|
*	commit old README about bulk downloads	Bryan Newbold	2020-07-01	1	-0/+40
\|
*	ES schema: add best_url to file schema	Bryan Newbold	2020-06-04	1	-0/+1
\| \| \| \| \| \| \| \| \|	This will increase index size (URLs are often long in our corpus, and we have many file entities), but seems worth it. Initially added `ia_url` as a second field, guaranteed to always be an *.archive.org URL, but `best_url` defaults to that anyways so didn't seem worthwhile.
*	sql: really don't double-dump requests	Bryan Newbold	2020-05-26	1	-1/+0
\| \| \| \| \| \|	I guess we were dumping 3 times originally; already had an earlier commit that removed one row from this README (that I copypaste to CLI every time)
*	2020-05-26 prod database size and stats	Bryan Newbold	2020-05-26	2	-0/+48
\|
*	update prod stats	Bryan Newbold	2020-04-17	7	-0/+149
\|
*	Add missing packages to Dockerfile and CI file	Bryan Newbold	2020-04-16	1	-1/+1
\|
*	test-base Dockerfile	Bryan Newbold	2020-04-16	2	-0/+51
\| \| \| \|	Used to create bnewbold/fatcat-test-base image
*	update bulk export instructions	Bryan Newbold	2020-04-07	1	-4/+2
\| \| \| \| \|	- don't do expanded and regular release dumps - default to sqldump_public for item name (as that is common-case)
*	sql_dumps: stop doing redundant release dumps	Bryan Newbold	2020-04-01	1	-1/+3
\|
*	bulk exports README different from SQL README	Bryan Newbold	2020-03-17	1	-1/+1
\|
*	ES README: really need to limit to 1k esbulk batches	Bryan Newbold	2020-02-26	1	-3/+3
\|
*	Merge branch 'bnewbold-elastic-v03b'	Bryan Newbold	2020-02-26	5	-61/+203
\|\
\| *	update ES transform README	Bryan Newbold	2020-02-26	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	- smaller batch sizes to prevent esbulk errors - file transform/index
\| *	ES container last tweaks	Bryan Newbold	2020-02-26	1	-3/+4
\| \|
\| *	ES release: last minor tweaks	Bryan Newbold	2020-02-26	1	-3/+5
\| \|
\| *	release schema: do doc_value on DOIs	Bryan Newbold	2020-02-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Because DOIs are pseudo-structured (prefix, and often structure within the publisher-controlled area), I suspect we will in fact be wanting to do analytics over these strings.
\| *	ES release: actually do want doc_values for work_id	Bryan Newbold	2020-02-05	1	-1/+1
\| \| \| \| \| \| \| \|	Eg, for fast "unique count"
\| *	fix axiv/arxiv typo in release schema	Bryan Newbold	2020-02-04	1	-1/+1
\| \|
\| *	ES release schema: fix typo	Bryan Newbold	2020-01-31	1	-1/+1
\| \|
\| *	fix json typos in changelog schema	Bryan Newbold	2020-01-30	1	-2/+2
\| \|
\| *	add upper-case work-around from kibana map join	Bryan Newbold	2020-01-30	1	-0/+1
\| \|
\| *	JSON typo in release mapping	Bryan Newbold	2020-01-30	1	-1/+0
\| \|
\| *	ES schemas: make keywords case-insensitive by default	Bryan Newbold	2020-01-30	4	-66/+115
\| \| \| \| \| \| \| \|	But not applying asciifolding; don't see any need to do so?
\| *	tweak file ES archive.org domain tracking	Bryan Newbold	2020-01-30	1	-0/+1
\| \|
\| *	elastic schema fixes	Bryan Newbold	2020-01-29	2	-7/+7
\| \|
\| *	add country to v03b release schema	Bryan Newbold	2020-01-29	1	-0/+1
\| \|
\| *	update ES docs and proposal	Bryan Newbold	2020-01-29	1	-0/+2
\| \|
\| *	actually implement changelog transform	Bryan Newbold	2020-01-29	1	-1/+10
\| \|
\| *	ES release schema updates	Bryan Newbold	2020-01-29	1	-23/+46
\| \|
\| *	container ES schema changes	Bryan Newbold	2020-01-29	1	-13/+20
\| \|
\| *	first implementation of ES file schema	Bryan Newbold	2020-01-29	1	-0/+46
\| \| \| \| \| \| \| \| \| \|	Includes a trivial test and transform, but not any workers or doc updates.
* \|	table size snapshots	Bryan Newbold	2020-02-19	2	-0/+47
\|/
*	stats: remove internal PG table sizes from old dumps	Bryan Newbold	2020-01-19	2	-292/+0
\| \| \| \|	For ease of reading and comparison
*	update stats and table sizes	Bryan Newbold	2020-01-19	4	-0/+96
\|
*	sql table size script: shorter output	Bryan Newbold	2020-01-15	1	-0/+1
\| \| \| \|	This skips postgres-internal tables in size output
*	2019-01-07 status update	Bryan Newbold	2020-01-07	2	-0/+36
\|
*	DB loads take a long time now	Bryan Newbold	2019-12-21	1	-1/+1
\|
*	add 2019-12-20 stats	Bryan Newbold	2019-12-20	2	-0/+148
\|
*	add kafka-pixy to docker-compose file	Bryan Newbold	2019-12-10	1	-0/+8
\|
*	tweaks to docker-compose image	Bryan Newbold	2019-12-10	1	-0/+5
\| \| \| \| \|	- don't start kafka image until zookeeper is running - set very liberal "watermarks" for elasticsearch disk monitoring
*	increase max.message.bytes in container	Martin Czygan	2019-12-05	1	-0/+1
\| \| \| \| \|	While working on datacite, some message were larger than the default of 1000012 bytes.
*	export raw affiliation strings for analysis	Bryan Newbold	2019-10-03	1	-0/+17
\|
*	docker-compose: kafka 2.0, and -dev topic names	Bryan Newbold	2019-09-20	1	-3/+2
\|
*	document release publish processv0.3.1	Bryan Newbold	2019-09-18	1	-0/+48
\|
*	create new collection just for fatcat exports	Bryan Newbold	2019-09-09	1	-1/+1
\|