aboutsummaryrefslogtreecommitdiffstats
path: root/extra
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2022-04-20 16:06:05 -0700
committerBryan Newbold <bnewbold@robocracy.org>2022-04-20 16:06:05 -0700
commit2280af9a1d0849c41950b44df18fe76e3b7c52c8 (patch)
tree5e6831d3c4d8225f93b22aac52095c529561d8e4 /extra
parent3a8dada3267c56fd62b84201b4af96889e4103e6 (diff)
downloadfatcat-2280af9a1d0849c41950b44df18fe76e3b7c52c8.tar.gz
fatcat-2280af9a1d0849c41950b44df18fe76e3b7c52c8.zip
bulk edits: docs on initial dataset/fileset ingest
Diffstat (limited to 'extra')
-rw-r--r--extra/bulk_edits/2022-04-07_initial_datasets.md22
1 files changed, 22 insertions, 0 deletions
diff --git a/extra/bulk_edits/2022-04-07_initial_datasets.md b/extra/bulk_edits/2022-04-07_initial_datasets.md
new file mode 100644
index 00000000..90827a38
--- /dev/null
+++ b/extra/bulk_edits/2022-04-07_initial_datasets.md
@@ -0,0 +1,22 @@
+
+Importing fileset and file entities from initial sandcrawler ingests.
+
+Git commit: `ede98644a89afd15d903061e0998dbd08851df6d`
+
+Filesets:
+
+ export FATCAT_AUTH_SANDCRAWLER=[...]
+ cat /tmp/ingest_dataset_combined_results.2022-04-04.partial.json \
+ | ./fatcat_import.py ingest-fileset-results -
+ # editgroup_5l47i7bscvfmpf4ddytauoekea
+ # Counter({'total': 195, 'skip': 176, 'skip-hit': 160, 'insert': 19, 'skip-single-file': 14, 'skip-partial-file-info': 2, 'update': 0, 'exists': 0})
+
+ cat /srv/fatcat/datasets/ingest_dataset_combined_results.2022-04-04.partial.json \
+ | ./fatcat_import.py ingest-fileset-file-results -
+ # editgroup_i2k2ucon7nap3gui3z7amuiug4
+ # Counter({'total': 195, 'skip': 184, 'skip-hit': 160, 'skip-status': 24, 'insert': 11, 'update': 0, 'exists': 0})
+
+Tried running again, to ensure that there are not duplicate inserts, and that
+worked ('exists' instead of 'insert' counts).
+
+Finally!