blob: 86ee167896d29163da680f91bf02accbff21e927 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
Want to receive a continual stream of updates from both fatcat and SIM
scanning; index the updated content; and push into elasticsearch.
## Message Types
Scholar Update Request JSON
- `key`: str
- `type`: str
- `fatcat_work`
- `sim_issue`
- `updated`: datetime, UTC, of event resulting in this request
- `work_ident`: str (works)
- `fatcat_changelog`: int (works)
- `sim_item`: str (items)
"Heavy Intermediate" JSON (existing schema)
- key
- `fetched`: Optional[datetime], UTC, when this doc was collected
Scholar Fulltext ES JSON (existing schema)
## Kafka Topics
fatcat-ENV.work-ident-updates
6x, long retention, key compaction
key: doc ident
scholar-ENV.sim-updates
6x, long retention, key compaction
key: doc ident
scholar-ENV.update-docs
12x, short retention (2 months?)
key: doc ident
## Workers
scholar-fetch-docs-worker
consumes fatcat and/or sim update requests, individually
constructs heavy intermediate
publishes to update-docs topic
scholar-index-docs-worker
consumes updated "heavy intermediate" documents, in batches
transforms to elasticsearch schema
updates elasticsearch
|