blob: 89cec83708d1e4c0751fd70de703eeb9cf35e7ef (
plain)
1
2
3
4
5
6
7
|
ingest crawler:
- SPNv2 only
- remove most SPNv1/v2 path selection
- landing page + fulltext hops only (short recursion depth)
- use wayback client library instead of requests to fetch content
|