index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
python
/
tests
/
test_ingest.py
Commit message (
Expand
)
Author
Age
Files
Lines
*
make fmt (black 21.9b0)
Bryan Newbold
2021-10-27
1
-126
/
+147
*
more progress on type annotations and linting
Bryan Newbold
2021-10-26
1
-1
/
+1
*
start handling trivial lint cleanups: unused imports, 'is None', etc
Bryan Newbold
2021-10-26
1
-5
/
+5
*
make fmt
Bryan Newbold
2021-10-26
1
-55
/
+64
*
python: isort all imports
Bryan Newbold
2021-10-26
1
-3
/
+4
*
refactor and expand wall/block/cookie URL patterns
Bryan Newbold
2021-09-03
1
-0
/
+14
*
check for simple URL patterns that are usually paywalls or loginwalls
Bryan Newbold
2020-08-11
1
-0
/
+18
*
pdfextract support in ingest worker
Bryan Newbold
2020-06-25
1
-0
/
+7
*
ingest: add URL blocklist feature
Bryan Newbold
2020-01-17
1
-0
/
+17
*
clarify ingest result schema and semantics
Bryan Newbold
2020-01-15
1
-3
/
+19
*
add postgrest checks to test mocks
Bryan Newbold
2020-01-14
1
-1
/
+9
*
tests: don't use localhost as a responses mock host
Bryan Newbold
2020-01-14
1
-2
/
+2
*
refactor ingest to a loop, allowing multiple hops
Bryan Newbold
2020-01-09
1
-2
/
+9
*
add ingest test file
Bryan Newbold
2020-01-09
1
-0
/
+120