index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Commit message (
Expand
)
Author
Age
Files
Lines
...
*
initial work on kafka_grobid worker
Bryan Newbold
2018-11-20
3
-0
/
+314
*
kafka notes
Bryan Newbold
2018-11-20
2
-0
/
+110
*
fix typos in DumpGrobidXmlJob
Bryan Newbold
2018-10-30
1
-2
/
+2
*
one more lint ignore
Bryan Newbold
2018-10-30
1
-1
/
+1
*
squelch some more lint warnings
Bryan Newbold
2018-10-30
1
-1
/
+1
*
several bugs and lint issues in import_grobid_metadata
Bryan Newbold
2018-10-30
1
-9
/
+10
*
please support DumpGrobidXmlJob
Bryan Newbold
2018-10-30
1
-0
/
+24
*
quick and dirty GROBID XML dumper
Bryan Newbold
2018-10-30
1
-0
/
+41
*
some progress on a crude grobid metadata filter
Bryan Newbold
2018-09-26
2
-7
/
+151
*
longtail grobid metadata parse/filter WIP
Bryan Newbold
2018-09-22
3
-0
/
+114
*
please support for DumpGrobidMetaInsertableJob
Bryan Newbold
2018-09-22
1
-0
/
+24
*
new DumpGrobidMetaInsertableJob
Bryan Newbold
2018-09-22
1
-0
/
+38
*
match_filter_enrich: fix typo
Bryan Newbold
2018-09-22
1
-1
/
+1
*
fix sha1/doi_list confusion in filter_scored_matches
Bryan Newbold
2018-09-22
1
-2
/
+2
*
pylint can be insufferable
Bryan Newbold
2018-09-20
1
-1
/
+1
*
gitignore in python dir
Bryan Newbold
2018-09-18
1
-0
/
+3
*
pass more pylint
Bryan Newbold
2018-09-18
3
-24
/
+23
*
fix typo in python/README
Bryan Newbold
2018-09-17
1
-1
/
+1
*
more robust extraction code (against petabox failures)
Bryan Newbold
2018-09-17
2
-2
/
+20
*
filter_scored_matches: fix tests
Bryan Newbold
2018-09-17
1
-2
/
+7
*
match and enrich notes+script
Bryan Newbold
2018-09-14
2
-0
/
+64
*
add manifest sqlite3 -> JSON converter
Bryan Newbold
2018-09-14
1
-0
/
+58
*
filter_scored_matches.py
Bryan Newbold
2018-09-13
1
-0
/
+110
*
dumpfilemeta support in please
Bryan Newbold
2018-09-13
1
-0
/
+24
*
new simple file metadata dump script
Bryan Newbold
2018-09-13
1
-0
/
+36
*
TODO updates
Bryan Newbold
2018-09-12
1
-0
/
+11
*
insertable flag for match-crossref
Bryan Newbold
2018-09-12
1
-1
/
+9
*
hack scorejob variant with extra context joined in
Bryan Newbold
2018-09-12
2
-0
/
+348
*
blacklist -> denylist
Bryan Newbold
2018-09-05
5
-12
/
+12
*
Merge branch 'ellen-none-refactor' into 'master'
bnewbold
2018-09-05
11
-154
/
+179
|
\
|
*
Merge branch 'ellen-none-refactor' of git.archive.org:webgroup/sandcrawler in...
Ellen Spertus
2018-09-04
0
-0
/
+0
|
|
\
|
|
*
changed style of ScoreJobTest.bundle
Ellen Spertus
2018-09-04
1
-14
/
+10
|
|
*
minor style improvement
Ellen Spertus
2018-09-04
1
-2
/
+2
|
|
*
restored code I inadvertantly removed when merging
Ellen Spertus
2018-08-28
1
-1
/
+4
|
|
*
fixed scalastyle issues, including cyclomatic complexity
Ellen Spertus
2018-08-28
2
-48
/
+62
|
|
*
fixed tests after replacing NoSlug with None
Ellen Spertus
2018-08-28
4
-77
/
+85
|
|
*
replaced NoSlug with proper use of Option
Ellen Spertus
2018-08-28
7
-37
/
+41
|
*
|
changed style of ScoreJobTest.bundle
Ellen Spertus
2018-09-04
1
-14
/
+10
|
*
|
minor style improvement
Ellen Spertus
2018-09-04
1
-2
/
+2
|
*
|
restored code I inadvertantly removed when merging
Ellen Spertus
2018-09-04
1
-1
/
+4
|
*
|
fixed scalastyle issues, including cyclomatic complexity
Ellen Spertus
2018-09-04
2
-48
/
+62
|
*
|
fixed tests after replacing NoSlug with None
Ellen Spertus
2018-09-04
4
-77
/
+85
|
*
|
replaced NoSlug with proper use of Option
Ellen Spertus
2018-09-04
7
-37
/
+41
|
/
/
*
|
warning in python/README
Bryan Newbold
2018-09-04
1
-0
/
+22
*
|
match crossref reducers=200
Bryan Newbold
2018-08-31
1
-1
/
+1
|
/
*
make similarity score case-insensitive
Bryan Newbold
2018-08-27
2
-1
/
+9
*
basic crossref subtitle concatination support
Bryan Newbold
2018-08-27
2
-1
/
+40
*
more special characters to strip
Bryan Newbold
2018-08-27
2
-2
/
+2
*
add even more entries to slug blacklist
Bryan Newbold
2018-08-27
1
-0
/
+96
*
update python TODO
Bryan Newbold
2018-08-27
1
-4
/
+1
[prev]
[next]