diff options
-rw-r--r-- | README.md | 19 | ||||
-rw-r--r-- | notes/Abbreviations.md (renamed from notes/abbrev.md) | 0 | ||||
-rw-r--r-- | notes/Todo.md (renamed from notes/plan.md) | 2 | ||||
-rw-r--r-- | projects/grobid_refs/.gitignore | 2 | ||||
-rw-r--r-- | projects/grobid_refs/README.md | 5 |
5 files changed, 26 insertions, 2 deletions
@@ -42,3 +42,22 @@ user 29177m5.516s sys 4927m3.277s ``` +## Data issues + +### A republised article + +There is "student BMJ" and "BMJ" - this (html) article (interview) has been +first published on "sbmj" (Published 07 July 2011), then "bmj" (Published 10 +August 2011). + +> Notes; Originally published as: Student BMJ 2011;19:d3983 + +* https://www.bmj.com/content/343/sbmj.d3983 +* https://www.bmj.com/content/343/bmj.d4964 + +It is essentially the same text, same title, author, just different DOI and +probably a different recorded date. + +Generic pattern "republication" duplicate: + +* metadata mostly same, except date and doi diff --git a/notes/abbrev.md b/notes/Abbreviations.md index 5106d5b..5106d5b 100644 --- a/notes/abbrev.md +++ b/notes/Abbreviations.md diff --git a/notes/plan.md b/notes/Todo.md index 94c1297..2c548b0 100644 --- a/notes/plan.md +++ b/notes/Todo.md @@ -1,4 +1,4 @@ -# Plan +# Todo ## Releases diff --git a/projects/grobid_refs/.gitignore b/projects/grobid_refs/.gitignore new file mode 100644 index 0000000..bd98a73 --- /dev/null +++ b/projects/grobid_refs/.gitignore @@ -0,0 +1,2 @@ +*.pdf + diff --git a/projects/grobid_refs/README.md b/projects/grobid_refs/README.md index 13ca3fc..15eaae0 100644 --- a/projects/grobid_refs/README.md +++ b/projects/grobid_refs/README.md @@ -2,5 +2,8 @@ References extracted from [grobid](https://grobid.readthedocs.io). -Example grobid output: [grobid.tei.xml](grobid.tei.xml). +Example grobid outputs: + +* [grobid.tei.xml](grobid.tei.xml), [pdf](http://dss.in.tum.de/files/brandt-research/me.pdf) -- here grobid does not extract many refs; GS looks ok +* [](), [pdf](https://ia803202.us.archive.org/21/items/jstor-1064270/1064270.pdf) |