re-sort README a bit

author: Bryan Newbold <bnewbold@archive.org> 2021-10-22 18:45:52 -0700
committer: Bryan Newbold <bnewbold@archive.org> 2021-10-22 18:45:52 -0700
commit: 1c4d9e2595f4bdd1ebbd00f9d908772757fd0663 (patch)
tree: 47579ad2089080a0958f86512c77ad043283a1f5 /README.md
parent: 1f7431e4d1430b215b1bcad7af7d432c35dd129f (diff)
download: grobid_tei_xml-1c4d9e2595f4bdd1ebbd00f9d908772757fd0663.tar.gz
grobid_tei_xml-1c4d9e2595f4bdd1ebbd00f9d908772757fd0663.zip
1 files changed, 19 insertions, 5 deletions
diff --git a/README.md b/README.md
index dafc2ac..3ff0654 100644
--- a/README.md
+++ b/README.md
@@ -6,22 +6,31 @@ This is a simple python library for parsing the TEI-XML structured documents
 returned by [GROBID](https://github.com/kermitt2/grobid), a machine learning
 tool for extracting text and bibliographic metadata from research article PDFs.
 
-TEI-XML is a standard format, and there are other libraries to parse entire
+TEI-XML is a standard format, and there exist other libraries to parse entire
 documents and work with annotated text. This library is focused specifically on
 extracting "header" metadata from document (eg, title, authors, journal name,
 volume, issue), content in flattened text form (full abstract and body text as
 single strings, for things like search indexing), and structured citation
 metadata.
 
+
+## Quickstart
+
 `grobid_tei_xml` works with Python 3, using only the standard library. It does
 not talk to the GROBID HTTP API or read files off disk on it's own, but see
-examples below.
+examples below. The library is packaged on [pypi.org](https://pypi.org).
+
+Install using `pip`, usually within a `virtualenv`:
 
-In the near future, it should be possible to install `grobid_tei_xml` from
-[pypi.org](https://pypi.org) using `pip`.
+    pip install grobid_tei_xml
 
+The main entry points are the function `process_document_xml(xml_text)` and
+`process_citations_xml(xml_text)`, which return python dataclass objects. The
+helper method `.to_dict()` can be useful for, eg, serializing these objects to
+JSON.
 
-## Use Examples
+
+## Usage Examples
 
 Read an XML file from disk, parse it, and print to stdout as JSON:
 
@@ -101,6 +110,11 @@ python object or, eg, JSON.
 
 [GROBID Documentation](https://grobid.readthedocs.io/en/latest/)
 
+[s2orc-doc2json](https://github.com/allenai/s2orc-doc2json): Python library
+from AI2 which includes a similar Python library for extracting both
+bibliographic metadata and (structured) full text from GROBID TEI-XML. Has nice
+features like resolving references to bibliography entry.
+
 [delb](https://github.com/funkyfuture/delb): more flexible/powerful interface
 to TEI-XML documents. would be a better tool for working with structured text
 (body, abstract, etc)
author	Bryan Newbold <bnewbold@archive.org>	2021-10-22 18:45:52 -0700
committer	Bryan Newbold <bnewbold@archive.org>	2021-10-22 18:45:52 -0700
commit	1c4d9e2595f4bdd1ebbd00f9d908772757fd0663 (patch)
tree	47579ad2089080a0958f86512c77ad043283a1f5 /README.md
parent	1f7431e4d1430b215b1bcad7af7d432c35dd129f (diff)
download	grobid_tei_xml-1c4d9e2595f4bdd1ebbd00f9d908772757fd0663.tar.gz grobid_tei_xml-1c4d9e2595f4bdd1ebbd00f9d908772757fd0663.zip