Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API reference to documentation site #174

Open
mocobeta opened this issue Dec 16, 2023 · 11 comments
Open

Add API reference to documentation site #174

mocobeta opened this issue Dec 16, 2023 · 11 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@mocobeta
Copy link
Contributor

mocobeta commented Dec 16, 2023

Now we have .pyi file (#167), we can generate API docs by pdoc.

For example, the following commands generate static HTML files in apidoc dir if you have tantivy module in your python path.

pip install pdoc
pdoc tantivy -o ./apidocs

Generated HTML looks very neat because PyO3 exposes comments in .rs files as docstrings of tantivy module, then pdoc collects all of them throughout the stubs and type hints!

tantivy_apidoc

Perhaps we can publish the API doc to https://tantivy-py.readthedocs.io/en/latest/.
I'm not familiar with the CI pipeline to publish the docs site. Is there any suggestions about that?

@cjrh
Copy link
Collaborator

cjrh commented Dec 28, 2023

The current workflow is I log into readthedocs on my own personal account and click "sync". I'm not happy with this, but I don't currently have enough permissions on this github repo to get the webhook working. Once the webhook is setup, docs deployment will be automatic.

@fmassot
Copy link

fmassot commented Dec 28, 2023

@cjrh do you know which permission you need?

@cjrh
Copy link
Collaborator

cjrh commented Dec 30, 2023

@fmassot I think either I need "write" access, or I can add you or another admin to the readthedocs site and you can generate the integration there. Long term that is probably best anyway, so that I am not the only one with access to the readthedocs site.

This is the page where the integration can be performed:

image

When I click "resync" it fails:

image

If you create a login on readthedocs (make sure to use your github OAuth login), I can add your account to the tantivy-py project on readthedocs, and then you will be able to successfully click that Resync webhook button, which should work because it'll use the permissions on your github OAuth token. Then I won't need any permissions change on tantivy-py.

Just let me know your login name on readthedocs and I'll add it to the project.

@cjrh
Copy link
Collaborator

cjrh commented Dec 30, 2023

@mocobeta This example at mkdocs shows integration of pdoc into a mkdocs build: https://github.com/mitmproxy/pdoc/tree/main/examples/mkdocs/

There is a comment that says:

mkdocstrings is a great alternative to pdoc if you are in the mkdocs ecosystem anyways.

Do you have personal experience that supports sticking with pdoc over mkdocstrings?

@cjrh
Copy link
Collaborator

cjrh commented Dec 31, 2023

I tried a few things, and it seems that mkdocstrings doesn't do a good job of pulling out the docstrings from the rust source. So I guess that means pdoc is better.

@mocobeta
Copy link
Contributor Author

I didn't know mkdocstrings, but it seems to parse the Python code with Griffle parser (https://mkdocstrings.github.io/griffe/docstrings/) and try to directly extract the docstring from the source instead of pulling it from __doc__. So unfortunatelly this wouldn't be a help for our purposes.

@pawamoy
Copy link

pawamoy commented Jan 12, 2024

Chiming in: it's actually possible to use dynamic analysis with Griffe, to use __doc__ and the like instead of reading sources. It's actually done by default for compiled modules which do not have Python sources 🙂 The edges might still be a bit rough around these use-cases, happy to get features requests / feedback 😄

@cjrh
Copy link
Collaborator

cjrh commented Jan 13, 2024

@pawamoy I guess the mkdocstrings is not using that feature then. It would likely be a high value improvement if you could help them to make it work with dynamic docstrings :)

@pawamoy
Copy link

pawamoy commented Jan 14, 2024

I've ran some experiments and identified a couple issues in Griffe that will be fixed in the next version. This will allow you to collect API docs from your Python module compiled from Rust. However there's another issue that I cannot fix on my side: classes like DocAddress and others say their module is tantivy, which is incorrect, as they are actually imported from tantivy.tantivy. I'm not familiar with Pyo3 but the following seems to work:

#[pyclass(frozen, module = "tantivy.tantivy")]

In other words: your compiled module ends up as tantivy/tantivy.cpython-blabla.so, so it is important that classes defined within it report their parent module as tantivy.tantivy, not just tantivy, even if in the end they are publicly exposed from there. Their __module__ attribute must be the canonical location, not the public one.

Also some other classes (Rust structs IIUC) don't even declare their parent module, like SchemaBuilder or Query. They must declare it for Griffe to correctly understand them.

My local diff
diff --git a/docs/reference.md b/docs/reference.md
index 8ca4294..d372b22 100644
--- a/docs/reference.md
+++ b/docs/reference.md
@@ -36,3 +36,16 @@ best_doc = searcher.doc(best_doc_address)
 Note: for integer search, the integer field should be indexed.
 
 For more possible query formats and possible query options, see [Tantivy Query Parser Docs.](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html)
+
+::: tantivy.Schema
+::: tantivy.SchemaBuilder
+::: tantivy.Facet
+::: tantivy.Document
+::: tantivy.Query
+::: tantivy.Order
+::: tantivy.DocAddress
+::: tantivy.SearchResult
+::: tantivy.Searcher
+::: tantivy.Index
+::: tantivy.Snippet
+::: tantivy.SnippetGenerator
diff --git a/mkdocs.yml b/mkdocs.yml
index 2640f63..5427b17 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -13,3 +13,24 @@ theme: readthedocs
 # - 'User Guide':
 #     - 'Writing your docs': 'writing-your-docs.md'
 #     - 'Styling your docs': 'styling-your-docs.md'
+
+plugins:
+- search
+- mkdocstrings:
+    handlers:
+      python:
+        options:
+          docstring_options:
+            ignore_init_summary: true
+          docstring_section_style: list
+          heading_level: 2
+          inherited_members: true
+          merge_init_into_class: true
+          separate_signature: true
+          show_root_heading: true
+          show_root_full_path: false
+          show_source: false
+          show_signature_annotations: true
+          signature_crossrefs: true
+          summary: true
+          filters: ["!^_"]
diff --git a/src/document.rs b/src/document.rs
index 8768463..b09e754 100644
--- a/src/document.rs
+++ b/src/document.rs
@@ -463,7 +463,7 @@ impl<'a> From<&'a Value> for BorrowedSerdeValue<'a> {
 ///     ...     {"unsigned": 1000, "signed": -5, "float": 0.4},
 ///     ...     schema,
 ///     ... )
-#[pyclass(module = "tantivy")]
+#[pyclass(module = "tantivy.tantivy")]
 #[derive(Clone, Default, PartialEq)]
 pub(crate) struct Document {
     pub(crate) field_values: BTreeMap<String, Vec<Value>>,
diff --git a/src/facet.rs b/src/facet.rs
index 2983fe2..74fb598 100644
--- a/src/facet.rs
+++ b/src/facet.rs
@@ -16,7 +16,7 @@ use tantivy::schema;
 /// implicitely imply that a document belonging to a facet also belongs to the
 /// ancestor of its facet. In the example above, /electronics/tv_and_video/
 /// and /electronics.
-#[pyclass(frozen, module = "tantivy")]
+#[pyclass(frozen, module = "tantivy.tantivy")]
 #[derive(Clone, Deserialize, PartialEq, Serialize)]
 pub(crate) struct Facet {
     pub(crate) inner: schema::Facet,
diff --git a/src/index.rs b/src/index.rs
index 3eef34f..4636d24 100644
--- a/src/index.rs
+++ b/src/index.rs
@@ -27,7 +27,7 @@ const RELOAD_POLICY: &str = "commit";
 ///
 /// To create an IndexWriter first create an Index and call the writer() method
 /// on the index object.
-#[pyclass]
+#[pyclass(module = "tantivy.tantivy")]
 pub(crate) struct IndexWriter {
     inner_index_writer: Option<tv::IndexWriter>,
     schema: tv::schema::Schema,
@@ -200,7 +200,7 @@ impl IndexWriter {
 ///
 /// If an index already exists it will be opened and reused. Raises OSError
 /// if there was a problem during the opening or creation of the index.
-#[pyclass]
+#[pyclass(module = "tantivy.tantivy")]
 pub(crate) struct Index {
     pub(crate) index: tv::Index,
     reader: tv::IndexReader,
diff --git a/src/parser_error.rs b/src/parser_error.rs
index 724d17a..6421215 100644
--- a/src/parser_error.rs
+++ b/src/parser_error.rs
@@ -799,7 +799,7 @@ impl TryFrom<tv::query::QueryParserError> for UnknownTokenizerError {
 
 /// The query contains a range query with a phrase as one of the bounds. Only terms can be used as
 /// bounds.
-#[pyclass(frozen)]
+#[pyclass(frozen, module = "tantivy.tantivy")]
 pub(crate) struct RangeMustNotHavePhraseError;
 
 #[pymethods]
diff --git a/src/query.rs b/src/query.rs
index 53da089..962d30d 100644
--- a/src/query.rs
+++ b/src/query.rs
@@ -3,7 +3,7 @@ use pyo3::{exceptions, prelude::*, types::PyAny};
 use tantivy as tv;
 
 /// Tantivy's Query
-#[pyclass(frozen)]
+#[pyclass(frozen, module = "tantivy.tantivy")]
 pub(crate) struct Query {
     pub(crate) inner: Box<dyn tv::query::Query>,
 }
diff --git a/src/schema.rs b/src/schema.rs
index ba0c740..122a971 100644
--- a/src/schema.rs
+++ b/src/schema.rs
@@ -7,7 +7,7 @@ use tantivy as tv;
 ///
 /// The schema is very strict. To build the schema the `SchemaBuilder` class is
 /// provided.
-#[pyclass(frozen, module = "tantivy")]
+#[pyclass(frozen, module = "tantivy.tantivy")]
 #[derive(Deserialize, PartialEq, Serialize)]
 pub(crate) struct Schema {
     pub(crate) inner: tv::schema::Schema,
diff --git a/src/schemabuilder.rs b/src/schemabuilder.rs
index c59e539..060c394 100644
--- a/src/schemabuilder.rs
+++ b/src/schemabuilder.rs
@@ -23,7 +23,7 @@ use tantivy::schema::{
 ///     >>> body = builder.add_text_field("body")
 ///
 ///     >>> schema = builder.build()
-#[pyclass]
+#[pyclass(module = "tantivy.tantivy")]
 #[derive(Clone)]
 pub(crate) struct SchemaBuilder {
     pub(crate) builder: Arc<RwLock<Option<schema::SchemaBuilder>>>,
diff --git a/src/searcher.rs b/src/searcher.rs
index 932e5c9..c202bfd 100644
--- a/src/searcher.rs
+++ b/src/searcher.rs
@@ -9,7 +9,7 @@ use tantivy::collector::{Count, MultiCollector, TopDocs};
 /// Tantivy's Searcher class
 ///
 /// A Searcher is used to search the index given a prepared Query.
-#[pyclass]
+#[pyclass(module = "tantivy.tantivy")]
 pub(crate) struct Searcher {
     pub(crate) inner: tv::Searcher,
 }
@@ -40,7 +40,7 @@ impl ToPyObject for Fruit {
     }
 }
 
-#[pyclass(frozen, module = "tantivy")]
+#[pyclass(frozen, module = "tantivy.tantivy")]
 #[derive(Clone, Copy, Deserialize, PartialEq, Serialize)]
 /// Enum representing the direction in which something should be sorted.
 pub(crate) enum Order {
@@ -60,7 +60,7 @@ impl From<Order> for tv::Order {
     }
 }
 
-#[pyclass(frozen, module = "tantivy")]
+#[pyclass(frozen, module = "tantivy.tantivy")]
 #[derive(Clone, Default, Deserialize, PartialEq, Serialize)]
 /// Object holding a results successful search.
 pub(crate) struct SearchResult {
@@ -269,7 +269,7 @@ impl Searcher {
 /// It consists in an id identifying its segment, and its segment-local DocId.
 /// The id used for the segment is actually an ordinal in the list of segment
 /// hold by a Searcher.
-#[pyclass(frozen, module = "tantivy")]
+#[pyclass(frozen, module = "tantivy.tantivy")]
 #[derive(Clone, Debug, Deserialize, PartialEq, Serialize)]
 pub(crate) struct DocAddress {
     pub(crate) segment_ord: tv::SegmentOrdinal,
diff --git a/src/snippet.rs b/src/snippet.rs
index e74b3ce..e5decc1 100644
--- a/src/snippet.rs
+++ b/src/snippet.rs
@@ -6,12 +6,12 @@ use tantivy as tv;
 ///
 /// The schema is very strict. To build the schema the `SchemaBuilder` class is
 /// provided.
-#[pyclass]
+#[pyclass(module = "tantivy.tantivy")]
 pub(crate) struct Snippet {
     pub(crate) inner: tv::Snippet,
 }
 
-#[pyclass]
+#[pyclass(module = "tantivy.tantivy")]
 pub(crate) struct Range {
     #[pyo3(get)]
     start: usize,
@@ -38,7 +38,7 @@ impl Snippet {
     }
 }
 
-#[pyclass]
+#[pyclass(module = "tantivy.tantivy")]
 pub(crate) struct SnippetGenerator {
     pub(crate) field_name: String,
     pub(crate) inner: tv::SnippetGenerator,

Then you can get something like this generated by mkdocstings:

tantivy

Note that the module has to be built and installed to build the docs, since Griffe actually imports it.

Feel free to mark as off-topic and move my comment in another issue 🙂

@cjrh
Copy link
Collaborator

cjrh commented Jan 15, 2024

That's good feedback, thank you for looking into this 😄 . This is the correct issue IMO. I'll try to find some time to fix up the module references next weekend.

@cjrh cjrh added the documentation Improvements or additions to documentation label Jan 20, 2024
@cjrh
Copy link
Collaborator

cjrh commented Jan 20, 2024

I added a PR here to fix only the module declarations: #190. After that is merged I will return to make a second PR using your example showing the mkdocstrings implementation.

@cjrh cjrh self-assigned this Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants