Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔧 rework custom internals #200

Merged
merged 3 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/extras/code_samples/custom_v1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ from mindee import Client, PredictResponse, product
# Init a new client
mindee_client = Client(api_key="my-api-key")

custom_endpoint = mindee_client.create_endpoint("field_test", "solution-eng-tests")
custom_endpoint = mindee_client.create_endpoint("my-endpoint", "my-account")

# Load a file from disk
input_doc = mindee_client.source_from_path("/path/to/the/file.ext")
Expand Down
10 changes: 5 additions & 5 deletions docs/extras/guide/custom_v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ The **columns_to_line_items()** function can be called from the document and pag

It takes the following arguments:

* **anchor_names** (`List[str]`): a list of the names of possible anchor (field) candidate for the horizontal placement a line. If all provided anchors are invalid, the `LineItemV1` won't be built.
* **anchor_names** (`List[str]`): a list of the names of possible anchor (field) candidate for the horizontal placement a line. If all provided anchors are invalid, the `CustomLine` won't be built.
* **field_names** (`List[str]`): a list of fields to retrieve the values from
* **height_tolerance** (`float`): Optional, the height tolerance used to build the line. It helps when the height of a line can vary unexpectedly.

Expand All @@ -121,14 +121,14 @@ response.document.pages[0].prediction.columns_to_line_items(
)
```

It returns a list of [CustomLineV1](#CustomlineV1) objects.
It returns a list of [CustomLine](#CustomLine) objects.

## CustomlineV1
## CustomLine

`CustomlineV1` represents a line as it has been read from column fields. It has the following attributes:
`CustomLine` represents a line as it has been read from column fields. It has the following attributes:

* **row_number** (`int`): Number of a given line. Starts at 1.
* **fields** (`Dict[str, ListFieldValueV1]`[]): List of the fields associated with the line, indexed by their column name.
* **fields** (`Dict[str, ListFieldValue]`[]): List of the fields associated with the line, indexed by their column name.
* **bbox** (`BBox`): Simple bounding box of the current line representing the 4 minimum & maximum coordinates as `float` values.


Expand Down
8 changes: 4 additions & 4 deletions docs/parsing/custom.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,26 @@ Custom Fields

Classification
==============
.. autoclass:: mindee.parsing.custom.classification.ClassificationFieldV1
.. autoclass:: mindee.parsing.custom.classification.ClassificationField
:members:


Line Items
==========
.. autoclass:: mindee.parsing.custom.line_items.CustomLineV1
.. autoclass:: mindee.parsing.custom.line_items.CustomLine
:members:

Lists
=====

List Field
----------
.. autoclass:: mindee.parsing.custom.list.ListFieldV1
.. autoclass:: mindee.parsing.custom.list.ListField
:members:

List Field Value
----------------
.. autoclass:: mindee.parsing.custom.list.ListFieldValueV1
.. autoclass:: mindee.parsing.custom.list.ListFieldValue
:members:

String Dict
Expand Down
2 changes: 1 addition & 1 deletion mindee/error/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from mindee.error.geometry_error import GeometryError
from mindee.error.mimetype_error import MimeTypeError
from mindee.error.mindee_error import MindeeClientError, MindeeError
from mindee.error.mindee_error import MindeeClientError, MindeeError, MindeeProductError
from mindee.error.mindee_http_error import (
MindeeHTTPClientError,
MindeeHTTPError,
Expand Down
4 changes: 4 additions & 0 deletions mindee/error/mindee_error.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,7 @@ class MindeeApiError(MindeeError):

class MindeeSourceError(MindeeError):
"""An exception relating to document loading."""


class MindeeProductError(MindeeApiError):
"""An exception relating to the use of an incorrect product/version."""
2 changes: 1 addition & 1 deletion mindee/input/sources.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ class LocalInputSource:
filename: str
file_mimetype: str
input_type: InputType
filepath: Optional[str] = None
filepath: Optional[str]

def __init__(self, input_type: InputType):
self.input_type = input_type
Expand Down
4 changes: 3 additions & 1 deletion mindee/parsing/common/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ class Job:
"""ID of the job sent by the API in response to an enqueue request."""
issued_at: datetime
"""Timestamp of the request reception by the API."""
available_at: Optional[datetime] = None
available_at: Optional[datetime]
"""Timestamp of the request after it has been completed."""
status: str
"""Status of the request, as seen by the API."""
Expand All @@ -30,6 +30,8 @@ def __init__(self, json_response: dict) -> None:
self.issued_at = datetime.fromisoformat(json_response["issued_at"])
if json_response.get("available_at"):
self.available_at = datetime.fromisoformat(json_response["available_at"])
else:
self.available_at = None
self.id = json_response["id"]
self.status = json_response["status"]
if self.available_at:
Expand Down
6 changes: 3 additions & 3 deletions mindee/parsing/custom/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from mindee.parsing.custom.classification import ClassificationFieldV1
from mindee.parsing.custom.line_items import CustomLineV1, get_line_items
from mindee.parsing.custom.list import ListFieldV1, ListFieldValueV1
from mindee.parsing.custom.classification import ClassificationField
from mindee.parsing.custom.line_items import CustomLine, get_line_items
from mindee.parsing.custom.list import ListField, ListFieldValue
2 changes: 1 addition & 1 deletion mindee/parsing/custom/classification.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from mindee.parsing.common.string_dict import StringDict


class ClassificationFieldV1:
class ClassificationField:
"""A classification field."""

value: str
Expand Down
40 changes: 19 additions & 21 deletions mindee/parsing/custom/line_items.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
from mindee.geometry.bbox import BBox, extend_bbox, get_bbox
from mindee.geometry.minmax import MinMax, get_min_max_y
from mindee.geometry.quadrilateral import get_bounding_box
from mindee.parsing.custom.list import ListFieldV1, ListFieldValueV1
from mindee.parsing.custom.list import ListField, ListFieldValue


def _find_best_anchor(anchors: Sequence[str], fields: Dict[str, ListFieldV1]) -> str:
def _find_best_anchor(anchors: Sequence[str], fields: Dict[str, ListField]) -> str:
"""
Find the anchor with the most rows, in the order specified by `anchors`.

Expand All @@ -23,12 +23,12 @@ def _find_best_anchor(anchors: Sequence[str], fields: Dict[str, ListFieldV1]) ->
return anchor


class CustomLineV1:
class CustomLine:
"""Represent a single line."""

row_number: int
"""Index of the row of a given line."""
fields: Dict[str, ListFieldValueV1]
fields: Dict[str, ListFieldValue]
"""Fields contained in the line."""
bbox: BBox
"""Simplified bounding box of the line."""
Expand All @@ -38,7 +38,7 @@ def __init__(self, row_number: int):
self.bbox = BBox(1, 1, 0, 0)
self.fields = {}

def update_field(self, field_name: str, field_value: ListFieldValueV1) -> None:
def update_field(self, field_name: str, field_value: ListFieldValue) -> None:
"""
Updates a field value if it exists.

Expand All @@ -61,7 +61,7 @@ def update_field(self, field_name: str, field_value: ListFieldValueV1) -> None:
merged_confidence = field_value.confidence
merged_polygon = get_bounding_box(field_value.polygon)

self.fields[field_name] = ListFieldValueV1(
self.fields[field_name] = ListFieldValue(
{
"content": merged_content,
"confidence": merged_confidence,
Expand All @@ -70,9 +70,7 @@ def update_field(self, field_name: str, field_value: ListFieldValueV1) -> None:
)


def is_box_in_line(
line: CustomLineV1, bbox: BBox, height_line_tolerance: float
) -> bool:
def is_box_in_line(line: CustomLine, bbox: BBox, height_line_tolerance: float) -> bool:
"""
Checks if the bbox fits inside the line.

Expand All @@ -86,25 +84,25 @@ def is_box_in_line(


def prepare(
anchor_name: str, fields: Dict[str, ListFieldV1], height_line_tolerance: float
) -> List[CustomLineV1]:
anchor_name: str, fields: Dict[str, ListField], height_line_tolerance: float
) -> List[CustomLine]:
"""
Prepares lines before filling them.

:param anchor_name: name of the anchor.
:param fields: fields to build lines from.
:param height_line_tolerance: line height tolerance for custom line reconstruction.
"""
lines_prepared: List[CustomLineV1] = []
lines_prepared: List[CustomLine] = []
try:
anchor_field: ListFieldV1 = fields[anchor_name]
anchor_field: ListField = fields[anchor_name]
except KeyError as exc:
raise MindeeError("No lines have been detected.") from exc

current_line_number: int = 1
current_line = CustomLineV1(current_line_number)
current_line = CustomLine(current_line_number)
if anchor_field and len(anchor_field.values) > 0:
current_value: ListFieldValueV1 = anchor_field.values[0]
current_value: ListFieldValue = anchor_field.values[0]
current_line.bbox = extend_bbox(
current_line.bbox,
current_value.polygon,
Expand All @@ -118,7 +116,7 @@ def prepare(
):
lines_prepared.append(current_line)
current_line_number += 1
current_line = CustomLineV1(current_line_number)
current_line = CustomLine(current_line_number)
current_line.bbox = extend_bbox(
current_line.bbox,
current_value.polygon,
Expand All @@ -140,26 +138,26 @@ def prepare(
def get_line_items(
anchors: Sequence[str],
field_names: Sequence[str],
fields: Dict[str, ListFieldV1],
fields: Dict[str, ListField],
height_line_tolerance: float = 0.01,
) -> List[CustomLineV1]:
) -> List[CustomLine]:
"""
Reconstruct line items from fields.

:anchors: Possible fields to use as an anchor
:columns: All fields which are columns
:fields: List of field names to reconstruct table with
"""
line_items: List[CustomLineV1] = []
fields_to_transform: Dict[str, ListFieldV1] = {}
line_items: List[CustomLine] = []
fields_to_transform: Dict[str, ListField] = {}
for field_name, field_value in fields.items():
if field_name in field_names:
fields_to_transform[field_name] = field_value
anchor = _find_best_anchor(anchors, fields_to_transform)
if not anchor:
print(Warning("Could not find an anchor!"))
return line_items
lines_prepared: List[CustomLineV1] = prepare(
lines_prepared: List[CustomLine] = prepare(
anchor, fields_to_transform, height_line_tolerance
)

Expand Down
22 changes: 9 additions & 13 deletions mindee/parsing/custom/list.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,33 +4,35 @@
from mindee.parsing.standard.base import FieldPositionMixin


class ListFieldValueV1(FieldPositionMixin):
class ListFieldValue(FieldPositionMixin):
"""A single value or word."""

content: str
"""The content text"""
confidence: float
"""Confidence score"""
page_id: Optional[int]
sebastianMindee marked this conversation as resolved.
Show resolved Hide resolved

def __init__(self, raw_prediction: StringDict) -> None:
def __init__(
self, raw_prediction: StringDict, page_id: Optional[int] = None
) -> None:
self.content = raw_prediction["content"]
self.confidence = raw_prediction["confidence"]
self.page_id = page_id
self._set_position(raw_prediction)

def __str__(self) -> str:
return self.content


class ListFieldV1:
class ListField:
"""A list of values or words."""

confidence: float
"""Confidence score"""
reconstructed: bool
"""Whether the field was reconstructed from other fields."""
page_id: Optional[int]
"""The document page on which the information was found."""
values: List[ListFieldValueV1]
values: List[ListFieldValue]
"""List of word values"""

def __init__(
Expand All @@ -43,15 +45,9 @@ def __init__(
self.reconstructed = reconstructed

for value in raw_prediction["values"]:
self.values.append(ListFieldValueV1(value))
if "page_id" in value:
page_id = value["page_id"]

if page_id is None:
if "page_id" in raw_prediction:
self.page_id = raw_prediction["page_id"]
else:
self.page_id = page_id
self.values.append(ListFieldValue(value, page_id))
self.confidence = raw_prediction["confidence"]

@property
Expand Down
14 changes: 7 additions & 7 deletions mindee/product/custom/custom_v1_document.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
from typing import Dict, List

from mindee.parsing.common import Prediction, StringDict, clean_out_string
from mindee.parsing.custom import ClassificationFieldV1, ListFieldV1
from mindee.parsing.custom.line_items import CustomLineV1, get_line_items
from mindee.parsing.custom import ClassificationField, ListField
from mindee.parsing.custom.line_items import CustomLine, get_line_items


class CustomV1Document(Prediction):
"""Custom V1 document prediction results."""

fields: Dict[str, ListFieldV1]
fields: Dict[str, ListField]
"""Dictionary of all fields in the document"""
classifications: Dict[str, ClassificationFieldV1]
classifications: Dict[str, ClassificationField]
"""Dictionary of all classifications in the document"""

def __init__(self, raw_prediction: StringDict) -> None:
Expand All @@ -23,17 +23,17 @@ def __init__(self, raw_prediction: StringDict) -> None:
self.classifications = {}
for field_name, field_contents in raw_prediction.items():
if "value" in field_contents:
self.classifications[field_name] = ClassificationFieldV1(field_contents)
self.classifications[field_name] = ClassificationField(field_contents)
# Only value lists have the 'values' attribute.
elif "values" in field_contents:
self.fields[field_name] = ListFieldV1(field_contents)
self.fields[field_name] = ListField(field_contents)

def columns_to_line_items(
self,
anchor_names: List[str],
field_names: List[str],
height_tolerance: float = 0.01,
) -> List[CustomLineV1]:
) -> List[CustomLine]:
"""
Order column fields into line items.

Expand Down
10 changes: 5 additions & 5 deletions mindee/product/custom/custom_v1_page.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
from typing import Dict, List, Optional

from mindee.parsing.common import Prediction, StringDict, clean_out_string
from mindee.parsing.custom import ListFieldV1
from mindee.parsing.custom.line_items import CustomLineV1, get_line_items
from mindee.parsing.custom import ListField
from mindee.parsing.custom.line_items import CustomLine, get_line_items


class CustomV1Page(Prediction):
"""Custom V1 page prediction results."""

fields: Dict[str, ListFieldV1]
fields: Dict[str, ListField]
"""Dictionary of all fields in the document"""

def __init__(self, raw_prediction: StringDict, page_id: Optional[int]) -> None:
Expand All @@ -19,14 +19,14 @@ def __init__(self, raw_prediction: StringDict, page_id: Optional[int]) -> None:
"""
self.fields = {}
for field_name, field_contents in raw_prediction.items():
self.fields[field_name] = ListFieldV1(field_contents, page_id=page_id)
self.fields[field_name] = ListField(field_contents, page_id=page_id)

def columns_to_line_items(
self,
anchor_names: List[str],
field_names: List[str],
height_tolerance: float = 0.01,
) -> List[CustomLineV1]:
) -> List[CustomLine]:
"""
Order column fields into line items.

Expand Down
Loading
Loading