OCR Implementation - Guidance #44

Excustic · 2024-08-12T07:17:02Z

Excustic
Aug 12, 2024

I was wondering if anyone could give me insight and tips on how I could implement OCR using the SDK.
Should I use the object detection yolo_v9 template?
The repository is vast and I'm not sure how to get my hands on a custom implementation here.

For more context, I am trying to read a word that is located above the user’s finger wherever he points to the document. I already made an algorithm that crops out the word using OpenCV with colour thresholding and some histogram analysis. I thought of maybe building a pipeline which takes the cropped image and feeds it to the NN but again I’m not sure it’s possible. If anyone has ideas I would love to hear them, thanks!

Answered by kris-himax

Aug 19, 2024

Hi @Excustic ,

Yes, it can run multiple models sequentially. You can reference tflm_fd_fm example which run 3 models sequentially.

You can reference https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/docs/memory_management.md to see the memory management of tensor arena because the tail part of tensor arena can not be share and other part of tensor arena can be reuse at difference model.

You can not load full resolution image to WE2 internal SRAM because of the memory limit.
Yes, you can just disable the flag OV5647_SUPPORT_BINNING at here and disable the sub-sample and binning and set the crop position where you want. But be careful that each cropped areas coul…

View full answer

kris-himax · 2024-08-13T02:06:14Z

kris-himax
Aug 13, 2024
Maintainer

Hi @Excustic ,
We have provided yolov8 object detection example, you can take it for the reference.

And I think you should train your own NN model for the OCR. If you want to train your own yolov8n and finally put it on WE2, here is the tutorial.

And about the crop function, you can use hx_lib_image_copy_helium for helium version.
You can reference here.

3 replies

Excustic Aug 18, 2024
Author

Hey @kris-himax , thank you for your feedback!
I did some more digging and thinking, and have some additional questions.

I know it is possible to store two different models. Can I run them sequentially? In a way that I get a result from Model A, do something with the data and then run Model B?
I saw the crop function and indeed it is handy for my case. Is it possible to read the frames in full resolution? 640x480 for a full-page image isn't enough for cropping a single tiny word out of it. I saw some variables that allow it but not sure if it's possible with the memory onboard.
If Q2 isn't possible, is it possible to get a 640x480 frame that is not a resized result but rather a cropped area from the actual sensor data?

Your help is very appreciated, thanks.

kris-himax Aug 19, 2024
Maintainer

Hi @Excustic ,

Yes, it can run multiple models sequentially. You can reference tflm_fd_fm example which run 3 models sequentially.

You can reference https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/docs/memory_management.md to see the memory management of tensor arena because the tail part of tensor arena can not be share and other part of tensor arena can be reuse at difference model.

You can not load full resolution image to WE2 internal SRAM because of the memory limit.
Yes, you can just disable the flag OV5647_SUPPORT_BINNING at here and disable the sub-sample and binning and set the crop position where you want. But be careful that each cropped areas could not be the same time-stamp at the actual sensor data.

/**crop setup**/
crop.start_x = DP_INP_CROP_START_X;
crop.start_y = DP_INP_CROP_START_Y;
crop.last_x = 640;
crop.last_y = 480;

if(subs == APP_DP_RES_RGB640x480_INP_SUBSAMPLE_2X||subs == APP_DP_RES_YUV640x480_INP_SUBSAMPLE_2X)
sensordplib_set_sensorctrl_inp_wi_crop_bin(SENCTRL_SENSOR_TYPE, SENCTRL_STREAM_TYPE, SENCTRL_SENSOR_WIDTH,
			SENCTRL_SENSOR_HEIGHT, INP_SUBSAMPLE_DISABLE, crop ,INP_BINNING_DISABLE);
else if(subs == APP_DP_RES_RGB640x480_INP_SUBSAMPLE_1X||subs == APP_DP_RES_YUV640x480_INP_SUBSAMPLE_1X)
sensordplib_set_sensorctrl_inp_wi_crop_bin(SENCTRL_SENSOR_TYPE, SENCTRL_STREAM_TYPE, SENCTRL_SENSOR_WIDTH,
				SENCTRL_SENSOR_HEIGHT, INP_SUBSAMPLE_DISABLE, crop ,INP_BINNING_DISABLE);

Answer selected by Excustic

CEGC2024 Oct 17, 2024

I saw the crop function and indeed it is handy for my case. Is it possible to read the frames in full resolution? 640x480 for a full-page image isn't enough for cropping a single tiny word out of it. I saw some variables that allow it but not sure if it's possible with the memory onboard.

If Q2 isn't possible, is it possible to get a 640x480 frame that is not a resized result but rather a cropped area from the actual sensor data?

I have been able to do something similar with the OV5647, but it requires modifying the camera control registers directly (via hx_drv_cis_setRegTable) after initializing with the OV5647_mipi_2lane_640x480 config. The control registers can be found in a datasheet on the internet.

You need to sweep the 640x480 output of the active pixel array via the TIMING_ISP_X_WIN and Y registers (0x3810 to 0x3813). The actual active pixels start at 16,6 of the CFA. Edit: If I am not mistaken 0x3800/1 + 0x3810/1 = 16 and 0x3802/3 + 0x3812/3 = 6 for pixel 0,0 of the active image. AFAIK: the inactive pixels included are for timing and BLC and other functions, but I could be wrong. The OV5647_mipi_2lane_640x480 config appears to point at 20,2 which would be starting output pixel 4,-4 (negative being inactive pixels) There is a bit configuration that allows to the ISP to ignore non-image data.
Disable binning (0x3820.bit0 and 0x3821.bit0) note other bits in register
Disable the subsampling registers (setting 0x3814 and 0x3815 to 0x11).
You may need to adjust TIMING_VTS and TIMING_HTS

The main issue with this is that each cropped portion of the active image array requires you to capture another frame as the OV5647 can't buffer/hold the entire frame, meaning you need to track your captures and stitch results together.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR Implementation - Guidance #44

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

OCR Implementation - Guidance #44

Excustic Aug 12, 2024

Replies: 1 comment · 3 replies

kris-himax Aug 13, 2024 Maintainer

Excustic Aug 18, 2024 Author

kris-himax Aug 19, 2024 Maintainer

CEGC2024 Oct 17, 2024

Excustic
Aug 12, 2024

Replies: 1 comment 3 replies

kris-himax
Aug 13, 2024
Maintainer

Excustic Aug 18, 2024
Author

kris-himax Aug 19, 2024
Maintainer