Skip to content

Problems about accelerating the speed of inference stage #23

@haopo2005

Description

@haopo2005

Hi,
I'd like to run the inference model at embed device. Due to the limitation of computing resources, I have some questions as followed:

I've tried to convert the float computing int integer computing. That is to say, after parsing the cascade binary file, the luts and thresholds matrix will be converted to integer.

`int i,j;
FILE* file;
file = fopen("jst_headcascade", "rb");
if(!file)
return 0;

fread(&version, sizeof(int32_t), 1, file);
fread(&bbox[0], sizeof(int8_t), 4, file);
fread(&tdepth, sizeof(int), 1, file);
fread(&ntrees, sizeof(int), 1, file);

for(i=0; i<ntrees; ++i)
{
	fread(&tcodes[i][0], sizeof(int32_t), (1<<tdepth)-1, file);
	fread(&luts[i][0], sizeof(float), 1<<tdepth, file);
	fread(&thresholds[i], sizeof(float), 1, file);
}
fclose(file);

//convert lut and thr to int data
for(i=0;i<ntrees;i++)
{
	int_thresholds[i] = (int)(thresholds[i]*PERCISON);
	for(j=0;j<(1<<tdepth);j++)
	{
		int_luts[i][j] = (int)(*(&luts[0][0]+i*1024+j)*PERCISON);
	}
}`

However, I've met some problems in the "run_cascade" function. The index of vppixels array is out of range.
So I'd like to know the structure of cascade file. The declarations of these matrixs are not fully used during the inference stage.

`int32_t version = 3;

int tdepth;
int ntrees=0;

int8_t bbox[4]; // (r_min, r_max, c_min, c_max)

int32_t tcodes[4096][1024];
float luts[4096][1024];

float thresholds[4096];`

I cant understand the parsing of following code, especially the tcodes and lut
`offset = ((1<<tdepth)-1)sizeof(int32_t) + (1<<tdepth)sizeof(float) + 1sizeof(float);
ptree = (int8_t
)cascade + 2sizeof(float) + 2sizeof(int);

*o = 0.0f;

for(i=0; i<ntrees; ++i)
{
	//
	tcodes = ptree - 4;
	lut = (float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t));
	thr = *(float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t) + (1<<tdepth)*sizeof(float));

	//
	idx = 1;

	for(j=0; j<tdepth; ++j)
		idx = 2*idx + (pixels[(r+tcodes[4*idx+0]*s)/256*ldim+(c+tcodes[4*idx+1]*s)/256]<=pixels[(r+tcodes[4*idx+2]*s)/256*ldim+(c+tcodes[4*idx+3]*s)/256]);

	*o = *o + lut[idx-(1<<tdepth)];

	//
	if(*o<=thr)
		return -1;
	else
		ptree = ptree + offset;
}

//
*o = *o - thr;`

Any response is helpful.
Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions