Problems about accelerating the speed of inference stage

 Hi,
I'd like to run the inference model at embed device. Due to the limitation of computing resources, I have some questions as followed:

I've tried to convert the float computing int integer computing. That is to say, after parsing the cascade binary file, the luts and thresholds matrix will be converted to integer.

`int i,j;
	FILE* file;
	file = fopen("jst_headcascade", "rb");
	if(!file)
		return 0;
	
	fread(&version, sizeof(int32_t), 1, file);
	fread(&bbox[0], sizeof(int8_t), 4, file);
	fread(&tdepth, sizeof(int), 1, file);
	fread(&ntrees, sizeof(int), 1, file);
	
	for(i=0; i<ntrees; ++i)
	{
		fread(&tcodes[i][0], sizeof(int32_t), (1<<tdepth)-1, file);
		fread(&luts[i][0], sizeof(float), 1<<tdepth, file);
		fread(&thresholds[i], sizeof(float), 1, file);
	}
	fclose(file);
	
	//convert lut and thr to int data
	for(i=0;i<ntrees;i++)
	{
		int_thresholds[i] = (int)(thresholds[i]*PERCISON);
		for(j=0;j<(1<<tdepth);j++)
		{
			int_luts[i][j] = (int)(*(&luts[0][0]+i*1024+j)*PERCISON);
		}
	}`

However, I've met some problems in the "run_cascade" function. The index of vppixels array is out of range.
So I'd like to know the structure of cascade file. The declarations of these matrixs are not fully used during the inference stage.

`int32_t version = 3;

int tdepth;
int ntrees=0;

int8_t bbox[4]; // (r_min, r_max, c_min, c_max)

int32_t tcodes[4096][1024];
float luts[4096][1024];

float thresholds[4096];`

I cant understand the parsing of  following code, especially the tcodes and lut
`offset = ((1<<tdepth)-1)*sizeof(int32_t) + (1<<tdepth)*sizeof(float) + 1*sizeof(float);
	ptree = (int8_t*)cascade + 2*sizeof(float) + 2*sizeof(int);

	*o = 0.0f;

	for(i=0; i<ntrees; ++i)
	{
		//
		tcodes = ptree - 4;
		lut = (float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t));
		thr = *(float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t) + (1<<tdepth)*sizeof(float));

		//
		idx = 1;

		for(j=0; j<tdepth; ++j)
			idx = 2*idx + (pixels[(r+tcodes[4*idx+0]*s)/256*ldim+(c+tcodes[4*idx+1]*s)/256]<=pixels[(r+tcodes[4*idx+2]*s)/256*ldim+(c+tcodes[4*idx+3]*s)/256]);

		*o = *o + lut[idx-(1<<tdepth)];

		//
		if(*o<=thr)
			return -1;
		else
			ptree = ptree + offset;
	}

	//
	*o = *o - thr;`


Any response is helpful.
Thanks


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems about accelerating the speed of inference stage #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problems about accelerating the speed of inference stage #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions