Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When analyzing the class inheritance relationship, treat the members of the class as the parent class #1539

Open
zz-fz-john opened this issue Aug 30, 2024 · 8 comments

Comments

@zz-fz-john
Copy link

The class definition is like this

class AP_GPS_Backend
{
public:
 ...
protected:
    AP_HAL::UARTDriver *port;           ///< UART we are attached to
    AP_GPS &gps;                        ///< access to frontend (for parameters)
    AP_GPS::GPS_State &state;           ///< public state for this instance

    uint64_t _last_pps_time_us;
    JitterCorrection jitter_correction;
    uint32_t _last_itow_ms;
...

The definition of class in IR is:

%class.AP_GPS_Backend = type <{ ptr, ptr, ptr, ptr, i64, %class.JitterCorrection, i32, [4 x i8], i64, i32, i32, i32, i32, i16, [6 x i8] }>

In the generated class inheritance graph, the edges related to this class are:

	Node0x5596521120f0 [shape=record,shape=box,label="{AP_GPS_Backend}"];
	Node0x5596521120f0 -> Node0x55965218ba90[style=solid];
	Node0x55965218ba90 [shape=record,shape=box,label="{JitterCorrection}"];

The analysis result of SVF is that the parent class of AP_GPS_Backend is JitterCorrection, but in fact JitterCorrection is a member of AP_GPS_Backend.
The version of SVF that I using is svf-3.0,and I use the command "wpa -sfrander -dump-cha "to generate the CHA.

@yuleisui
Copy link
Collaborator

Would you be able to dig a bit and locate where the bug is?

@zz-fz-john
Copy link
Author

Would you be able to dig a bit and locate where the bug is?

I think it’s because in the process of analyzing class members, all class members that are structures are regarded as their parent classes.

For example the above code:

%class.AP_GPS_Backend = type <{ ptr, ptr, ptr, ptr, i64, %class.JitterCorrection, i32, [4 x i8], i64, i32, i32, i32, i32, i16, [6 x i8] }>

In fact, JitterCorrection is a member variable in AP_GPS_Backend, but svf mistakenly recognized it as the parent class of AP_GPS_Backend.

I read some code, and I think maybe it's because of the readInheritanceMetadataFromModule(const Module &M) function in CHGBuilder.cpp, which iterates through all the operands and treats the operands as his parent class.
here is the code:

void CHGBuilder::readInheritanceMetadataFromModule(const Module &M)
{
    for (Module::const_named_metadata_iterator mdit = M.named_metadata_begin(),
            mdeit = M.named_metadata_end(); mdit != mdeit; ++mdit)
    {
        const NamedMDNode *md = &*mdit;
        string mdname = md->getName().str();
        if (mdname.compare(0, 15, "__cxx_bases_of_") != 0)
            continue;
        string className = mdname.substr(15);
        for (NamedMDNode::const_op_iterator opit = md->op_begin(),
                opeit = md->op_end(); opit != opeit; ++opit)
        {
            const MDNode *N = *opit;
            const MDString* mdstr = SVFUtil::cast<MDString>(N->getOperand(0).get());
            string baseName = mdstr->getString().str();
            chg->addEdge(className, baseName, CHEdge::INHERITANCE);
        }
    }
}

But I have a little doubt: I queried the entire IR and did not find "_cxx_bases_of".

But I have one question: I queried the entire IR and did not find "_cxx_bases_of". Is the "_cxx_bases_of" prefix generated by dynamic analysis during processing? Or does this prefix always exist in the IR file? If it is the former, then I think there is a problem with the way readInheritanceMetadataFromModule is processed. The processing logic of this function should be in the second for loop. When it encounters the first parameter that is not a structure, it will stop connecting the class inheritance graph. If so The latter, then I can't determine the problem.

@yuleisui
Copy link
Collaborator

CHG should not rely on metadata. We will need to identify the pattern from llvm bc alone.

@zz-fz-john
Copy link
Author

I think I found the problem, the function "buildCHG()" will call the function "void CHGBuilder::buildCHGNodes(const Function* F)", and then the function "void CHGBuilder::buildCHGNodes(const Function* F)" will call " connectInheritEdgeViaCall"
the function "connectInheritEdgeViaCall" 's code is here:

void CHGBuilder::connectInheritEdgeViaCall(const Function* caller, const CallBase* cs)
{
    if (getCallee(cs) == nullptr)
        return;

    const Function* callee = getCallee(cs);

    struct DemangledName dname = demangle(caller->getName().str());
    if ((isConstructor(caller) && isConstructor(callee)) || (isDestructor(caller) && isDestructor(callee)))
    {
        if (cs->arg_size() < 1 || (cs->arg_size() < 2 && cs->paramHasAttr(0, llvm::Attribute::StructRet)))
            return;
        const Value* csThisPtr = cppUtil::getVCallThisPtr(cs);
        //const Argument* consThisPtr = getConstructorThisPtr(caller);
        //bool samePtr = isSameThisPtrInConstructor(consThisPtr, csThisPtr);
        bool samePtrTrue = true;
        if (csThisPtr != nullptr && samePtrTrue)
        {
            struct DemangledName basename = demangle(callee->getName().str());
            if (!LLVMUtil::isCallSite(csThisPtr)  &&
                    basename.className.size() > 0)
            {
                chg->addEdge(dname.className, basename.className, CHEdge::INHERITANCE);
            }
        }
    }
}

This function will find the constructor or destructor of a class. If the constructor (destructor) of the class calls the constructor (destructor) of another class, the class will be treated as a subclass of the other class.

@zz-fz-john
Copy link
Author

I think I found the problem, the function "buildCHG()" will call the function "void CHGBuilder::buildCHGNodes(const Function* F)", and then the function "void CHGBuilder::buildCHGNodes(const Function* F)" will call " connectInheritEdgeViaCall" the function "connectInheritEdgeViaCall" 's code is here:

void CHGBuilder::connectInheritEdgeViaCall(const Function* caller, const CallBase* cs)
{
    if (getCallee(cs) == nullptr)
        return;

    const Function* callee = getCallee(cs);

    struct DemangledName dname = demangle(caller->getName().str());
    if ((isConstructor(caller) && isConstructor(callee)) || (isDestructor(caller) && isDestructor(callee)))
    {
        if (cs->arg_size() < 1 || (cs->arg_size() < 2 && cs->paramHasAttr(0, llvm::Attribute::StructRet)))
            return;
        const Value* csThisPtr = cppUtil::getVCallThisPtr(cs);
        //const Argument* consThisPtr = getConstructorThisPtr(caller);
        //bool samePtr = isSameThisPtrInConstructor(consThisPtr, csThisPtr);
        bool samePtrTrue = true;
        if (csThisPtr != nullptr && samePtrTrue)
        {
            struct DemangledName basename = demangle(callee->getName().str());
            if (!LLVMUtil::isCallSite(csThisPtr)  &&
                    basename.className.size() > 0)
            {
                chg->addEdge(dname.className, basename.className, CHEdge::INHERITANCE);
            }
        }
    }
}

This function will find the constructor or destructor of a class. If the constructor (destructor) of the class calls the constructor (destructor) of another class, the class will be treated as a subclass of the other class.

In fact, the situation where the constructor of one class calls the constructor of another class may be caused by two reasons: 1. The class is a subclass of another class. 2.An instance of another class is a member variable of that class.

@yuleisui
Copy link
Collaborator

I think I found the problem, the function "buildCHG()" will call the function "void CHGBuilder::buildCHGNodes(const Function* F)", and then the function "void CHGBuilder::buildCHGNodes(const Function* F)" will call " connectInheritEdgeViaCall" the function "connectInheritEdgeViaCall" 's code is here:

void CHGBuilder::connectInheritEdgeViaCall(const Function* caller, const CallBase* cs)
{
    if (getCallee(cs) == nullptr)
        return;

    const Function* callee = getCallee(cs);

    struct DemangledName dname = demangle(caller->getName().str());
    if ((isConstructor(caller) && isConstructor(callee)) || (isDestructor(caller) && isDestructor(callee)))
    {
        if (cs->arg_size() < 1 || (cs->arg_size() < 2 && cs->paramHasAttr(0, llvm::Attribute::StructRet)))
            return;
        const Value* csThisPtr = cppUtil::getVCallThisPtr(cs);
        //const Argument* consThisPtr = getConstructorThisPtr(caller);
        //bool samePtr = isSameThisPtrInConstructor(consThisPtr, csThisPtr);
        bool samePtrTrue = true;
        if (csThisPtr != nullptr && samePtrTrue)
        {
            struct DemangledName basename = demangle(callee->getName().str());
            if (!LLVMUtil::isCallSite(csThisPtr)  &&
                    basename.className.size() > 0)
            {
                chg->addEdge(dname.className, basename.className, CHEdge::INHERITANCE);
            }
        }
    }
}

This function will find the constructor or destructor of a class. If the constructor (destructor) of the class calls the constructor (destructor) of another class, the class will be treated as a subclass of the other class.

In fact, the situation where the constructor of one class calls the constructor of another class may be caused by two reasons: 1. The class is a subclass of another class. 2.An instance of another class is a member variable of that class.

That is a good finding. Could we distinguish these two cases by changing the above code to fix it?

@zz-fz-john
Copy link
Author

Unfortunately, in the above code, it should be difficult to distinguish between the two cases.
there is a case to show that the two case is same in IR.

class AP_GPS_MAV : public AP_GPS_Backend {
public:

    using AP_GPS_Backend::AP_GPS_Backend;

    bool read() override;

    static bool _detect(struct MAV_detect_state &state, uint8_t data);

    void handle_msg(const mavlink_message_t &msg) override;

    const char *name() const override { return "MAV"; }

private:
    bool _new_data;
    uint32_t first_week;
    JitterCorrection jitter{2000};
};
#endif

the correspond IR is:

; Function Attrs: noinline nounwind optnone
define internal fastcc noundef ptr @_ZN10AP_GPS_MAVCI214AP_GPS_BackendER6AP_GPSRNS0_9GPS_StateEPN6AP_HAL10UARTDriverE(ptr noundef nonnull returned align 8 dereferenceable(128) %0, ptr noundef nonnull align 8 dereferenceable(928) %1, ptr noundef nonnull align 8 dereferenceable(176) %2, ptr noundef %3) unnamed_addr #5 comdat align 2 !dbg !210192 {
  %5 = alloca ptr, align 4
  %6 = alloca ptr, align 4
  %7 = alloca ptr, align 4
  %8 = alloca ptr, align 4
  store ptr %0, ptr %5, align 4
  call void @llvm.dbg.declare(metadata ptr %5, metadata !210197, metadata !DIExpression()), !dbg !210199
  store ptr %1, ptr %6, align 4
  call void @llvm.dbg.declare(metadata ptr %6, metadata !210200, metadata !DIExpression()), !dbg !210199
  store ptr %2, ptr %7, align 4
  call void @llvm.dbg.declare(metadata ptr %7, metadata !210201, metadata !DIExpression()), !dbg !210199
  store ptr %3, ptr %8, align 4
  call void @llvm.dbg.declare(metadata ptr %8, metadata !210202, metadata !DIExpression()), !dbg !210199
  %9 = load ptr, ptr %5, align 4
  %10 = load ptr, ptr %6, align 4, !dbg !210203
  %11 = load ptr, ptr %7, align 4, !dbg !210203
  %12 = load ptr, ptr %8, align 4, !dbg !210203
  %13 = call fastcc noundef ptr @_ZN14AP_GPS_BackendC2ER6AP_GPSRNS0_9GPS_StateEPN6AP_HAL10UARTDriverE(ptr noundef nonnull align 8 dereferenceable(90) %9, ptr noundef nonnull align 8 dereferenceable(928) %10, ptr noundef nonnull align 8 dereferenceable(176) %11, ptr noundef %12), !dbg !210203
  store ptr getelementptr inbounds ({ [24 x ptr] }, ptr @_ZTV10AP_GPS_MAV, i32 0, inrange i32 0, i32 2), ptr %9, align 8, !dbg !210203
  %14 = getelementptr inbounds %class.AP_GPS_MAV, ptr %9, i32 0, i32 3, !dbg !210204
  %15 = call fastcc noundef ptr @_ZN16JitterCorrectionC2Ett(ptr noundef nonnull align 8 dereferenceable(28) %14, i16 noundef zeroext 2000, i16 noundef zeroext 100), !dbg !210205
  ret ptr %9, !dbg !210203
}

you could see that the call of two case is same,so I can not distinguish the two case by modifying above code

@yuleisui
Copy link
Collaborator

Any other way to distinguish them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants