Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes Windows installation failed #14

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jiaweien
Copy link

On the openstack platform, create the virtual machine using the Windows image template processed by sysprep(I used the windows 2012 data center Chinese).
sometimes cloudbase-init reboot the Windows after sethostname when setup.exe is running, although the probability of that happening is very small.
At that time the SYSTEM\Setup\Status\SysprepStatus\GeneralizationState's value is already 7, but the setup of oobe still running, this causes the Windows installation to fail to continue with the exception, and the restart remains the same.
I apologize for not keeping the log file, but please refer to the time in log file of sysprep and cloudbase-init for details.

@ader1990
Copy link
Member

ader1990 commented Aug 13, 2019

Hello,

Can you open a bug report on https://bugs.launchpad.net/cloudbase-init and use the git review workflow to push a fix for the bug? Thank you. Let me know if you need help with using 'git review'.

This repo is just a mirror of https://opendev.org/x/cloudbase-init.git and the bug tracking/code patches are done using launchpad and review.opendev.org

@ader1990
Copy link
Member

@jiaweien have you tried using the registry keys in HKEY_LOCAL_MACHINE\SYSTEM\Setup\Status\ChildCompletion to achieve the same result? I think the process checking can be very brittle in case of newer Windows versions.

@jiaweien
Copy link
Author

jiaweien commented Aug 13, 2019 via email

@jiaweien
Copy link
Author

jiaweien commented Aug 13, 2019 via email

@jiaweien
Copy link
Author

jiaweien commented Nov 1, 2019

@ader1990 Someone told me recently that he had the same problem,I haven't come up with a better solution yet except process checking. So , could you review it?

@ader1990
Copy link
Member

@jiaweien I understand the issue here, but we need to find a more reliable way to fix this problem, as looking for processes with a hardcoded executable (like setup,exe) can bring to an infinite loop or failure if the admin installed some service that has setup.exe in the path.

If I make a patch using HKEY_LOCAL_MACHINE\SYSTEM\Setup\Status\ChildCompletion to test the issue, could you test it?

@jiaweien
Copy link
Author

@ader1990 Okay,I'll be test it later, and tell you the result when it's over.

@jiaweien
Copy link
Author

@ader1990 During this period, I tested Windows server 08 R2 and Windows server 2012 R2 several times(both are in chinese). I checked whether the registry value of SYSTEM\Setup\Status\SetupFinalTasks was 3 to determine whether the initialization had been completed.So far, there have been no initialization failures.But I'm not sure if I haven't tested enough or if I've fixed the problem.I also included my changes in the patches I submitted,I wonder if that's what you suggest.

@jiaweien
Copy link
Author

jiaweien commented Jan 2, 2020

@ader1990
test-code
cloudbase-init-log
Just now, I got this log from my test code.In this initialization, when the GeneralizationState value is 7, there are approximately 3 seconds or so after the SetupFinalTasks value is not 3. This should indicate that the initialization work can be determined by checking the status of SetupFinalTasks.

@ader1990
Copy link
Member

ader1990 commented Jan 3, 2020

On a Windows 10, this is what these registry keys contain:

Get-ChildItem -Path HKLM:\system\Setup\Status\


    Hive: HKEY_LOCAL_MACHINE\system\Setup\Status


Name                           Property
----                           --------
ChildCompletion                setup.exe       : 3
                               oobeldr.exe     : 3
                               SetupFinalTasks : 3
SysprepStatus                  GeneralizationState : 7
UnattendPasses                 oobeSystem : 2

I need to check if all Windows versions have these keys to make sure there is no regression when we check the child completion keys.

@ader1990
Copy link
Member

ader1990 commented Jan 3, 2020

@jiaweien Here is a suggestion to improve the code (needs refactor or a better placement of code though).

I would prefer putting the ChildCompletion check just before the plugin reboot as from my understanding, you get this error only if you run the SetHostname Plugin in the normal cloudbase-init run. The cloudbase-init installer configures the SetHostname Plugin to run during the Sysprep specialize step (which is the recommended way).

Your current code will hang forever if cloudbase-init is installed using the default installer from cloudbase.it . The reason is that during the unattend phase, the value of the subkeys from ChildCompletion are: SetupFinalTasks = 0 and setup.exe = 1.

    def wait_for_boot_completion(self):
        try:
            with winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE,
                                "SYSTEM\\Setup\\Status\\SysprepStatus", 0,
                                winreg.KEY_READ) as key:
                while True:
                    gen_state = winreg.QueryValueEx(key,
                                                    "GeneralizationState")[0]
                    if gen_state == 7:
                        break
                    time.sleep(0.1)
                    LOG.info('Waiting for sysprep completion. '
                             'GeneralizationState: %d', gen_state)

            with winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE,
                                "SYSTEM\\Setup\\Status\\ChildCompletion", 0,
                                winreg.KEY_READ) as key:
                steps = 0
                while steps < 1000:
                    setup_state = winreg.QueryValueEx(key,
                                                      "setup.exe")[0]
                    setup_final_state = winreg.QueryValueEx(key,
                                                            "SetupFinalTasks")[0]
                    if setup_state == 3 and setup_final_state == 3:
                        LOG.info('Sysprep entered desired mode for service run.')
                        break
                    if setup_state == 1 and setup_final_state == 0:
                        LOG.info('Sysprep entered desired mode for unattended run.')
                        break
                    time.sleep(0.1)
                    LOG.info('Waiting for sysprep completion. '
                             'setup_state: %(setup_state)d and setup_final_state: %(setup_final_state)d', {"setup_state": setup_state, "setup_final_state": setup_final_state})
                    steps = steps + 1
        except WindowsError as ex:
            if ex.winerror == 2:
                LOG.debug('Sysprep data not found in the registry, '
                          'skipping sysprep completion check.')
            else:
                raise ex

@jiaweien
Copy link
Author

jiaweien commented Jan 7, 2020

@ader1990 Maybe make the cloudbase-init installer configures the SetHostname Plugin to run during the Sysprep specialize step is the best choice.
The purpose of my current code is to wait until the setup process and its child processes (which contain the parsing and execution of unattention.xml) are complete before starting the cloudbase-init task.
Also, I'm not sure if the code you posted will cause the previous error. For example, in my previous screenshot, when the SYSTEM\Setup\Status\SysprepStatus\GeneralizationState is 7, there are nearly 3 seconds left for the Setup to be truly complete. If cloudbase-init executes to sethostname when setup_state == 1 and setup_final_state == 0 are set, the previous error may be thrown.

@ader1990
Copy link
Member

ader1990 commented Jan 8, 2020

When using the cloudbase-init MSI installer, the installer has an option to sysprep the machine at the end of the installation. By default, it configures SetHostname Plugin to run during the Sysprep specialize step.

What my code does is to handle two scenarios:

  1. when cloudbase-init runs during specialize phase, it waits for the GeneralizationState=7, setup.exe = 1 and SetupFinalTasks = 0 (as the childcompletions are not ready at that moment). The reboot is performed by specialize step thorugh the special instruction of the RunSychronousComand - WillReboot https://github.com/cloudbase/cloudbase-init-installer/blob/master/CloudbaseInitSetup/Unattend.xml#L26
  2. when cloudbase-init runs at each boot as a service, it waits for GeneralizationState=7, setup.exe = 3 and SetupFinalTasks = 3 (as the childcompletions should be finished before performing other tasks like reboot, user scripts, etc).

@jiaweien
Copy link
Author

jiaweien commented Jan 9, 2020

I'm very sorry. I may have misunderstood you.I thought you wanted to use your code without refactoring.
My idea is to complete the hostname setting by modifying unattend.xml, adding sethostname task in unattend.xml, and configuring willreboot as request to complete the hostname update. Is that what you said about changing the sethostname process?
This way, users who want to customize unattend.xml (I used my own generated unattend.xml) will need to add the custom part to the unattend.xml in the cloudbase-init installer directory before proceeding to the last step, sysprep.
I tried it, and it worked.

@ader1990
Copy link
Member

ader1990 commented Mar 3, 2020

Hello @jiaweien,

Did you manage to solve this issue per my suggestion from above or did you use a special unattedn.xml? Do you still need some code change in Cloudbase-Init?

Thank you,
Adrian Vladu

@jiaweien
Copy link
Author

jiaweien commented Mar 16, 2020

@ader1990
I'm sorry. I've been busy lately.
I just tried to modify the architecture of the way what you said the last time , in the unattend.xml file before execution cloudbase-init added a step (my test scripts, not sethostname), it is can be done before the start of the cloubase-init (I reboot the machine in my test script, I saw the completion of sysprep log than cloudbase-init start logging time sooner).
My idea is that, would be replace my script with sethostname, setting the hostname and restarting it before the other steps in cloudbase-init are executed.
However, if this is true, it may be worth noting that sethostname may require a network to retrieve the hostname from the metadata.

@jiaweien
Copy link
Author

jiaweien commented Oct 21, 2020

Hello @ader1990 ,
A lot has been going on lately, so...Sorry for the late reply

You're right, my code will be hang forever when using the cloudbase-init MSI installer, the installer has an option to sysprep the machine at the end of the installation..
image
image

And I also tested your code, I have tested it many times and it always has completed the initialization of the virtual machine.
image
image

But I still wonder if these changes only reduce the likelihood of initialization failure, not prevent it altogether.Because if I choose to use the cloudbase-init service to initialize the server, will there still be GeneralizationState=7, setup.exe = 1 and SetupFinalTasks = 0 at some point-in-time, and then restart the server directly after sethostname plugin completes?(But that's just my suspicion, and it didn't happen during my tests).

> if setup_state == 1 and setup_final_state == 0:
>      LOG.info('Sysprep entered desired mode for unattended run.') 
>      break

Also, I am not sure if I can add a judgment on the existence of the "Cloudbase-init-unattend.log" file to determine whether Cloudbase-init is initiated by the service or by unattended(Because I see that when Cloudbase-init is initiated by unattended. it will log to "Cloudbase-init-unattended. log").
image
image

eg.

if GeneralizationState=7:
    while steps < 1000:
        if C:\Program Files\Cloudbase Solutions\Cloudbase-Init\log\cloudbase-init-unattend.log is exist:
            if setup.exe = 1 and SetupFinalTasks = 0:
                break
        else:
             if setup_state == 3 and setup_final_state == 3:
                break
        time.sleep(0.1)
        steps = steps + 1

Do you think this is better? Or that it may not be necessary at all. I'd like to hear your thoughts.

Thank you,
jiaweien

@ader1990
Copy link
Member

Hello @jiaweien ,

You can have a check of the config options that are set in the unattend conf, like https://github.com/cloudbase/cloudbase-init-installer/blob/master/CloudbaseInitSetup/Actions/ConfFileActions.js#L86, but that is a brittle and might change in the future. The best course of action would still be the registry keys check, as those keys are reliable source of identifying in what stage cloudbase-init is run. Still, checking that stop_service_on_exit is false can reliably tell that cloudbase-init is not running in normal mode, under a Windows service, but this change cannot be included in the upstream codebase, whereas the code I shared can and should be 100% reliable.

In this case, I will create a patch with those checks and would be glad if you could test the new MSI.

Thank you,
Adrian Vladu

@jiaweien
Copy link
Author

Hello @ader1990 ,

Yes, I also noticed allow_reboot is false in cloudbase-init-unattend.conf when I was testing earlier.
In my custom virtual machine image, I manually sysprep after cloudbase-init is installed(C:\windows\system32\sysprep.exe /generalize /oobe /shutdown), instead of selecting Cloudbase-init.msi to complete sysprep and initialize it to the virtual machine image template. This will allow me to have my own custom unattend step, and also not affect the execution of Cloudbase-init as a service.
So I'm not worried about Cloudbase-init running in unattended-mode, just not known if Cloudbase-init running in service mode can completely avoid this problem?

I am looking forward to your new Cloudbase-init.msi and I will test it again then.

Thank you
jiaweien

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants