vulkan: wait for fences on ErrorInjector device loss

When ErrorInjector injects a device loss, it is fake and commands are
still running on the device and we need to wait for them before we start
to deallocate stuff.

When handling the error in the frontend, call WaitForIdleForDestruction
so that all the commands are finished. In the Vulkan backend
WaitForIdleForDestruction have a special code path to no allow error
injection on vkWaitForFences in this specific case, as double error
injection would make the problem appear again.

Bug: chromium:1244408

Change-Id: I710fccbb40b4b14d84f5787be5e002b469e6e2e3
Reviewed-on: https://dawn-review.googlesource.com/c/dawn/+/63101
Commit-Queue: Corentin Wallez <cwallez@chromium.org>
Reviewed-by: Stephen White <senorblanco@chromium.org>
This commit is contained in:
Corentin Wallez 2021-09-02 17:23:18 +00:00 committed by Dawn LUCI CQ
parent c0f20fbc54
commit 71f2214e14
2 changed files with 20 additions and 1 deletions

View File

@ -28,6 +28,7 @@
#include "dawn_native/CreatePipelineAsyncTask.h"
#include "dawn_native/DynamicUploader.h"
#include "dawn_native/ErrorData.h"
#include "dawn_native/ErrorInjector.h"
#include "dawn_native/ErrorScope.h"
#include "dawn_native/ExternalTexture.h"
#include "dawn_native/Instance.h"
@ -264,10 +265,18 @@ namespace dawn_native {
void DeviceBase::HandleError(InternalErrorType type, const char* message) {
if (type == InternalErrorType::DeviceLost) {
mState = State::Disconnected;
// If the ErrorInjector is enabled, then the device loss might be fake and the device
// still be executing commands. Force a wait for idle in this case, with State being
// Disconnected so we can detect this case in WaitForIdleForDestruction.
if (ErrorInjectorEnabled()) {
IgnoreErrors(WaitForIdleForDestruction());
}
// A real device lost happened. Set the state to disconnected as the device cannot be
// used. Also tags all commands as completed since the device stopped running.
AssumeCommandsComplete();
mState = State::Disconnected;
} else if (type == InternalErrorType::Internal) {
// If we receive an internal error, assume the backend can't recover and proceed with
// device destruction. We first wait for all previous commands to be completed so that

View File

@ -879,6 +879,16 @@ namespace dawn_native { namespace vulkan {
VkResult result = VkResult::WrapUnsafe(VK_TIMEOUT);
do {
// If WaitForIdleForDesctruction is called while we are Disconnected, it means that
// the device lost came from the ErrorInjector and we need to wait without allowing
// any more error to be injected. This is because the device lost was "fake" and
// commands might still be running.
if (GetState() == State::Disconnected) {
result = VkResult::WrapUnsafe(
fn.WaitForFences(mVkDevice, 1, &*fence, true, UINT64_MAX));
continue;
}
result = VkResult::WrapUnsafe(
INJECT_ERROR_OR_RUN(fn.WaitForFences(mVkDevice, 1, &*fence, true, UINT64_MAX),
VK_ERROR_DEVICE_LOST));