Previously the Metal backend used a manual mutex system to make sure the
BufferSetSubData didn't have data races with reads from the GPU. Replace
this with a non-hacky version
- Make the Buffer objects allocated on the GPU
- Make SetSubData use a ResourceUploader that allocates a CPU buffer
and schedules a CPU->GPU copy.
- Have a list of pending commands and a finished command serial to
order operations and track when resource become unused.