Behind The Scenes: Understanding video object segmentation (VOS)