Enhanced reframe model can auto detect active speakers, moving speakers, mentioned specific objects, and track or automatically focus on the right area.