RI: Small: Learning to Read, Ground, and Reason in Multimodal Text