The current paper shows a neuro-robotics experiment on developmental learning of goal-directed actions. The robot was trained to predict visuo-proprioceptive flow of achieving a set of goal-directed behaviors through iterative tutor training processes. The learning was conducted by employing a dynamic neural network model which is characterized by their multiple time-scale dynamics. The experimental results showed that functional hierarchical structures emerge through stages of developments where behavior primitives are generated in earlier stages and their sequences of achieving goals appear in later stages. It was also observed that motor imagery is generated in earlier stages compared to actual behaviors. Our claim that manipulatable inner representation should emerge through the sensory–motor interactions is corresponded to Piaget’s constructivist view.