You should check out Toki Pona; it's an artificial language of about 120 words and uses particles to indicate lexical roles (much like japanese) - and it is of the consonant-vowel structure I propose as superior for our purposes.
If 120 is to minimal for you, I suggest you look at "Swadesh Lists"; google it*. they are wordlists composed of 100-1000 words for concepts that are most frequent in human languages.
regards minimalism however, I would personally be quite philosophical about what a vocabulary actually needs
. Consider how many idiomatic phrases there are that exist from our pre-scientific roots: despite knowing the Sun is a stationary body, we still call it a "sunrise" - it could more accurately be a "fullface" (for midday; as it is we who are 'fully facing the sun')
; consider too the words "dark" and "cold" - neither truly exists, but are both the state of absence of heat and light respectively - they are default states
and therefore having words for them is somewhat redundant, as by specifying there is 'no light/heat' for instance, their existence is implied, meaning we only need words for 'light' and 'heat' plus a negator.
what I am proposing is that you endeavour to be economic
in your word choice: for the sake of efficiency and practicality; and ease of programming.
As you have already affirmed, a large and complex language is beyond requirement for a simple robotic companion.
I too would not worry ever about how one writes
a language, and simply focus on the spoken aspect: contrary to popular belief, written language is merely just a means of representing spoken language; much to my peers despair i have often declared that English could be just as well depicted by "Chinese" characters. Thus, unless you intend your robot to write (which I suspect you don't, as that is crazy difficult to implement xD; not least for a cat ;P) - dont get hung up on aesthetics.
You know the software available better than I; if you inform me of the capabilities and limitations of speech producing software, I can certainly help you construct a bespoke phonology.
*[EDIT] Here is the wiktionary archive of Swadesh Lists; choose a language and size and you're away - note that much of what humans regard as distinct concepts are things like natural phenomena (fire, wind, etc.); human anatomy (hands; heads. I believe this is a most
important category for human-robot relations - something Toki Pona struggles to convey accurately); and 'person' (first, second, etc; I personally favour a speaker