To illustrate this problem, consider two sentences.
(1) John kicked the ball to Mary.The syntactic structures are the same, and it would be easy to teach the syntax to a computer. Thus you could get the machine to understand the outcome of the event described in the sentence: Mary ends up with the ball, as a result of an action performed by John. But what if you asked the machine to describe how John got the ball to Mary? What if the computer's task was to describe John's action? It could probably do OK with sentence (1), because the action isn't a novel one, and having been programmed with the meaning of the word "kicked," all it would have to do is spit that meaning out. But sentence (2) contains a novel verb, one which the machine is unlikely to have in its lexicon (unless the programmer has read the paper from which I stole the verb, and is trying to get one past me). You and I shouldn't have much trouble figuring out John's action in (2), even without context (if we added a little context, such as a sentence preceding both (1) and (2) that read, "John was standing across the table from Mary, and the ball was on the table," figuring out (2) would be even easier for us). What would the computer need to describe John's action in (2)?
(2) John crutched the ball to Mary.
The answer is that the computer would have to understand what sorts of actions are possible with a crutch -- the affordances of crutches -- as well as being able to reason about which of those actions would be effective in performing the action described in the sentence. As I said, I used to think that this required having a body (because affordances are organism-specific, and even body-specific), and that may still be the case. But if it doesn't require having a body, and even if it does, you've got to have a whole heck of a lot of background knowledge and folk theories to get the affordances of a crutch for John, and pick the relevant ones for a given context. For example, you'd have to know that the hard end of a crutch can be used to apply force to other solid objects (i.e., to push them); you'd have to know that balls are generally light enough to be pushed by a crutch; you'd have to know that some ways of applying force will work in some situations, but not in others (e.g., John and Mary may be close to each other, or indoors, making swinging the crutch like a baseball bat to hit the ball over to Mary impractical, and even dangerous); if you knew that the ball was far enough away from John that he would be forced to fully extend his arm and utilize the full length of the crutch to get to the ball, you'd have to know that the mechanics of the situation would probably require John to hold onto the bottom of the crutch and use the top (the part that goes under your arm when walking with a crutch) to hit the ball, while if the ball were closer to John, it might be easier to use the crutch the other way around. The list of things could go on and on. And the machine would have to know things like this for every novel verb it came across (and novel verbs, particularly denominal verbs like "crutched," are pretty common in everyday language).
So that's the problem I saw back then, and still see today. In order to write a program that can understand the meaning of sentences with which you and I would have no trouble, you basically have to program in most or all of the knowledge of at least a well-developed human child, if not an adult. And I don't see how that's really possible. It certainly doesn't seem to be possible today, since we don't have a firm understanding of how people reason about the mechanics of situations like the one in (2), or how they activate the relevant background knowledge (the paper linked above gives one potential answer, in the form of the "Indexical Hypothesis"). If a machine doesn't have that level of knowledge, every time it gets a novel verb, it's going to be lost.