Natural Language Understanding (NLU) works by utilizing the power of people to model the reality when talking or writing.
People use language to think. It reflects human thought process. People think by modeling the problem and situation where it occurs and asking and answering a set of questions to refine their understanding. It is a creative process, so there is a very small chance that computers will ever master creative thinking.
But, we can use what people say/write and the properties of languages to visualize what people are thinking about. So we can use the text to convert it into models, which people have in their brain, when they write or talk. And these models are actual understanding. They can be visualized if people are talking about something that is happening in real life (commonsense knowledge).
Commonsense knowledge is our life experiences, which everybody has. Having similar experiences helps people to understand each other.
Having a visual reference can help a lot in the process of understanding (by both computers and people).
Usually understanding is an interactive process. People when they don’t understand, they ask clarifying questions. So our NLU —  is a chatbot, which can ask and answer questions, do some simple logical reasoning. It visualizes what he “understood” (sorry, it doesn’t have any consciousness). And a person can correct the bot, if he sees that he was misunderstood.