Natural Language Processing

What is natural language processing and what are its applications?

Digital Science AI

Natural language processing (NLP) brings together two disciplines as apparently distant as linguistics and artificial intelligence. Today, this field of computer science, which consists of transforming natural language into a formal language — such as programming — that computers can process, is constantly evolving and its applications are growing.

NLP allows a machine to process natural language and generate answers automatically.
NLP allows a machine to process natural language and generate answers automatically.

If you have ever asked Alexa or Siri for the time, you will have realised that you do not always have to ask the question in the same way. You can ask "what time is it?" or "can you tell me the time?" and in both cases receive an appropriate response. The same is true of Google's automatic translator, which detects the nuances between different words depending on the context. These examples, and many more, have something called natural language processing (NLP) behind them.

WHAT IS NATURAL LANGUAGE PROCESSING (NLP)

According to IBM's definition, natural language processing (NLP) refers to the branch of computer science — and more specifically, the branch of artificial intelligence — concerned with giving computers the ability to understand text and spoken words in much the same way human beings can. This technology has now become highly advanced thanks to the application of technologies like machine learning (automatic learning), big data, the internet of things and neuronal networks.

Some of the most important applications focus on (business intelligence), which automatically analyses customer reactions through their comments on the internet or the questions they ask to get information. Then there are the chatbots, another application which, although there is much room for improvement, streamline interaction with customers through chats or telephone answering services by offering quick, automatic answers using natural language processing.

Natural language processing has its roots in the 1950s, when Alan Turing published a paper (Computing Machines and Intelligence) in which he proposed what is now known as the Turing Test. The test examined the ability of a machine to exhibit intelligent behaviour similar to that of a human being. Since then, the evolution of the algorithms associated with this technology has enabled the current progress to be made.

The evolution of natural language processing and its algorithms.
The evolution of natural language processing and its algorithms.

 SEE INFOGRAPHIC: The evolution of natural language processing and its algorithms [PDF] External link, opens in new window.

HOW DOES NATURAL LANGUAGE PROCESSING WORK

The first models of natural language analysis were symbolic and were based on manually encoding the rules of the language. This made it possible to distinguish, for example, the tenses and conjugations of verbs and to extract the meaning of the root. The 1980s and 1990s saw the statistical revolution. Instead of writing sets of rules (and exceptions) NLP systems began to use statistical inference algorithms to analyse other texts and make comparisons in search of patterns.

The advantage of statistical models is that they are more reliable in understanding new words or in detecting errors, such as misspelled or accidentally omitted words. Most current systems use a combination of symbolic and statistical models. In particular, natural language processing systems perform several types of analyses:

  • Morphological: focuses on distinguishing the different types of words (verbs, nouns, prepositions, etc.) and their variations (gender, number, tense, etc.).
  • Syntactical: separates sentences from each other and analyses their constituent parts (subject, verb, predicate) in order to extract their meaning.
  • Semantic: analyses the meaning, not only of individual words, but also of the sentences of which they are part and of the discourse as a whole.
  • Pragmatic: is responsible for extracting the intention of the text depending on its context and makes it possible to differentiate factors such as irony, ambiguity or mood.

APPLICATIONS OF NATURAL LANGUAGE PROCESSING (EXAMPLES)

Your word processor's spellchecker or your phone's autocorrect use natural language processing techniques, but the applications go much further:

 Document classification

The task of classifying large numbers of documents according to subject matter or style can be streamlined with NLP systems.

 Sentiment and opinion analysis

Comments on social networks about products and services are extremely important to companies and NLP systems can extract relevant information from them.

 Text comparison

NLP systems make it possible to find patterns in texts and detect matches between them, which facilitates plagiarism detection and quality control.

 Document anonymisation

Through NLP systems, documents can be processed to identify and remove mentions of personal data, thus ensuring the privacy of individuals and institutions.

NATURAL LANGUAGE PROCESSING TOOLS

Numerous companies offer software tools to apply natural language processing techniques. To develop them, they use standard programming languages, especially Python — the most widely applied for this purpose — :

  • Natural Language Toolkit (NLTK): This Python library has a modular structure that facilitates NLP functions such as tagging and sorting, among others.
  • MonkeyLearn: is a NLP platform that provides models for text or sentiment analysis, topic classification or keyword extraction tasks.
  • IBM Watson: is a set of AI services stored in IBM's cloud that offers NLP systems, enabling the identification and extraction of categories, sentiments, entities, etc.
  • Google Cloud Natural Language: this natural language API provides several models for sentiment analysis, content classification and entity extraction, among others.
  • Amazon Comprehend: is a NLP service integrated into the Amazon Web Services infrastructure for sentiment analysis, topic modelling, or recognition of entities, among others.
  • SpaCy: is an open code library for NLP with Python. It is one of the most recent and is very useable having been designed to analyse large volumes of data.
  • GenSim: is a specialised Python library focused on modelling subjects, recognising similarities between texts and navigating between different documents.