Text Summarization using Python Programming

Python is a powerful programming language. It is quite easy to pick up due to its simplicity. However, just like anything worthwhile, it takes practice to become good at it.

Recently, Python has sky-rocketed in popularity due to its immense applications in AI development. You probably already know about OpenAI and their miraculous chatbot called ChatGPT.

It spun the world into an AI race. Well, you too can be a part of that race by learning programming in Python. Today, we are going to look at how you can use Python 3 to summarize text. This is a good exercise that will help you grasp the concepts of Python and text manipulation and how you can use AI methods and functions to achieve your programming objectives.

How to Summarize A Text Using Python

This process is simple, but it will take some time and basic technical knowledge of Python. For the purpose of this tutorial, we will be using a Google Colab notebook. The benefit of this is that you won’t need to worry about a lot of dependencies. Without further ado, let’s start.

What Is Text Summarization?

Text summarization is the process of creating a condensed version of a given text. The condensed and smaller version only provides the crucial details and cuts away anything that is not strictly required.

There are two types of summaries; Extractive and Abstractive. An extractive summary only uses sentences from the source material, without modifying them. An abstractive summary takes the liberty to paraphrase the source material while shortening it. In this tutorial, we are going to check out the abstractive method.

Installing the Required Python Libraries

Just like any other object-oriented programming language, Python has a number of libraries for a variety of functions. For text summarization, we are going to use the Pegasus transformer. This is a library that has a lot of functions related to abstractive summarizing.

To install it, simply type the following command in the code box.

!pip install transformers

But, before installing transformers, you need to install Pytorch as well. Pytorch is a Python implementation for deep learning and neural networks. It is a prerequisite for using transformers.

The command to install Pytorch is as follows:

!pip install torch==1.8.2+cu111 torchvision==0.9.2+cu111 torchaudio===0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html

And that is all for installing the libraries.

Importing and Loading the Pegasus Model

Now, we have to import and load the Pegasus model from the transformers library. The Pegasus model works for text summarization by breaking the text into tokens. So, the first thing we need to do is to load the tokenizer.

To load the tokenizer input the following code.

from transformers import PegasusForConditionalGeneration, PegasusTokenizer

Run it, and then immediately after, write and run the following command.

tokenizer = PegasusTokenizer.from_pretrained("google/pegasus-xsum")

Now, the tokenizer has been loaded.

However, we are not done yet. There is still one more command to run for the complete model loading. That is the following code.

model = PegasusForConditionalGeneration.from_pretrained("google/pegasus-xsum")

After you have run all of these commands, you will see some files being downloaded and installed. They are around 2 gigabytes in size so it may take a few minutes before they are installed.

Now, that is done, let’s move on to the summarization.

Abstractive Summarization

The summarizing process is that we provide some text to the model and then tokenize it and then we pass the tokens to the summarizer model. The output we get is another set of tokens that need to be decoded to see the summary.

Now, let’s do all of that to show you what happens.

So, first of all, we need to enter some text. You can do that by writing

Text = “””Your sample text goes here.
“””

We used some text from the Wikipedia page about Python.

Then we passed this text to the tokenizer. The code for that is the following.

tokens = tokenizer(text, truncation=True, padding="longest", return_tensors="pt")

This returns us the following data.

All, these numbers represent tokens of the text.

To summarize these tokens we need to use the following code.

summary = model.generate(**tokens)

This will summarize the tokens. But we can’t see it yet. To see the summarized tokens you need to input this command.

summary[0]

We do this to access the summary because we need to pass an index value of 0 to start parsing the nested list of tokens from beginning to end. Once you run this command, you should get the following output.

Now, we need to decode this text to see the actual summary. The code for that is

tokenizer.decode(summary[0])

The output shown is as follows

This is a highly concise abstractive summary. And that is how you can use the Pegasus transformer in Python to create a summary.

Alternate Method of Summarizing Text with Python

There is an easier way of summarizing texts with Python. However, it is a bit of a stretch to call it summarizing text with Python. That’s because this method does not involve using Python directly.

Instead, you use an online text summarizer that shortens text for you using Python in its backend. Such summarizing tools are extremely easy to use and they are quite flexible as well. Now, let’s see how you can create a summary with the help of such tools.

You need to start by looking for such tools online. You can do that by Googling “Summarizer” or any variation of that word. This will show you plenty of search results that contain tools for summarizing texts.

Then you need to pick one. We suggest one that has the following properties

Is free or freemium
Has options for adjusting output length
Has multiple options for inputting text as well as downloading it.

If you can find such a tool, then input your content into it. Here is an example to show you what that might look like. (For reference, the tool that you’re using in the screenshots below is Summarizer.org. You can try it out yourself if you want. It has all the qualities we listed above.)

Once you tweak the settings and start the process, your output should look like this.

Now, your summary is ready and you can use it however you like. So, that was the other method of summarizing texts with Python.

Conclusion

Those were two ways of summarizing text. One method was to use the Pegasus transformer to create an abstractive summary. The other was to rely on a tool instead. Both ways have their benefits. However, if you are just looking for convenience, then the second method is the best. If you want to use the exact code as was used in this article, you can refer to this source.