Showcase

Using Marqo and GPT3 for topical news summarisation

I wanted to build a fun search application within minutes to show the ease and power of Marqo. I decided to build a news summarisation application, i.e. answer questions like “What is happening in business today?” that synthesises example news corpus (link).

The plan is to use Marqo’s search to provide useful context for a generation algorithm; we use OpenAI’s GPT3 API (link). This is more formally called “retrieval-augmented generation” and helps with generation tasks that require specific knowledge that the model has not seen during training. For example, company-specific documents and news data that’s “in the future”. Overview of what we’re planning:

Thus, we can see the problem when we solely ask GPT3, “What is happening in business today?” It does not know and thus generates a generic response:

Question: What is happening in business today?
Answer: There is a lot happening in business today. The economy is slowly recovering from the recession, and businesses are starting to invest again...

In fact, anyone following the financial markets knows ‘the “economy is slowly recovering” and “businesses are starting to invest again” is completely wrong!!

To solve this, we need to start our Marqo docker container, which creates a Python API we’ll interact with during this demo:

docker pull marqoai/marqo:0.0.6;
docker rm -f marqo;
docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:0.0.6

Next, let’s look at our example news documents corpus, which contains BBC and Reuters news content from 8th and 9th of November. We use “_id” as Marqo document identifier, the “date” the article was written, “website” indicating the web domain, “Title” for the headline, and “Description” for the article body:

[
{
'_id': '2',
'date': '2022-11-09',
'website': 'www.bbc.co.uk',
'Title': 'COP27: Time to pay the climate bill - vulnerable nations',
'Description': 'Leaders of countries flooded or parched due to climate change are pleading at the COP27 summit...'
},...
]

We then index our news documents that manage both the lexical and neural embeddings. By default, Marqo uses SBERT from neural text embeding and has complete OpenSearch lexical and metadata functionality natively.

DOC_INDEX_NAME = ''news-index'
print('Establishing connection to marqo client.')
mq = marqo.Client(url='http://localhost:8882')
print('Indexing documents')
mq.index(DOC_INDEX_NAME).add_documents(MARQO_DOCUMENTS)

Now we have indexed our news documents, we can simply use Marqo Python search API to return relevant context for our GPT3 generation. For query “q”, we use the question and want to match news context based on the “Title” and “Description” text. We also want to filter our documents for “today”, which was ‘2022–11–09’.

question = 'What is happening in business today?'
date = '2022-11-09'
results = mq.index(DOC_INDEX_NAME).search(
q=question,
searchable_attributes=['Title', 'Description'],
filter_string=f"date:{date}"
limit=5)

Next, we insert Marqo’s search results into GPT3 prompt as context, and we try generating an answer again::

Background:
Source 0) M&S warns of 'gathering storm' as shoppers squeezed || Marks and Spencer has warned of a "gathering storm" of higher costs for retailers and pressure on household budgets as it reported a fall in profits for the first half of the year. The High Street giant said trading would become "more challenging" after it revealed its profits dropped by 24%. It said "all parts" of retail would be affected...
Source 1) Facebook-owner Meta to cut 11,000 staff || Meta, which owns Facebook, Instagram and WhatsApp, has announced that it will cut 13% of its workforce. The first mass lay-offs in the firm's history will result in 11,000 employees, from a worldwide headcount of 87,000, losing their jobs. Meta chief executive Mark Zuckerberg said the cuts were "the most difficult changes we've made in Meta's history". The news follows...
Source 2) Allianz beats quarterly profit expectations, posts rosier 2022 outlook || German insurer Allianz on Wednesday posted a better-than-expected 17% rise in third-quarter net profit, helped by strength at its property and casualty division, and gave a more optimistic full-year outlook....
Source 3) Georgia race goes to run-off as fight for US Senate neck-and-neck || Results are being declared in the US midterm elections, with control of Congress hanging in the balance. Republicans are likely to take control of the House of Representatives but the Senate fight is on a knife-edge. The race for the Senate seat in Georgia, which could determine the outcome, will not be decided until a runoff election on 6 December....
Source 4) Tesla stock hits 2-year low after Musk sells $4 bln worth of shares || Tesla Inc (TSLA.O) shares slid to their lowest level in nearly two years on Wednesday after Chief Executive Elon Musk sold $3.95 billion worth of shares in the electric-vehicle maker. The shares were down 6.1% at $179.66 in afternoon trading. Musk's latest share sale fueled jitters about the fallout of his Twitter buy on the world's most valuable automaker, analysts...
view raw answergen.py hosted with ❤ by GitHub
Question: What is happening in business today?
Answer: There are a few major stories happening in business today. Firstly, Marks and Spencer has warned of a "gathering storm" of higher costs for retailers and pressure on household budgets. Secondly, Facebook-owner Meta is cutting 11,000 staff in its first mass lay-offs. And finally, Tesla stock has hit a 2-year low after CEO Elon Musk sold $4 billion worth of shares.
view raw answergen2.py hosted with ❤ by GitHub

Sucess! You’ll notice that using Marqo to add relevant and temporally correct context means we can build a news summarisation application with ease. So instead of wrong and vague answers, we get factually-grounded summaries based on retrieved facts such as:

  1. Marks and Spencer has warned of a “gathering storm” of higher costs for retailers
  2. Facebook-owner Meta is cutting 11,000 staff
  3. Tesla stock has hit a 2-year low after CEO Elon Musk sold $4 billion worth of shares

Full code: here (you’ll need GPT3 API token)

Visit Marqo on Github: https://github.com/marqo-ai/marqo/tree/mainline/examples/GPT3NewsSummary

Iain Mackie
Investor @ Creator Fund