4 Tips to use Watson Discovery

Pedro Matias de Araujo
3 min readMay 5, 2022

Disclaimer — This is just my opinion and a bit of my experience with Watson Discovery.

Watson Discovery

I’ve been working with Watson Discovery for over a year now and that’s why I felt motivated to write this article to explain my experience and give some tips for those who want to work with this tool.

What’s Watson Discovery?

Watson Discovery is an award-winning AI-powered intelligent search and text analytics platform. It eliminates data silos and recovers hidden information on corporate data. The platform uses innovative, market-leading natural language processing to uncover meaningful business insights from documents, webpages, and big data, which can cut research time by more than 75%. (source)

some use cases

  • Commentaries
  • Blog posts
  • Documents

Basically, all static text can be used with Watson Discovery to extract a good value and then be able to return that in a good structure project.

Like all AI tools, it’s not magic. It’s not enough to just throw things on the platform and expect everything to magically resolve itself.

What to know when starting to work with WD

It's an AI tool, not a performance tool

In the early stages, expected a huge performance like ElasticSearch, of course, Watson Discovery had a satisfactory answer but after some time I understand that it is not the focus of this tool, there are thousands of file analyses inside it and complex queries, so it’s normal that you have to wait a few seconds until you get an answer

Expect a large volume of data

In my experience we only use pdfs, so after the process, the JSON returned is huge, passing more than 1MB to a doc. So in case, I want 50 documents to be returned in one request, we probably get 50~60MB from Watson Discovery. So was common occurs memory overflow until we understood what was happening.

Beware of document updates

For security, we send some metadata along with the pdf file, in this form, we filter the permission of the file directly inside Discovery, but that way if you change some metadata, you need to update it. However, there is no option to just update the metadata, so was necessary to upload the same PDF again with new metadata.

On average, a huge PDF file takes around 6~8 minutes to be processed by Watson, this is not a big number as an absolute value, but when it scales and has thousands of files to process it starts to be a problem.

Some weird things happen sometimes

There is a library in React that saves time, takes the returned JSON, and renders a pageview, this works great with PDFs and saves a lot of time in developing reuse of that asset.

Sometimes, like any AI tool, things don’t go as expected. So, for example, a passage of the file is highlighted and basically, another document that is basically the same file with few modifications is not.

Why? I don’t know

Final thoughts

Watson Discovery is truly an amazing tool, but as with almost all cases, not everything is perfect. To perform well using Watson Discovery, it’s important to understand what it does and doesn’t do.

Like every tool out there, Watson Discovery is constantly building and improving. I believe that the more we work with something, the more we have the opportunity to learn about and help improve potential problems.

I think it’s always worth the experience.

--

--