Text and Data Mining (TDM) is the process of using computational techniques to extract valuable insights and patterns from large volumes of information. Organisations use TDM to gain competitive advantages and drive efficiencies by uncovering hidden trends and extracting valuable facts and concepts from text and data.

CLA licences will soon include rights covering use of published content for TDM purposes. This does not cover the use of content in training or prompting Generative AI models. We already have over 120 participating publishers, and we plan to make these new rights available to all UK businesses and public sector organisations in early 2025.

What is text and data mining?

Text and data mining is the process of transforming unstructured content into a structured format to analyse, extract and identify meaningful information and insights. By using TDM, organisations can harness the power of vast volumes of information and data, capturing and revealing key concepts, trends, and hidden relationships. Organisations use TDM for market research, sentiment analysis, text classification and customer analysis.

This computational technique provides valuable information to organisations for studies and research and to aid decision-making.

TDM Licensing permissions

  • The right to download, extract from, and format, using computational technical means, the licensed content on the licensee’s computer servers (including cloud-based servers) to enable the use of licensed content for the permitted purposes.
  • The right to create one’s own digital copy from print publications for the purpose of text and data mining.
  • The right to create a central repository with retention of mined licensed content (for the duration of the term of the licence only) – subject to the licensee agreeing to industry-standard information security obligations.

TDM use cases

Image showing digital copying actions

  1. Media evaluation
  2. Financial analysis
  3. Image identification
  4. Scientific discovery
  5. Anti-plagiarism

Enquire now

To enquire about the new TDM permissions, use the form below. Our specialist team will be happy to help.

TDM Licence FAQs

Text and data mining (TDM) is the automated process of extracting useful information and insights from large amounts of unstructured data, for the purposes of identifying trends, patterns and knowledge. This allows organisations to efficiently and cost-effectively gain insight from a wide range of data sources.
Unstructured data is data that is not actively managed by a database management system. When we think of data we think of binary/factual information like statistics, numbers and facts, when in reality unstructured data makes up 80 – 90% of global data that is being used by organisations. Unstructured data is less the quantitative content that comes to mind, and more everything we see and use online. Text is the most common type of unstructured data, found in the form of websites, Word documents, online articles, social media posts, reviews, video transcripts, e-books etc. Other types of unstructured data include images, audio and video files. Most new data generated today is unstructured data, and this data is difficult to store and manage in a conventional database, which is why organisations need tools and processes, such as text and data mining, to manage, analyse and make use of it.
The most common sources of data for TDM include journal articles, books, datasets, images, social media posts and websites. TDM involves accessing and analysing this content, and then extracting and reproducing – at least parts of – these works.
The content used in the TDM process are by default protected by copyright, and while copyright does not apply to accessing and analysing published content it does cover the reproduction of it. TDM practices go beyond accessing and gathering information from datasets, they extract and reproduce information, and it is this act of copying that is subject to copyright. When it comes to TDM, technology is the substitute for a human viewing or reading something, and then making a copy of extracts of it.
TDM is generally NOT permitted in the UK without a licence due to existing UK copyright law (Copyright, Designs and Patents Act 1998 or “CDPA”), the one exception to this being for TDM for non-commercial research.