Skip to main content
I am a CLA customer
Accessibility Tools
Search
Search
Search Link Icon
Search
Search
Search Link Icon
Products
CLA Licences
Do You Need A Licence?
Generative AI Solution
Business (incl. Charities) Licence
Media Monitoring Organisation Licence
Text and Data Mining (TDM) Licence
Public Sector Licences
Schools Licence
Further Education (FE) Licence
Higher Education (HE) Licence
Partner Licences
AVLA Licence
NLA Schools Licence Solutions
NLA Education Establishment Licence
CLA Products
Copyright Course for Workplace Compliance
Workplace GAI Permissions module
Resources
Copyright
What Is Copyright?
Being Compliant
Crown Copyright
Do You Need A Licence?
Generative AI and Copyright
Copyright Risk Assessment
Events
Upcoming Events
Event Library
Bursary Opportunities
Assets
Content Use and Copying Report
Downloads
Teaching Resources
Training For Educators
Licence Tools
Licence Tool Summary
Check Permissions
Digital Content Store
About Us
About Us
Who We Are & What We Do
CLA Values
Copy, Right
Collective Management Organisations
Work For Us
Benefits & Vacancies
Media Centre
News
Featured
CLA newsletter sign up
Contact
Contact Us
Get in touch
You are here:
Home
TDM Licence FAQs
TDM Licence FAQs
What is text and data mining?
Text and data mining (TDM) is the automated process of extracting useful information and insights from large amounts of unstructured data, for the purposes of identifying trends, patterns and knowledge. This allows organisations to efficiently and cost-effectively gain insight from a wide range of data sources.
What is unstructured data and what do these datasets include?
Unstructured data is data that is not actively managed by a database management system. When we think of data we think of binary/factual information like statistics, numbers and facts, when in reality unstructured data makes up 80 – 90% of global data that is being used by organisations. Unstructured data is less the quantitative content that comes to mind, and more everything we see and use online. Text is the most common type of unstructured data, found in the form of websites, Word documents, online articles, social media posts, reviews, video transcripts, e-books etc. Other types of unstructured data include images, audio and video files. Most new data generated today is unstructured data, and this data is difficult to store and manage in a conventional database, which is why organisations need tools and processes, such as text and data mining, to manage, analyse and make use of it.
How do TDM practices use published content?
The most common sources of data for TDM include journal articles, books, datasets, images, social media posts and websites. TDM involves accessing and analysing this content, and then extracting and reproducing – at least parts of – these works.
What aspects of Text & Data Mining infringe copyright?
The content used in the TDM process are by default protected by copyright, and while copyright does not apply to accessing and analysing published content it does cover the reproduction of it. TDM practices go beyond accessing and gathering information from datasets, they extract and reproduce information, and it is this act of copying that is subject to copyright. When it comes to TDM, technology is the substitute for a human viewing or reading something, and then making a copy of extracts of it.
Is Text & Data mining permitted in the UK?
TDM is generally NOT permitted in the UK without a licence due to existing UK copyright law (Copyright, Designs and Patents Act 1998 or “CDPA”), the one exception to this being for TDM for non-commercial research.
What is the difference between text and data mining, and generative AI?
While there is overlap, TDM and generative AI training are two distinct activities. TDM is the process of turning unstructured data into structured data, and it is this structured data that generative AI models use for training. TDM can exist without generative AI, but generative AI would be a lot less effective without TDM. In terms of business practices, an organisation would carry out text and data mining to gain insight from published content, whereas they would use published content in a generative AI tool to generate new content.
1
2
»
Join the mailing list
"
*
" indicates required fields
Company
This field is for validation purposes and should be left unchanged.
Name
*
First
Last
Organisation
*
Email
*
Consent
*
I agree to CLA's
Privacy Policy
and would like to receive news and updates from CLA.
This field is hidden when viewing the form
Mailing list
CLA
Business Licences
This field is hidden when viewing the form
Business Segmentation
Pharmaceutical
This field is hidden when viewing the form
Marketing Sync
Yes
This field is hidden when viewing the form
Lead Owner
This field is hidden when viewing the form
Record Type
This field is hidden when viewing the form
Status
This field is hidden when viewing the form
Lead Source
This field is hidden when viewing the form
Form ID
By completing this form I confirm that I would like to receive publications, research, newsletters, events, and products news, from CLA
CAPTCHA
x
Email me the report
Receive a PDF version of the report via email.
"
*
" indicates required fields
LinkedIn
This field is for validation purposes and should be left unchanged.
Name
*
First
Last
Organisation
*
Email
*
This field is hidden when viewing the form
AI Report
Stay up to date
I would like to stay up to date with CLA's activities around AI
CAPTCHA
X
Download this content
CLOSE