Dirty data within an IP Data Management System

By Donal O’Connell

Intellectual Property (IP) rights are valuable assets for any business, possibly among the most important that it possesses.  It is therefore imperative that the associated IP data is also treated with the respect that it deserves.

Data integrity is data that has a complete or whole structure. All characteristics of the data including business rules, rules for how pieces of data relate to each other, dates, definitions and lineage must be correct for data to be complete.  This paper explores the issue of problems with data integrity within an IP data management system, and applies equally to systems that reside within either a corporate environment or private practice.

IP data that has integrity is identically maintained during any operation on the IP Management System, such as data entry, data transfer, storage or retrieval.  Put in simple business terms, IP data integrity is the assurance that the IP data is consistent, certified and can be reconciled.

Dirty data refers to the lack of data integrity to one degree or another. ‘Dirty data’ is a term used by information technology (IT) professionals when referring to inaccurate information or data, and this term ‘dirty data’ will be utilised throughout the paper.

The definition of ‘dirty data’

Dirty data can have a variety of meanings:-

  • Missing data
  • Incorrect data, wrongly entered to the tool
  • Incorrectly formatted data
  • Data entered into the wrong field on the IP Management System
  • Stale data, that was once correct but is now out of date
  • Missing links such as the relationship between the data in two or more fields
  • Duplicated data, where the data exists in more than one place

All of the above are valid and these examples qualify as dirty data.  To summarise, dirty data can be incorrect, lacking in basic/general formatting, incorrectly spelled or punctuated, entered into the wrong field or duplicated, all of which will make the data generally misleading.

The root causes which lead to data becoming dirty

There are a number of possible root causes of dirty data:-

  • Migration errors
  • Data entry errors
  • System design errors
  • Synchronisation problems
  • Data reporting problems
  • Maintenance problems

Migration is where data is transferred into an IP Data Management System from another systems, perhaps as a result of a system upgrade or as a result of M&A activity, where data has been transferred and incorporated from an external IP Data Management System.  If the data is dirty before the migration, then it is likely to remain dirty after the migration, unless concrete steps have been taken to address the problem.

Data entry mistakes can be made by IP personnel within the organisation, by non IP personnel within the organisation who are given access to the IP Data Management System and by external IP personnel who have been provided with access to the Data Management System.  A certain amount of human error is inevitable, but what is the solution when the mistakes are constantly occurring, the fix would make an auditor cringe and the person or persons making the errors are taking zero responsibility, while blaming it all on the system?

Data Management System design and implementation errors can lead to dirty data. However, good system design can for example help to greatly reduce data entry errors, by focusing on such issues as catching exceptions, formatting, buffering and the way in which choices and selections are provided to the user.

Synchronisation, in this instance, is the maintenance of one operation in the IP Data Management System, in step with another step in another system to ensure overall data integrity.  Synchronisation challenges with other company systems can lead to problems with the data as it is not uncommon for the corporate IP Data Management System to be linked electronically with other corporate systems in the company, used for example by HR or Finance.  Combine this with systems belonging to an IP Renewals/Annuities Payment provider and possibly with the system belonging to an IP Agent network and your synchronisation challenges can be even greater.

Creating reports using the data can itself present the problem of dirty data within the actual reports, if there are errors with the scripts or problems with the reporting functionality of the system.  It can also be due to lack of understanding of the data structure within the system.

The data within the IP data management system may not be being properly maintained.  If data within the system is not being updated on a regular basis, as it should be, this can lead to dirty data problems within the system.

I should add another dimension related to maintenance. When reviewing and cleaning data from the jurisdictional IP data bases (USPTO, EPO, etc.) certain dirty data issues may be identified, and the information contained in the jurisdictional data bases may be inconsistent with what a company’s IP Dept or what an IP Firm believes. For example, a maintenance payment may have been missed and the patent subsequently may not have been renewed.

More importantly, the patent my have been purchased but not reassigned or assigned for security interest to a bank but not updated in the jurisdictional data.

So, there are several causes of dirty data.

Where is the ‘dirty data’?

Dirty data can exist in the data fields associated with any of the key IP process areas such as IP creation, IP portfolio management and IP utilisation.

Problems can be linked to data fields used in the front end, for example in the patent creation process from inventor and invention report, through to Patent Committee or Patent Board decisions.  Problems can also exist in the data fields used in the actual patenting process from drafting, to first filing, or foreign filing and through prosecution through to granted patent.

Dirty data can also occur in the IP portfolio management process in data fields used during management the IP assets, and in the IP utilisation phase in data fields used in license agreements and contracts.

Why is ‘dirty data’ an issue for IP?

If it exists, then dirty data is a serious issue for any corporate IP Department or any IP Agency as it can lead to liability issues or a loss of rights.  The ‘rules’ may not run for example for the proper creation of patent families, key dates may be missed or the wrong data may be sent to the IP Office.  Correspondence may be sent to the wrong person or IP reports with incorrect data may be created and used in the decision making process.  IP data is ultimately used for IP management purposes and will be utilised for well informed decision making.  Dirty data may lead to the wrong decisions being made.

Why is ‘dirty data’ an issue outside of IP?

IP data is utilised not just by the IP department and IP data is most important in business, as far as technologies, products and services are concerned as it forms an integral part of many legal agreements and contracts.  IP data is more frequently being reported to, and utilised by, Senior Management within the Corporation so ‘dirty data’ in IP can adversely impact activities and decision making outside of the corporate IP dept.

Cleaning up the data

Firstly, some understanding is needed of how serious the problem is with ‘dirty data’.  Questions to ask include how and why it has occurred and where is it happening?  If the challenge with dirty data is large, then what is the prioritisation?  Only when all the previous questions have been considered should the clean-up exercise be undertaken.  Cleaning the data may involve using dedicated IP Service Providers and/or developing some automatic scripts and tools.  It will almost definitely involve some manual hard work.

A three stage process is strongly recommended:-

  • Corrective actions to fix any problems
  • Understanding of the root cause
  • Preventative actions to stop problems repeating (processes, systems, education, checks)

‘Dirty data’ cannot be tackled in isolation

Data quality issues cannot be tackled in isolation.  Data quality is interlinked with the IP processes or ways of working which are adopted in the company, the IP systems and tools in use, various legal matters and of course the actual people involved.  Last but not least it involves management and leadership.

Best practices

A number of best practices exist to help address dirty data issues within an IP data management system:-

  • Control the data entry
  • Define mandatory and optional data fields properly
  • Assign rights and roles both for IP and non IP personnel with access to the system
  • Assign personal responsibility
  • Keep a change history
  • Design ‘intelligent’ data fields
  • Use tools to measure and clean the data on a regular basis
  • Make data management a living process
  • Measure, measure, measure!!!

The best approach is to make data quality management an on-going process and an integral part of IP management within the organisation


To properly address dirty data problems within an IP Data Management System, it is important to adopt a recognised iterative four step problem solving process.  ‘Plan, Do, Check, Act’.

This first step is to thoroughly evaluate and analyse the problem and decide if, what, where and how dirty data is a problem and what needs to be done to rectify the situation. The second step involves making the necessary improvements, often on a small scale initially.  The third step involves checking the situation and comparing actual results versus planned results.  The final step is to analyse the differences to determine their causes.

When your dirty data challenge has been addressed, it is most important not to just forget the problem and move onto the next issue.  Metrics should be defined, agreed and implemented and regular data reports created so that you know precisely the situation with your data integrity going forward and so that you can react quickly if things go amiss again in the future.

As stated at the beginning of this paper, intellectual property (IP) rights are valuable assets for any business, possibly among the most important it possesses.  It is therefore imperative that the associated IP data is also treated with the respect that it deserves, and that any dirty data challenges are tackled and resolved.

[This article originally appeared at Chawton Innovation Services.]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: