Credit to Open Source Patent Analytics Project by Dr. Paul Oldham Introduction
This article provides a quick overview of some of the main sources of free patent data. It is intended for quick reference and points to some free tools for accessing patent databases that you may not be familiar with.
This article is now a chapter in the WIPO Manual on Open Source Patent Analytics. You can read the chapter in electronic book format here and find all the materials including presentations at the WIPO Analytics Github homepage.
It goes without saying that getting access to patent data in the first place is fundamental to patent analysis. There are quite a few free services out there and we will highlight some of the important ones. Most free sources have particular strengths or weaknesses such as the number of records that can be downloaded, the data fields that can be queried, the format the data comes back in or how clean data is in terms of the hours required to prepare for analysis. We won’t go into all of the details that but will provide some basic pointers.
The DatabasesThe Lens
Previously known as the Patent Lens this is a well designed site with quite a few visualisation options and access to sequence data. It is possible to search the title, abstract, description and claims of patent documents and create and share data in collections. In 2015 the ability to download up to 10,000 records at a time was added. When combined with interactive charts that allow the user to drill down into results set, this has transformed the Lens into a very useful and innovative database and visualization tool.
Patentscope
The WIPO Patentscope database provides access to Patent Cooperation Treaty data including downloads of a selection of fields (up to 10,000 records), a very useful search expansion translation tool, and translation.
Obtaining sequence data from Patentscope. Note that this rapidly becomes gigabytes of data.
espacenet
Probably the best known free patent database from the European Patent Office.
LATIPAT
For readers in Latin America (or Spain & Portugal) LATIPAT is a very useful resource.
EPO Open Patent Services
Access patent data through the EPO Application Programming Interface (API) free of charge. Requires programming knowledge.
The developer portal allows you to test your API queries and is recommended.
USPTO Patents View
The Patents View for free searches and USPTO patent databases may be archaic but you can download the entire US collection from the Google USPTO Bulk download service.
It is a fantastic service, and an example to patent offices everywhere on freeing up patent data. If you have a good broadband connection and the hard drive space, it is quite good fun to suddenly have access to millions of patent records. The authors used the service to text mine the collection for millions of biological species names as reported here.
However, one important issue to note is that the XML delimiting individual documents is not always well demarcated. This means that any code that will work for one bulk set of files may fail on another set. While it is possible to address this, be prepared to spend time working on this and/or seek assistance from a professional programmer. For an insight into these issues see this Stackoverflow discussion on parsing the data in R.
Free Patents Online
Sign up for a free account for enhanced access and to save and download data. It has been around quite a while now and while the download options are limited we rather like it.
DEPATISnet
We are not covering national databases. However, the patent database of the German Patent and Trademark Office struck us as potentially very useful. It allows for searches in English and German and has extensive coverage of international patent data, including the China, EP, US and PCT collections. The coverage details are here. Worth experimenting with.
OECD Patent Databases
One that is more for patent statisticians. The OECD has invested a lot of effort into developing patent indicators and resources including citations, the Harmonised Applicants names database HAN database, mapping through the REGPAT database among other resources that are available free of charge.
Along the same lines the US National Bureau of Economic Research NBER US Patent Citations Data File is an important resource.
EPO World Patent Statistical Database
The most important database for statistical use is the EPO World Patent Statistical Database (PATSTAT) and contains around 90 million records. PATSTAT is not free and costs 1250 Euro for a year (two editions) or 630 Euro for a single edition. The main barrier to using PATSTAT is the need to run and maintain a +200 Gigabyte database. However, there is also an online version of PATSTAT that is free for the first two months if you wish to try it by signing up for the trial (knowledge of SQL required).
For users seeking to load PATSTAT into a MySQL database Simone Mainardi provides the following code on Github.
Other data sources
A number of companies provide access to patent data, typically with tiered access depending on your needs and budget. Examples include Thomson Innovation, Questel Orbit, STN, and PatBase. We will not be focusing on these services but we will look at the use of data tools to work with data from services such as Thomson Innovation.
For more information on free and commercial data providers try the excellent Patent Information User Group and its list of Patent Databases from Tom Wolff and Robert Austin.
Also worth mentioning is the Landon IP Intellogist blog which maintains Search System Reports