Dell Business Intelligence Project Using USPTO Data: Episode 1

Dell Business Intelligence Project Using USPTO Data: Episode 1

Author: Robert Pound, Roger Lopez, Umesh Sunnapu

Links: Enterprise Solutions, Business Intelligence and Appliance Solutions

Episode 1 | Project Overview

Welcome to the first episode of the Dell Business Intelligence Project Using US Patent Office (USPTO) Project. You might wonder, what is the Dell Business Intelligence Project Using USPTO Data project? This project was created by the Database Solutions Engineering & Business Intelligence Product Group teams. The project goal is to show the step-by-step process of building a data warehouse and business intelligence solution. In this first episode, we describe what we want to achieve and examine the dataset we want to use for the project. Google and the USPTO have worked together to make the USPTO data publicly available. Our goal is to show you how to extract meaningful information from this data.

Business Intelligence
Figure 1: Google and the USPTO Data

Project Objective

This project shows a Dell end-to-end business intelligence solution. The main objective of this solution is to download, capture, analyze, and find correlations with the publically available USPTO data. We accomplish this goal using an implementation of the Dell Quickstart Data Warehouse to import and load the data into an SQL database. We then implemented Dell Toad products to analyze the data. The project starts off by looking at the following information:
  • Top 10 patent producing companies for a given year.
  • Top 10 patent producing cities for a given year.
  • Top patent categories for a given company.
  • Maximum time before a patent is awarded.
  • The histogram for patents inflow for a given year.
  • Top 10 patent authors.

Episode overview

This project is the collaboration of multiple Dell teams, over several months. The results of this collaboration are shown in the following episodes:

Episode 2 | Sample XML

Episode 2 covers the XML data format from the US patent office. The patent office must conform to several data standards with their XML documentation. This XML conformation makes the data more complicated than regular XML data. The episode explores how you can view sample patent data using Microsoft Excel to get a better idea of the collected data.[Link to Episode 2]

Episode 3 | Boomi: HTTP to Disk

Episode 3 is the introduction to Dell Boomi. The content covers downloading a single zip file, uncompressing in the cloud and storing the results on a local drive. [Link to Episode 3]

Episode 4 | Boomi: XML to CSV

Building on the process created in Episode 3, Episode 4 covers the details of mapping an XML file to a Comma Separated Value (CSV) file. The file downloaded in Episode 3 is used in this process. This episode includes detailed instructions on how to manually modify the file that still needs to be processed. [Link to Episode 4]

Episode 5 | Boomi: HTTP to CSV

Episode 5 combines the information covered in Episodes 3 and 4 by showing how to download a zip file and translate the contents of the uncompressed XML file to a CSV file without manual intervention. [Link to Episode 5]

Episode 6 | Dell Quickstart Data Warehouse Appliance

This episode describes what steps are taken to set up the data warehouse appliance to store and analyze data. [Link to Episode 6]

Episode 7 | SQL creation/Boomi: CSV to SQL

Running in parallel to Episodes 4 and 5, Episode 7 uses the CSV output from the previous episodes as input to the process. The episode covers the Microsoft SQL database schema residing on the Dell Quickstart Data Warehouse Appliance. In addition, it covers the Boomi process created for connecting to and importing data into the database. [Link to Episode 7]

Episode 8 | Boomi: HTTP to SQL

Episode 8 combines the full Boomi process from end-to -end for one data file from the US patent office. This episode combines the knowledge of episodes 3 through 7. [Link to Episode 8]

Episode 9 | SQL: De-duplication

When data is translated from XML, there are many duplicate rows produced. Episode 9 covers how this duplication was overcome in SQL, once the data was loaded into the database. [Link to Episode 9]

Episode 10 | Boomi: One year

Episode 10 combines all information gathered from previous episodes to collect one year of US patent data and store it into the database. [Link to Episode 10]

Episode 11 | Analyzing Data using Toad BI Suite

Episode 11 details the steps of how the Toad BI products are set up and then dives into the details of how analysis was performed on the US patent office data and what insights were gained. [Link to Episode 11]

Article ID: SLN310362

Last Date Modified: 09/10/2018 08:54 AM

Rate this article

Easy to understand
Was this article helpful?
Yes No
Send us feedback
Comments cannot contain these special characters: <>()\
Sorry, our feedback system is currently down. Please try again later.

Thank you for your feedback.