Robert Pound, Roger Lopez, Umesh Sunnapu
Links: Enterprise Solutions
, Business Intelligence and Appliance Solutions
Episode 1 | Project Overview
Welcome to the first episode of the Dell Business Intelligence Project Using US Patent Office (USPTO) Project. You might wonder, what is the Dell Business Intelligence Project Using USPTO Data project? This project was created by the Database Solutions Engineering & Business Intelligence Product Group teams. The project goal is to show the step-by-step process of building a data warehouse and business intelligence solution. In this first episode, we describe what we want to achieve and examine the dataset we want to use for the project. Google and the USPTO have worked together to make the USPTO data
publicly available. Our goal is to show you how to extract meaningful information from this data.
Google and the USPTO Data
This project shows a Dell end-to-end business intelligence solution. The main objective of this solution is to download, capture, analyze, and find correlations with the publically available USPTO data. We accomplish this goal using an implementation of the Dell Quickstart Data Warehouse to import and load the data into an SQL database. We then implemented Dell Toad products to analyze the data. The project starts off by looking at the following information:
- Top 10 patent producing companies for a given year.
- Top 10 patent producing cities for a given year.
- Top patent categories for a given company.
- Maximum time before a patent is awarded.
- The histogram for patents inflow for a given year.
- Top 10 patent authors.
This project is the collaboration of multiple Dell teams, over several months. The results of this collaboration are shown in the following episodes:
Episode 2 | Sample XML
Episode 2 covers the XML data format from the US patent office. The patent office must conform to several data standards with their XML documentation. This XML conformation makes the data more complicated than regular XML data. The episode explores how you can view sample patent data using Microsoft Excel to get a better idea of the collected data.[Link to Episode 2]
Episode 3 | Boomi: HTTP to Disk
Episode 3 is the introduction to Dell Boomi. The content covers downloading a single zip file, uncompressing in the cloud and storing the results on a local drive. [Link to Episode 3
Episode 4 | Boomi: XML to CSV
Building on the process created in Episode 3, Episode 4 covers the details of mapping an XML file to a Comma Separated Value (CSV) file. The file downloaded in Episode 3 is used in this process. This episode includes detailed instructions on how to manually modify the file that still needs to be processed. [Link to Episode 4]
Episode 5 | Boomi: HTTP to CSV
Episode 5 combines the information covered in Episodes 3 and 4 by showing how to download a zip file and translate the contents of the uncompressed XML file to a CSV file without manual intervention. [Link to Episode 5
Episode 6 | Dell Quickstart Data Warehouse Appliance
This episode describes what steps are taken to set up the data warehouse appliance to store and analyze data. [Link to Episode 6
Episode 7 | SQL creation/Boomi: CSV to SQL
Running in parallel to Episodes 4 and 5, Episode 7 uses the CSV output from the previous episodes as input to the process. The episode covers the Microsoft SQL database schema residing on the Dell Quickstart Data Warehouse Appliance. In addition, it covers the Boomi process created for connecting to and importing data into the database. [Link to Episode 7
Episode 8 | Boomi: HTTP to SQL
Episode 8 combines the full Boomi process from end-to -end for one data file from the US patent office. This episode combines the knowledge of episodes 3 through 7. [Link to Episode 8
Episode 9 | SQL: De-duplication
When data is translated from XML, there are many duplicate rows produced. Episode 9 covers how this duplication was overcome in SQL, once the data was loaded into the database. [Link to Episode 9]
Episode 10 | Boomi: One year
Episode 10 combines all information gathered from previous episodes to collect one year of US patent data and store it into the database. [Link to Episode 10
Episode 11 | Analyzing Data using Toad BI Suite
Episode 11 details the steps of how the Toad BI products are set up and then dives into the details of how analysis was performed on the US patent office data and what insights were gained. [Link to Episode 11