Riitta Katila - Research

Search Depth and Search Scope

Description

This page has software for calculating the depth and scope of a firm's technical knowledge over time, as expressed in the citation patterns in its patents. These measures were created because firms can differ in their reuse and use of existing knowledge ("search depth"), just as they can vary in their exploration of new knowledge ("search scope"). The depth measure describes how deeply a firm reuses its existing knowledge; the scope measure describes how widely a firm explores new knowledge (see Katila, 2000; Katila & Ahuja, 2002 for details). Depth is measured by counting how often each citation in the current patents has occurred before, and scope by counting how many of the current citations have never occurred before. In other words, they measure how much the firm exploits existing knowledge vs. explores new knowledge in its innovation search. Although these measures were developed for patent data, it is possible to use them to measure exploitation and exploration more generally as well. For details, see

Katila, R. 2000. In Search of Innovation: Search Determinants of New Product Introductions. Doctoral dissertation, University of Texas at Austin.
Katila, R., & Ahuja, G. 2002. Something Old, Something New: A Longitudinal Study of Search Behavior and New Product Introductions. Academy of Management Journal, 45(6): 1183-1194

The software is distributed under GPL, and can be downloaded and used freely for research (see "License" below). If you download the program and find it useful, I would appreciate you letting me know (rkatila @ stanford.edu).

Implementation

The software is implemented as a standalone ANSI-C program ("depthscope") so that very large data sets can be processed efficiently, which would be difficult to do in a generic tool such as Matlab. The data are read into an internal tree representation that has a small memory footprint and is fast to process. For example, calculating the depth and scope measures for a set of nearly 300,000 patents with over 2 million citations takes less than 220MB of memory and two minutes (on a Pentium 4) to process. The code was developed in linux. No libraries other than the standard input/output library are used in the implementation, and the code should therefore be easy to install and modify across platforms. It should compile and run without modification at least on the various unix/linux platforms, Mac OS X (under terminal), and microsoft windows (under Cygwin).

Installing and Running

If you are using a Mac OS X, open the terminal application; it will give you a unix terminal window where depthscope can be run.
If you are using microsoft windows, download and install the Cygwin linux emulator first. Make sure you include the gcc compiler (in the "select packages" menu, select "devel" and then "gcc: C compiler upgrade helper"). Open Cygwin, and you will get a terminal window.

Create a directory where you want depthscope to be installed. Download the following files to that directory:

In the terminal window, cd to that directory, and compile the program with "gcc -o depthscope depthscope.c". You will get an executable file called "depthscope".

Run the test file with "./depthscope example-inputdata testresults". The program should warn that two patents were ignored (as shown in "example-erroroutput"), and create an output file "testresults". This file should be identical to "example-results".

Applying Depthscope to Your Own Data

You can apply depthscope to your own dataset (1) by creating a file of historical data similar to "example-inputdata", (2) changing a few compiler constants in depthscope.c and recompiling it, and (3) running the recompiled program with your dataset as input.

The file format is described in the beginning of the "example-inputdata" file. Your dataset should be clean and in a consistent format. The program does check, however, that each patent entry occurs only once and that the patents fall within the sample years FIRSTYEAR-LASTYEAR (given as compiler constants near the beginning of the program). Those patents that fail these checks will be ignored, with a message generated in the standard output.

Data are read into an internal tree format, and the depth and scope measures are calculated based on patent IDs and references during the previous NPREYEARS (compiler constant) years. The output file lists the number of patents and the calculated depth and scope measures for each year.

If the compiler constant DEBUG is set to TRUE, debugging output is generated in the standard output. It includes (for each year) a list of all patent IDs, all references (unique), and all patent IDs and references (nonunique) during the previous NPREYEARS. The debug output is useful for understanding how the calculations are done, and checking that they are done correctly for new data. An example debug output file (for the example-inputdata run) can be downloaded from example-debugoutput.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but without any warranty, without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details.