"Facets of the Technical Information Problem," Charles P. Bourne & Douglas C. Engelbart, SRI Journal, Vol. 2, No. 1, 1958. Reprinted in The Magazine of DATAMATION, September/October 1958 (AUGMENT,133180,).
FACETS OF THE TECHNICAL INFORMATION PROBLEM
by CHARLES P. BOURNE and DOUGLAS C. ENGELBART Technology, so adept in solving problems of man and his environment, must be directed to solving a gargantuan problem of its own creation. A mass of technical information has been accumulated and at a that has far outstripped means for making it available to those working in science and engineering. But first the many concepts that must be considered in fashioning such a system and the needs to be served by it must be appraised. The complexities in any approach to an integrated informationsystem are suggested by the follorwing questions.
RECENT world events have catapulted the problem of the presently unmanageable mass of technical information from one that should be solved to one that must be solved. The question is receiving serious and thoughtful consideration in many places in government, industry, and in the scientific and technical community.
One of the most obvious characteristics of the situation is its complexity. A solution to the problem must serve a diversity of users ranging from academic scientists engaged in fundamental investigations to industrial and governmental executives faced with management decisions that must be based on technical considerations. The solution must accommodate an almost overwhelming quantity of technical and scientific information publicly available in many forms through many kinds of media and in many languages .
Some students of the problem, including men with many years' experience in various aspects of information handling, have viewed this complexity and concluded that the problem cannot be solved in its entirety. These authorities haverecommended a piecemeal attack on components of the problem.
Stanford Research Institute believes that the techniques of systems analysis coupled with an understanding of the potentials of machines permit a powerful approach to the solution of this many-faceted problem. In fact, it may very well be that only by grappling with the problem as a single, integrated system can a realistic and lasting solution be attained.
However, to deal with the information system as a whole it is necessary first to define its complexities with as greal detail as possible. As an aid to the preliminary mapping of the system, a study group at SRI polled a portion of the Institute's own professional staff of engineers and scientists for questions they believe must be answered before an effective system can be designed. A representative list of the questions raised in this fashion is given in this article
The list is impressive, but obviously not exhaustive. It does confirm the multiplcity of points of view that must be appreciated before this problem can be attacked.
Many of the questions require simple factual answers (see Data Needed About Information Sources and Services p. 5). They can be answered by straightforward techniques of counting, surveying, sampling, and estimating. A few of the answers are already available, but the fact that most questions of this type cannot be answered from available sources emphasizes the pressing need for a much better quantitative assessment of the size and nature of the information problem before a rational attempt to solve it can be undertaken.
Another group of questions involves essentially matters of national and scientific policy that ultimately must be answered arbitrarily. Data and analysis can give guidance to the answers but the ultimate decision will be based on judgment of relative needs and relative values.
Questions Relating to Policy
What are the specific aims of the program?
Will the system start with only new information? Or will it process back literature, and, if so, how far back?
Will the Service process requests from allied countries? To what extent? Will it coordinate with the Soviet Union?
Can part of the operations be done abroad? What about translation?
Will an international classification, indexing, or retrieval
system be adopted or promoted?
Will the system be designed to serve the brilliant, the sophisticated, as well as the more unsophisticated?
Will the Service be financially self-supporting?
Will big business have any better access than small businesses or individuals?
Would a private citizen or scholar afford to use the Service?
How will prices be established for the Service?
What is the range of subject matter to be included?
Will classified information be included?
Will safeguards be established to insure that classified information is kept under proper control?
What type of information should be included? Books (texts, tables)? Technical and trade journals? Conferencc proceedings and papers presented but not published? Industrial and government interim and final project reports, etc ? Operation and instruction manuals? Patents? Manufacturers catalogs? Newspapers and general magazines?
Who will be responsible for selecting the material to be included?
What protection will be provided users who want their queries to remain confidential?
Shou1d service be provided outside the technical community? To congressmen? Executives? Businessmen? High-school students?
Who will control the policy in the matter of designing, establishing, and/or operating the Service? An appointed committee, such as for the NACA? A civil servant? A political appointee? A committee elected by scientific organizations?
Would it be feasible to establish legal authority to speed up the standardization and coordination of existing facilities (such as the F.C.C.)?
Who is competent to design, establish, and/or operate the System? Would this be a civil-service organization?
Could the objectives of the Service bc achieved by expanding existing government agencies (e.g. Bureau of Standards, the Library of Congress, Armed Services Technical Information Agency)?
If the Service were not directed by some existing government agency, would it not be best handled by some university?
Would it be economically feasible for any sort of commercial enterprise or non-profit corporation organized by the professional community, or by private industry, to establish and run a Service which would assure continued social and technical progress?
If we must look to the federal government for support, what residual responsibilities remain with the professional societies? Should private groups continue to sponsor special collections?
What economic and political limiting factors exist with respect to the freedom one would have in utilizing or changing those organizations already active in the documentation field, and whose existence could be over-shadowed by a national Service?
What about copyrights? Would royalties be forthcoming to the owner of the copyright if the Service distributes the material? What will be the impact on the technical publishing industry?
Should the Service act as a publisher for collections of papers (reprints) in very new and special fields?
How will the priority schedules be fixed for the Service?
How soon could the Service be initiated? With an immediate manual system? With an ultimate mechanized system?
What factors will determine the location? Can strategic dispersal considerations influence the location without adversely affecting efficiency?
Is the proposed Service simply an attempt to copy Russia?
Might not an interim solution be to translate and distribute the exhaustive Russian abstracts, thus leaviing our interim energies free for other uses?
Might it not be better to reduce the amount of literature produced rather than go to the tremendous expense of providing super-service for all of it? Can a quality filter be applied to this output?
Why not allocate federal money to support more direct interchange between working scientists? Perhaps rnore meetings, special conventions, seminars, etc., would be more economical than better literature processing? Couldn't the money be better spent on education to achieve a given increase in scientific effectiveness?
Could a substantial portion of the information problem be solved by teaching the users more about present-day documentation techniques?
Questions Requiring Research
Some of the questions posed to the study group will require considerable study and research to produce valid answers. The research will be in many fields -- in the social as well as in the natural sciences. Some of the study must be quite profound -- even theoretical. Some will be more straightforward. Many of these questions must be answered before the policy decisions implied in the previous group can be made with confidence.
Can we separate apparent need, influenced by present concepts and experience, from real need? Lack of awareness of the potentialities of recently developed methods (or methods not yet developed) can easily result in an unimaginative formulation of the possibilities and opportunities for advantageously using recorded information.
How will users' habits and needs evolve as a good System becomes available?
How are the information needs of a user affected by his age, educational level, profession, type of position held, etc.?
What are the characteristic information needs of the basic (academic) scientist? The applied researcher? The engineer? The decision maker? Are they all equally critical or is the "applier" of knowledge the one with the biggest problem?
What is the role of information retrieval, storage, etc in the decision-making process of the research worker, engineer, scholar, administrator, etc.?
How much use does the scientist and engineer make of the facilities that are presently available?
By what processes does the scientist and engineer abreast of the advances in the art now? What are the relative importances of each of these processes?
How many scientists and engineers have a definite program of "keeping up with the literature"? How much tirne would they "like to spend"? What keeps them from spending more time?
How much of the literature that would, with reasonably high probability, be useful to a scientist or engineer, is caught by him now by his own regular surveillance of the literature? How far out of his way will the average user go to be sure that he hasn't missed some possible
information ... considering the usual distracting pressures on him, his familiarity with the sources, etc.?
How many pages of literature in various categories relative to the level and interest-area of the user can we expect him to scan or search for his different information needs?
What are the relative merits of the different types of reference information services with regard to the user and his needs, desires, habits, and limitations?
What are the relative importances of the users' various informational needs? On one hand, he needs to know the newsy items such as who is working on what, what his current attack is, who disagrees with whom and basically why, etc.; and on the other hand, he also needs to be able to study in detail the carefully written treatises that may have bearing on his work. Can these different kinds of needs be met by a single system?
What are the special information requirements for different specialty fields?
Does the user, when he goes outside his special field for supporting information, want information in different form or different levels than which he seeks in his own field? For instance, would he be looking more for "cook book" techniques or for survey-type information?
How valuable would broad, multi-disciplinary searches be if they could be conducted effectively? How great is the problem of differences in nomenclature between fields?
What type of questions now go unanswered at the libraries?
Isn't the main problem of information retrieval one of identification --since people so seldom express satisfactorily their needs to the documentalist?
What are the major limitations in the various methods presently used in classifying and indexing scientific literature?
Is the problem that the information now is just not available at all, or is it that it is just hard to find?
Why aren't the existing services that process technical information satisfactory?
How many places does a user of each discipline have to look for index listings of a given special interest?
How can the processing of recorded information be planned so that it can be effective in spite of human limitations, or of limitations in numbers of human beings?
How much is missed by technical people leaning too heavily on librarians?
What relative gain in efficiency could be achieved by integration, merging, or better managing of existing documentation services?
What increase in efficiency of the scientist or engineer would result from improving the accessibility of recorded information?
What are the probable net benefits, short and long range, of an effective information Service to military, industrial, commercial, scholarly, government groups?
Can dollar costs be derived for reasonably well-proven delays and duplications, and can the total national loss rate due to this problem be realistically estimated? Can it be determined that the expense of delay and duplication now is greater than that of establishing and operating an information service?
What is the lack of an information Service costing government agencies?
Can the savings in Federal money now spent on other information programs be diverted to a national information Service?
What are the relative costs and characteristics of different reproduction techniques that might be applicable to some of the dissemination and massive processing problems of an information service?
What are the techniques and costs involved in keeping up and in using large mailing lists in taking care of distribution of journals, etc.?
What are relative costs of providing the information in micro form as against making original-size photo copies?
Of the currently-operating abstracting services, how many are operating merely to satisfy an obligation of a professional society that would rather have somebody else do the abstracting?
What services does the Russian All-Union Institute really provide? What is the reaction of a Russian scientist to this information center?
How important is it to know what the rest of the world is doing?
Are any projects or areas of work reported almost exclusively in foreign literature?
What is the expected rate of growth of the system?
What are the potential information processing capabilities of existing mechanical devices?
What are the theoretical capabilities of existing or anticipated machine components which might be applied to the information processing problem?
How often will the system presumably be searched? How definitive will the search have to be? What volume of information should a search produce? How fast should the system respond?
Characteristics of the Information Service
As increasing data become available it will become possible to consider some of the last group of questions -- those dealing with the desired or necessary operating characteristics of a comprehensive technical-information processing system Certainly, the first system implemented would be of an interim nature using existing resources, which unfortunately employ largely manual techniques However, ultimetely it is inevitable, in view of the impressive advances made almost daily in information processing techniques, that a highly mechanized system will be possible
How soon can an interim system be functioning?
How much can be done just by concentrating on abstract distribution and better dissemination techniques?
W ould it be feasible for the abstracting publications to use a standard format and type font, such that mats (or something similar) could easily be distributed to other interested publishers, thus saving printing expenses?
What technical societics could cooperate to publish a single journal instead of numerous splinter journals?
What about the scale of the Service? Does it have to be a big system or nothing?
Does "having a large information Service" necessarily mean the physical collection of all activities at one central location?
Would a group of smaller centers, for specific fields, be of greater utility and more tractable?
Would a collection of special libraries be more useful?
What can a national service provide that is different than what is now available? Is this to be an entirely new type of service, a real advance in the state of the art, or is it to be just more and better of the same thing?
Will the system have a finite capacity? One system might work well with a few million entries, but be hopeless with a hundred million
As the System grows in size, will it be possible to make changes easily in the classification scheme and bring the old coding into the new scheme?
If a private consultant, with "need to know" established, were to work on a government project, how would he locate and procure pertinent classified material?
Will financial filtering of requests by a uniform fee structure be desirable or effective, or would it be necessary to make non-uniform fee structure so that there is essentially some "priority" given?
What means can be used to pry loose useful information that customarily doesn't get into the published technical information channels?
Will the service include a positive program to declassify material under security restrictions?
What is an acceptable delay in getting information entered into this system?
Will all material in the subject fields be included or will there be an editor or a censor?
Will an attempt be made to standardize the form of the material before it gets into the center? Does the material have to be on standard-size sheets or forms?
What happens when the system becomes overloaded? Should service to users just be late, or should the service just be less complete?
How can we protect against freezing the specifications until enough systems work has been done to make clear what would be optimal?
Will the policy makers make sure that the final methods chosen for a retrieval system are not inifluenced too heavily by the requirement of compatibility with past systems?
Will abstractions be done? What kind? Descriptive? Critical? Informative? How can we get good-quality abstracts? Should the Service use volunteer abstractors directly or a staff of full-time abstractors? Or should it allow the various technical societies to organize their own volunteer abstracting services?
Will any effort be made to review old documents, and to remove or recode when necessary?
Is a standard (or artificial) vocabulary necessary? How much work will be required to design and institute such a vocabulary?
What techniques and devices can reasonably be developed and applied for facilitating such immediate requirements as printing, reproducing, storing, rmicrofilming, billing, cornmunicating, etc ?
What kind of a data-processing system will the Service need just to keep track of its operation?
Would the information Service keep a collection of the original documents?
What special precautions must be taken to store primary records? Would a duplicate file and collection be maintained to prevent disruption of service due to fires, or other catastrophes? How much would this cost?
What is the useful life of various forms of records? In use? In storage?
What will the information Service physically provide in response to information requests?
Will the output be in English, or a code that must be translated?
Will microform copies be acceptable to the users? If not, what improvements need be made in order to gain user acceptance?
Will the information Service output be in a form that the researcher can determine which of the documents are in a locally accessible collection?
Will the system give answers (e.g., "yes," "no," "5,000 tons in 1945," etc.) as well as references?
Why not periodically publish inventories of research in progress, to indicate what research projects are currently being undertaken in each specialty field, thus helping to eliminate duplication?
Will there be a "special communication network" in which workers in the various specialized fields can easily circulate working papers or "think pieces?" A central agency could maintain printing, listing, (in appropriate subject-interest categories), and mailing facilities for this sort of service.
Will the information Service be able to retain a file of questions to be asked of all new input material, thus providing up-to-the-minute data for standing questions?
Will it be possible to stimulate more writing of "review-the-literature" papers by qualified people in the various fields, in order to provide guides for other workers?
Can a partial search be made? (For example, can 1/10 of the file be searched and the results checked to determine if further searching is justified?)
Could the information Service operate on a "just search 1/2 the file for me; I don't need a comprehensive search" basis?
What kind of communications network will be needed for the operation of the interim information Service? Will it be accessible to anyone by telephone or other direct device, such that the searcher can interrogate the file directly and at will?
Would the Service be available for browsing?
What technical-manpower drain would the proposed information Service program have on other high-priority scientific programs?
What professional and educational background is needed for the personnel to operate the Service?
Could university science students be used part time and during summers to help with the various processing tasks, as a means of alleviating the shortage of people with adequate technical backgrounds?
Will there be special training for abstractors and translators or for documentation and information specialists, etc.?
How much research is needed? What research budget is reasonable?
If an information Service were established, how soon could present partial services by government agencies be terminated and funds diverted to the Service? Could some special activities in industrial libraries be elirninated?
These questions, by the very nature of their origin, are random and fragmentary.
Even the full list from which they have been selected is far from comprehensive.
However, we have found them a helpful stimulus as well as a disciplinary
aid in viewing the technical-information problem in its broadest dimensions.
We hope that others interested in this problem will be similarly served.
CHARLES P. BOURNE and DOUGLAS C. ENGELBART are research engineers at Stanford Research Institute's computer laboratory. Mr. Bourne gained his first electronics experience in USN schools from 1950-51. From 1952 to 1953 he served as instructor of various aspects of guided missile operation and maintenance with Convair Guided Missile Division and as adult education instructor in electronics at Chaffy Junior College. After receiving his BS degree from the University of California in 1957, he was employed as a research engineer at SRI where he has been engaged in research on mechanization of in formation retrieval and logical design.
Dr. Engelbart received his BS degree in electrical engineering at Oregon State College in 1948, MEE in 1953, and PhD in 1955 at the University of California. His theses were concerned with design and programming of drum-type computers and special gas-discharge tubes for use in computers. He has worked as professor of electrical engineering at the University of California, as electrical engineer at Ames Aeronautical Laboratories, and as consultant. In October 1957 he joined the SRI staff. Iniormation retrieval is one of his specialties.