The database-journal hybrid beast

Bourne, P. (2005). Will a biological database be different from a biological journal? PLoS Comput Biol, 1(3):179-181.

dx doi 10.1371/journal.pcbi.0010034

As text mining tools make it easier to annotate database entries - such as DNA, RNA or protein data - with information derived from the literature, the difference between the database and the journal may become (as Bourne suggests) harder to distinguish.

The author mentions that digital technologies are primarily a means to distribute traditional scholarly documents. Far from becoming the richly hyperlinked, database-document hybrid envisaged by some, online articles are digital surrogates for their print bretheren. Even with tools to make it easier to link databases to articles, barriers like publisher gateways, copyright issues, work practices and the effort required tend to preclude Bourne’s proposal of a database-article linked / hybrid beast.

This image of the database-article hybrid was published back in 2005. Has much changed? Most top biology journals still look like traditional print journals online. Separate supplementary materials sections still abound. Documents do have some hyperlinks to databases and even data repositories, though these are few in number and sophistication. Databases have links back to original articles, though these tend to be of the form of a reference list or bibliography relevant to a biological sequence or database entry.

Old habits seem hard to break. Though the twin worlds of data and literature should be brought closer together, the concept is still aspirational rather than reality. And at the rate papers are being published, perhaps we will never get round to Bourne’s sensible transformation.

Some criticisms of Popper’s World 3

Notturno, Mark Amadeus (2000). The meaning of World 3, or why Wittgenstein walked out. In Science and the open society : the future of Karl Popper’s philosophy. Central European University Press.

See this book on Amazon.com

  • Of the purpose of Popper’s 3 Worlds epistemology:

“It is, in fact, proposed to solve two problems at once: the problem of objective knowledge on the one hand, and the problem of the relationship between the body and the mind on the other.” (pp. 144)

  • Philosophers think that Popper’s World reifies knowledge and contradicts his own theory that scientific knowledge is fallible. However Popper’s World 3 is quite different to the Platonic realm of ideas in that it evolves - it is objective / autonomous AND fallible.
  • Notturno argues that most criticisms of Popper’s epistemology implicitly accept empiricist dogma.

“How is objective knowledge possible - given that the only things that exist are those that can be known through the senses.“(pp. 146)

  • If one accepts empiricist dogma, then one simply cannot accept Popper’s epistemology because it includes immaterial entities. World 3 cannot exist because we cannot prove it exists by inspect it through our senses.
  • The 3 Worlds contravenes the parsimony argument, that theories should be simple. A good point Notturno makes is that Popper did not start with World 3 - he was led to incorporate World 3 into his epistemology as a solution to the problem of objective knowledge. Solutions without World 3 are weak, or implausible.
  • Another problem: philosophers are uncomfortable with Popper’s epistemology because it argues that the immaterial can affect the material. It extends the problem of Cartesian dualism and causality - or how the body and mind interact - to create a supplementary set of interactions between World 3 and World 2.
  • Popper’s 3 Worlds is a solution to a philosophical problem, but Wittgenstein would say that there are no philosophical problems, so the epistemology is garbage.

Seven sins and their information behaviour equivalents

Much information behaviour research is based on the assumption that users are bright, interested, eager information seekers. Resources are appraised with a partiality established by the task at hand, at the exclusion of the usual vagaries of the human spirit.

Imagine the psychology of the information seeker grounded in the seven deadly sins, those weaknesses of Man that religion and social structures as a whole, have long sought to limit or abrogate.

  • Lust : Most digital information is pornographic. Male users haplessly gravitate towards sexual content. The Internet is awash with titillation. The information revolution is marked for a revolution in the distribution and expansion of information of a derogatory sexual nature.
  • Gluttony : Users gorge themselves on information. Hard drives overflow with useless data. Nothing can be deleted. Users want more / faster / better technology, regardless of whether it is needed. Information overload is the active user over-consuming information. The user does not want to stop receiving information.
  • Greed : Users want the information other people have. Digital piracy is the urge to get information without working for it. It is information theft driven by greed. Information hoarding is likewise a manifestation of greedy impulses.
  • Sloth : Users are inherently lazy. Quick and easy information retrieval is a happy substitute for understanding, for deeper comprehension. The digital revolution has been paralleled by greater physical inactivity and by a demand for information delivered with minimal effort on the part of the user.
  • Wrath : Violent video games indulge angry, aggressive fantasies motivated by human wrath. Frustration with technology and with the de-humanising aspects of the information revolution have failed to promote peace, calm, placidity.
  • Envy : Users want the information their neighbors possess. Intellectual property laws aim to guard information products from envious and exploitative parties. Users are driven by a sense that somewhere out there, there is someone with more information than they have, with vital information they must ‘get at’.
  • Pride : Personalisation and customisation of information services appeal to the users sense of vanity. Virtual identities re-create users in more flattering aspects. Blogs are expressions of the ego, of the sense that the world must be interested in what I have to say.

Get your hands off my mouse

F. Murray (forthcoming). The Oncomouse that roared : hybrid exchange strategies as a source of productive tension at the boundary of overlapping institutions.

Full text from the author’s site

Patents to holders working primarily in the academic sector are a relatively new development in biomedicine. Murray relates the story of a murine tumor model generated by breeding genetically altered mice (the titular ‘oncomouse’), and how patenting practices developed in response to this invention.

The oncomouse was the first mammal to be patented. As a consequence, biologists found that in order to use the mouse, they had adopt new practices and relinquish certain rights over derivative findings - habits quite out of keeping with their traditional approaches to sharing mouse models.

What is interesting about the story is the response of biologists. Quiet disobedience, informal exchanges of technology / experience and the spread of patenting by academics, both to control technology rights and as a defense against potential commercial restrictions should they fail to patent, all stemmed from the original oncomouse patent.

Most major universities have technology transfer departments or resident experts seeking to slap patents on to promising new technologies. For example, a quick search of the European Patent Database reveals over a thousand patents to holders linked with University College London. Lots of these patents are for biomedical applications.

It is clear that a tension exists between e-science approaches to biology, attendant ‘open’ principles towards data sharing, and the push by universities and government to commercialise, and realise the economic value, of academic research.

Thesis aims and objectives for June 2009

  • Aim 1 : To show that information is an important conceptual entity in biological theories
  • Aim 2 : To develop an epistemology aligned with the e-science agenda in biology
  • Aim 3 : To explore the validity of the ‘data imperative’ in e-science
  • Aim 4 : To investigate how new knowledge is integrated into taxonomic tools for biology
  • Aim 5 : To see whether e-science approaches support the growth of knowledge

  • Objective 1 : Apply an information history approach to map the development of the information perspective in biology
  • Objective 2 : Create an argument to support the application of Popper’s 3 Worlds epistemological model to biology and e-science in general
  • Objective 3 : Use bibliometric methods to test the claim that data re-use creates new value from old research data
  • Objective 4 : Develop a methodology to demonstrate how new knowledge entities are incorporated into the Gene Ontology
  • Objective 5 : Test the thesis that e-science and informatic approaches to problems in biology risk limiting the growth of knowledge

  • What is information in biology?
  • Why is epistemology relevant to information problems in biology?
  • Which is more important: data or knowledge?
  • How are new concepts absorbed into the biological knowledge corpus?
  • Can a machine be a biologist?

Are Premiership footballers more important than scientists?

Reading an interesting book by Nick / Nicholas Maxwell called ‘From knowledge to wisdom‘.

Maxwell argues for a shift in academic inquiry from the acquisition of knowledge to the acquisition of wisdom, wisdom being “…the capacity to realize what is of value in life, for oneself and others.”

For example science has created great technological developments in the last century, yet still the human species is plagued by war, hunger, poverty, inequality, disease and social malaise. If all academic inquiry were grounded in an aim-orientated empiricism, in seeking to realize what is of value in life - to be happy, to love, to co-operate, to be fair - more might be done to solve these seemingly intractable problems.

The appeal of ‘From knowledge to wisdom’ for me is in the challenge for scientists to answer the question: what is the purpose of research? If it is make the world a better place, the present system of academic inquiry is woefully designed to achieve this. If it is to acquire knowledge for knowledge’s sake, that it might be applied to solve hunger, or inequality, or deficiencies in the well-being of our species, then where is the evidence that scienceĀ  is indeed leading us to realize satisfaction in life?

Lousy jobs, alienation, disempowerment, ill-health, financial inequalities, low culture: we might all be vaccinated and have flatscreen televisions, but does the man on the street believe a scientist is more or less important to his life than a Premiership footballer?

For more information on Maxwell’s thesis, see www.knowledgetowisdom.org

Scientific practices or scientific products?

Wouters, P., Vann, K., Scharnhorst, A., Ratto, M., Hellsten, I., Fry, J., and Beaulieu, A. (2008). Messy shapes of knowledge - STS explores informatization, new media, and academic work. In Hackett, E. J., Amsterdamska, O., Lynch, M., and Wajcman, J., editors, The Handbook of Science and Technology Studies, pages 319-352. MIT Press.

Full text

I know that one cannot entirely separate the study of the practice of producing scientific knowledge from the study of the scientific product itself. Yet a conflation persists in the information science and science studies literature: the conflation of scientific practices with scientific products.

The error is even made explicit in document titles. ‘Messy shapes of knowledge’ suggests one is interested in the nature of knowledge itself, in its characteristics, tendencies, colourings.

Instead, the aforementioned title precedes an essay on the practice of scientific knowledge generation. How are new technologies changing the way science is done? What are the implications of new technologies - such as the Internet - for science as work and for scientific knowledge as a labour product?

The scientist is the object of study. He/she is typed, inspected, pondered. The knowledge product is so obvious as to not warrant attention. If scientific knowledge is to be considered, it is only in:

“[...] the way empirical materials and facts are combined to produce a plausible story or vision of the future [...].”

In the study of science and technology, knowledge as an objective entity is often denigrated. To appeal to an objective knowledge is to ‘reify’ but a version of all we might know. Every individual’s knowledge is primary and thus, no knowledge is primary. A shared scientific consensus is oppressive, stifling, institutional.

I resist the the demotion of objective scientific knowledge. By objective I mean in Popper’s World 3 sense, as a knowledge external to the mind of Man. I put scientific knowledge on a plinth. I am interested in products over processes.

What is information-centred research, or ICR?

Thelwall, M., Wouters, P., and Fry, J. Information-centered research for large-scale analyses of new information sources. Journal of the American Society for Information Science and Technology, 59(9):1523-1527.

dx doi 10.1002/asi.20829

What is information-centred research (ICR)?

“Information-centered research (ICR) is an e-research methodology that focuses on a new information source by (a) developing generic research tools that can be applied across a number of problem areas, and (b) identifying relevant research problems (Thelwall & Wouters, 2005).”

ICR is

“[...] generic tools for analyzing new information sources across problems.”

  • Contra domain analysis, ICR is not domain-specific
  • Contra Anomalous State of Knowledge theory, ICR is not concerned with specific problem-situations
  • Contra cognitive theories, ICR ignores user-context issues in favour of supplying potentially interesting information sources (there is no information encounter-problem to solve)

ICR is about information supply and finding ways of channeling information sources to users who are going to find these sources relevant. It is generic information supply, without recourse to analysing specific, domain or situation-dependent information problems.

E-science / e-research is creating a generic information infrastructure to support lots of different academic research areas. ICR would complement this infrastructure with research into, and supply of, the types of information sources that such an infrastructure should be delivering to scientists to support whatever problem area they are working on.

I don’t see the value of inventing a new term like this. Isn’t ICR basically about being a librarian?

Form over content in information science

Information science has become enamored by form over content.

The message has been subordinated to how the message is communicated.

What there is to know is more interesting a question than what an individual person knows.

I am watching your face as you talk but I am not listening to what you are saying.

Am I interested in scholarly communication?

Scholarly communication as defined by Christine Borgman:

By scholarly communication we mean the study of how scholars in any field (e.g., physical, biological, social, and behavioral sciences, humanities, technology) use and disseminate information through formal and informal channels. The study of scholarly communication includes the growth of scholarly information, the relationships among subject areas and disciplines, the information needs and uses of individual user groups, and the relationships among formal and informal methods of communication [...].”

Taken from: Borgman, C. L. (1990). Editor’s introduction. In C. L. Borgman (Ed.), Scholarly Communication and Bibliometrics, (pp. 10-27). Newbury Park, CA: Sage Publications.

I think that my thesis will investigate the use and dissemination of knowledge rather than information. In order to make this distinction, I will need to define what I mean by ‘information’ and ‘knowledge’ in the context of the biological sciences.

I consider information in biology to be statements and sets of statement. For example:

Calcium/calmodulin-dependent protein kinase ID (CAMK1D) is a gene encoding a member of the Ca2+/calmodulin-dependent protein kinase 1 subfamily of serine/threonine kinases.

This is a piece of information in biology in the form of a statement. Unannotated gene sequences relating to CAMK1D would be data. Gene expression levels captured by an instrument in a laboratory would be data. The statement above is information: it tells us something about CAMK1D.

Of an a order higher to this statement is biological knowledge about the gene. Knowledge about the gene might be:

  • Grouping genes into families is important
  • The Ca2+/calmodulin-dependent protein kinase 1 subfamily of serine/threonine kinases share biological characteristics
  • We can make predictions based on the fact that CAM1KD is in this family

These are more than chunks of information. I think that these are biological theories, and that they can be logically related to other biological theories. Stating that a gene is part of a gene family draws on this knowledge about genes and molecular biology. As a statement it can be translated into different languages. The components of the statement are logically related according to the biological knowledge corpus.

I am only partly interested in how biological knowledge is used and disseminated. I want to find out how biological knowledge grows, and how knowledge in one discipline relates to knowledge in another discipline. I am interested more in what is being communicated - objective knowledge - rather than how it is communicated, or how users as people process or believe knowledge.

Contrary to modern information theories, such as ASK or cognitive models for information behaviours, I believe a knowledge-grounded approach would have to incorporate novelties such as knowledge objects being right or wrong, people being mistaken and not understanding concepts, and accomodating the notions of mendacity and deception.