Monday, February 5, 2024

Pre-registration in molecular biology

A few years back, perhaps in pre-pandy times, I was on a faculty development panel in which I was one of two presenters. I was of course there to present on how to use Twitter to build your brand (sigh, I’m lame), and a more senior faculty member (I think a neuroscientist) was there to talk about pre-registration in lab work. He was very kind and wise-seeming, and explained how he had been pre-registering their results in the lab for a while, and how it transformed their work.

What is pre-registration? It’s probably most familiar to you in the form of clinical studies, where there was a notorious selection bias in which results would be reported. Like, does drinking coffee cause flatulence? One would have to do a randomized controlled trial to check. But if people did, say, 100 clinical trials and only reported the ones where there was a “positive” result, then you would see 5 clinical trials with p < 0.05 showing that coffee causes flatulence, and none of the contradictory results. So now you have to pre-register a trial, meaning that you have to say, I am going to do this trial with this power and what not, and then you are obligated to report the outcome, no matter what the outcome is. A great idea!

But here was someone advocating for pre-registration much closer to home, in our day to day lab work. I remembering being vehemently and vocally opposed. Sure, clinical trials are one thing, with a clearly stated hypothesis and major resources devoted to a single experiment. But in my line of work, where we are constantly trying new experiments and checking out new avenues of work, where there are tons of false leads and new directions? How could that possibly work without gumming up the works in needless bureaucracy? I was vehemently and vocally opposed, to which the senior faculty member just patiently and calmly responded “Sure, I hear you, just think about it”.

Ever since, I keep coming back to that moment, and it has come to have a major effect on how I approach our science—and especially our reporting of it. The key take home point is: if you did an experiment to answer a question, and you don’t have any reason to exclude it based on the experiment itself, then you have to report the results. Repeat: unless there is an independent basis for the exclusion of a result, you have to report the results. Or, to put it another way: if you would have included the data if the result had come out the other way, you have to report it.

Selective reporting of data is a strange issue in molecular biology in that almost everyone agrees that it is wrong and yet the overall culture of the field leans towards selective reporting in so many ways. Here is an example from our own work. In a recent paper, we were trying to confirm the knockdown of a particular protein. We were able to show a convincing knockdown by RNA FISH, but also wanted to show that the protein levels went down. We did a bunch of westerns, but the results came out ambiguously: sometimes we saw an effect and sometimes not (there are reasons that that could be the case, but we didn't confirm those because they were very difficult). The standard thing to do here would be to not report the western results. But there was no reason to exclude the experiment other than being annoyed with the results. So, we reported it.

But again, the cultural standard in molecular biology is often not to report such ambiguous results. I saw this mindset a lot early in my career, back when RNA FISH was considered cool and people wanted our help to add some RNA FISH to their paper to spice it up. There were several times when people came to us with data in support of a, shall we say… “fanciful” hypothesis, and then we would do the RNA FISH, which would basically show the hypothesis was wrong. At which point, the would-be collaborator would beg out, saying that given the “ambiguous” nature of the RNA FISH results, “perhaps we should save the data for the next paper” (which of course never materialized). After enough of these moments, I started asking potential collaborators what stage of their paper they were at, and if they were close to the end, whether they really wanted us to do this experiment. At least one time, when faced with this choice, the person said, uhhh, let’s not!

There have also been many times when we’ve tried following up on work where we are pretty sure there has been a lot of selective reporting of positive results. Let’s just say that that is an unpleasant realization to make.

I want to emphasize that I don’t think that people are being malicious or fraudulent in their work. I think the vast majority of scientists are honest people and are not trying to do something wrong. I just think that science would benefit from having a more transparent reporting of results, because it is sometimes the data that doesn’t fit the narrative that leads to something new in the future. I also don’t necessarily think we need to formally pre-register our work, although it might be an interesting experiment to try. We should just try and shift our culture a bit towards transparent reporting. One potential challenge in doing science this way is that our stories are a lot less likely to be “perfect”. There will almost always be some bits of conflicting evidence, and given our adversarial peer review system, there is seemingly a lot of pressure to keep these conflicting results out. Or is there? We have been doing this for quite a while, and I would say that our experience has been largely fine in the sense that reviewers don’t mind as long as you are transparent about it. I say “largely” because there have definitely been cases in which reviewers point out some issue that we were transparent about and reject our paper because of it. So at least in my experience, I would say that adopting this more transparent reporting of results is not entirely without consequence. All I can say is that if we do decide to make this cultural shift, we also have to be more tolerant of imperfections in the “story” when we put our reviewer hats on.

By the way, I think a lot of people tend to think of selective reporting as a problem of experimental science. Not at all the case! Same goes for every analysis of e.g. some large scale dataset: if you checked for some signal in the data, you have to report the result, regardless of whether the result came out the way you wanted. It’s actually if anything even more of an issue in computational work in some ways, where many hypotheses can be tested with the same data in (relatively) rapid fashion.

There is also a bit of a gray area in terms of what to do about false leads. Sometimes, you have an idea that goes in a new direction that has nothing to do with the story of the paper. I don’t know what to do in this case. Certainly, science would be in some ways better for having these results out there, since there was probably (hopefully?) some basis for the experiment or analysis in the first place. But it may just serve to distract from the main thread of the paper, making it harder to follow. I don’t know how best to balance these competing and important principles, but I think it’s an important discussion for us to have.

I’m very curious how people will respond to this discussion. Ultimately, there is no form or checklist that can solve the issues we have in science. Pre-registration sounds like a bureaucratic solution, but in the end, it’s just a call for careful, honest thought about the work we do. I’m sure some people reading this will have a strongly negative reaction, much like I did at first. All I’m saying is “Sure, I hear you, just think about it.” 🙂

Tuesday, September 26, 2023

“Refusing the call” and presenting a scientific story

 When scientists present in an informal setting where questions are expected, I always have an internal bet with myself as to how long until some daring person asks the first question, after which everyone else joins in and the questions rapidly start pouring out. This usually happens around the 10 minute mark. This phenomenon has gotten me wondering what this means for how best to structure a scientific talk.


I think this “dam breaking” phenomenon can be best thought of in terms of “refusal of the call”, which is a critical part of the classic hero’s journey in the theory of storytelling. The hero typically is leading some sort of hum-drum existence, until suddenly there is a “call to adventure”. Think Luke Skywalker in Star Wars (Episode IV, of course) when Obi Wan proposes that he go on an adventure to save the galaxy, only for Luke to say “Awww, I hate the empire, but what can I do about it?”. (Related point, Mark Hamill sucks.) Usually, shortly afterwards, the hero will “refuse the call” to adventure—usually from fear or lack of confidence or perhaps just from having common sense. This refusal involves some sort of rejection of the premise of the proposed adventure, which then needs to be overcome.


I think that’s exactly what’s going on in a scientific talk. As Nancy Duarte says, in a presentation, your audience is the hero. You are Obi Wan, presenting the call to adventure (an exciting new idea). And, almost immediately afterward, your audience (the hero) is going to refuse the call, meaning they are going to challenge your premise. In the context of a scientific talk, I think that’s where you have to present some sort of data. Like, I’ve presented you with this cool idea, here’s some preliminary result that gives it some credibility. Then the hero will follow the guide a little further on the adventure.


The mistake I sometimes see in scientific talks is that they let this tension go on for too long. They introduce an idea and then expound on the idea for a while, not providing the relief of a bit of data as the audience is refusing the call. The danger is that the longer the audience's mind runs with their internal criticism, the more it will forever dominate their destiny. Instead, spoon feed it to them slowly. Present an idea. Within a minute, say to the audience “You may be wondering about X. Well here is Y proof.” If you are pacing at their rate of questioning, perhaps a little faster, then they will feel very satisfied.


For instance:

“You may think drug resistance in cancer is caused by genetic mutations and selection. However, what if it is non-genetic in origin? We did sequencing and found no mutations…”


Friday, July 16, 2021

Confusion and credentials in presenting your work

Just listened to a great Planet Money episode in which Dr. Cecelia Conrad describes how she dealt with some horrible racist students in her class who were essentially questioning her credentials. She got the advice from a senior professor to be less clear in her intro class:


This snippet reminded me of some advice I got from my postdoc advisor about giving talks: "You don't want everything to be clear. You should have at least some part of it that is confusing." This advice has really stuck with me through the years, and I have continued to puzzle over it for a long time. Like, it should all be clear, no? I always felt like the measure of success for a presentation should on some level be a monotonically increasing function of its clarity.

But… for a while before the pandemic, I was doing this QR code thing to get feedback after my talks on both degree of clarity and degree of inspiration, and I have to say I feel like I noticed some slight anti-correlation: when I gave a super clear talk, it was seemingly less inspiring, but when I got lower marks for clarity, it was somehow more inspiring. Huh.

Nancy Duarte makes the point that in any presentation, the audience is the hero, and you as the presenter are more like Yoda, the sage who leads the audience on their heroic adventure. Perhaps it is not for nothing that Yoda speaks in wise-seeming syntactically mixed-up babble. Perhaps you have to assert credentials and intellectual dominance at some point in order to inspire your audience? Thoughts on how best to accomplish that goal?

Friday, July 31, 2020

Alternative hypotheses and the Gautham Transform

As I have mentioned several times, having Gautham in the lab really changed how I think about science. In particular, I learned a lot about how to take a more critical approach to science. I think this has made me a far better and more rigorous scientist, and I want to impart those lessons to all members of the lab.

The most important thing I learned from Gautham was to consider alternative hypotheses. I know this sounds like duh, that’s what I learn in my RCR meetings, “expected outcomes and potential pitfalls” sections of grants, and boring classes on how to do science, but I think that’s because we so rarely see how powerful it is in practice. I think it was one of Gautham’s favorite pastimes, and really exemplified his scientific aesthetic (indeed, he was very well known for demonstrating some alternative hypotheses for carrier multiplication, I believe). There were many, many times Gautham proposed alternative hypotheses in our lab, and it was always illuminating. Indeed, one of the main points of his second paper from the lab was about how one could explain “fluctuations between states” by simple population dynamics without any state switching—a whole paper’s worth of alternative hypothesis!

Why do we generally fail to consider alternative hypotheses? One reason is that it’s scary and not fun. Generally, the hypothesis you want to consider is the option that is the fun one. It is scary to contemplate the idea that something fun might turn out to be something boring. (Gautham and I used to joke that the “Gautham Transform” was taking something seemingly interesting and showing that it was actually boring.) The truth of it, though, is that most things are boring. Sure, in biology, there are a lot more surprises than in, say, physics, but there are still far fewer interesting things than are generally claimed. I think that we would all do better to come in with a stronger prior belief that most findings actually have a boring explanation, and a critical implementation of that belief is to propose alternative hypotheses. Keep in mind also that when we are trained, we typically are presented with a list of facts with no alternatives. This manner of pedagogy leaves most of us with very little appreciation for all the wrong turns that comprise science as it’s being made as opposed to the little diagrams in the textbooks.

The other reason we fail to consider alternatives is that it’s a lot of work. It’s always going to be harder to spend as much time actively thinking of ways to show that your pet theory is incorrect, and so in my experience it’s usually more work to come up with plausible alternative hypotheses. Usually, this difficulty manifests as a proclamation of “there’s just no other way it could be!” Thing is… there’s ALWAYS an alternative hypothesis. All models are wrong. You may get to a point where you just get tired, or the alternatives seem too outlandish, but there’s always another alternative to exclude. I remember as we were wrapping up our transcriptional-scaling-with-cell-size manuscript, we got this cool result suggesting that transcription was cut in half upon DNA replication (decrease in burst frequency). I was really into this idea, and Gautham was like, that’s really weird, there must be some other explanation. I was like, I can’t think of one, and I remember him saying “Well, it’s hard, but there has to be something, what you’re proposing is really weird”. So… I spent a couple days thinking about it, and then, voila, an alternative! (The alternative was a global decrease in transcription in S-phase, which Olivia eliminated with a clever experiment measuring transcription from a late-replicating gene.) Point is, it’s hard but necessary work.

(Note: I’m wondering about ways to actively encourage people to consider alternatives on a more regular basis. One suggestion was to stop, say, group meeting somewhere in the middle and just explicitly ask everyone to think of alternatives for a few minutes, then check in. Another option (HT Ben Emert) is to have a lab buddy who’s job is to work with you to challenge hypotheses. Anybody have other thoughts?)

So when do you stop making alternatives? I think that’s largely a matter of taste. At some point, you have to stand by a model you propose, exclude as many plausible alternatives as you can, and then acknowledge that there are other possible explanations for what you see that you just didn’t think of. Progress continues, excluding one alternative at a time…

“Hipster” overlay journals

Been thinking a lot about overlay journals and their implications these days. For those who don’t know, an overlay journal is sort of like a “meta-journal” in that it doesn’t formally publish its own papers. Rather, it provides links to other preprints/papers that it thinks are interesting. On some level, the idea is that the true value of a journal is to serve as a filter for what someone thinks is science worth reading so that you don’t have to read every single paper. An overlay journal provides that filter function without the need for the rest of the (costly) trappings of a journal, like peer review and, uhh, color figures ;). 

There is one very interesting aspect of an overlay journal that I don’t think has been discussed very much: in contrast with regular journals, they are fundamentally non-exclusive, meaning that ANY overlay journal can in principle “publish” ANY paper. What this non-exclusivity means is that there is no jockeying between journals to publish the “obviously important” papers, which have a perhaps slightly elevated chance of actually being important. You know, like “we sequenced 10x more single cells than the last paper in a fancy journal” kind of papers. If you run an overlay journal, you never have to gaze longingly at those “high impact” papers—if you want to publish it, just add it to your overlay!

What are the consequences of non-exclusivity? Primarily, I think it would serve to diminish the value of “obviously important” papers. Everyone can identify them based on authors and number of genomes sequenced or whatever, so there’s really not that much value in including them per se. It would be like saying “Here’s my playlist, it’s like a copy of the Billboard Top 40”. Nobody’s going to look to your overlay journal for that kind of stuff (which you can readily get from CNS or Twitter). Rather, the real value would be in making lists of papers that are awesome but might otherwise be overlooked—essentially a hipster playlist. As an editor, your cache would be in your ability to identify these new, cool papers and making Michael Cera-esque mixtapes out of them. Can leave the Hot 100 to Casey Kasem/Spotify algorithms.

Measuring the importance of an overlay journal would also be interesting. Clearly, impact factor is not a useful metric, since anybody can make their impact factor as high as they want by including highly cited papers. I would guess a far more sensible metric would be number of followers of the journal (which makes more sense anyway).

Another interesting aspect of an overlay journal is that it can be retrospective. You could include old papers as well, highlighting old gems that may have been forgotten.

Of course, an interesting question is whether there is any difference between an overlay journal and someone’s Twitter feed. Not sure, actually…

Also, thoughts on existing journals that have hipster qualities to them? I vote Current Biology, my lab votes eLife.

Friday, July 17, 2020

My favorite "high yield" guides to telling better stories

Guest post by Eric Sanford


In medical school, we usually have five lectures’ worth of new material to memorize each day. Since we can’t simply remember it all, we are always seeking “high yield” resources (a term used so often by med students that it quickly becomes a joke): those concise one or two-pagers that somehow contain 95 percent of what we need to know for our exams. My quest of finding the highest yield resources has continued in full force after becoming a PhD student.


A major goal of mine has been to improve my scientific communication skills (you know, writing, public speaking, figure-making… i.e. those extremely-important skills that most of us scientists are pretty bad at), and I’ve come across a few very high yield resources as I’ve worked on this. Here are my favorites so far:


Resonate, by Nancy Duarte:

  • The best talks are inspiring, but “be more inspiring” is not easy advice to follow.

  • This book teaches you how to turn your content into a story that inspires an audience.

  • I received extremely positive feedback and a lot of audience questions the first time I gave a talk where I tried to follow the suggestions of this book.

  • This was both the most fun and the most useful of all my recommendations.


The Visual Display of Quantitative Information, by Edward Tufte:

  • Tufte is probably the most famous “data visualization” guru, and I think this book, his first book, is his best one. (I’ve flipped through the sequels and would also recommend the chapter on color from “Envisioning Information.”)

  • This book provides a useful framework for designing graphics that convey information in ways that are easy (easier?) for readers to understand. Some pointers include removing clutter, repeating designs in “small multiples”, labeling important elements directly, and using space consistently when composing multiple elements in the same figure.


The Elements of Style, by Strunk and White, pages 18-25:


Words to Avoid When Writing, by Arjun Raj


Raj Lab basic Adobe Illustrator (CC) guide, by Connie Jiang


There are many other great resources out there that are also worth going through if you have the time (Style: Lessons in Clarity and Grace by Bizup and Williams is another excellent writing guide), but for me these ones above had the highest amount-learned-per-minute-of-concentration-invested. 



Guest post by Eric Sanford



Wednesday, August 21, 2019

I <3 Adobe Illustrator (for scientific figure-making) and I hope that you will too

Guest post by Connie Jiang

As has been covered somewhat extensively (see here, here, and here), we are a lab that really appreciates the flexibility and ease with which one can use Illustrator to compile and annotate hard-coded graphical data elements to create figures. Using Illustrator to set things like font size, marker color, and line weighting is often far more intuitive and time-efficient than trying to do so programmatically. Furthermore, it can easily re-arrange/re-align graphics and create beautiful vector schematics, with far more flexibility than hard-coded options or PowerPoint.

So why don’t more people use Illustrator?

For one, it’s not cheap. We are lucky to have access to relatively inexpensive licenses through Penn. If expense is your issue, I’ve heard good things about Inkscape and Gimp, but unfortunately I have minimal experience with these and this document will not discuss them. Furthermore, as powerful and flexible as Illustrator is, its interface can be overwhelming. Faced with the activation energy and cognitive burden of having to learn how to do even basic things (drawing an arrow, placing and reshaping a text box without distorting the text it contains), maybe it’s unsurprising that so many people continue to use PowerPoint, a piece of software that most people in our lab first began experimenting with prior to 8th grade [AR editor’s note: uhhh… not everyone]. 

Recently, I decided to try to compile a doc with the express purpose of decreasing that activation energy of learning to use Illustrator to accomplish tasks that we do in the lab setting. Feel free to skip to the bottom if you’d just like to get to that link, but here were the main goals of this document:
  1. Compile a checklist to run through for each figure before submission. This is a set of guidelines and standards we aim to adhere to in lab to maintain quality and consistency of figures.
  2. Give a basic but thorough rundown of essentially everything in Illustrator that you need to begin to construct a scientific figure. Furthermore, impart the Illustrator “lingo” necessary to empower people to search for more specific queries.
  3. Answer some of what I feel to be the most FAQs. Due to my love of science-art and general artistic/design experimentation, I’ve spent a lot of time in Illustrator, so people in lab will sometimes come to me with questions. These are questions like: “my figure has too many points and is slowing my Illustrator down: how can I fix it?” and “what’s the difference between linked and embedded images?”. Additionally, there are cool features that I feel like every scientist should be able to take advantage of, like “why are layers super awesome?” and “how can I select everything of similar appearance attributes?”.
Finally, a disclaimer: This document will (hopefully) give you the tools and language to use Illustrator as you see fit. It does not give any design guidance or impart aesthetic sense (aside from heavily encouraging you to not use Myriad Pro). Make good judgments~

Full Raj lab basic Illustrator guide can be found here.