This forum is in permanent archive mode. Our new active community can be found here.

Programmers vs. Scientists on Coding

edited July 2011 in Technology
Just read this great short tidbit about how programmers see software versus how scientists see software.

http://www.johndcook.com/blog/2011/07/21/software-exoskeletons/

It's not something I ever thought about, not having worked with scientists, but it makes total sense. A scientist just wants to get their results. They can open a Python shell, import numpy and scipy, connect to the database, and have the results they need spit out with just a few more lines. That will make them very happy.

But copying and pasting those lines into a text editor is nowhere close to actually making a piece of software that anyone can reuse. There's only about a thousand things that need to be added to it to make it an actual usable program. Validating the input data alone is going to be a ton of work, depending on the complexity.

It's kinda weird when you think about it. You would think that programmers would be the ones who wouldn't mind having such an unfinished program that it would only work when actual programmers are operating it by hand. In reality, programmers are the laziest and they want the software to work on its own without any people touching it. Scientists just want the results as fast as possible regardless of any other factor.

Comments

  • edited July 2011
    A scientist just wants to get their results. They can open a Python shell, import numpy and scipy, connect to the database, and have the results they need spit out with just a few more lines. That will make them very happy.
    Using Matlab, or writing scripts, is considered programming in the scientific community.

    I actually wrote a python module for biologists to easily write up some primers/probes (genetic forensic stuff) in a Python interpreter, then run a bunch of calculations over it. They didn't want a spit and polished GUI or anything that would be desired for production use. They just wanted to be able to copy genetic sequences, paste it into something, make some modifications, and get results.

    Here is an example of programmer (me) doing reusable code (module) for scientists (specifically forensics specialists) to do their own "programming" in an interpreter and get results.

    EDIT: oh yeah I forgot to add the conclusion. After I wrote the software, the scientists decided they would simply order pre-made primer/probe sets instead of building them themselves. The software got used a few times to finish up a paper. If they ever start building primer/probe sets again, my software will have been long forgotten (even if it is still useful for the purpose). They might even ask someone else to write basically the same confirmatory code all over again.
    Post edited by Byron on
  • A scientist just wants to get their results. They can open a Python shell, import numpy and scipy, connect to the database, and have the results they need spit out with just a few more lines. That will make them very happy.
    Using Matlab, or writing scripts, is considered programming in the scientific community.
    This is pretty much my experience from working with a few scientists. I hadn't really thought about it though. I'd also postulate that this isn't exclusive to programming in science. I think that scientific work takes so much time and money that they want to get on with without worrying on the extra bits.
  • edited July 2011
    A scientist just wants to get their results. They can open a Python shell, import numpy and scipy, connect to the database, and have the results they need spit out with just a few more lines. That will make them very happy.
    Using Matlab, or writing scripts, is considered programming in the scientific community.
    No it isn't. At least not where I'm from. What is considered programming is writing 13Mb of mixed C, C++, and Fortan source code without a single line of comments. Furthermore it is considered programming to take the aforementioned program and write some more C, C++, and Fortran code around it (as well as modifying the original code in undocumented ways) and publishing that!

    Writing a ton of scripts just to get crappy software to work for your specific problem is a necessity.

    The two main reasons why this culture of undocumented code and barely functional programs exist are 1) There is no Academic merit for producing readable, maintainable code. You are not going to get more grant money, and your boss will not be happy when you tell him that you spent three weeks longer than necessary on an analysis because you wanted to produce a better tool. 2) There is a disincentive to publish your tools in that you making it easier for other people to publish research in your own area, something you'd rather do yourself.

    The underlying combining factor here is the way science funding is structured. You don't get funding for producing good tools.

    I could do a whole show about this.
    Post edited by Dr. Timo on
  • The underlying combining factor here is the way science funding is structured. You don't get funding for producing good tools.
    But if you were smart, you could turn your tools into products, sell them for cash, and get funding that way. No thinking outside the box I see.
  • Scientists just want the results as fast as possible regardless of any other factor.
    Yup.

    The thing is, almost all scientists wind up having to learn specialized software that fits exactly what they're doing. Thus, because we're learning something new anyhow, it doesn't really matter what we're learning. We just need to know that we're learning how to use the tool that gets us what we want.
  • edited July 2011
    Scientists just want the results as fast as possible regardless of any other factor.
    Yup.

    The thing is, almost all scientists wind up having to learn specialized software that fits exactly what they're doing. Thus, because we're learning something new anyhow, it doesn't really matter what we're learning. We just need to know that we're learning how to use the tool that gets us what we want.
    See, I think you're all looking at this the wrong way. Sure, if you do it your way you'll get some results very quickly, but that's just your psychological need for instant gratification. If you did things our way you might not see your first result for months. But after all that waiting you'll have more results per unit time than you can handle since the entire thing will be nearly completely automated. You can just sit on your ass and read the results all day without having to do shit. Us programmers like to wait and get a second marshmallow.
    Post edited by Apreche on
  • In a perfect world.....
  • In a perfect world.....
    So the world where programmers are in charge is a perfect world? I am agree.
  • So the world where programmers are in charge is a perfect world? I am agree.
    You realize it's programmers writing the specialized pieces of software that someone like Pete uses, right?
  • edited July 2011
    Sure, if you do it your way you'll get some results very quickly, but that's just your psychological need for instant gratification.
    It's not really a need for instant gratification per se.

    Let's say you, as a scientist, secure some grant money to do some project - tracking whales or something. Nobody makes the software you need, so you hire a dude and work with him to get him to build you the software that you need.

    You hired a guy to make a custom tool. Then, you learn how to use that tool and use the hell out of it to do the research you need to. This is no different than a factory having a specialty piece of equipment fabricated for them.

    This scheme doesn't really include time or money to continue to develop the software beyond its initial use. Because what we need is so very specialized and project-dependent, it doesn't make sense to invest the extra time or money to develop the tool more fully - instead, we build what we need when we need it and use it until we don't need it.

    A lot of companies pay programmers lots and lots of money to develop software for scientific applications. It's a lucrative field, because scientists by and large don't have the time or the inclination to troubleshoot software - we just need the thing to work.

    Ultimately, this works out in the favor of the programmer, who has a guaranteed source of income from scientists. And it works out for the scientists because it allows us to focus all of our energy on science and not building the tools that we use.

    There's a functional limit to the DIY mentality. Yes, you could do it all yourself - but it'd take so much of your time that you wouldn't get done the thing you want to get done in enough time to matter. That's why we work in teams.
    Post edited by TheWhaleShark on
  • Well, the point of what the article is saying is that you are building incomplete tools that require a specialist to operate them. If you cared, you could get a tool that operates itself.

    Take frontrowcrew.com for example. Back in the day Rym would have to put ID3 tags on the file, upload it to libsyn, copy and paste the URLs back and forth. It was a huge pain, but it got the job done quickly. That's the scientist way. Nowadays you upload the MP3 and it is automatically tagged, uploaded, twittered, bitly'd, and forum'd. Soon it will also be facebook'd. All of that without any human interaction beyond pressing the go button. Sure, it took some time to code that up, but it has already paid off in terms of time savings and aggravation.

    Also keep in mind that some of the biggest things in the world were accidental side projects. Twitter was an accidental byproduct of fucking ODEO. Remember that? Django was an accidental byproduct of a Kansas newspaper. This is what happens when you let programmers do what they do. You start out with a treasure map pointing towards X. But along the way you end up finding all sorts of other treasures if you would just stop and dig for a little while. Often those treasures are bigger than the one you were originally after. It's not Alladin's Cave of Wonders. You are allowed to take all the treasure.

    Imagine two people have to cross over 100 pits, Pitfall style. The first guy makes a grappling hook in about a minute and starts crossing pits immediately. The other guy sits down and starts building a hook-shot, Zelda style. By the time the hook-shot is done, the first guy is already across half of the pits. His arm is getting tired and his speed at crossing pits is slowing down. Still, he has a huge lead! About a minute later the hook shot guy passes him and then finishes crossing all 100 pits. He then goes back to the beginning and does the whole thing again before the other guy even finishes once. That's the power of letting computers and robots do all your work. It's actually even more powerful with computers since you can buy more computers and have the work being done multiple times simultaneously, depending on the nature of it.
  • edited July 2011
    Well, the point of what the article is saying is that you are building incomplete tools that require a specialist to operate them. If you cared, you could get a tool that operates itself.
    This depends entirely on the application in question. Some of the software I use is thoroughly developed by professional coders and rolled out worldwide to multiple testing platforms. They get input from users and make changes in future builds. What you want happens all the time in the scientific world. Applied Maths makes a robust piece of software for the bioinformatics field, and they're constantly developing this tool.

    Heavily customized jobs also happen frequently. I get the impression that the author of the article is talking about the latter situation because he is unfamiliar with the former.
    you are building incomplete tools that require a specialist to operate them
    This is not what the article is saying. From your perspective, the tool is incomplete because you could build additional functionality into it. From the scientist perspective, the tool is complete because it has the functionality we care about.
    If you cared, you could get a tool that operates itself.
    Caring has very little to do with it. Money is the bigger consideration, and everything is a cost-benefit scenario.

    I could buy an automated 96-well DNA extractor for $100,000, and the kits to use it for about $1500 for 300 reactions. That adds $5 to the cost of processing a single sample, but might make up for the cost in saved labor. Of course, each 96-well plate is only usable once, so if I use less than a full plate, I'm basically paying more per sample for the extraction.

    Manual DNA extraction via spin columns costs me ~$250 for 100 reactions - half the cost of the automated kit. It also maintains its cost per reaction no matter the throughput, because each tube is individualized. And it involves using a centrifuge, which is something we already have.

    So when I'm deciding what tool I want to use, I have to do a cost-benefit analysis: will it make sense for me to invest my resources in this shiny thing if I'm only going to use 30% of its capacity? Instead, I'll put for the minimum resources required to get the job done and direct my savings elsewhere.

    I understand your perspective - you are a man who builds tools. Your perspective is not the only one, though. As someone who needs to put tools to use, I can tell you that sometimes I don't need a tool that does everything. Sometimes I just need a goddamn screwdriver.
    Post edited by TheWhaleShark on
  • I understand your perspective - you are a man who builds tools. Your perspective is not the only one, though. As someone who needs to put tools to use, I can tell you that sometimes I don't need a tool that does everything. Sometimes I just need a goddamn screwdriver.
    Yes, you obviously just use a screwdriver right now. The most efficient path is to actually start working right away with the screwdriver and then trade off and start using the power screwdriver once I've finished preparing it. The problem is you have low ambition and narrow view. You are only interesting in getting your results and finishing your job. You could along the way also get a bonus prize.

    This bonus prize mentality is very prevalent in technology. Take Amazon for example. They started out selling books. To sell books they needed serious hosting. They made serious hosting for themselves. Now Amazon Web Services is a gigantic successful side project from a company that started out selling books. Also, almost every single tech company collects vast amounts of data merely as a side-effect of doing business and having a database. They make lots of money reselling that data to interested third parties, even though their primary business might be movies or some such.

    Lots of great scientific discoveries happened by accident when someone was working on something unrelated. I have a feeling that if the mentality described in this article is for real, that lots of potentially great software never came to fruition that could have.
  • Lots of great scientific discoveries happened by accident when someone was working on something unrelated.
    Ehhh...not so much. A lot of discoveries happened because we were trying to discover them. History has this funny way of exalting particular scientists and telling nice stories about genius discoveries and "breakthroughs" - the serve to encourage people to do better.

    But they're fairy tales at best. Breakthoughs and single geniuses don't actually exist in the way we think they do - they just happen to be convenient milestones. What usually happens is that you'll be doing a series of experiments, get some small unexpected result, and dig deeper. You intentionally investigate the anomalies, explain them, modify your original hypothesis, retest, and draw a new conclusion.

    tl;dr: "Eureka" moments don't happen.
    You are only interesting in getting your results and finishing your job.
    Yes, we are interested in setting goals and taking discrete steps to get to those goals. The problem is that your "bonus prize" concept is fundamentally inconsistent with how scientific investigations work.

    I have a question that I'm investigating. I don't know what the answer is. I need a tool to answer the question. How am I supposed to build the tool to account for subsequent investigations when I don't yet know what those investigations will be? This is why science progresses the way it does; we draw conclusions and then figure out what those conclusions mean before proceeding.

    Yes, we have a narrow view because that view is fundamental to proper science. You ask a specific question and design an experiment to control very specific variables so you can draw a very specific conclusion. This is why scientists are endlessly frustrated at the non-scientist; you literally do not understand how specific the inquiry process is.
  • edited July 2011
    Sure, it took some time to code that up, but it has already paid off in terms of time savings and aggravation.
    So I think the scientific community sucks at sharing code or taking the time to research if anyone else has already written the code. The exception is when a research paper explicitly says code exists and is available to do something.

    If research scientists treated code as they do actual research (i.e. reading up on what's out there and confirming it or building off of it), I believe efficiency could be increased. However, doing so in a way that doesn't bog down one scientist from his or her work does require a better understanding of computer science; not just programming, but computer science.

    Think of it: the scientist needs code for a short time for a particular use. In order to benefit other scientists, this one needs to understand software engineering well enough to make a tool that suits his or her particular use and could be used by others in the future with maybe a slight modification. Modular programming is a good example of a concept that would need to be understood and utilized to make this code sharing realistic. These are things that tend to differentiate a programmer from a computer scientist.

    The good news is that computer science (and not just programming) is becoming more and more ingrained in the education of everybody in the sciences and will continue to be so as computer programming becomes more ubiquitous.
    Post edited by Byron on
  • edited July 2011
    How am I supposed to build the tool to account for subsequent investigations when I don't yet know what those investigations will be?
    Very easily, by not worrying about subsequent investigations. You build a perfect tool for the current investigation. But you build it in a proper way. Software is incredibly flexible when designed properly. If you build a tool for your current investigation you will never be able to use that tool ever again in the future. If I build the tool it will work for your current investigation and also any other investigations that are even remotely similar, even though we don't yet know what those investigations are. It will be coded in such a way that it will be trivial to modify it or extend so we can reuse it later and other people can also reuse it.

    See how many different systems you can run Linux on, including toasters? You think Linus was thinking about playing Doom on smart phones when he wrote Linux in the first place? Hell no. He just wrote it in such a way that it was incredibly flexible and reusable. That's why the same software can be in your home router, your television, your phone, and the server hosting this forum.

    At work all the time people will ask for the software to do something new. I will often, but obviously not always, say that it already does that. Nobody ever asked for that feature before. Nobody ever used that feature before. The functionality just already exists as a byproduct of the software being well designed. If you write your software properly now, it will also be the software you need later, even though you don't know what you will need later.
    Post edited by Apreche on
  • edited July 2011
    You build a perfect tool for the current investigation. But you build it in a proper way. Software is incredibly flexible when designed properly. If you build a tool for your current investigation you will never be able to use that tool ever again in the future. If I build the tool it will work for your current investigation and also any other investigations that are even remotely similar, even though we don't yet know what those investigations are.
    Well, sure, we do that all the time. We have an automated assay platform that does a specific kind of test, and can do it in multiple settings. Shit, that's common. I don't understand how the author thinks we don't use stuff like this.
    If you write your software properly now, it will also be the software you need later, even though you don't know what you will need later.
    Sure, but we often need someone to develop that functionality. Maybe the software can do it, but it's not currently optimized to do so. So we pay someone to optimize it for that new function. Again, this is a ridiculously common occurrence in science.

    I think that blog poster probably doesn't work closely enough with scientists in the field to actually understand what they're doing. This is a very very common issue between scientists and software developers - which is why some really big companies have their own programmers.
    Post edited by TheWhaleShark on
  • dsfdsf
    edited July 2011
    I will say, it's pretty difficult for people new to the industry, like me, to figure out how to write a good and expandable program from the ground up. I had to rewrite the last program I wrote about 4 times to get it where I wanted it. But I did it at home and to my bosses the rewrites where invisible. They got what they wanted on the time frame they asked for but I put way more man hours into it then they realized because I would get to a point and say to myself,"this isn't going to be able to be reused if I keep walking down this path." But hey, learning experience right?
    Post edited by dsf on
  • I will say, it's pretty difficult for people new to the industry, like me, to figure out how to write a good and expandable program from the ground up. I had to rewrite the last program I wrote about 4 times to get it where I wanted it. But I did it at home and to my bosses the rewrites where invisible. They got what they wanted on the time frame they asked for but I put way more man hours into it then they realized because I would get to a point and say to myself,"this isn't going to be able to be reused if I keep walking down this path." But hey, learning experience right?
    That's the way you get some skills. Eventually when you have the skills you can demand the mad moneys.
  • I think this could be Programmers vs. Non-Programmers on Coding. I write python scripts all the time, and use scripts I've written myself on a daily basis. I have scripts to write HTML for blog posts, and I've written two CMS from scratch to make two different websites with different needs.

    If a programmer looked at my scripts they'd be horrified! If anyone else tried to use them, they wouldn't have a clue what to do! Am I interested in making a GUI? Nope! Am I wanting it to work with other programs? Nope! In each case I just copy and paste the results, or upload the resulting batch of html files, and I'm happy. I don't want results as fast as possible, but I do like to be in full control of what results I do get, and the simplest way for me to get those results is for me to be in control of the entire process, that means the scripts have to be simple enough for me to write and understand everything that's going on, which means the last thing I want is a programmer getting their hands on the scripts and making them "better".
  • Programmers vs. Non-Programmers on Coding
    Anybody who codes is a programmer. It is the Computer Scientists (or more specifically Software Engineers) vs non-(the thing on the left side).
  • Okay, I was meaning professional programmers on one side, and everyone else who programs on the other, but sure.
  • edited July 2011
    Okay, I was meaning professional programmers on one side, and everyone else who programs on the other, but sure.
    I would still argue that many professional programmers are pretty shitty using the metrics that Scott is using.

    While I studied CS at uni, we were frequently told "If you want to learn how to program, get an associates or go to trade school." Many professional programmers did just that.
    Post edited by Byron on
  • edited July 2011
    The underlying combining factor here is the way science funding is structured. You don't get funding for producing good tools.
    But if you were smart, you could turn your tools into products, sell them for cash, and get funding that way. No thinking outside the box I see.
    There are so many problems with that, I'll list a few:

    1) Transparency and peer review. It is highly questionable to claim that a closed source chunk of bits produced the results you just published and that no one can look at the source to verify the analysis. Opening up the source makes it pretty much impossible to monetize a program, since, as a scientist, you don't want to / can't monetize on any service associated with the software.

    2) You get funding to do research. If you do computational tools instead then a) you won't get any more funding, b) income from the tools is either illegal or doesn't belong to you.

    3) The theory and the results are cool but the bits in between are booooooooring. Most scientific computations are trivial and optimal implementations are known. Apart from problems associated with scale and complexity that grows with scale there is not that much pure CS innovation going on.

    4) The theory and the results are cool only in your own opinion. Most scientific computations do not apply to anything even remotely marketable.

    Scientific computing serves science first and foremost, it cannot be easily made to serve other purposes. Until science funding changes so that scientists actually benefit from creating extensible, maintainable, and transparent code, scientific programs will not be any of those things.
    Post edited by Dr. Timo on
  • 1) Transparency and peer review. It is highly questionable to claim that a closed source chunk of bits produced the results you just published and that no one can look at the source to verify the analysis. Opening up the source makes it pretty much impossible to monetize a program, since, as a scientist, you don't want to / can't monetize on any service associated with the software.
    You can make money with open source software. Worst case you can follow the Nessus business model. Basically you put two identical zip files on your site. One of them you label as the enterprise version and charge money for it. You would be surprised how many people pay.
    2) You get funding to do research. If you do computational tools instead then a) you won't get any more funding, b) income from the tools is either illegal or doesn't belong to you.
    That is a serious problem that I have no answer for. All I can say is that you do computational tools in addition to research, not instead of it.

    3 and 4 I got nothing.

    The only other thing I can add is that at the very least you could contribute your code to an existing open source project like scipy or something. Then you'll at least be forced to get it into good enough shape that they will accept your patch, and somebody elsewhere might reuse it. Better yet, you can save all the time in the world if you find an open source project that already has what you need.
  • edited July 2011
    you can save all the time in the world if you find an open source project that already has what you need.
    I've posed this to scientists before. The answer is usually that they have already learned how to make the tools they need themselves. The time it would take to learn how to use someone else's code would take longer than writing it from scratch. Although that seems foreign and sounds almost like a fallacious, I can't really make a value call on that judgment for other people. Being trained for CS, I don't have that problem, so open source solutions are super good. I always had to sit down and show the software I wrote for scientists to them, which was very quick to do, but otherwise they wouldn't have used it.
    The only other thing I can add is that at the very least you could contribute your code to an existing open source project like scipy or something.
    Again, this takes more understanding of computer science or software engineering to plan out the software or make it look nice. Most technical/scientific people without a CS background do not have the skills to do this nor the time to learn those skills. And again, I think this is changing as universities broaden science/tech programs to include some amount of CS.
    Post edited by Byron on
  • Again, this takes more understanding of computer science or software engineering to plan out the software or make it look nice. Most technical/scientific people without a CS background do not have the skills to do this nor the time to learn those skills. And again, I think this is changing as universities broaden science/tech programs to include some amount of CS.
    Again, this goes back to technology needing to be the fifth subject in all school. Everyone needs to know it.
  • Timo, you do physics right? I've noticed that the sciences closer to physics use lower level coding systems, and the farther ones tend to gravitate towards Matlab and simple scripts.
    1) Transparency and peer review. It is highly questionable to claim that a closed source chunk of bits produced the results you just published and that no one can look at the source to verify the analysis. Opening up the source makes it pretty much impossible to monetize a program, since, as a scientist, you don't want to / can't monetize on any service associated with the software.
    You can make money with open source software. Worst case you can follow the Nessus business model. Basically you put two identical zip files on your site. One of them you label as the enterprise version and charge money for it. You would be surprised how many people pay.
    The problem with this idea is what does the scientist care? If there is really money to be made of of either a program, device a scientists makes to do his real work, that tends to get passed up the food chain to whomever "owns" the scientist and they do something with it.
    To give an example, I was working in a Biophysics lab and they needed a tool to do image analysis. They needed to measure the size of some vesicles, pipet sizes, the amount of the vesicle in the pipet, and often there's a bead that needs to be tracked. They had code that did some of this in Matlab, and really the code I had to write didn't need to be much more complicated. So I write this code and it's generating data. After this point what is going to be worth the effort of publishing this in any meaningful way? Most of the background code is common knowledge algorithms, and most of the front end is very application specific. I could sit down and write a GUI to make the code idiot useable, but all of the people working with it know whats going on. They can read code, I can just leave notes in the code so they can optimize the code for a specific run. Any further development of this tool, is time better spent making more tools.
  • The problem with this idea is what does the scientist care? If there is really money to be made of of either a program, device a scientists makes to do his real work, that tends to get passed up the food chain to whomever "owns" the scientist and they do something with it.
    If I "owned" a scientist, I would definitely want to extract as many revenue producing products from them as possible.
Sign In or Register to comment.