I’d like to share a recent article of mine about open data and code for a shared future of humanity:
https://royalsocietypublishing.org/doi/10.1098/rspb.2024.1515
Thanks @hikinghack for linking me with this community!
I’d like to share a recent article of mine about open data and code for a shared future of humanity:
https://royalsocietypublishing.org/doi/10.1098/rspb.2024.1515
Thanks @hikinghack for linking me with this community!
Dear @AsymptoticAspiration , first of all, apologies, I have just read the abstract of your paper, I will try to read all of it as soon as possible. As a first experience, I felt a bit alarmed about the suggestion to store loads and loads of data and code as opposed to papers. I am worried that, if no major revolution occurs in the way data and code are stored so that they don’t hurt the environment as much, then we may fill the world with datacenters of which 99% will be minutiae and trivial data and useless code. On the other hand, if we allow an evolutionary sort of approach (or let’s say, historical, in the sense of how science until right before the data revolution) where the best paper and datasets stay and are studied, then we’ll have more space in the planet for things like homes for people and environment for the planet to survive.
But, as I have just said, this is just a first impression from your abstract, and I look forward to chatting with you here! Welcome, dear friend!!
Best Wishes,
Haris
Hello @haris, Thank you for your interest in this! While I agree with you that there are environmental costs to storing data and code, I think in Ecology the vast majority of cases involve quite small repository sizes (in terms of storage costs relative to other things - e.g., chatGPT, consumer data, etc.). In some cases, datasets can be gigantic (e.g., global climate projections), but often those datasets are quite valuable. I think these conversations about what to save are important to have, and there is some discussion of this in the paper. With that said, the paper does not advocate for storing data and code “as opposed to papers”, but in addition to papers. IMO, papers are tiny, and any storage costs there are negligible compared to the hoards of useless stuff on the internet. Unfortunately, I don’t think your evolutionary/historical approach is how things have worked out because data and code haven’t historically been archived. So for those more pivotal papers, we are reliant on taking folks’ words for the outcome, and we aren’t able to ‘study’ or reanalyze things in different ways because the original data and code are not available. These research products were incredibly expensive to create, but then they are lost as soon as they are not archived, which can often be more wasteful than the cost of storing those products… because others have to go out and collect more data / recreate code to ask similar or related questions… just my two cents. But, again, I fully agree that we have to have an honest conversation about the environmental costs of data storage and ask ourselves whether the advantages of the data we are collecting and storing are outweighing the costs.