Soil Chromatography to evaluate soil state

We are working with a community of farmers campesinos to evaluate soil fertility and we find this biodynamic Pfeiffer circular soil cromatography test
It is really simple and low cost.
Is anyone familiar with it? @dornawcox @gbathree @jarancio
There is also a book about it but in spanish
There are a few interesting studies about the “calibration” of the chromatograms with traditional quantifiable methods
I’m gathering a few resources on this here.

1 Like

I’ve seen that once before but never used it. Actually, it’s a good candidate for using machine learning - you have a great deal of visual information, fairly hard to interpret due to complexity, but if analyzed with enough context could tell you a lot.

I’d love to see the highlights from those studies - are there clear correlations? Could you post a little summary of what you find? I’d be super interested to know!

At Our-Sci we’ve been kicking the idea around of collaborating with existing testing labs, who already have a large amount of sample throughput, to correlate with simpler, alternative methods. This would be a case example to apply that strategy. Imagine if you could sit in a testing lab for 3 days which does 200 samples a day and calibrate this against actual results? That’s a very achievable goal.

I think in terms the difficulty of using machine learning for such a task - it’s surprisingly well developed and easy to use in almost every available programming language (including javascript!), especially for consistent images like these. In fact my guess is @spMohanty could give some advice on that.

1 Like

Very interesting, I have just started doing a video course on deep learning and would love to find some non toy applications to work on. From what I have learned so far it does seem like deep learning, especially when re-training an existing image recognition net, could be very effective for something like the Pfeiffer circles.

Of course, for re-training, you would need quite a lot of samples and knowledge about the soil from those samples to begin with. Digging a bit, I found a reference to this paper in one of the papers in your Google Drive where they mention such a database.

The group in MCRC has collected around ten thousand soil samples from many parts of India. These soil samples have been analyzed by a sister company in the Murugappa group, E.I.D. Parry, working on soil and fertilizers. At the same time the MCRC group has prepared chromatograms for all the soil samples. These chromatograms have been scanned at IIT Madras, where we have also built a CBR system that stores cases as image features along with soil properties, into a system called InfoChrom.

1 Like

Debra Solomon did a three month residency on this technology on this methodology in our Open Wetlab recently. She now has a lot of experience with the technique. It has been exhibited too under the label of “Soil Portraits”

She runs a lab in urban agriculture and you can reach here through that

1 Like

Hi all
Thanks for your comments.
I’m just starting to dig in this soil chromatography world an it seems really promising.
Some of the things that I’have found @gbathree
All coincide that soil chrom is a good candidate for image processing. There are a few papers that use Case Based Reasoning along with soil properties determined at commercial labs to retrieve soil sample properties. The databases used contain between 10000 to 20000 cases.
The reported correlations to lab results are pretty good. In this case 100% for EC, Ca, Mn, Mg, Cu, 94% for pH, 92% for OC,99% for N and P, 97% for K, 94% for Fe, 93% for Zn, 89% for S rec
There are descriptions of the image processing algorithms but nothing “open”

Collaborating with existing testing labs would be a good opportunity to create the large databases needed. Another approach can be start sharing chroms and results in a more or less standarized way…?
We have already a few chrom tests along with lab results for some parameters. I think we will try to advance in the image processing track at least to learn about it. We can use some help on that @kaspar
@pieter thanks. I will contact Debra Solomon to know more abour it’s work. ‘Soil Portraits’

I will keep you updated about our advance


Sheesh that paper is kind of wonky to read, but I think I go it. It shows the soil chromatograms relate to actual nutrient values (N, P, K, Z, … ) based on rough category (high, middle, low) most of the time. So when they say 100% for EC, it means the chromatogram is 100% accurate in being in the same category (high, medium, low) as the lab sample. So it’s not very quantitative, but it’s shows that there’s information there for sure.

Also, crazy variation between labs - sheesh! How can we correlate anything to anything when the calibration standard is such a mess!

I’m super into this. @nanocastro What’s the simplest, most accurate (quantitative) thing we could all measure at home with little upfront cost to create our own database? I think if we can show proof of concept on a few parameters that’d be awesome. It seems there’s folks capable and interested in doing the analytical component… just need some data (?)

1 Like

I wouldn’t say I am capable yet, but am interested.

Thank you for starting this thread and posting the resources. I do not have experience with the technique but am excited to follow the discussion. I will reach out to to invite comment from some soil health colleagues.

I think a generalized software supported indicator evaluation and feedback process would be very valuable to streamline the process of validation and calibration of experimental measurements in relationship to established protocols.

Deep and diverse common libraries to draw upon seem like an important building block.

Thanks @gbathree for the more carefull reading.
Next week I’m going to talk with the soil lab guy (Leandro) to ask him about what kind of data he already has and about the simplest and most accurate thing that we could all measure to start our own database…

When I was looking for methods to test N in soil I have found this one. Is a colorimetric method that uses reagent grade enzymes (instead of cadmium for the reduction of nitrate to nitrite). They also develped (along with Pearce group at MTU) an open source colorimeter. Maybe this is a candidate test even do is not going to be so cheap here (6 dollars per sample).

De nada @dornawcox and thanks for joining the conversation.

I will get back with news next week

1 Like

Finally some news. Today I went to the soil lab and this is what they told me…
The tests that they are doing along with the chromatographies are:
-Salinity: conductometric method
-Organic matter: by carbon oxidation (Walkley-Smolik)
-total breathing: Alef method
-Total Nitrogen: Kjeldahl
-Extractable phosporus: carbonic extraction 1:10 relation
-Potassium: exchangeable with ammonium acetate pH 7 1N
-Texture: Bouyoucos method
-Calcareous content: calcimeter method with ClH
None of this test can be done on the field, they all require a lab.
They expected to have around 100 samples by november. We can try to do some machine learning with this small database. What do you think?

In order to populate a database more easyly Leandro (the soil lab guy) was really interested on the NEMI on-field enzimatic method. We are going to see if it’s possible to replicate the colorimeter and implement this test also.
That is all for the moment
Keep you updated

1 Like

My initial intuition is that 100 samples is far from enough. We need a training and validation set so it’s just 50 samples to re-train the net. I think something on the order of 1000 would be needed.

Bare in mind I have just started on machine learning and don’t really know what I am talking about. It’s worth getting some additional advice from someone that does.

We used machine learning on some photosynthesis data, and I’m familiar with it though no expert. It’s improving all the time so numbers can change by a few times as improvements are made, but I tend to agree with Kaspar… 500 - 1000 is definitely safer, especially for image data.

Also, in terms of ease of use and proof of concept for free, you could consider using Watson. I’m told by a friend that you can use Watson’s API for free up to 1000 calls per day - .

But as image processing is the bread and butter of deep learning frameworks, it may be easy to set up there also.

Thanks Kaspar and Greg for your insights
So we are kind of far from those numbers…I think that for the moment we are just going to gather some more data while we learn about machine learning

Hello Goshers,

I am interested in evaluating the quality or contamination of soil. I would like to gather data about this.
Is it easy to make such sensors? would you guide me? I would like to gather data in my city, possibly elaborate a local cartografy and present later this data with a sound interface to make an art performance.

warm greetings, Alexis

hello pieter, how is the visual data of the chromatographs? can I re-interpreted this data in sound?

Hi everyone,

I came a bit late to the thread.

Much of deep learning is all about end-to-end learning, i.e. have as little (or none at all) domain specific information in the actual learning pipeline as possible.

And you guys came to the conclusion already, that the presence of the dataset is actually the bottleneck.
The actual implementational details of setting up the networks for training and training the networks are rather trivial, and is a question about capacity building in the community, and as my research is in that particular domain, I would happy to help in capacity building around Deep Neural networks etc in the community. Apart from that, another important point to remember is that we should focus on what we specialise in, and delegate what we do not specialise in. So instead of trying to be experts in Machine Learning ourselves, we can try to delegate it to others who work on it day in and day out. On that front, I run a small Machine Learning for Open Science with Open Data platform called as crowdAI, where we run Machine Learning challenge on interesting problems in Open Science and our community of Machine Learning researchers try to solve it for a small academic prize (usually a travel grant to visit a conference in Switzerland, or visit the headquarters of a partnering organization like UN, etc)

In terms of steps forward, the major items that need to go in the roadmap are :

  • define a series of well defined problems
  • standardize the datasets to go along with those problems
  • define the evaluation metrics to be able to judge the progress on the problem over time.

While trying to deal with these above mentioned items sequentially, we will run into the very problems you also discussed. For example,

  • Agree we need data, but how much data ?
    The answer to this question is not straightforward, and most academics refrain to put down a number in writing. In context of Image Classification (because of which Deep Learning became popular), the only academically written “rule-of-thumb” comes from this recent book, where the authors cite : 5000 images per class, a number I personally disagree with from my experience on the problem, but I do understand why they would want to be on the safe side and cite a large enough number.
    To address this problem, the pipeline we use in research, is to collect a “reasonable amount of data” (which in itself varies form problem to problem, or is just defined the constraints/costs we have in the data collection procedure), and then establish a proper mathematical metric to evaluate if the model is actually learning (or is it simply correlating the inherent bias in the dataset we collected), and then keep iterating until we agree that the model is actually learning.

Another common pitfall is too-good results, which are very common in the deep learning era, when you use a really large and complex network with tens of millions of parameters on a really small dataset. The model ends up pretty much learning every single and subtle bias in the dataset (in case of an image dataset, image the subtle variations in lighting because all images of a particular class were collected on the same day with the same lighting , etc) and giving up a really bloated near perfect result even in your validation test.
I have seen this recently more and more on spectral datasets, when some of my students would come up with unbelievable results on a small dataset of spectral fingerprints, and the reason is always because they threw a mammoth of a model on a pretty small dataset. This is also something @gbathree pointed out in one of his previous posts, where he says : “How can we correlate anything to anything when the calibration standard is such a mess!”.
These considerations are way much more important in problems where we have no idea of what to expect and are suddenly getting unbelievable results.

If you guys do manage to collect a small dataset, I would be happy to play with it and also help in defining a standard set of problems around it along with the corresponding evaluation metrics.



Dear soil-geeks,

we have also started a focused network and activities around Open Soil Research called “HUMUS.Sapiens” and we are still running our crowd-funding campaign, double backed through the Science Booster on wemakeit.

It would be great to use this opportunity and bring some people together. The first meeting is here in Schaffhausen, Switzerland on 4-6. May, followed by a technical dev-phase in Fribourg, Switzerland, 7-13. May.

we still need some backers… THANKS!

1 Like

I’m glad this thread persists, thanks @spMohanty for restarting it! I have some possibly useful updates:

We are building a soil and food testing lab right now, and (fingers crossed) we will be running 1000 - 2000 samples from all over the US this year. On the soil samples, we’ll be running total organic carbon (via loss on ignition method), carbon mineralization (aka soil respiration, using CO2 sensor), and minerals (from Na --> U at ppm levels). We’ll be keeping 30g of all of our soil samples to collaborate with other labs, and I would love to try to try to tackle some of the questions discussed here.

We are intentionally not measuring soil properties which change a lot (like the N P K kind of stuff) because the project we’re running this for isn’t interested in that, but someone could certainly take our left-over samples and run those things if they wished.

Perhaps that’d be a good project for Humus sapiens? I dunno, but anyway throwing out there what we’re up to.

1 Like

Hi all
Thanks @spMohanty for you insights. Is going to take some time before we can gather the small database here but we are already in the road.
Last year we organized a small demonstration of soil chromatography in two different campesinos communities. They show us a lot of interest in advancing these kind of research / methods for agroecologycal management of soils and biofertilizer productions. In line with this interest, and because we are trying to gather and connect some people around OSCH, we will be running with some goshers an open hardware workshop this Thursday at the local Agronomy university where, hopefully, we will find more interested people to advance this kind of soil research.
Unfortunately I will not make it to GOAT or Humus sapiens but I will follow from the distance. I think they are really interesting initiatives and the outcomes can be helpful very helpful in our context.

Please check out my research I am currently doing on Soil Chromatography. Been working on this for a number of years now. Let me know how I can help.

1 Like