Statistical Analysis Report
| Customer | Innovation Centre for Organic Farming, Tove Mariegaard Pedersen |
| Customer ID | DA00204-24 |
| Project | Markens motor (2024). |
| Sample Type | Soil |
| Number of samples | 30 samples |
| Type of data | Marker gene sequencing of bacteria (16S) and fungi (ITS) |
The Project
This report describes the biodiversity of the field bacterial and fungal microbiome in 30 samples collected across 30 organic fields in Denmark in 2024. In this report we analyse the biodiversity measures for fungi and bacteria in the soil of the fields. Analysis of single taxa and overall composition (beta-diversity) is presented in the additional R3 reports.
With the marker-gene (amplicon) sequencing data (also called metabarcoding) that was generated for two markers (ITS for fungi and 16S rRNA for bacteria) across fields, we calculate measures of biodiversity. The technical term in microbiome bioinformatic context for the measures is ‘alpha diversity’ and we calculate the two measures observed (or richness) and Shannon. The measures are introduced in Report 2.
We use the amplicon data specifically for biodiversity analyses as it has some advantage over shotgun sequencing data when it comes to diversity estimations. Amplicon data is more sensitive to less abundant taxa because of the targeted nature of the method (we use DNA-primers to fish for the DNA from the type of organism we want to detect (here fungi or bacteria)), and because we estimate the diversity based on all detected organisms independent of their presence in a reference database, we can include many more less known organisms in the measures resulting in a measures that better reflect the true diversity and not just the “diversity of well known organisms”. In the report all evaluations will be performed for both measures, for both bacteria and fungi.
For each field, one sample was collected to represent the field. These samples were taken for each field based on 16 sub-samples taken in a w-pattern throughout the field.
We split many analysis by JB values into 2 groups based on both analyses of this dataset and prior analyses of data from 2021-2022 where we saw a strong association between JB and microbiome profiles. We split the analysis based on JB groups as we find the effect of JB overshadows the associations that may be between the microbiome and other variables of interest.
The JB groups are:
A special focus for this project is the association of the microbiome with biodynamic farming, groupings with grazing versus mowing, and annual crops. The aim is to evaluate how the microbiome of the fields associate with other field parameters of both agricultural practices and soil indicators of nutrients, type and structure.
We initiate with an overview of metadata variables of the project.
Practical notes
In “Report 3”, biostatistical analyses are performed and the results
presented, building on the data generated and evaluated in the 4 reports
(two for fungi and 2 for bacteria data) (Report 1: Sequencing
and data processing report, Report 2: Microbiome profiling
report).
Through biostatistical analysis we relate the microbiome biodiversity to
the key variables.
Note on preprocessing for biodiversity analysis
As part of our biodiversity analysis pipeline, we have introduced a key preprocessing step: taxa that are detected in only one sample are removed prior to the calculation of alpha diversity metrics. This filtering step helps reduce the influence of extremely rare or potentially spurious taxa, placing greater emphasis on the more consistently detected members of the microbial community. In doing so, we aim to enhance the robustness and reproducibility of the diversity measures, while maintaining a stronger focus on ecologically relevant and common taxa.
We have included all variables in the collected metadata which have some level of variation across the fields (i.e. must not be all the same value across fields (like all organic) or with only very few deviations like 90%+ identical values).
Below are two overview tables; the first show the variables by category including a short description and the second show the summary statistics of each variable allowing us to inspect the variation and subgroups of fields that the variable represent.
| Category | Report_variable | Description |
|---|---|---|
| Geografi og vejrdata | GPS-koordinater | GPS-koordinater |
| Geografisk placering | Geografisk placering | |
| Geografisk placering (gruppe) | 1=Vendsyssel (V), 2=Region Nord u. Vendsyssel (NJ), 3=Region Midtjylland (MJ), 4=Region Syddanmark u. Fyn (SJ), 5=Sjaelland (S), 6=Lolland, Falster, 7=Fyn (F), 8=Bornholm (B) | |
| Nedboer | Nedboer i alt april-september paa kommuneniveau | |
| Toerkeindeks | Gn.snit toerkeindeks april-september paa kommuneniveau | |
| Middeltemperatur | Gn.snit middeltemperatur april-september paa kommuneniveau | |
| Vurdering af marken | JB | Vurderet JB-nr. |
| Regnorme | Mange regnorm: 1 Faa/ingen regnorm: 0 | |
| Kold jord | Kold jord: 1 Ikke kold jord: 0 | |
| Jordtemperatur | ||
| Kompakt jord | Er jorden kompakt : 1 Ikke kompakt: 0 | |
| Veldraenet | Er marken veldraenet: 1 Ikke veldraenet: 0 | |
| Holde paa vand | Marken kan holde paa vand i toerre perioder 1=ja, 0=nej | |
| Nedmuldning af halm | Nedmuldning af halm seneste 3 aar: 1 Ingen nedmuldning af halm: 0 | |
| Kloevergraes | Kloevergraes i 3 aar: 1 En-aarige afgroeder 3 aar: 0 | |
| Afgraesset | Er marken afgraesset hvert aar de senest 4 aar 1=ja, 0=nej | |
| Slaet | Er der taget slaet paa graesmarkerne de sidste tre aar og ikke afgraesset 1=ja, 0=nej | |
| Afgroede | Afgroede saesonen op til proeveudtagning | |
| Afgroede (gruppe) | K: Kornafgroede uden efterafgroeder KE:Kornafgroede med efterafgroede B: Hesteboenne, aert, lupin som hovedafgroede eller i blanding med korn KL: Kloevergraes G: Groensager F: Froeproduktion (graes, spinat) O: Oliefroe (raps med og uden efterafgroede) M: Majs med eller uden efterafgroede | |
| Ploejefri dyrkning | Ploejefri dyrkning: 1 Traditionel ploejning: 0 | |
| Conservation Agriculture | Conservation Agriculture: 1 Ikke Conservation Agriculture: 0 | |
| Aar_sidste_ploejning | Aar sidste ploejning (inkl. oeko bedrifter med kloevergraes) | |
| Rt | Rt | |
| Fosfor | Fosfor (mg/100g) | |
| Kalium | Kalium (mg/100g) | |
| Magnesium | Magnesium (mg/100g) | |
| Kobber | Kobber (mg/kg) | |
| Organisk stof | Organisk stof (%) | |
| Organisk stof (factor) | Organisk stof L:lav, M:middelhoej H:hoej - ift. lerindhold “Hvad gemmer sig bag tallene” | |
| Ler | Ler (%) | |
| Kvaelstof | Kvaelstof (%) | |
| Driftsform | Organic | oekologisk bedrift: 1 Ikke oekologisk bedrift: 0 |
| Organic (years) | Antal aar siden omlaegning til oekologisk produktioni | |
| Biodynamisk | Biodynamisk bedrift: 1 | |
| Biodynamisk (years) | Antal aar siden omlaegning til biodynamisk produktioni | |
| Husdyrbrug | Husdyrbrug: 1 Uden husdyr: 0 | |
| Goedningstildeling og kalkning | Husdyrgoedning | Husdyrgoedning er anvendt det seneste aar Ja: 1, Nej: 0 |
| Handelsgoedning | Handelsgoedning er anvendt det seneste aar Ja: 1, Nej: 0 | |
| Vinasse | Vinasse er anvendt det seneste aar Ja: 1, Nej: 0 | |
| Gips | Gips er anvendt det seneste aar Ja: 1, Nej: 0 | |
| Afgasset goedning | Afgasset goedning er anvendt det seneste aar Ja: 1, Nej: 0 | |
| Jordforbedringsmidler | Anvendes der jordforbedringsmidler (kompost, praeparater mv.) | |
| Kalket | Er marken kalket de seneste 3 aar: 1 Ikke kalket de seneste 3 aar: 0 |
Table 1: Overview of metadata variables. We have allowed fro the overview to be in danish as the data was collected into danish language table. In the remaining report however we have translated the variables to the report language of English.
The key variables assessed in this report are summarized with summary statistics across the 30 samples in the below table.
| Variable | NotNA | Mean | Median | PropNA |
|---|---|---|---|---|
| year | 30 | 2024 | 2024 | 0 |
| JB_groups | 30 | |||
| … JB1_JB2 | 16 | 53% | ||
| … JB5_JB6_JB7 | 14 | 47% | ||
| JB_value | 30 | 3.5 | 2 | 0 |
| Rainfall | 0 | 1 | ||
| Average_drought_index | 0 | 1 | ||
| Average_temp. | 0 | 1 | ||
| soil_tmp | 30 | 12 | 12 | 0 |
| field_keep_water | 30 | |||
| … 0 | 9 | 30% | ||
| … 1 | 21 | 70% | ||
| Clovergrass_within_3_years | 30 | |||
| … 0 | 10 | 33% | ||
| … 1 | 20 | 67% | ||
| Grazed | 30 | |||
| … 0 | 19 | 63% | ||
| … 1 | 11 | 37% | ||
| Harvested | 30 | |||
| … 0 | 22 | 73% | ||
| … 1 | 8 | 27% | ||
| Years_since_plowing | 30 | 3.5 | 2.5 | 0 |
| Rt | 30 | 5.9 | 5.8 | 0 |
| Phosphorus | 30 | 3.4 | 3.2 | 0 |
| Potassium | 30 | 8.5 | 7.2 | 0 |
| Magnesium | 30 | 7.8 | 7.5 | 0 |
| Cobber | 30 | 2.2 | 1.7 | 0 |
| Organic_material_perc | 30 | 4.1 | 3.5 | 0 |
| Organic_material_factor | 30 | |||
| … H | 7 | 23% | ||
| … M | 23 | 77% | ||
| Clay_perc | 30 | 8 | 6.2 | 0 |
| Nitrogen_perc | 30 | 0.17 | 0.16 | 0 |
| Years_since_turning_organic | 30 | 17 | 11 | 0 |
| Biodynamic_farm | 30 | |||
| … 0 | 21 | 70% | ||
| … 1 | 9 | 30% | ||
| Years_since_turning_biodynamic | 30 | 4.2 | 0 | 0 |
| Livestock_manure | 30 | |||
| … 0 | 11 | 37% | ||
| … 1 | 19 | 63% | ||
| Degassed.fertilizer | 30 | |||
| … 0 | 25 | 83% | ||
| … 1 | 5 | 17% | ||
| Crop_detail | 30 | |||
| … groenkorn vinterrug foraarssaaet med kl graes udlaeg | 1 | 3% | ||
| … Groenkorn Vinterrug m. kl. graes udlaeg | 1 | 3% | ||
| … groentsager | 1 | 3% | ||
| … Helsaed Vaarbyg/aert med undersaaet froegraes | 1 | 3% | ||
| … kl. graes | 3 | 10% | ||
| … kl. graes afgraesning | 5 | 17% | ||
| … Kl. graes afgraesning | 1 | 3% | ||
| … kl. graes afgraesning (varig?) | 1 | 3% | ||
| … Kl. graes slaet | 3 | 10% | ||
| … kl. graes slaet/afpudsning | 1 | 3% | ||
| … Kl.graes afgraesning og slaet | 1 | 3% | ||
| … kl.graes slaet | 2 | 7% | ||
| … kl.graes slaet supleret med afgraesning | 1 | 3% | ||
| … soedkirsebaer siden 2015 med kl.graes imellem | 1 | 3% | ||
| … Vaarbyg med kl. graes efterafgroede | 1 | 3% | ||
| … Vaarbyg med udlaeg af kl.graes | 1 | 3% | ||
| … vedv. kl.graes afgraesning | 1 | 3% | ||
| … vinterhvede | 1 | 3% | ||
| … Vinterraps | 1 | 3% | ||
| … Vinterrug hybrid | 1 | 3% | ||
| … Vinterspelt | 1 | 3% | ||
| Crop_category | 30 | |||
| … G | 1 | 3% | ||
| … K | 3 | 10% | ||
| … KE | 5 | 17% | ||
| … KL | 20 | 67% | ||
| … O | 1 | 3% |
Table 2: Summary statistics of the key variables selected for evaluation in relation to the fields microbiome profiles in 2024.
This section evaluate the geographic location of the fields included in the project. We use the collected coordinates to show the location of the fields on a map, and color the data points by each of the four biodiversity measures to evaluate if there are regional patterns.
Part-conclusion for section
Figure 1: Visualization of the geographic location of the samples. Using the coordinates of each field in the project, we show the locations on a map and color the samples by Shannon biodiversity for the fungi community.
Figure 2: Visualization of the geographic location of the samples. Using the coordinates of each field in the project, we show the locations on a map and color the samples by observed biodiversity for the fungi community.
Figure 3: Visualization of the geographic location of the samples. Using the coordinates of each field in the project, we show the locations on a map and color the samples by Shannon biodiversity for the bacterial community.
Figure 4: Visualization of the geographic location of the samples. Using the coordinates of each field in the project, we show the locations on a map and color the samples by Shannon biodiversity for the bacterial community.
We first show illustrations of the association between diversity and each variable using box plots for grouping variables and dot plots with a linear regression line for the continuous variables. These plots allow us to evaluate any association patterns and review findings from the following statistical analyses.
)
Figure 7: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 8: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 9: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 10: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 11: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 12: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 13: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 14: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 15: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 16: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 17: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 18: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 19: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 20: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 21: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 22: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 23: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 24: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 25: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 26: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 27: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 28: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 29: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 30: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 31: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 32: Illustration of the alpha diversity levels across the levels or values of the environmental variable.
)
Figure 39: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 40: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 41: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 42: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 43: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 44: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 45: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 46: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 47: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 48: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 49: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 50: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 51: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 52: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
)
Figure 53: Illustration of the fungal diversity levels across the levels or values of the environmental variable.
)
Figure 54: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.
An analysis of Variance Model (ANOVA) was used to evaluate if the mean diversity differed significantly between the levels of grouping variables, and a robust linear regression was used to assess the relationship for continuous variables.
| Observed | Shannon | |||||||
|---|---|---|---|---|---|---|---|---|
| Variable | Df | Sum.Sq | F.value | P | Df | Sum.Sq | F.value | P |
| field_keep_water | 1 | 1062036.668 | 6.794 | 2.07e-02 | 1 | 0.149 | 3.391 | 8.68e-02 |
| Grazed | 1 | 343610.592 | 1.655 | 2.19e-01 | 1 | 0 | 0.007 | 9.35e-01 |
| Harvested | 1 | 1368.037 | 0.006 | 9.40e-01 | 1 | 0.032 | 0.62 | 4.44e-01 |
| Clovergrass_within_3_years | 1 | 193614.556 | 0.887 | 3.62e-01 | 1 | 0.043 | 0.825 | 3.79e-01 |
| Organic_material_factor | 1 | 46066.021 | 0.201 | 6.61e-01 | 1 | 0 | 0 | 9.97e-01 |
| Biodynamic_farm | 1 | 588397.756 | 3.094 | 1.00e-01 | 1 | 0.219 | 5.606 | 3.28e-02 |
| Livestock_manure | 1 | 430569.334 | 2.138 | 1.66e-01 | 1 | 0.081 | 1.656 | 2.19e-01 |
| Degassed.fertilizer | 1 | 142062.228 | 0.64 | 4.37e-01 | 1 | 0.109 | 2.312 | 1.51e-01 |
| Crop_detail | 12 | 2149131.437 | 0.488 | 8.39e-01 | 12 | 0.41 | 0.287 | 9.50e-01 |
| Crop_category | 2 | 235659.006 | 0.508 | 6.13e-01 | 2 | 0.062 | 0.576 | 5.76e-01 |
Table 3: Results from ANOVA analysis across JB1 and JB2 fields. The table shows results from ANOVA analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).
| Observed | Shannon | |||||||
|---|---|---|---|---|---|---|---|---|
| Variable | Df | Sum.Sq | F.value | P | Df | Sum.Sq | F.value | P |
| field_keep_water | 1 | 133521.44 | 0.762 | 4.00e-01 | 1 | 0.008 | 0.186 | 6.74e-01 |
| Grazed | 1 | 67680.857 | 0.374 | 5.52e-01 | 1 | 0.012 | 0.286 | 6.02e-01 |
| Harvested | 1 | 2976.19 | 0.016 | 9.01e-01 | 1 | 0 | 0.002 | 9.64e-01 |
| Clovergrass_within_3_years | 1 | 160864.268 | 0.93 | 3.54e-01 | 1 | 0.006 | 0.14 | 7.15e-01 |
| Organic_material_factor | 1 | 601.145 | 0.003 | 9.56e-01 | 1 | 0.009 | 0.225 | 6.44e-01 |
| Biodynamic_farm | 1 | 278393.207 | 1.705 | 2.16e-01 | 1 | 0.115 | 3.485 | 8.66e-02 |
| Livestock_manure | 1 | 16918.007 | 0.091 | 7.68e-01 | 1 | 0.013 | 0.317 | 5.84e-01 |
| Degassed.fertilizer | NA | NA | NA | NA | NA | NA | NA | NA |
| Crop_detail | 11 | 1854678.357 | 0.881 | 6.44e-01 | 11 | 0.354 | 0.416 | 8.64e-01 |
| Crop_category | 4 | 350213.468 | 0.418 | 7.92e-01 | 4 | 0.021 | 0.095 | 9.82e-01 |
Table 4: Results from ANOVA analysis across JB5, JB6 and JB7 fields. The table shows results from ANOVA analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).
| Observed | Shannon | |||||||
|---|---|---|---|---|---|---|---|---|
| Variable | Estimate | SE | t.value | P | Estimate | SE | t.value | P |
| JB_value | 64.344 | 447.485 | 0.144 | 8.88e-01 | 0.066 | 0.292 | 0.227 | 8.24e-01 |
| Years_since_plowing | -12.454 | 23.929 | -0.52 | 6.11e-01 | -0.011 | 0.01 | -1.094 | 2.92e-01 |
| Rt | 966.359 | 10315.276 | 0.094 | 9.27e-01 | 0.474 | 0.185 | 2.555 | 2.29e-02 |
| Phosphorus | -35.849 | 116.354 | -0.308 | 7.63e-01 | -0.008 | 0.045 | -0.189 | 8.53e-01 |
| Potassium | 45.95 | 33.349 | 1.378 | 1.90e-01 | 0.009 | 0.014 | 0.673 | 5.12e-01 |
| Magnesium | 53.281 | 65.244 | 0.817 | 4.28e-01 | 0.017 | 0.026 | 0.663 | 5.18e-01 |
| Cobber | -176.45 | 148.67 | -1.187 | 2.55e-01 | -0.078 | 0.216 | -0.363 | 7.22e-01 |
| Organic_material_perc | -1.082 | 57.521 | -0.019 | 9.85e-01 | 0.007 | 0.019 | 0.359 | 7.25e-01 |
| Clay_perc | 94.469 | 202.538 | 0.466 | 6.48e-01 | 0.022 | 0.057 | 0.384 | 7.07e-01 |
| Nitrogen_perc | 762.062 | 2992.194 | 0.255 | 8.03e-01 | 0.354 | 1.038 | 0.341 | 7.38e-01 |
| Years_since_turning_organic | -5.378 | 18.881 | -0.285 | 7.80e-01 | -0.001 | 0.007 | -0.168 | 8.69e-01 |
| Years_since_turning_biodynamic | -19.234 | 23.165 | -0.83 | 4.20e-01 | -0.015 | 0.004 | -3.663 | 2.56e-03 |
Table 5: Results from robust linear regression analysis across JB1 and JB2 fields. The table shows results from the robust regression analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).
| Observed | Shannon | |||||||
|---|---|---|---|---|---|---|---|---|
| Variable | Estimate | SE | t.value | P | Estimate | SE | t.value | P |
| JB_value | NA | NA | NA | NA | NA | NA | NA | NA |
| Years_since_plowing | 10.408 | 37.199 | 0.28 | 7.84e-01 | -0.002 | 0.018 | -0.139 | 8.91e-01 |
| Rt | 926.427 | 344.691 | 2.688 | 1.98e-02 | 0.396 | 0.189 | 2.097 | 5.79e-02 |
| Phosphorus | 246.931 | 103.089 | 2.395 | 3.38e-02 | -0.022 | 0.439 | -0.05 | 9.61e-01 |
| Potassium | -10.381 | 24.858 | -0.418 | 6.84e-01 | 0.001 | 0.013 | 0.039 | 9.69e-01 |
| Magnesium | 50.146 | 39.349 | 1.274 | 2.27e-01 | 0.008 | 0.021 | 0.366 | 7.20e-01 |
| Cobber | -376.296 | 178.831 | -2.104 | 5.71e-02 | -0.225 | 0.058 | -3.907 | 2.08e-03 |
| Organic_material_perc | 209.468 | 170.552 | 1.228 | 2.43e-01 | 0.055 | 0.084 | 0.659 | 5.23e-01 |
| Clay_perc | -4.368 | 78.282 | -0.056 | 9.56e-01 | -0.012 | 0.017 | -0.718 | 4.87e-01 |
| Nitrogen_perc | 2993.932 | 3763.152 | 0.796 | 4.42e-01 | 0.393 | 1.601 | 0.246 | 8.10e-01 |
| Years_since_turning_organic | -30.594 | 18.445 | -1.659 | 1.23e-01 | -0.009 | 0.012 | -0.745 | 4.71e-01 |
| Years_since_turning_biodynamic | -36.284 | 41.779 | -0.868 | 4.02e-01 | -0.027 | 0.017 | -1.648 | 1.25e-01 |
Table 6: Results from robust linear regression analysis across JB5, JB6 and JB7 fields. The table shows results from the robust regression analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).
| Observed | Shannon | |||||||
|---|---|---|---|---|---|---|---|---|
| Variable | Df | Sum.Sq | F.value | P | Df | Sum.Sq | F.value | P |
| field_keep_water | 1 | 728.62 | 0.117 | 7.38e-01 | 1 | 0.004 | 0.057 | 8.14e-01 |
| Grazed | 1 | 256.392 | 0.041 | 8.43e-01 | 1 | 0.148 | 2.376 | 1.46e-01 |
| Harvested | 1 | 25071.704 | 5.574 | 3.33e-02 | 1 | 0.004 | 0.059 | 8.12e-01 |
| Clovergrass_within_3_years | 1 | 8637.556 | 1.523 | 2.37e-01 | 1 | 0.284 | 5.433 | 3.52e-02 |
| Organic_material_factor | 1 | 13035.021 | 2.433 | 1.41e-01 | 1 | 0.015 | 0.205 | 6.58e-01 |
| Biodynamic_farm | 1 | 9563.41 | 1.706 | 2.13e-01 | 1 | 0.013 | 0.174 | 6.83e-01 |
| Livestock_manure | 1 | 1853.858 | 0.301 | 5.92e-01 | 1 | 0.038 | 0.536 | 4.76e-01 |
| Degassed.fertilizer | 1 | 330.137 | 0.053 | 8.22e-01 | 1 | 0.055 | 0.806 | 3.85e-01 |
| Crop_detail | 12 | 72587.937 | 1.174 | 5.08e-01 | 12 | 0.896 | 1.848 | 3.37e-01 |
| Crop_category | 2 | 25054.006 | 2.586 | 1.13e-01 | 2 | 0.291 | 2.598 | 1.12e-01 |
Table 7: Results from ANOVA analysis across JB1 and JB2 fields. The table shows results from ANOVA analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).
| Observed | Shannon | |||||||
|---|---|---|---|---|---|---|---|---|
| Variable | Df | Sum.Sq | F.value | P | Df | Sum.Sq | F.value | P |
| field_keep_water | 1 | 4906.714 | 0.462 | 5.10e-01 | 1 | 0.018 | 0.346 | 5.67e-01 |
| Grazed | 1 | 12636.006 | 1.266 | 2.83e-01 | 1 | 0.001 | 0.019 | 8.93e-01 |
| Harvested | 1 | 9966.964 | 0.977 | 3.43e-01 | 1 | 0.104 | 2.337 | 1.52e-01 |
| Clovergrass_within_3_years | 1 | 7160.914 | 0.686 | 4.24e-01 | 1 | 0.038 | 0.761 | 4.00e-01 |
| Organic_material_factor | 1 | 11892.502 | 1.184 | 2.98e-01 | 1 | 0.042 | 0.838 | 3.78e-01 |
| Biodynamic_farm | 1 | 10047.114 | 0.985 | 3.41e-01 | 1 | 0 | 0.005 | 9.44e-01 |
| Livestock_manure | 1 | 12034.314 | 1.199 | 2.95e-01 | 1 | 0.007 | 0.131 | 7.23e-01 |
| Degassed.fertilizer | NA | NA | NA | NA | NA | NA | NA | NA |
| Crop_detail | 11 | 131075.048 | 17.412 | 5.55e-02 | 11 | 0.612 | 4.17 | 2.09e-01 |
| Crop_category | 4 | 71489.214 | 2.639 | 1.04e-01 | 4 | 0.296 | 1.941 | 1.88e-01 |
Table 8: Results from ANOVA analysis across JB5, JB6 and JB7 fields. The table shows results from ANOVA analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).
| Observed | Shannon | |||||||
|---|---|---|---|---|---|---|---|---|
| Variable | Estimate | SE | t.value | P | Estimate | SE | t.value | P |
| JB_value | NA | NA | NA | NA | NA | NA | NA | NA |
| Years_since_plowing | 1.757 | 3.356 | 0.524 | 6.09e-01 | 0.023 | 0.011 | 2.167 | 4.79e-02 |
| Rt | 93.271 | 92.363 | 1.01 | 3.30e-01 | 0.114 | 0.917 | 0.125 | 9.03e-01 |
| Phosphorus | 0.633 | 15.611 | 0.041 | 9.68e-01 | 0.013 | 0.063 | 0.205 | 8.41e-01 |
| Potassium | -0.505 | 5.313 | -0.095 | 9.26e-01 | -0.003 | 0.037 | -0.088 | 9.31e-01 |
| Magnesium | 48.021 | 4.52 | 10.625 | 4.39e-08 | 0.023 | 0.022 | 1.084 | 2.97e-01 |
| Cobber | 4.353 | 15.975 | 0.272 | 7.89e-01 | 0.131 | 0.076 | 1.716 | 1.08e-01 |
| Organic_material_perc | 7.52 | 15.871 | 0.474 | 6.43e-01 | 0.004 | 0.02 | 0.181 | 8.59e-01 |
| Clay_perc | 14.668 | 38.558 | 0.38 | 7.09e-01 | -0.044 | 0.12 | -0.363 | 7.22e-01 |
| Nitrogen_perc | 495.038 | 270.182 | 1.832 | 8.83e-02 | 0.788 | 1.006 | 0.783 | 4.46e-01 |
| Years_since_turning_organic | -3.135 | 25.221 | -0.124 | 9.03e-01 | 0.003 | 0.007 | 0.416 | 6.84e-01 |
| Years_since_turning_biodynamic | -2.759 | 2.113 | -1.306 | 2.13e-01 | -0.011 | 0.016 | -0.695 | 4.98e-01 |
Table 9: Results from robust linear regression analysis across JB1 and JB2 fields. The table shows results from the robust regression analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).
| Observed | Shannon | |||||||
|---|---|---|---|---|---|---|---|---|
| Variable | Estimate | SE | t.value | P | Estimate | SE | t.value | P |
| Years_since_plowing | 1.803 | 5.86 | 0.308 | 7.64e-01 | 0.004 | 0.044 | 0.099 | 9.23e-01 |
| Rt | 18.553 | 88.88 | 0.209 | 8.38e-01 | 0.153 | 0.236 | 0.649 | 5.29e-01 |
| Phosphorus | 4.484 | 14.65 | 0.306 | 7.65e-01 | 0.141 | 0.047 | 2.994 | 1.12e-02 |
| Potassium | 7.165 | 8.29 | 0.864 | 4.04e-01 | -0.027 | 0.021 | -1.294 | 2.20e-01 |
| Magnesium | 3.795 | 10.768 | 0.352 | 7.31e-01 | 0.005 | 0.083 | 0.065 | 9.49e-01 |
| Cobber | -7.238 | 20.533 | -0.352 | 7.31e-01 | 0.019 | 0.12 | 0.155 | 8.80e-01 |
| Organic_material_perc | -71.036 | 37.675 | -1.885 | 8.38e-02 | -0.1 | 0.293 | -0.341 | 7.39e-01 |
| Clay_perc | -1.32 | 7.374 | -0.179 | 8.61e-01 | -0.011 | 0.022 | -0.5 | 6.26e-01 |
| Nitrogen_perc | -971.972 | 922.677 | -1.053 | 3.13e-01 | -1.009 | 5.364 | -0.188 | 8.54e-01 |
| Years_since_turning_organic | -0.441 | 1.681 | -0.262 | 7.97e-01 | 0.002 | 0.041 | 0.058 | 9.55e-01 |
| Years_since_turning_biodynamic | 2.033 | 6.871 | 0.296 | 7.72e-01 | 0 | 0.021 | -0.001 | 9.99e-01 |
Table 10: Results from robust linear regression analysis across JB5, JB6 and JB7 fields. The table shows results from the robust regression analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).
The rapid advancement of machine learning technologies has provided new tools for understanding complex agricultural systems. One such method, the Random Forest algorithm, is particularly effective in identifying key variables that influence continuous outcome variables. In this section, we apply the Random Forest technique to identify the variables most important for shaping soil diversity.
Random Forest is an ensemble learning method that constructs multiple decision trees during training and merges their outputs to improve accuracy and control overfitting. Each tree in the forest considers a random subset of the variables, allowing the algorithm to identify which factors consistently play a crucial role in determining the outcome. This capability makes Random Forest especially useful in scenarios with a large number of potential influencing factors and complex, non-linear relationships, such as those found in agricultural ecosystems.
By utilizing this method, we can pinpoint the most significant factors that affect soil microbiome diversity.
We perform three RF tests for each diversity measure to access the imporatance of the predictors.
1) Using main variables on climate and location.
Higher the value of mean decrease accuracy or mean decrease gini score , higher the importance of the variable in the model.
%IncMSE (Percent Increase in Mean Squared Error): Indicates how much the model’s prediction error increases when the values of a variable are randomly permuted. Higher values mean the variable is more important for accurate predictions.
IncNodePurity (Increase in Node Purity): Measures how much a variable contributes to reducing impurity (typically via the Gini index for classification or variance for regression) when it is used to split nodes in the forest. Higher values suggest more informative splits and thus greater importance in tree construction.
Prediction of diversity
Now we look if the variables can predict the value of the outcome measure, calculated as a % of variance explained by the tested variables.
##
## Call:
## randomForest(formula = Shannon ~ ., data = meta_asv_count_sub, ntree = 500, importance = TRUE)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 1
##
## Mean of squared residuals: 0.08099892
## % Var explained: -34.61
2) Using all variables (excluding 2 with many missing values).
Higher the value of mean decrease accuracy or mean decrease gini score , higher the importance of the variable in the model.
%IncMSE (Percent Increase in Mean Squared Error): Indicates how much the model’s prediction error increases when the values of a variable are randomly permuted. Higher values mean the variable is more important for accurate predictions.
IncNodePurity (Increase in Node Purity): Measures how much a variable contributes to reducing impurity (typically via the Gini index for classification or variance for regression) when it is used to split nodes in the forest. Higher values suggest more informative splits and thus greater importance in tree construction.
Prediction of diversity
Now we look if the variables can predict the value of the outcome measure, calculated as a % of variance explained by the tested variables.
##
## Call:
## randomForest(formula = Shannon ~ ., data = meta_asv_count_sub_clean, ntree = 500, importance = TRUE)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 7
##
## Mean of squared residuals: 0.06744636
## % Var explained: -12.09
We perform three RF tests for each diversity measure to access the importance of the predictors.
1) Using main variables of interest.
The higher the value of mean decrease accuracy or mean decrease gini score, the higher the importance of the variable in the model.
%IncMSE (Percent Increase in Mean Squared Error): Indicates how much the model’s prediction error increases when the values of a variable are randomly permuted. Higher values mean the variable is more important for accurate predictions.
IncNodePurity (Increase in Node Purity): Measures how much a variable contributes to reducing impurity (typically via the Gini index for classification or variance for regression) when it is used to split nodes in the forest. Higher values suggest more informative splits and thus greater importance in tree construction.
Prediction of diversity
Now we look if the variables can predict the value of the outcome measure, calculated as a % of variance explained by the tested variables.
##
## Call:
## randomForest(formula = Shannon ~ ., data = meta_asv_count_sub, ntree = 500, importance = TRUE)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 1
##
## Mean of squared residuals: 0.03872921
## % Var explained: 11.06
2) Using all variables (excluding 2 with many missing values).
Higher the value of mean decrease accuracy or mean decrease gini score , higher the importance of the variable in the model.
%IncMSE (Percent Increase in Mean Squared Error): Indicates how much the model’s prediction error increases when the values of a variable are randomly permuted. Higher values mean the variable is more important for accurate predictions.
IncNodePurity (Increase in Node Purity): Measures how much a variable contributes to reducing impurity (typically via the Gini index for classification or variance for regression) when it is used to split nodes in the forest. Higher values suggest more informative splits and thus greater importance in tree construction.
Prediction of diversity
Now we look if the variables can predict the value of the outcome measure, calculated as a % of variance explained by the tested variables.
##
## Call:
## randomForest(formula = Shannon ~ ., data = meta_asv_count_sub_clean, ntree = 500, importance = TRUE)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 7
##
## Mean of squared residuals: 0.03301169
## % Var explained: 24.19
Table 11: List of used software including the used R-programming environment packages.
| Package | Version | Package | Version |
|---|---|---|---|
| OS | Ubuntu 20.04.4 LTS | colorspace | 2.1-0 |
| R | 4.3.3 | jpeg | 0.1-10 |
| countrycode | 1.6.0 | utf8 | 1.2.4 |
| splines | 4.3.3 | generics | 0.1.3 |
| bitops | 1.0-7 | robustbase | 0.99-3 |
| lifecycle | 1.0.4 | class | 7.3-22 |
| rstatix | 0.7.2 | S4Arrays | 1.2.1 |
| sf | 1.0-16 | pkgconfig | 2.0.3 |
| MASS | 7.3-60.0.1 | gtable | 0.3.5 |
| insight | 0.20.2 | hwriter | 1.3.2.1 |
| backports | 1.5.0 | pcaPP | 2.0-4 |
| magrittr | 2.0.3 | htmltools | 0.5.8.1 |
| sass | 0.4.9 | carData | 3.0-5 |
| rmarkdown | 2.27 | biomformat | 1.30.0 |
| jquerylib | 0.1.4 | png | 0.1-8 |
| yaml | 2.3.9 | rstudioapi | 0.16.0 |
| zip | 2.3.1 | tzdb | 0.4.0 |
| cowplot | 1.1.3 | reshape2 | 1.4.4 |
| DBI | 1.2.3 | coda | 0.19-4.1 |
| minqa | 1.2.7 | nlme | 3.1-165 |
| ade4 | 1.7-22 | curl | 5.2.1 |
| multcomp | 1.4-26 | nloptr | 2.1.1 |
| abind | 1.4-5 | proxy | 0.4-27 |
| zlibbioc | 1.48.2 | cachem | 1.1.0 |
| Rtsne | 0.17 | zoo | 1.8-12 |
| RCurl | 1.98-1.16 | rhdf5 | 2.46.1 |
| TH.data | 1.1-2 | sjlabelled | 1.2.0 |
| sandwich | 3.1-0 | KernSmooth | 2.23-24 |
| GenomeInfoDbData | 1.2.11 | parallel | 4.3.3 |
| units | 0.8-5 | s2 | 1.1.7 |
| svglite | 2.1.3 | pillar | 1.9.0 |
| codetools | 0.2-20 | vctrs | 0.6.5 |
| DelayedArray | 0.28.0 | ggpubr | 0.6.0 |
| xml2 | 1.3.6 | car | 3.1-2 |
| tidyselect | 1.2.1 | xtable | 1.8-4 |
| farver | 2.1.2 | cluster | 2.1.6 |
| geojsonsf | 2.0.3 | evaluate | 0.24.0 |
| multtest | 2.58.0 | mvtnorm | 1.2-5 |
| e1071 | 1.7-14 | cli | 3.6.3 |
| survival | 3.7-0 | compiler | 4.3.3 |
| iterators | 1.0.14 | rlang | 1.1.4 |
| systemfonts | 1.1.0 | crayon | 1.5.3 |
| foreach | 1.5.2 | ggsignif | 0.6.4 |
| tools | 4.3.3 | rrcov | 1.7-5 |
| ragg | 1.3.2 | labeling | 0.4.3 |
| glue | 1.8.0 | interp | 1.1-6 |
| mnormt | 2.1.1 | classInt | 0.4-10 |
| SparseArray | 1.2.4 | plyr | 1.8.9 |
| xfun | 0.46 | stringi | 1.8.4 |
| mgcv | 1.9-1 | viridisLite | 0.4.2 |
| withr | 3.0.0 | deldir | 2.0-4 |
| fastmap | 1.2.0 | munsell | 0.5.1 |
| latticeExtra | 0.6-30 | V8 | 4.4.2 |
| boot | 1.3-30 | hms | 1.1.3 |
| rhdf5filters | 1.14.1 | Rhdf5lib | 1.24.2 |
| fansi | 1.0.6 | highr | 0.11 |
| digest | 0.6.36 | broom | 1.0.6 |
| timechange | 0.3.0 | igraph | 2.0.3 |
| R6 | 2.5.1 | RcppParallel | 5.1.8 |
| estimability | 1.5.1 | bslib | 0.7.0 |
| textshaping | 0.4.0 | DEoptimR | 1.1-3 |
| wk | 0.9.2 | ape | 5.8 |