Customer Innovation Centre for Organic Farming, Tove Mariegaard Pedersen
Customer ID DA00204-24
Project Markens motor (2024).
Sample Type Soil
Number of samples 30 samples
Type of data Marker gene sequencing of bacteria (16S) and fungi (ITS)

Introduction to the biostatistical analysis

The Project

This report describes the biodiversity of the field bacterial and fungal microbiome in 30 samples collected across 30 organic fields in Denmark in 2024. In this report we analyse the biodiversity measures for fungi and bacteria in the soil of the fields. Analysis of single taxa and overall composition (beta-diversity) is presented in the additional R3 reports.

With the marker-gene (amplicon) sequencing data (also called metabarcoding) that was generated for two markers (ITS for fungi and 16S rRNA for bacteria) across fields, we calculate measures of biodiversity. The technical term in microbiome bioinformatic context for the measures is ‘alpha diversity’ and we calculate the two measures observed (or richness) and Shannon. The measures are introduced in Report 2.

We use the amplicon data specifically for biodiversity analyses as it has some advantage over shotgun sequencing data when it comes to diversity estimations. Amplicon data is more sensitive to less abundant taxa because of the targeted nature of the method (we use DNA-primers to fish for the DNA from the type of organism we want to detect (here fungi or bacteria)), and because we estimate the diversity based on all detected organisms independent of their presence in a reference database, we can include many more less known organisms in the measures resulting in a measures that better reflect the true diversity and not just the “diversity of well known organisms”. In the report all evaluations will be performed for both measures, for both bacteria and fungi.

For each field, one sample was collected to represent the field. These samples were taken for each field based on 16 sub-samples taken in a w-pattern throughout the field.

We split many analysis by JB values into 2 groups based on both analyses of this dataset and prior analyses of data from 2021-2022 where we saw a strong association between JB and microbiome profiles. We split the analysis based on JB groups as we find the effect of JB overshadows the associations that may be between the microbiome and other variables of interest.

The JB groups are:

A special focus for this project is the association of the microbiome with biodynamic farming, groupings with grazing versus mowing, and annual crops. The aim is to evaluate how the microbiome of the fields associate with other field parameters of both agricultural practices and soil indicators of nutrients, type and structure.

We initiate with an overview of metadata variables of the project.

Practical notes

In “Report 3”, biostatistical analyses are performed and the results presented, building on the data generated and evaluated in the 4 reports (two for fungi and 2 for bacteria data) (Report 1: Sequencing and data processing report, Report 2: Microbiome profiling report).
Through biostatistical analysis we relate the microbiome biodiversity to the key variables.

Note on preprocessing for biodiversity analysis

As part of our biodiversity analysis pipeline, we have introduced a key preprocessing step: taxa that are detected in only one sample are removed prior to the calculation of alpha diversity metrics. This filtering step helps reduce the influence of extremely rare or potentially spurious taxa, placing greater emphasis on the more consistently detected members of the microbial community. In doing so, we aim to enhance the robustness and reproducibility of the diversity measures, while maintaining a stronger focus on ecologically relevant and common taxa.

Overview of variables in project

We have included all variables in the collected metadata which have some level of variation across the fields (i.e. must not be all the same value across fields (like all organic) or with only very few deviations like 90%+ identical values).

Below are two overview tables; the first show the variables by category including a short description and the second show the summary statistics of each variable allowing us to inspect the variation and subgroups of fields that the variable represent.

Category Report_variable Description
Geografi og vejrdata GPS-koordinater GPS-koordinater
Geografisk placering Geografisk placering
Geografisk placering (gruppe) 1=Vendsyssel (V), 2=Region Nord u. Vendsyssel (NJ), 3=Region Midtjylland (MJ), 4=Region Syddanmark u. Fyn (SJ), 5=Sjaelland (S), 6=Lolland, Falster, 7=Fyn (F), 8=Bornholm (B)
Nedboer Nedboer i alt april-september paa kommuneniveau
Toerkeindeks Gn.snit toerkeindeks april-september paa kommuneniveau
Middeltemperatur Gn.snit middeltemperatur april-september paa kommuneniveau
Vurdering af marken JB Vurderet JB-nr.
Regnorme Mange regnorm: 1 Faa/ingen regnorm: 0
Kold jord Kold jord: 1 Ikke kold jord: 0
Jordtemperatur
Kompakt jord Er jorden kompakt : 1 Ikke kompakt: 0
Veldraenet Er marken veldraenet: 1 Ikke veldraenet: 0
Holde paa vand Marken kan holde paa vand i toerre perioder 1=ja, 0=nej
Nedmuldning af halm Nedmuldning af halm seneste 3 aar: 1 Ingen nedmuldning af halm: 0
Kloevergraes Kloevergraes i 3 aar: 1 En-aarige afgroeder 3 aar: 0
Afgraesset Er marken afgraesset hvert aar de senest 4 aar 1=ja, 0=nej
Slaet Er der taget slaet paa graesmarkerne de sidste tre aar og ikke afgraesset 1=ja, 0=nej
Afgroede Afgroede saesonen op til proeveudtagning
Afgroede (gruppe) K: Kornafgroede uden efterafgroeder KE:Kornafgroede med efterafgroede B: Hesteboenne, aert, lupin som hovedafgroede eller i blanding med korn KL: Kloevergraes G: Groensager F: Froeproduktion (graes, spinat) O: Oliefroe (raps med og uden efterafgroede) M: Majs med eller uden efterafgroede
Ploejefri dyrkning Ploejefri dyrkning: 1 Traditionel ploejning: 0
Conservation Agriculture Conservation Agriculture: 1 Ikke Conservation Agriculture: 0
Aar_sidste_ploejning Aar sidste ploejning (inkl. oeko bedrifter med kloevergraes)
Rt Rt
Fosfor Fosfor (mg/100g)
Kalium Kalium (mg/100g)
Magnesium Magnesium (mg/100g)
Kobber Kobber (mg/kg)
Organisk stof Organisk stof (%)
Organisk stof (factor) Organisk stof L:lav, M:middelhoej H:hoej - ift. lerindhold “Hvad gemmer sig bag tallene”
Ler Ler (%)
Kvaelstof Kvaelstof (%)
Driftsform Organic oekologisk bedrift: 1 Ikke oekologisk bedrift: 0
Organic (years) Antal aar siden omlaegning til oekologisk produktioni
Biodynamisk Biodynamisk bedrift: 1
Biodynamisk (years) Antal aar siden omlaegning til biodynamisk produktioni
Husdyrbrug Husdyrbrug: 1 Uden husdyr: 0
Goedningstildeling og kalkning Husdyrgoedning Husdyrgoedning er anvendt det seneste aar Ja: 1, Nej: 0
Handelsgoedning Handelsgoedning er anvendt det seneste aar Ja: 1, Nej: 0
Vinasse Vinasse er anvendt det seneste aar Ja: 1, Nej: 0
Gips Gips er anvendt det seneste aar Ja: 1, Nej: 0
Afgasset goedning Afgasset goedning er anvendt det seneste aar Ja: 1, Nej: 0
Jordforbedringsmidler Anvendes der jordforbedringsmidler (kompost, praeparater mv.)
Kalket Er marken kalket de seneste 3 aar: 1 Ikke kalket de seneste 3 aar: 0

Table 1: Overview of metadata variables. We have allowed fro the overview to be in danish as the data was collected into danish language table. In the remaining report however we have translated the variables to the report language of English.

The key variables assessed in this report are summarized with summary statistics across the 30 samples in the below table.

Summary Statistics
Variable NotNA Mean Median PropNA
year 30 2024 2024 0
JB_groups 30
… JB1_JB2 16 53%
… JB5_JB6_JB7 14 47%
JB_value 30 3.5 2 0
Rainfall 0 1
Average_drought_index 0 1
Average_temp. 0 1
soil_tmp 30 12 12 0
field_keep_water 30
… 0 9 30%
… 1 21 70%
Clovergrass_within_3_years 30
… 0 10 33%
… 1 20 67%
Grazed 30
… 0 19 63%
… 1 11 37%
Harvested 30
… 0 22 73%
… 1 8 27%
Years_since_plowing 30 3.5 2.5 0
Rt 30 5.9 5.8 0
Phosphorus 30 3.4 3.2 0
Potassium 30 8.5 7.2 0
Magnesium 30 7.8 7.5 0
Cobber 30 2.2 1.7 0
Organic_material_perc 30 4.1 3.5 0
Organic_material_factor 30
… H 7 23%
… M 23 77%
Clay_perc 30 8 6.2 0
Nitrogen_perc 30 0.17 0.16 0
Years_since_turning_organic 30 17 11 0
Biodynamic_farm 30
… 0 21 70%
… 1 9 30%
Years_since_turning_biodynamic 30 4.2 0 0
Livestock_manure 30
… 0 11 37%
… 1 19 63%
Degassed.fertilizer 30
… 0 25 83%
… 1 5 17%
Crop_detail 30
… groenkorn vinterrug foraarssaaet med kl graes udlaeg 1 3%
… Groenkorn Vinterrug m. kl. graes udlaeg 1 3%
… groentsager 1 3%
… Helsaed Vaarbyg/aert med undersaaet froegraes 1 3%
… kl. graes 3 10%
… kl. graes afgraesning 5 17%
… Kl. graes afgraesning 1 3%
… kl. graes afgraesning (varig?) 1 3%
… Kl. graes slaet 3 10%
… kl. graes slaet/afpudsning 1 3%
… Kl.graes afgraesning og slaet 1 3%
… kl.graes slaet 2 7%
… kl.graes slaet supleret med afgraesning 1 3%
… soedkirsebaer siden 2015 med kl.graes imellem 1 3%
… Vaarbyg med kl. graes efterafgroede 1 3%
… Vaarbyg med udlaeg af kl.graes 1 3%
… vedv. kl.graes afgraesning 1 3%
… vinterhvede 1 3%
… Vinterraps 1 3%
… Vinterrug hybrid 1 3%
… Vinterspelt 1 3%
Crop_category 30
… G 1 3%
… K 3 10%
… KE 5 17%
… KL 20 67%
… O 1 3%

Table 2: Summary statistics of the key variables selected for evaluation in relation to the fields microbiome profiles in 2024.

Geography

This section evaluate the geographic location of the fields included in the project. We use the collected coordinates to show the location of the fields on a map, and color the data points by each of the four biodiversity measures to evaluate if there are regional patterns.

Part-conclusion for section

Shannon fungi biodiversity

Figure 1: Visualization of the geographic location of the samples. Using the coordinates of each field in the project, we show the locations on a map and color the samples by Shannon biodiversity for the fungi community.

Observed fungi biodiversity

Figure 2: Visualization of the geographic location of the samples. Using the coordinates of each field in the project, we show the locations on a map and color the samples by observed biodiversity for the fungi community.

Shannon bacteria biodiversity

Figure 3: Visualization of the geographic location of the samples. Using the coordinates of each field in the project, we show the locations on a map and color the samples by Shannon biodiversity for the bacterial community.

Observed bacteria biodiversity

Figure 4: Visualization of the geographic location of the samples. Using the coordinates of each field in the project, we show the locations on a map and color the samples by Shannon biodiversity for the bacterial community.

Association of biodiversity with variables of interest

Biodiversity in relation to each metadata variable

We first show illustrations of the association between diversity and each variable using box plots for grouping variables and dot plots with a linear regression line for the continuous variables. These plots allow us to evaluate any association patterns and review findings from the following statistical analyses.

Grouping variables

JB groups

alt text here)

Figure 7: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 8: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Field keep water

alt text here)

Figure 9: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 10: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.

Grazed

alt text here)

Figure 11: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 12: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Harvested

alt text here)

Figure 13: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 14: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Clovergrass within 3 years

alt text here)

Figure 15: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 16: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Livestock manure

alt text here)

Figure 17: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 18: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Organic material (factor)

alt text here)

Figure 19: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 20: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Biodynamic farm

alt text here)

Figure 21: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 22: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Degassed fertilizer

alt text here)

Figure 23: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 24: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Crop (category)

alt text here)

Figure 25: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 26: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Continuous variables

Years since plowing

alt text here)

Figure 27: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 28: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Years since turning biodynamic

alt text here)

Figure 29: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 30: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Years since turning organic

alt text here)

Figure 31: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 32: Illustration of the alpha diversity levels across the levels or values of the environmental variable.


Rt

alt text here)

Figure 39: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 40: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Phosphorus

alt text here)

Figure 41: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 42: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Potassium

alt text here)

Figure 43: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 44: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Magnesium

alt text here)

Figure 45: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 46: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Cobber

alt text here)

Figure 47: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 48: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Organic material (perc)

alt text here)

Figure 49: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 50: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Clay (perc)

alt text here)

Figure 51: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 52: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Nitrogen (perc)

alt text here)

Figure 53: Illustration of the fungal diversity levels across the levels or values of the environmental variable.

alt text here)

Figure 54: Illustration of the bacterial diversity levels across the levels or values of the environmental variable.


Statistical assessment

An analysis of Variance Model (ANOVA) was used to evaluate if the mean diversity differed significantly between the levels of grouping variables, and a robust linear regression was used to assess the relationship for continuous variables.

Bacterial biodiversity
Bacterial diversity in JB 1-2 (group variables)
Observed Shannon
Variable Df Sum.Sq F.value P Df Sum.Sq F.value P
field_keep_water 1 1062036.668 6.794 2.07e-02 1 0.149 3.391 8.68e-02
Grazed 1 343610.592 1.655 2.19e-01 1 0 0.007 9.35e-01
Harvested 1 1368.037 0.006 9.40e-01 1 0.032 0.62 4.44e-01
Clovergrass_within_3_years 1 193614.556 0.887 3.62e-01 1 0.043 0.825 3.79e-01
Organic_material_factor 1 46066.021 0.201 6.61e-01 1 0 0 9.97e-01
Biodynamic_farm 1 588397.756 3.094 1.00e-01 1 0.219 5.606 3.28e-02
Livestock_manure 1 430569.334 2.138 1.66e-01 1 0.081 1.656 2.19e-01
Degassed.fertilizer 1 142062.228 0.64 4.37e-01 1 0.109 2.312 1.51e-01
Crop_detail 12 2149131.437 0.488 8.39e-01 12 0.41 0.287 9.50e-01
Crop_category 2 235659.006 0.508 6.13e-01 2 0.062 0.576 5.76e-01

Table 3: Results from ANOVA analysis across JB1 and JB2 fields. The table shows results from ANOVA analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).

Bacterial diversity in JB 5-7 (group variables)
Observed Shannon
Variable Df Sum.Sq F.value P Df Sum.Sq F.value P
field_keep_water 1 133521.44 0.762 4.00e-01 1 0.008 0.186 6.74e-01
Grazed 1 67680.857 0.374 5.52e-01 1 0.012 0.286 6.02e-01
Harvested 1 2976.19 0.016 9.01e-01 1 0 0.002 9.64e-01
Clovergrass_within_3_years 1 160864.268 0.93 3.54e-01 1 0.006 0.14 7.15e-01
Organic_material_factor 1 601.145 0.003 9.56e-01 1 0.009 0.225 6.44e-01
Biodynamic_farm 1 278393.207 1.705 2.16e-01 1 0.115 3.485 8.66e-02
Livestock_manure 1 16918.007 0.091 7.68e-01 1 0.013 0.317 5.84e-01
Degassed.fertilizer NA NA NA NA NA NA NA NA
Crop_detail 11 1854678.357 0.881 6.44e-01 11 0.354 0.416 8.64e-01
Crop_category 4 350213.468 0.418 7.92e-01 4 0.021 0.095 9.82e-01

Table 4: Results from ANOVA analysis across JB5, JB6 and JB7 fields. The table shows results from ANOVA analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).

Bacterial diversity in JB 1-2 (continous variables)
Observed Shannon
Variable Estimate SE t.value P Estimate SE t.value P
JB_value 64.344 447.485 0.144 8.88e-01 0.066 0.292 0.227 8.24e-01
Years_since_plowing -12.454 23.929 -0.52 6.11e-01 -0.011 0.01 -1.094 2.92e-01
Rt 966.359 10315.276 0.094 9.27e-01 0.474 0.185 2.555 2.29e-02
Phosphorus -35.849 116.354 -0.308 7.63e-01 -0.008 0.045 -0.189 8.53e-01
Potassium 45.95 33.349 1.378 1.90e-01 0.009 0.014 0.673 5.12e-01
Magnesium 53.281 65.244 0.817 4.28e-01 0.017 0.026 0.663 5.18e-01
Cobber -176.45 148.67 -1.187 2.55e-01 -0.078 0.216 -0.363 7.22e-01
Organic_material_perc -1.082 57.521 -0.019 9.85e-01 0.007 0.019 0.359 7.25e-01
Clay_perc 94.469 202.538 0.466 6.48e-01 0.022 0.057 0.384 7.07e-01
Nitrogen_perc 762.062 2992.194 0.255 8.03e-01 0.354 1.038 0.341 7.38e-01
Years_since_turning_organic -5.378 18.881 -0.285 7.80e-01 -0.001 0.007 -0.168 8.69e-01
Years_since_turning_biodynamic -19.234 23.165 -0.83 4.20e-01 -0.015 0.004 -3.663 2.56e-03

Table 5: Results from robust linear regression analysis across JB1 and JB2 fields. The table shows results from the robust regression analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).

Bacterial diversity in JB 5-7 (continous variables)
Observed Shannon
Variable Estimate SE t.value P Estimate SE t.value P
JB_value NA NA NA NA NA NA NA NA
Years_since_plowing 10.408 37.199 0.28 7.84e-01 -0.002 0.018 -0.139 8.91e-01
Rt 926.427 344.691 2.688 1.98e-02 0.396 0.189 2.097 5.79e-02
Phosphorus 246.931 103.089 2.395 3.38e-02 -0.022 0.439 -0.05 9.61e-01
Potassium -10.381 24.858 -0.418 6.84e-01 0.001 0.013 0.039 9.69e-01
Magnesium 50.146 39.349 1.274 2.27e-01 0.008 0.021 0.366 7.20e-01
Cobber -376.296 178.831 -2.104 5.71e-02 -0.225 0.058 -3.907 2.08e-03
Organic_material_perc 209.468 170.552 1.228 2.43e-01 0.055 0.084 0.659 5.23e-01
Clay_perc -4.368 78.282 -0.056 9.56e-01 -0.012 0.017 -0.718 4.87e-01
Nitrogen_perc 2993.932 3763.152 0.796 4.42e-01 0.393 1.601 0.246 8.10e-01
Years_since_turning_organic -30.594 18.445 -1.659 1.23e-01 -0.009 0.012 -0.745 4.71e-01
Years_since_turning_biodynamic -36.284 41.779 -0.868 4.02e-01 -0.027 0.017 -1.648 1.25e-01

Table 6: Results from robust linear regression analysis across JB5, JB6 and JB7 fields. The table shows results from the robust regression analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).

Fungal biodiversity
Fungal diversity in JB 1-2 (group variables)
Observed Shannon
Variable Df Sum.Sq F.value P Df Sum.Sq F.value P
field_keep_water 1 728.62 0.117 7.38e-01 1 0.004 0.057 8.14e-01
Grazed 1 256.392 0.041 8.43e-01 1 0.148 2.376 1.46e-01
Harvested 1 25071.704 5.574 3.33e-02 1 0.004 0.059 8.12e-01
Clovergrass_within_3_years 1 8637.556 1.523 2.37e-01 1 0.284 5.433 3.52e-02
Organic_material_factor 1 13035.021 2.433 1.41e-01 1 0.015 0.205 6.58e-01
Biodynamic_farm 1 9563.41 1.706 2.13e-01 1 0.013 0.174 6.83e-01
Livestock_manure 1 1853.858 0.301 5.92e-01 1 0.038 0.536 4.76e-01
Degassed.fertilizer 1 330.137 0.053 8.22e-01 1 0.055 0.806 3.85e-01
Crop_detail 12 72587.937 1.174 5.08e-01 12 0.896 1.848 3.37e-01
Crop_category 2 25054.006 2.586 1.13e-01 2 0.291 2.598 1.12e-01

Table 7: Results from ANOVA analysis across JB1 and JB2 fields. The table shows results from ANOVA analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).

Fungal diversity in JB 5-7 (group variables)
Observed Shannon
Variable Df Sum.Sq F.value P Df Sum.Sq F.value P
field_keep_water 1 4906.714 0.462 5.10e-01 1 0.018 0.346 5.67e-01
Grazed 1 12636.006 1.266 2.83e-01 1 0.001 0.019 8.93e-01
Harvested 1 9966.964 0.977 3.43e-01 1 0.104 2.337 1.52e-01
Clovergrass_within_3_years 1 7160.914 0.686 4.24e-01 1 0.038 0.761 4.00e-01
Organic_material_factor 1 11892.502 1.184 2.98e-01 1 0.042 0.838 3.78e-01
Biodynamic_farm 1 10047.114 0.985 3.41e-01 1 0 0.005 9.44e-01
Livestock_manure 1 12034.314 1.199 2.95e-01 1 0.007 0.131 7.23e-01
Degassed.fertilizer NA NA NA NA NA NA NA NA
Crop_detail 11 131075.048 17.412 5.55e-02 11 0.612 4.17 2.09e-01
Crop_category 4 71489.214 2.639 1.04e-01 4 0.296 1.941 1.88e-01

Table 8: Results from ANOVA analysis across JB5, JB6 and JB7 fields. The table shows results from ANOVA analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).

Fungal diversity in JB 1-2 (continuous variables)
Observed Shannon
Variable Estimate SE t.value P Estimate SE t.value P
JB_value NA NA NA NA NA NA NA NA
Years_since_plowing 1.757 3.356 0.524 6.09e-01 0.023 0.011 2.167 4.79e-02
Rt 93.271 92.363 1.01 3.30e-01 0.114 0.917 0.125 9.03e-01
Phosphorus 0.633 15.611 0.041 9.68e-01 0.013 0.063 0.205 8.41e-01
Potassium -0.505 5.313 -0.095 9.26e-01 -0.003 0.037 -0.088 9.31e-01
Magnesium 48.021 4.52 10.625 4.39e-08 0.023 0.022 1.084 2.97e-01
Cobber 4.353 15.975 0.272 7.89e-01 0.131 0.076 1.716 1.08e-01
Organic_material_perc 7.52 15.871 0.474 6.43e-01 0.004 0.02 0.181 8.59e-01
Clay_perc 14.668 38.558 0.38 7.09e-01 -0.044 0.12 -0.363 7.22e-01
Nitrogen_perc 495.038 270.182 1.832 8.83e-02 0.788 1.006 0.783 4.46e-01
Years_since_turning_organic -3.135 25.221 -0.124 9.03e-01 0.003 0.007 0.416 6.84e-01
Years_since_turning_biodynamic -2.759 2.113 -1.306 2.13e-01 -0.011 0.016 -0.695 4.98e-01

Table 9: Results from robust linear regression analysis across JB1 and JB2 fields. The table shows results from the robust regression analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).

Fungal diversity in JB 5-7 (continuous variables)
Observed Shannon
Variable Estimate SE t.value P Estimate SE t.value P
Years_since_plowing 1.803 5.86 0.308 7.64e-01 0.004 0.044 0.099 9.23e-01
Rt 18.553 88.88 0.209 8.38e-01 0.153 0.236 0.649 5.29e-01
Phosphorus 4.484 14.65 0.306 7.65e-01 0.141 0.047 2.994 1.12e-02
Potassium 7.165 8.29 0.864 4.04e-01 -0.027 0.021 -1.294 2.20e-01
Magnesium 3.795 10.768 0.352 7.31e-01 0.005 0.083 0.065 9.49e-01
Cobber -7.238 20.533 -0.352 7.31e-01 0.019 0.12 0.155 8.80e-01
Organic_material_perc -71.036 37.675 -1.885 8.38e-02 -0.1 0.293 -0.341 7.39e-01
Clay_perc -1.32 7.374 -0.179 8.61e-01 -0.011 0.022 -0.5 6.26e-01
Nitrogen_perc -971.972 922.677 -1.053 3.13e-01 -1.009 5.364 -0.188 8.54e-01
Years_since_turning_organic -0.441 1.681 -0.262 7.97e-01 0.002 0.041 0.058 9.55e-01
Years_since_turning_biodynamic 2.033 6.871 0.296 7.72e-01 0 0.021 -0.001 9.99e-01

Table 10: Results from robust linear regression analysis across JB5, JB6 and JB7 fields. The table shows results from the robust regression analyses including samples from all fields. The table shows the obtained statistical values for each of the environmental variables (rows) and the three microbiome features (columns).

Detecting most imporant variables usign mashine learning

The rapid advancement of machine learning technologies has provided new tools for understanding complex agricultural systems. One such method, the Random Forest algorithm, is particularly effective in identifying key variables that influence continuous outcome variables. In this section, we apply the Random Forest technique to identify the variables most important for shaping soil diversity.

Random Forest is an ensemble learning method that constructs multiple decision trees during training and merges their outputs to improve accuracy and control overfitting. Each tree in the forest considers a random subset of the variables, allowing the algorithm to identify which factors consistently play a crucial role in determining the outcome. This capability makes Random Forest especially useful in scenarios with a large number of potential influencing factors and complex, non-linear relationships, such as those found in agricultural ecosystems.

By utilizing this method, we can pinpoint the most significant factors that affect soil microbiome diversity.

Fungal Shannon diversity

We perform three RF tests for each diversity measure to access the imporatance of the predictors.

  1. Using main variables on climate and location.
  2. Using all variables (excluding any with many missing values)

1) Using main variables on climate and location.

Higher the value of mean decrease accuracy or mean decrease gini score , higher the importance of the variable in the model.

  • %IncMSE (Percent Increase in Mean Squared Error): Indicates how much the model’s prediction error increases when the values of a variable are randomly permuted. Higher values mean the variable is more important for accurate predictions.

  • IncNodePurity (Increase in Node Purity): Measures how much a variable contributes to reducing impurity (typically via the Gini index for classification or variance for regression) when it is used to split nodes in the forest. Higher values suggest more informative splits and thus greater importance in tree construction.

Prediction of diversity

Now we look if the variables can predict the value of the outcome measure, calculated as a % of variance explained by the tested variables.

## 
## Call:
##  randomForest(formula = Shannon ~ ., data = meta_asv_count_sub,      ntree = 500, importance = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 0.08099892
##                     % Var explained: -34.61

2) Using all variables (excluding 2 with many missing values).

Higher the value of mean decrease accuracy or mean decrease gini score , higher the importance of the variable in the model.

  • %IncMSE (Percent Increase in Mean Squared Error): Indicates how much the model’s prediction error increases when the values of a variable are randomly permuted. Higher values mean the variable is more important for accurate predictions.

  • IncNodePurity (Increase in Node Purity): Measures how much a variable contributes to reducing impurity (typically via the Gini index for classification or variance for regression) when it is used to split nodes in the forest. Higher values suggest more informative splits and thus greater importance in tree construction.

Prediction of diversity

Now we look if the variables can predict the value of the outcome measure, calculated as a % of variance explained by the tested variables.

## 
## Call:
##  randomForest(formula = Shannon ~ ., data = meta_asv_count_sub_clean,      ntree = 500, importance = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 7
## 
##           Mean of squared residuals: 0.06744636
##                     % Var explained: -12.09

Bacterial Shannon diversity

We perform three RF tests for each diversity measure to access the importance of the predictors.

  1. Using main variables of interest (“Biodynamic_farm”, “Years_since_turning_organic”, “JB_value”).
  2. Using all variables (excluding any with many missing values)

1) Using main variables of interest.

The higher the value of mean decrease accuracy or mean decrease gini score, the higher the importance of the variable in the model.

  • %IncMSE (Percent Increase in Mean Squared Error): Indicates how much the model’s prediction error increases when the values of a variable are randomly permuted. Higher values mean the variable is more important for accurate predictions.

  • IncNodePurity (Increase in Node Purity): Measures how much a variable contributes to reducing impurity (typically via the Gini index for classification or variance for regression) when it is used to split nodes in the forest. Higher values suggest more informative splits and thus greater importance in tree construction.

Prediction of diversity

Now we look if the variables can predict the value of the outcome measure, calculated as a % of variance explained by the tested variables.

## 
## Call:
##  randomForest(formula = Shannon ~ ., data = meta_asv_count_sub,      ntree = 500, importance = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 1
## 
##           Mean of squared residuals: 0.03872921
##                     % Var explained: 11.06

2) Using all variables (excluding 2 with many missing values).

Higher the value of mean decrease accuracy or mean decrease gini score , higher the importance of the variable in the model.

  • %IncMSE (Percent Increase in Mean Squared Error): Indicates how much the model’s prediction error increases when the values of a variable are randomly permuted. Higher values mean the variable is more important for accurate predictions.

  • IncNodePurity (Increase in Node Purity): Measures how much a variable contributes to reducing impurity (typically via the Gini index for classification or variance for regression) when it is used to split nodes in the forest. Higher values suggest more informative splits and thus greater importance in tree construction.

Prediction of diversity

Now we look if the variables can predict the value of the outcome measure, calculated as a % of variance explained by the tested variables.

## 
## Call:
##  randomForest(formula = Shannon ~ ., data = meta_asv_count_sub_clean,      ntree = 500, importance = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 7
## 
##           Mean of squared residuals: 0.03301169
##                     % Var explained: 24.19

Version information

Table 11: List of used software including the used R-programming environment packages.

Package Version Package Version
OS Ubuntu 20.04.4 LTS colorspace 2.1-0
R 4.3.3 jpeg 0.1-10
countrycode 1.6.0 utf8 1.2.4
splines 4.3.3 generics 0.1.3
bitops 1.0-7 robustbase 0.99-3
lifecycle 1.0.4 class 7.3-22
rstatix 0.7.2 S4Arrays 1.2.1
sf 1.0-16 pkgconfig 2.0.3
MASS 7.3-60.0.1 gtable 0.3.5
insight 0.20.2 hwriter 1.3.2.1
backports 1.5.0 pcaPP 2.0-4
magrittr 2.0.3 htmltools 0.5.8.1
sass 0.4.9 carData 3.0-5
rmarkdown 2.27 biomformat 1.30.0
jquerylib 0.1.4 png 0.1-8
yaml 2.3.9 rstudioapi 0.16.0
zip 2.3.1 tzdb 0.4.0
cowplot 1.1.3 reshape2 1.4.4
DBI 1.2.3 coda 0.19-4.1
minqa 1.2.7 nlme 3.1-165
ade4 1.7-22 curl 5.2.1
multcomp 1.4-26 nloptr 2.1.1
abind 1.4-5 proxy 0.4-27
zlibbioc 1.48.2 cachem 1.1.0
Rtsne 0.17 zoo 1.8-12
RCurl 1.98-1.16 rhdf5 2.46.1
TH.data 1.1-2 sjlabelled 1.2.0
sandwich 3.1-0 KernSmooth 2.23-24
GenomeInfoDbData 1.2.11 parallel 4.3.3
units 0.8-5 s2 1.1.7
svglite 2.1.3 pillar 1.9.0
codetools 0.2-20 vctrs 0.6.5
DelayedArray 0.28.0 ggpubr 0.6.0
xml2 1.3.6 car 3.1-2
tidyselect 1.2.1 xtable 1.8-4
farver 2.1.2 cluster 2.1.6
geojsonsf 2.0.3 evaluate 0.24.0
multtest 2.58.0 mvtnorm 1.2-5
e1071 1.7-14 cli 3.6.3
survival 3.7-0 compiler 4.3.3
iterators 1.0.14 rlang 1.1.4
systemfonts 1.1.0 crayon 1.5.3
foreach 1.5.2 ggsignif 0.6.4
tools 4.3.3 rrcov 1.7-5
ragg 1.3.2 labeling 0.4.3
glue 1.8.0 interp 1.1-6
mnormt 2.1.1 classInt 0.4-10
SparseArray 1.2.4 plyr 1.8.9
xfun 0.46 stringi 1.8.4
mgcv 1.9-1 viridisLite 0.4.2
withr 3.0.0 deldir 2.0-4
fastmap 1.2.0 munsell 0.5.1
latticeExtra 0.6-30 V8 4.4.2
boot 1.3-30 hms 1.1.3
rhdf5filters 1.14.1 Rhdf5lib 1.24.2
fansi 1.0.6 highr 0.11
digest 0.6.36 broom 1.0.6
timechange 0.3.0 igraph 2.0.3
R6 2.5.1 RcppParallel 5.1.8
estimability 1.5.1 bslib 0.7.0
textshaping 0.4.0 DEoptimR 1.1-3
wk 0.9.2 ape 5.8