We study how well we can reconstruct the two-point clustering of galaxies on linear scales, as a function of mass and luminosity, using the halo occupation distribution (HOD) in several semi-analytical models (SAMs) of galaxy formation from the Millennium Simulation. We find that the HOD with Friends-of-Friends groups can reproduce galaxy clustering better than gravitationally bound haloes. This indicates that Friends-of-Friends groups are more directly related to the clustering of these regions than the bound particles of the overdensities. In general, we find that the reconstruction works at best to ≃5 per cent accuracy: it underestimates the bias for bright galaxies. This translates to an overestimation of 50 per cent in the halo mass when we use clustering to calibrate mass. We also found a degeneracy on the mass prediction from the clustering amplitude that affects all the masses. This effect is due to the clustering dependence on the host halo substructure, an indication of assembly bias. We show that the clustering of haloes of a given mass increases with the number of subhaloes, a result that only depends on the underlying matter distribution. As the number of galaxies increases with the number of subhaloes in SAMs, this results in a low bias for the HOD reconstruction. We expect this effect to apply to other models of galaxy formation, including the real Universe, as long as the number of galaxies increases with the number of subhaloes. We have also found that the reconstructions of galaxy bias from the HOD model fail for low-mass haloes with M ≲ 3-5 × 1011 h-1 M⊙. We find that this is because galaxy clustering is more strongly affected by assembly bias for these low masses.