If you took a shot with the D5 and D7100 with the same ISO, shutter speed and aperture setting, the two shots should be the same providing you used the same lens and each camera's metering system was calibrated equally. It's no secret that camera manufacturers fudge the actual ISOs of cameras to falsely give cameras a better ISO result. But given that the D7100 is not a cheap camera, this probably would not be an issue, but it wouldn't surprise me since it's Nikon.
Not all f/stops are equal. If you used two different lenses, say a 50mm prime and an 18-55mm kit lens, there would likely be a very small difference due to the actual T-stop of each lens. While f/stops are general numbers referring to the size of the aperture, T-stops is the actual amount of light coming through the lens. So while a lens has an f/stop of f/2.8, it's actual T-stop could be a bit smaller, say f/3.
The size of the pixels won't change the exposure. The pixel size will determine how much signal can be stored during the exposure, and how many photons strikes it during the exposure as well. The amount of charge or signal that the pixel can store is always related to the physical size of the pixel itself. Apple calls this "deeper pixels." Say you taking a shot with a very bright segment such as a white clouds. With small pixels, it's quite easy for those super-bright areas to produce such a strong signal, that it's more than the holding capacity of the pixel. If so, then those white areas would be clipped and show up as pure white without details.
Since the larger the pixel the more photons will strike it during any given exposure, this means that larger pixels will be able to collect more light. This is massively important when you have deep shadows. Going back to the scene with the bright, white clouds, say you also have deep shadows. Small pixels would not capture as much light, and therefore those specific pixels would generate a very weak signal. The weak signal will create more noise with little to no detail depending upon just how dark the shadows are relative to the exposure. This is all how a large pixel will produce less noise and more dynamic range than smaller pixels, and why a 12MP camera like the Sony A7S ii produces far greater image quality than a 41MP Nokia phone.