First person shooters are about (1) moving, and (2) looking.
Currently, moving is very meaningful in almost every context: you dodge bullets, you dodge rockets, enemies react to your new position, games like Thief 3 and Mirror's Edge attempt a "body awareness" system to contextualize movement, etc.
But looking? Comparatively, not meaningful at all.
Stare at someone in real-life for a few seconds and they'll get uneasy. Stare at someone's cleavage in a game? They stare back or they don't care. People stare at each other in-game all the time. It's nothing special. Few NPCs will ask you why you're staring at their daughter like that.
So, how do you make the "act of looking" into something meaningful? Well, you can make whatever you look at -- you can make it explode.
And thus, the gun...
But as the first person shooter genre progressed through the years, the on-screen gun or "view model" changed and took on a number of secondary functions.
For one thing, it's no longer centered in the middle of the screen (you couldn't see the side detail of the view model / weapon animations were too ambiguous) at the risk of starting a wonderfully pointless debate between right-handed people and the filthy filthy left-handed cesspool of gamer communities.
The gun functions partly as an extension of your player character; while you might think you're a highly trained CIA ninja, you're actually a box with a vector as far as the game engine is concerned.
(for more on this, see "The Refrigerator Box" at Gausswerks)
Since Doom 2 (and ignoring Quake 3), most FPS games have done away with "Your Face" embedded in the HUD at the bottom of your screen. Before, that was how you knew who you were -- the theory was that if you couldn't see your face, how could you get a sense of your player character? But now we don't use the face. The face is kind of a crass technique, these days, with our anti-HUD tendencies.
Now, I argue, we've combined the face and the gun.
In No One Lives Forever, when you skillfully take out your pocket-mirror decoder scanner gadget, it reinforces the fact that you are, indeed, a (fashionable) spy. Look at her color coordinated gloves with the fur trim! Then the apparent skill with which Cate Archer deftly wields a submachine gun reinforces that, yes, she can hold her own. The view model is part of your character; the different weapons reveal facets of her personality.
Who wears white gloves? You do. Because you're badass chic spy, Cate Archer.
In this sense, the view model also grounds your place in the world.
- It's winter in Russia, so it's cold. Your view model wears a fur coat and gloves. That's diegetic grounding. (If this were still the Doom era, "your face" would be chattering his teeth)
- When you stand in sunlight, your view model is lit brightly; when you stand in shadow, your view model is dark. The view model, in nearly every modern FPS, serves a coarse but intuitive form of the Thief series' "light gem" HUD element. (How else would you know that you're in shadow? You can't see your legs!)
It's to help you ignore the fact that you control like a refrigerator box.