We're going to be talkimg about interfaces, not technology, with the exception of sensors. So a gesture is any physicla movement that can be responded to without the presence of a traditional interface device. While the technology has been around since the 1970s, it's only lately that the technology has started to take off, but we've been training a significant portion of the population to interact with these devices for the last twenty years.
There are really two types of interactive gestures, touch screens (either single or multi-touch) or a free-form gesture based interface (like whistling to find your car keys). But the first question is whether you should even have a gesture based interfaces. It's not good for heavy data input, it relies on the visual and the physical form and is inappropriate for some contexts.
However, if you're going to have them, the first part of having them is sensors. They're the secret sauce. The type of sensors you have totally dictates what interfaces are available. The most common sensors available are: pressure, light, proximity, acoustic, tilt, motion and orientation.
The attributes of gestures: presence, duration, pressure, width, height, depth, orientation, number of touch points and the sequence of gestures.
You also have to think about the limits of the human body. The more complicated the gesture, the fewer people who will be able to perform it. The typical size of the fingers (width 16-20mm, tips 8-10mm, pads 10-14mm) are fundamental in designing gesture based interfaces. Your fingers just aren't as accurate as the cursors we're used to designing for...
Touch targets are the technical term for when an event happens. We need to pay attention to Fitt's law (the time it takes to get to a target = distance t0 target /size if target). You generally need touch targets around the 10mm size, interesting the iPhone keyboard has targets half this size.
However it uses tricks to work around this, iceberg tips (where the touch target is larger than the visual element representing it. Then there is adaptive targets, where you predict the next interface element your predicting the user will touch.
For the most part people aren't dragging their fingers across screens, so to have a persistent cursor generally doesn't make sense. So, since there is no persistent cursor, hovers and mouseovers don't make a lot of sense. Multi-select is limited by the number of fingers, and right clicks, drop down menus, double clicks and cut and paste are all are just plain difficult.
So how do you figure out what is the appropriate gesture? Firstly, determine what sensors you have available, then the task that needs to be performed, then you have to consider the physiology of the human body. This can actually be a fairly straightforward set of questions, and the complexity of the gesture should match the complexity of the task at hand.
The best designs are those that "dissolve into behaviour" (Natoto Fukasawa), the behaviour is unconscious in what you want to do, this is the promise of interactive gestures in general. The best designs match the behviour of the system to the gestures humans might already do to enable that behviour.
...and we're done.