As I see it, there are two main parts with this.
1) Understanding of programs.
To some degree "processes", and basics about "windows" including window controls like buttons, etc.
2) Understanding of a of a script language and engine.
By engine meaning a lot of script languages in them self might not have faculties to do anything with external windows and processes.
MacroMonkey does, as well as AutoIt, ACTOOL, etc.
Depending on what you want to do, you might even get by with just a record and play back kind of macro program.
This is where you just record mouse clicks and key presses, etc., just a sequence that you repeat over and over.
But then this kind of setup is pretty "dumb". It doesn't have feed back or anything to know where it's at.
And because of this, these tend to run slower.
I can see how you can manipulate a Firefox window, and onwards more sophisticated if it was a browser game running on it or something.
Using MacroMonkey:
First you'd want to find your browser window using a facility here:
http://www.macromonkey.com/windows.html"win.Find()"
Then to read the status of something on the screen, you could use "win.GetPixel()" to read pixels off the screen.
Then to click on things you can use the "user input" library:
http://www.macromonkey.com/userinput.htmlLike "input.MouseClick()"
To use MM you will need to learn at least some basic
Lua scripting.
See here for an introduction and links to info about Lua:
http://www.macromonkey.com/introduction.html ("Scripting")