Every once in a while, I need to step outside the command line. Sometimes I'm even forced to interact with deaf graphical programs, those that do not listen to standard input or a meager HTTP port. In those desperate times, were it not for tools such as XAUT or xdotool, I would have to type and click outside of VIM, like cavemen probably did.
Those two little programs are enough to make me happy when confronted with an X11 server. However, my computer, of a more whimsical nature, is reluctant to execute binaries other than a Python interpreter (like any other well-meaning general-purpose device assembled during the 21st century, really). That is why I have decided to write the simplest Python library I could think of that is able to:
- Find out the position of the mouse pointer
- Move the mouse pointer around the screen
- Press and release mouse buttons
- Press and release keys in the keyboard
- Capture the screen
The xrobot library is lean, simple and Python[23]-compliant. It is just a wrapper around functions defined inside python-xlib. Since Xlib screen capture is painfully slow, the python-gtk bindings are used instead, if present. I have decided to return images as numpy arrays for my convenience; if you find that dependency unbearable, you can root it out easily from the code.
I leave you with a link to the xrobot github repository and some sample code:
import xrobot
xr = xrobot.XRobot()
xr.move(10, 10)
robot = XRobot()
x, y = robot.mouse_pos()
print('Current mouse position: x =', x, 'y =', y)
robot.move(10, 10)
robot.click(1)
robot.key('a') # Press and release 'a'
robot.key_down('comma') # Press ','
robot.key_up('comma') # Release ','
width, height = robot.screen_resolution()
print('Screen width:', width, 'Screen height:', height)
img = robot.capture_screen()
import pylab as pl
pl.imshow(img)
pl.show()