Text-based Linux and Unix systems are easy to manipulate. The way the Unix I/O system works you can always fake keyboard input to another program and intercept its output. The whole system is made to work that way. Graphical X11 programs are another matter, though. Is there a way to control X11 programs like you control text programs? The answer to that question depends on exactly what you want to do, but the general answer is yes.
As usual for Linux and Unix, though, there are many ways to get to that answer. If you really want fine-grained control over programs, some programs offer control via a special mechanism known as D-Bus. This allows programs to expose data and methods that other programs can use. In a perfect world your target program will use D-Bus but that is now always the case. So today we’ll look more for control of arbitrary programs.
There are several programs that can control X windows in some way or another. There’s a tool called xdo that you don’t hear much about. More common is xdotool and I’ll show you an example of that. Also, wmctrl can perform some similar functions. There’s also autokey which is a subset of the popular Windows program AutoHotKey.
About xdotool
The xdotool is probably the most useful of the commands when you need to take over GUI programs. It is sort of a Swiss Army knife of X manipulation. However, the command line syntax is a bit difficult and that’s likely because the tool can do lots of different things. Most of the time I interested in its ability to move and resize windows. But it can also send fake keyboard and mouse input, and it can bind actions to things like mouse motion and window events.
Although you can make the tool read from a file, you most often see the arguments right on the command line. The idea is to find a window and then apply things to it. You can find windows by name or use other means such as letting the user click on the desired window.
For example, consider this:
echo Pick Window; xdotool selectwindow type "Hackaday"
If you enter this at a shell prompt, you can click on a window and see the given string appear as if it were typed there by the user. The tools is also capable of sending mouse events and performing a multitude of window operations like changing window focus, changing which desktop is shown, etc.
By the way, some of xdotool’s features require the XTest extension to your X server. I’ve always found this turned on, but if things aren’t working, you’d want to check your X server log to see if that extension is loaded.
What About wmctrl?
The wmctrl program has a lot of similar functions but mostly interacts with your window manager. The only problem is, it uses a standard interface to your window manager and not all window managers support all features. This is one of those things that makes distributing programs for Linux so exciting. No two systems are alike and some aren’t even close!
The wmctrl program shines when you want to do things like switch desktops, maximize windows, and related tasks. However, it can do many of the tasks that xdotool can do, as well.
Using a Big Monitor
I recently switched out my three-monitor setup for a very large 4K monitor. The 43-inch behemoth has a resolution of 3840×2160. That’s great, but I did miss being able to put one program on one monitor and a second (or third) program on another monitor.
The answer was to get the windows to slide into certain positions on the screen. You could use a tiled window manager, but I use KDE (which no longer has a tiling option). It will snap windows to certain positions if you drag them to just the right spot, but that’s not very fast. Also, the snap areas were not all where I wanted them.
My original thought was just to use xdotool and map some keys using KDE’s shortcuts. Control+Alt+1 could snap the current window to the top left of the screen and Control+Alt+0 could maximize. Control+Alt+6 would eat up the right half of the screen and Control+Alt+8 would take up the top half.
My first attempt at creating the Control+Alt+1 shortcut looked like this:
xdotool getwindowfocus windowmove 0 0 windowsize 1920 1080
The idea is to find the current window, move it to 0,0 and then make it take up a quarter of the screen. Sure, the hardcoded numbers aren’t great, but it works for a single machine set up. You can set the size to 50% 50%, if you prefer. That makes sense for this one, but for the other macros where the position isn’t 0,0 you have to use a hardcoded number anyway.
The first problem was that — in some cases — the moving wasn’t working every time. Reversing the size and the move took care of that.
However, there was still a problem. If you maximize a window, the window size and position values don’t do anything. Here’s where wmctrl can help:
wmctrl -r :ACTIVE: -b remove,maximized_horz,maximized_vert ; xdotool getwindowfocus windowsize 1920 1080 windowmove 0 0
This removes the maximized property from the active window and then applies the xdotool command. However, if you are going to use wmctrl, you might as well say:
wmctrl -r :ACTIVE: -b remove,maximized_horz,maximized_vert -e 0,0,0,1920,1080
The -e option moves the window. The first zero isn’t a typo. It sets the “gravity” of the window and is usually zero. The next four numbers are the corner coordinate and the size. However, you will notice that while xdotool moves the top left corner of the window, wmctrl moves the top left corner of the interior of the window (that is, not including the window decorations). So the result is slightly different.
Of course, you could write a simple bash script to manage all this and work out the math so if your screen size changes you don’t have to change each macro. You could even adjust to make the windows not overlap or do other special effects. For example:
#!/bin/bash # Change to suit or read them from xrandr #SCREENX=3840 #SCREENY=2160 # If you don't have xrandr, awk, or yours doesn't put the right # format out, just hardcode up top SCREENX=`xrandr -q | awk -F'[ ,]+' '/current/ { print $8 }'` SCREENY=`xrandr -q | awk -F'[ ,]+' '/current/ { print $10 }'` # Could adjust the actual locations here if you wanted HALFX=$(( SCREENX/2 )) HALFY=$(( SCREENY/2 )) if [ $# -ne 1 ] then ARG="?" else ARG="$1" fi case "$ARG" in nw) TOP=0 LEFT=0 W=$HALFX H=$HALFY ;; n) TOP=0 LEFT=0 W=$SCREENX H=$HALFY ;; ne) TOP=0 LEFT=$HALFX W=$HALFX H=$HALFY ;; w) TOP=0 LEFT=0 W=$HALFX H=$SCREENY ;; center) TOP=$(( $SCREENY/4 )) LEFT=$(( $SCREENX/4 )) W=$HALFX H=$HALFY ;; e) TOP=0 LEFT=$HALFX W=$HALFX H=$SCREENY ;; sw) TOP=$HALFY LEFT=0 W=$HALFX H=$HALFY ;; s) TOP=$HALFY LEFT=0 W=$SCREENX H=$HALFY ;; se) TOP=$HALFY LEFT=$HALFX W=$HALFX H=$HALFY ;; *) echo "Usage: winpos (nw, n, ne, w, center, e, sw, s, se)" exit 1 ;; esac # do it # wmctrl -r :ACTIVE:-b remove,maximized_horz,maximized_vert -e 0,$LEFT,$TOP,$W,$H # or here's another way (note, this will show title bars without adjustment # the above method will cut them off at the top part of screen wmctrl -r :ACTIVE: -b remove,maximized_horz,maximized_vert xdotool getwindowfocus windowsize $W $H windowmove $LEFT $TOP exit 0
With this script, your keyboard macros can just call the script with a tag like “ne” (Northeast) or “center” to control the window position. Any changes are easy to manage in the script instead of spread over multiple macros.
Summary
There’s a line in one of the Star Trek movies (the real ones, with William Shatner) where Kirk tells someone that you have to learn how things work on a starship. Linux is much the same. There’s not much you can’t do if you can only figure out how and wade through the myriad tools that might be what you want. Sometimes it takes a combination of tools and dealing with the infinite variety of configurations is tough, but you can usually make things happen if you try.
I’m sure Wayland will have all these features right? I mean the developers have made it so clear that they intend to keep all that cool functionality that brings some of us to choose Linux/Unix in the first place and not just turn it into yet another boring Windows clone. They are only eliminating stuff that nobody at all is using because… well.. I guess games will run better after that. Right?!?!
There is no Wayland. Well there is but it is not a program to run. It is a protocol and a library which implements it. Most development/design efforts went into securing Wayland. The result is the key difference between X and Wayland, in case of the latter, display server and window manager are integrated in a single process.
When everybody and their brother is writing their own competing Wayland compositors I will concede that point. I probably won’t care either because I’ll probably be running Windows or maybe even OSX as everything better about Linux/Unix will finally be gone.. Until then Wayland = Weston.
If you actually understood open source you’d realise why there is a world where both X and Wayland is important. You see features, I see gaping security nightmares. The only thing worse than Wayland, is having ONLY X. The real benefit is to move away from the monoculture that has developed around a 25 year old patch work of ugly code.
And so we got firejail (or similar jail method) and Xpra. Bonus: firejail protects at the filesystem and memory levels too. No compromised app can drop a nasty thing in your shell rc files. Or read you ssh keys. While Xpra can take care of clipboard attacks.
And proof that there is nothing in the X11 protocol that forces the system to give total access. We could get a secure X11 like we already are getting in terms of dropping root caps if there was the will to find and solve.
But Linux has been in NIH mode for the past 10 years, kicking out features instead of finishing and fixing what already exists. CADT as jwz (of Netscape fame) said.
And so we got firejail (or similar jail method) and Xpra. Bonus: firejail protects at the filesystem and memory levels too. No compromised app can drop a nasty thing in your shell rc files. Or read you ssh keys. While Xpra can take care of clipboard attacks.
And proof that there is nothing in the X11 protocol that forces the system to give total access. We could get a secure X11 like we already are getting in terms of dropping root caps if there was the will to find and solve.
But Linux has been in NIH mode for the past 10 years, kicking out features instead of finishing and fixing what already exists. CADT as jwz (of Netscape fame) said.
(Try two)
+1 firejail
“The only thing worse than Wayland, is having ONLY X.”
Great! Someone advocating for more fragmentation. As someone who actually uses Linux as a desktop I may have given up the hope that everyone else will too some day but I have not given up hope that at least EVERY application I want or need to run will one day run on Linux. I just can’t wait for the day when the application I want to run only runs on Wayland but I still prefer X because Wayland is a jail and X gives me features and choices.
Close. “You have to learn why things work on a starship.” is the actual quote.
Autohotkey can do some similar things in a Windows environment.
devilspie is something one could use for controlling windows.
you can do the same thing in windows without any additional software using the .Net Wscript.Shell
You can even do it through powershell by setting up a com-object.
https://github.com/nathanaelries/Powershell-Caffeine/blob/master/Caffeine.ps1
Yep, that’s why the top 500 supercomputers in the world run Linux and NONE of them run anything Microsoft and have not done for many years now. I did think about posting all the things that you can do with Linux that you cannot do with Windows but I fear it would hit the post limit for a comment. Windows is closed source where as Linux is open source and is extensible so guess what, people are extending what it can do without having to wait for Microsoft to release an upgrade or for a 3rd party to sell you a piece of software to make Windows do something new.
And those who would find the comments above coulnd’t care less of what other things linux can do which windows can’t.
The point is that those who read the article and thought “neat! Now how could I do that in my windows env?” will find it quite useful. In contrast to another linux vs windows debate unrelated to the article. No one cares… stop it.
X11 was great for it’s time but now it should be considered an insecure windowing system that uses hacked in hardware acceleration. Wayland is better but not great, so there currently a really good windowing system on Linux. :(
I’m curious: what do you mean by “Wayland is better but not great”? What do you think it should do better?
Everything is the answer to that. If X may sound great from a feature perspective, but the incredible feature set is built from 25 years of hacks. X under the hood is ugly. It is insecure. Its hacked together design allows all sorts of security issues e.g. one application to take over the window of another. That may not sound serious until you try to e.g. lock the screen, or prevent a phishing window from pretending to be your login screen.
Wayland is an alternative that was designed from the ground up as a modern system. Is it better feature wise? No. Is it better as a fundamentally simple window manager suited to a fast secure desktop environment? Hell yeah.
Secure, sure. Less code running with elevated privileges? Absolutely. Desktop environment? I guess, but I don’t like GNOME.
But fast? Don’t make me laugh. Ever since people started porting things to GTK3 all I started noticing was an extra second of black screen after the window was mapped and before the program started putting data in its own buffer, on top of a general argument that I am Using My Computer Wrong.
Performance in terms of what I see is definitely worse than using Xfree86 with its now-removed XAA.
Remote display.
Yes, supposedly it does that. But it doesn’t. The claim that it does is only because some coder managed to insert it as a preliminary hack that has never been properly documented and nobody who isn’t fully versed in low-level display server programming has any hope of reproducing.
Then there’s the Wayland supporters that say remote support should be re-implemented by every application. The genius behind X was that every application can be remote-displayed regardless of if the author intended it or not. Nobody gets to tell the user what they can and cannot do. Also the procedure to remote display one application under X is exactly the same as the procedure for another one.
Using Wayland on a desktop is like putting a cell-phone carrier in charge of administering it. Somebody else is telling you what you can and cannot do with your own device. It is a movement in the wrong direction.
There are so called tiling window managers. Awesome is my favorite.
Yes, there are! I’m a StumpWM man myself.
I love xdotool! Every semester that I’m forced to license an ebook, I use xdotool in a bash script to turn the page while my packet sniffer captures the page images as they load. About an hour for 500 pages and some carefully tweaked compression gets me a somewhat reasonably sized PDF. I was thinking of making a graphical front end so I don’t have to tweak the script every time.
If you’ve got a pile of images, it might make more sense to store it in a pile-of-images format like cbz instead of wrapping it in a pdf. Just a thought.
Now that’s a nice hack since pay view online ebooks are a massive scam.
Well… today in GMT timezone… we’ve been seeing hacks all day.
If you want to take “Hackaday” literately, then the staff around here could register:
HackadaySpecificallyForTheNeedsOfThatSBRKtroll.com
With exactly one hack within exactly 24 hours of each blog post.
These tutorials are here to provoke thoughts and ideas in order to encourage mods, bodges, hacks, makeshifts, जुगाड़, etc…
It seems to work occasionally by the looks of things.
More of these tagged as tutorial is very welcome… Especially if someone is trying to configure an embedded system and requires tips from the likes of these to complete their project… Could be for another take on those “Energy monitoring boards” from some major supermarkets.
Think of the tutorial as more of an investment to educate people. This article may have just inspired tomorrow’s hack.
I use FVWM… with that I am able to set up a menu that can tile windows in the configurations I need:
With that, I hit the logo key (Meta, Command, “Windows”… whatever you call it). If the cursor is in a window, I have the ability to re-size it to occupy the full screen, ½ screen or ¼ screen, and the ability to switch to a different window, switch desktops or pages, move windows between desktops/pages, close windows or launch new windows.
Tweaking it a little, it is possible to make that same environment work on a touchscreen. FVWM might be ancient, but so far I’ve found it a very flexible window manager, and it’s an environment I keep coming back to having tried Gnome, KDE, XFCE, CDE, WindowMaker, CTWM, AfterStep and numerous other environments.
You should try a tiling window manager like dwm, i3, awesome, etc.
Yep, tried awesome… and it wasn’t.
Openbox can do the same thing, in basically the same way. Works well.
Go figure, There are at least three of us, then :-)
Seriously. After TWM it was FVWM for me. (the old one). Then Gnome. And since the tale of boiling frogs is obviously false, I, at some point run screaming from it when the water became too hot. After some experimenting (Awesome, Xfce) it was back to… FVWM.
Never looked back.
Choice. That is why Linux is better than Windows. And being free helps too.
Wrath of Khan, still the best Star Trek movie… Kirk speaking to Saavik as they are about to transmit a command to lower the Reliant’s shields. Talk about your gaping security holes.
Security hole? As it was their own ship, sounds like the cloud authenticated single sign-on works as intended and much better in the future than ours does now :P
ho hum. Nothing here. Back to hacking code.
Don’t ruin HackADay.
Attempts to ruin the ‘net continue.
Let SOMEthing, be and shout from
the rooftop (antennii,) FREE!
Let something stick in the craw of capitalism
and bespeak of the modern Colt 45,
PeacemMaker. Failing that, Equalizer.
Don’t ruin HackADay.
Attempts to ruin the ‘net continue.
Let SOMEthing, be and shout from
the rooftop (antennii,) FREE!
Let something stick in the craw of capitalism
and bespeak of the modern Colt 45,
PeaceMaker. Failing that, Equalizer.
I am old from SillyComV, but new here.
You have reinspired the last 50-60 years
of my subtle hacking.
Carry-On, as Frank Lloyd Wright, when
he gifted us with the cinderblock.
Your chatterbot needs some work, man.
At least it wasn’t Microsoft’s Tay AI Bot…
Otherwise for being human and not metal, we’d already been off to the Skynet “Human Holiday camps” to of been given Skynet certified “Showers” before we had realized too late what happened!
“I recently switched out my three-monitor setup for a very large 4K monitor.”
Why not a 3-monitor setup with 3 very large 4K monitors? (c:
And here was I, worrying about Al’s horrible indentation.
I really tried to report this typo privately, but couldn’t figure out how.
“In a perfect world your target program will use D-Bus but that is now always the case. ”
…should be
“In a perfect world your target program will use D-Bus but that is not always the case. ”
perfect worlds have perfect grammar ;-).
… But a great article regardless. Thanks.
If you want people to use your script, don’t forget to add a free license, otherwise your script is (c) you.
Unless you’re presenting something in tutorial form, then it is essentially public domain….
That is unless your local laws enforce a default of copyright on everything including sub-contents of a tutorial, thus essentially rendering said tutorial completely useless as learning from said tutorial before putting it into practice would infringe on copyright in said barbaric highly restricting country.