Selenium anatomy
Last updated
Was this helpful?
Last updated
Was this helpful?
INFO. Selenium version 3.141.59 is used for class diagrams INFO. ChromeDriver is used in all explanations and examples.
Below is a simple selenium test, which opens a web page and performs a login operation. And after ChromeDriver is installed on local machine the test will succeed (installation steps are not the scope of this article).
So the goal of the article is to consider each component of the test separately and understand how components are communicating to each other.
Lets start with a general picture
WebDriver is a HTTP compliant protocol, which specification can be found here
ChromeDriver is an implementation of WebDriver protocol
ChromeDriver communicates with a browser
through the DevTools remote debugging interface, which is a WebSockets interface described here.
The Chrome DevTools Protocol allows for tools to instrument, inspect, debug and profile Chromium, Chrome and other Blink-based browsers.
Selenium clients communicate with ChromeDriver by sending HTTP requests
In this section I want to concentrate on implementation details of Selenium libraries (java). Consider class hierarchies, relations between the most important parts and entities, and create a sense of which classes reside in which jars.
The usual sequence of steps for UI test is following:
Start e.g. ChromeDriver from code
Establish a new session with a ChromeDriver
Send WebDriver commands within the session
Close the session with a WebDriver
Below is an overview of class relations of important parts of Selenium lib:
Where
org.openqa.selenium.remote.service.DriverService
"Manages the life and death of a native executable driver server."
org.openqa.selenium.remote.HttpCommandExecutor
is responsible for performing HTTP calls to WebDriver
org.openqa.selenium.remote.RemoteWebDriver
provides API to use in tests
It is useful to specify class hierarchies for above mentioned high level classes in details:
Capabilities are a key-value properties which describe which features a user requests for the session. More details can be found here
Drivers (Chrome, Firefox) are different from each other by different implementations of
DriverCommandExecutor
DriverService
Capabilities
As it was said above DriverService "manages the life and death of a native executable driver server."
List of ChromeWebDriver commands: https://chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/client/command_executor.py
HttpCommandExecutor
as an implementation of CommandExecutor
takes care of all HTTP requests to the WebDriver.
Below is a sequence diagram which shows communication between classes which are involved in new session creation. Basically this flow happens when following code is executed in the test:
WebDriver driver = new ChromeDriver();
Below is a sequence diagram which illustrates general flow for calling WebDriver commands, all code below follows this pattern.
Particularly driver.get("https://www.saucedemo.com");
is shown on sequence diagram (where "https://www.saucedemo.com" is represented as just a "URL")
So, once again, ChromeDriver is an implementation of WebDriver protocol. And WebDriver protocol specification can be found here. Some code pointers for ChromeDriver implementation:
The ChromeDriver sources are in the Chromium tree, and can be checked out by following these instructions
Most of the code is under the src/chrome/test/chormedriver directory
So ChromeDriver is basically a HttpServer that responds to HTTP requests. The main function is in chromedriver_server.cc
To follow the code for each WebDriver command => start at http_handler.cc, which contains a mapping from each WebDriver command to the C++ function that implements it.
e.g. lets track how url opening happens
The mapping between WebDriver HTTP path and further call for DevTools protocol can be found below (and a link)
WebDriver HTTP path for this command is taken from protocol specification link
DevTools command is taken from protocol specification (link)
WebDriver will log following as a result of navigation to the url:
As a small demo lets work with a ChromeDriver but without Selenium libraries, only performing HTTP calls against driver directly. The demo will emulate java code:
First start chromedriver binaries manually (this step is done in Selenium by org.openqa.selenium.remote.service.DriverCommandExecutor
)
It is necessary to specify --whitelisted-ips=
parameter due to the recent changes (link)
The next step is to create a session (this step is done in Selenium by org.openqa.selenium.remote.ProtocolHandshake#createSession()
)
Response is
Where the most important part is a session id aca0212be4a400ee65935221a6ea5e3f (which is random of course)
As a next step we need to open a https://www.saucedemo.com
url within the session
So the pattern of WebDriver url for this command is http://localhost:9515/session/:sessionId/url
(link)
So in current case, within a particular session id the command will look like
So lets next find an element by id user-name
and insert a value standard_user
to this field
The pattern of WebDriver url for finding element is http://localhost:9515/session/:sessionId/element
(link)
Within our session the HTTP call will be
The important result here is an element id 0.44000226938784204-1
And now having the element id we can send a value to the element.
The url pattern is http://localhost:9515/session/:sessionID/element/:elementID/value
(link)
So as a result of the demo the value standard_user
was set to the login field. We have just walked through the equivalent of java steps:
The remaining component in Selenium communication chain is a DevTools Chrome protocol which is used by the WebDriver to communicate with a certain browser (in our case ChromeDriver communicates with Chrome browser). As a demo lets repeat our steps from previous demos, which can be expressed as a java code
but in current case we will directly communicate with a browser (we will emulate WebDriver by ourselves).
Start Chrome on macOS with opened remote debugging port
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --no-first-run --no-default-browser-check --user-data-dir=$(mktemp -d -t 'chrome-remote_data_dir')
as a result a browser session is started with opened port for debugging
Now we need a client which will send commands to the DevTool protocol.
In Selenium case this client is a ChromeDriver
For our experiment this can be a client with a front-end for convenience
so open another chrome window and open a url http://localhost:9222
it means that we have connected to the remote debugging port of another browser
The client gives a list of inspectable pages
After choosing an inspectable page client fetches HTML, JavaScript and CSS files over HTTP
from that page
Once loaded, Developer Tools establishes a Web Socket connection to its host and starts exchanging JSON messages with it.
(Optional) In order to monitor communication over the DevTools protocol:
enable DevTools experiments link
click the ⋮ menu icon in the top-right of the DevTools, and select Settings
Select Experiments on the left of settings
Turn on "Protocol Monitor", then close and reopen DevTools
Now click the ⋮ menu icon again, choose More Tools and then select Protocol monitor.
It is possible to also issue commands
Open dev tools on dev tools. How to
in our case it means we need to open a DevTools for the browser, which is connected to the remotely debugged Chrome
Then within the inner DevTools window call different commands in console, e.g.
So lets call necessary commands for our demo After we have connected to http:localhost:9222
to inspectable page, we can call a command to open a url
Couple of examples of other commands
In this article we went through each component which compose Selenium based UI tests environment. We have looked precisely into java implementation details of Selenium library part, experimented directly with ChromeDriver and DevTools remote debugger interface of Chrome. I hope that after reading this article you have got a solid understanding of this ecosystem and can prudently make conclusions while writing/debugging Selenium tests.