Selenium anatomy

Simple selenium test example

INFO. Selenium version 3.141.59 is used for class diagrams INFO. ChromeDriver is used in all explanations and examples.

Below is a simple selenium test, which opens a web page and performs a login operation. And after ChromeDriver is installed on local machine the test will succeed (installation steps are not the scope of this article).

WebDriver driver = new ChromeDriver();

driver.get("https://www.saucedemo.com");
driver.findElement(By.id("user-name")).sendKeys("standard_user");
driver.findElement(By.id("password")).sendKeys("secret_sauce");
driver.findElement(By.cssSelector(".btn_action")).submit();

driver.close();

So the goal of the article is to consider each component of the test separately and understand how components are communicating to each other.

General overview of components involved in the test

Lets start with a general picture

general picture
  • WebDriver is a HTTP compliant protocol, which specification can be found here

  • ChromeDriver is an implementation of WebDriver protocol

  • ChromeDriver communicates with a browser

    through the DevTools remote debugging interface, which is a WebSockets interface described here.

    • The Chrome DevTools Protocol allows for tools to instrument, inspect, debug and profile Chromium, Chrome and other Blink-based browsers.

  • Selenium clients communicate with ChromeDriver by sending HTTP requests

Selenium client detailed notes

In this section I want to concentrate on implementation details of Selenium libraries (java). Consider class hierarchies, relations between the most important parts and entities, and create a sense of which classes reside in which jars.

The usual sequence of steps for UI test is following:

  1. Start e.g. ChromeDriver from code

  2. Establish a new session with a ChromeDriver

  3. Send WebDriver commands within the session

  4. Close the session with a WebDriver

Below is an overview of class relations of important parts of Selenium lib:

class overview

Where org.openqa.selenium.remote.service.DriverService "Manages the life and death of a native executable driver server." org.openqa.selenium.remote.HttpCommandExecutor is responsible for performing HTTP calls to WebDriver org.openqa.selenium.remote.RemoteWebDriver provides API to use in tests

It is useful to specify class hierarchies for above mentioned high level classes in details:

Desired capabilities

capabilities

Capabilities are a key-value properties which describe which features a user requests for the session. More details can be found here

Driver classes

drivers

Drivers (Chrome, Firefox) are different from each other by different implementations of

  • DriverCommandExecutor

  • DriverService

  • Capabilities

DriverService classes

DriverService classes

As it was said above DriverService "manages the life and death of a native executable driver server."

CommandExecutor classes

CommandExecutor classes

List of ChromeWebDriver commands: https://chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/client/command_executor.py HttpCommandExecutor as an implementation of CommandExecutor takes care of all HTTP requests to the WebDriver.

CommandCodec classes

CommandCodec classes

Sequence diagram of creating a new session

Below is a sequence diagram which shows communication between classes which are involved in new session creation. Basically this flow happens when following code is executed in the test: WebDriver driver = new ChromeDriver();

Sequence diagram of calling a regular WebDriver method

Below is a sequence diagram which illustrates general flow for calling WebDriver commands, all code below follows this pattern.

driver.get("https://www.saucedemo.com");
driver.findElement(By.id("user-name")).sendKeys("standard_user");
driver.findElement(By.id("password")).sendKeys("secret_sauce");
driver.findElement(By.cssSelector(".btn_action")).submit();

Particularly driver.get("https://www.saucedemo.com"); is shown on sequence diagram (where "https://www.saucedemo.com" is represented as just a "URL")

ChromeDriver detailed notes

So, once again, ChromeDriver is an implementation of WebDriver protocol. And WebDriver protocol specification can be found here. Some code pointers for ChromeDriver implementation:

  • The ChromeDriver sources are in the Chromium tree, and can be checked out by following these instructions

  • Most of the code is under the src/chrome/test/chormedriver directory

So ChromeDriver is basically a HttpServer that responds to HTTP requests. The main function is in chromedriver_server.cc

  • the port for the server is hardcoded and is 9151 link, but it can be overridden by the parameter

  • To follow the code for each WebDriver command => start at http_handler.cc, which contains a mapping from each WebDriver command to the C++ function that implements it.

    • e.g. lets track how url opening happens

      • The mapping between WebDriver HTTP path and further call for DevTools protocol can be found below (and a link)

        CommandMapping(
               kPost, "session/:sessionId/url",
               WrapToCommand("Navigate", base::BindRepeating(&ExecuteGet))),
      • WebDriver HTTP path for this command is taken from protocol specification link

      • DevTools command is taken from protocol specification (link)

      • WebDriver will log following as a result of navigation to the url:

        ...
          [1574177057.990][INFO]: [1d778fc2fcf507ea962ab77b87d347b9] COMMAND Navigate {
           "url": "https://www.saucedemo.com"
          }
        ...
          [1574177057.994][DEBUG]: DevTools WebSocket Command: Page.navigate (id=13) 525924E9AAE045751E097D69F1E6C1E1 {
            "url": "https://www.saucedemo.com"
          }
          [1574177058.227][DEBUG]: DevTools WebSocket Response: Page.navigate (id=13) 525924E9AAE045751E097D69F1E6C1E1 {
            "frameId": "525924E9AAE045751E097D69F1E6C1E1",
            "loaderId": "5978A67DB58AE2738DCD871588846FAE"
          } 
        ...

Demo with ChromeDriver

As a small demo lets work with a ChromeDriver but without Selenium libraries, only performing HTTP calls against driver directly. The demo will emulate java code:

WebDriver driver = new ChromeDriver();
driver.get("https://www.saucedemo.com");
driver.findElement(By.id("user-name")).sendKeys("standard_user");
  • First start chromedriver binaries manually (this step is done in Selenium by org.openqa.selenium.remote.service.DriverCommandExecutor)

    ➜  ~ chromedriver --whitelisted-ips=
    Starting ChromeDriver 78.0.3904.70 (edb9c9f3de0247fd912a77b7f6cae7447f6d3ad5-refs/branch-heads/3904@{#800}) on port 9515

    It is necessary to specify --whitelisted-ips= parameter due to the recent changes (link)

  • The next step is to create a session (this step is done in Selenium by org.openqa.selenium.remote.ProtocolHandshake#createSession())

    curl -X POST \
    http://127.0.0.1:9515/session \
    -H 'content-type: application/json' \
    -d '{
      "desiredCapabilities": {
          "caps": {
              "nativeEvents": false,
              "browserName": "chrome",
              "version": "",
              "platform": "ANY"
          }
      }
    }'

Response is

{"sessionId":"aca0212be4a400ee65935221a6ea5e3f","status":0,"value":{"acceptInsecureCerts":false,"acceptSslCerts":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"browserName":"chrome","chrome":{"chromedriverVersion":"78.0.3904.70 (edb9c9f3de0247fd912a77b7f6cae7447f6d3ad5-refs/branch-heads/3904@{#800})","userDataDir":"/var/folders/32/vl36v3_n0_z80shcn1wcdynw0000gp/T/.com.google.Chrome.xVvLRQ"},"cssSelectorsEnabled":true,"databaseEnabled":false,"goog:chromeOptions":{"debuggerAddress":"localhost:64933"},"handlesAlerts":true,"hasTouchScreen":false,"javascriptEnabled":true,"locationContextEnabled":true,"mobileEmulationEnabled":false,"nativeEvents":true,"networkConnectionEnabled":false,"pageLoadStrategy":"normal","platform":"Mac OS X","proxy":{},"rotatable":false,"setWindowRect":true,"strictFileInteractability":false,"takesHeapSnapshot":true,"takesScreenshot":true,"timeouts":{"implicit":0,"pageLoad":300000,"script":30000},"unexpectedAlertBehaviour":"ignore","version":"78.0.3904.108","webStorageEnabled":true}}

Where the most important part is a session id aca0212be4a400ee65935221a6ea5e3f (which is random of course)

  • As a next step we need to open a https://www.saucedemo.com url within the session

    So the pattern of WebDriver url for this command is http://localhost:9515/session/:sessionId/url (link)

    So in current case, within a particular session id the command will look like

    curl -X POST \
    http://localhost:9515/session/aca0212be4a400ee65935221a6ea5e3f/url \
    -H 'content-type: application/json' \
    -d '{"url":"https://www.saucedemo.com"}
    '
  • So lets next find an element by id user-name and insert a value standard_user to this field

    The pattern of WebDriver url for finding element is http://localhost:9515/session/:sessionId/element (link)

    Within our session the HTTP call will be

    curl -X POST \
    http://localhost:9515/session/aca0212be4a400ee65935221a6ea5e3f/element \
    -H 'content-type: application/json' \
    -d '{"using":"id","value":"user-name"}'

    The important result here is an element id 0.44000226938784204-1

    {
      "sessionId": "aca0212be4a400ee65935221a6ea5e3f",
      "status": 0,
      "value": {
          "ELEMENT": "0.44000226938784204-1"
      }
    }

    And now having the element id we can send a value to the element.

    The url pattern is http://localhost:9515/session/:sessionID/element/:elementID/value (link)

    curl -X POST \
    http://localhost:9515/session/aca0212be4a400ee65935221a6ea5e3f/element/0.44000226938784204-1/value \
    -H 'content-type: application/json' \
    -d '{"value":["standard_user"]}'

    So as a result of the demo the value standard_user was set to the login field. We have just walked through the equivalent of java steps:

    WebDriver driver = new ChromeDriver();
    driver.get("https://www.saucedemo.com");
    driver.findElement(By.id("user-name")).sendKeys("standard_user");

Demo with Chrome DevTools protocol

The remaining component in Selenium communication chain is a DevTools Chrome protocol which is used by the WebDriver to communicate with a certain browser (in our case ChromeDriver communicates with Chrome browser). As a demo lets repeat our steps from previous demos, which can be expressed as a java code

WebDriver driver = new ChromeDriver();
driver.get("https://www.saucedemo.com");

but in current case we will directly communicate with a browser (we will emulate WebDriver by ourselves).

  • Start Chrome on macOS with opened remote debugging port

    /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --no-first-run --no-default-browser-check --user-data-dir=$(mktemp -d -t 'chrome-remote_data_dir')

    • as a result a browser session is started with opened port for debugging

  • Now we need a client which will send commands to the DevTool protocol.

    • In Selenium case this client is a ChromeDriver

    • For our experiment this can be a client with a front-end for convenience

      • so open another chrome window and open a url http://localhost:9222

      • it means that we have connected to the remote debugging port of another browser

  • The client gives a list of inspectable pages

  • After choosing an inspectable page client fetches HTML, JavaScript and CSS files over HTTP

    from that page

  • Once loaded, Developer Tools establishes a Web Socket connection to its host and starts exchanging JSON messages with it.

  • (Optional) In order to monitor communication over the DevTools protocol:

    • enable DevTools experiments link

    • click the ⋮ menu icon in the top-right of the DevTools, and select Settings

    • Select Experiments on the left of settings

    • Turn on "Protocol Monitor", then close and reopen DevTools

    • Now click the ⋮ menu icon again, choose More Tools and then select Protocol monitor.

  • It is possible to also issue commands

    • Open dev tools on dev tools. How to

      • in our case it means we need to open a DevTools for the browser, which is connected to the remotely debugged Chrome

    • Then within the inner DevTools window call different commands in console, e.g.

  • So lets call necessary commands for our demo After we have connected to http:localhost:9222 to inspectable page, we can call a command to open a url

    await Main.sendOverProtocol('Page.navigate', {url: "https://www.saucedemo.com"});

Couple of examples of other commands

await Main.sendOverProtocol('Emulation.setDeviceMetricsOverride', {
  mobile: true,
  width: 412,
  height: 732,
  deviceScaleFactor: 2.625,
});

const data = await Main.sendOverProtocol("Page.captureScreenshot");

Conclusion

In this article we went through each component which compose Selenium based UI tests environment. We have looked precisely into java implementation details of Selenium library part, experimented directly with ChromeDriver and DevTools remote debugger interface of Chrome. I hope that after reading this article you have got a solid understanding of this ecosystem and can prudently make conclusions while writing/debugging Selenium tests.

Last updated