navigator.webdriver
Defines a standard way for co-operating user agents to inform the document that it is controlled by WebDriver, for example so that alternate code paths can be triggered during automation.
So, the bottom line is:
Selenium identifies itself
However some generic approaches to avoid getting detected while web-scraping are as follows:
The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
If you need to send multiple requests to a website, you need to keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs)
. Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds
–
–
As per the current WebDriver W3C Editor's Draft specification:
The webdriver-active flag is set to true when the user agent is under remote control. It is initially false.
Hence, the readonly boolean attribute webdriver returns true if webdriver-active flag is set, false otherwise.
Further the specification further clarifies:
navigator.webdriver Defines a standard way for co-operating user
agents to inform the document that it is controlled by WebDriver, for
example so that alternate code paths can be triggered during
automation.
There had been tons and millions of discussions demanding Feature: option to disable navigator.webdriver == true ? and @whimboo
in his comment concluded that:
that is because the WebDriver spec defines that property on the
Navigator object, which has to be set to true when tests are running
with webdriver enabled:
https://w3c.github.io/webdriver/#interface
Implementations have to be conformant to this requirement. As such we
will not provide a way to circumvent that.
Generic Conclusion
From the above discussions it can be concluded that:
Selenium identifies itself
and there is no way to conceal the fact that the browser is WebDriver driven.
Recommendations
However some users have suggested approaches which can conceal the fact that the Mozilla Firefox browser is WebDriver controled through the usage of Firefox Profiles and Proxies as follows:
selenium4 compatible python code
from selenium.webdriver import Firefox
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
profile_path = r'C:\Users\Admin\AppData\Roaming\Mozilla\Firefox\Profiles\s8543x41.default-release'
options=Options()
options.set_preference('profile', profile_path)
options.set_preference('network.proxy.type', 1)
options.set_preference('network.proxy.socks', '127.0.0.1')
options.set_preference('network.proxy.socks_port', 9050)
options.set_preference('network.proxy.socks_remote_dns', False)
service = Service('C:\\BrowserDrivers\\geckodriver.exe')
driver = Firefox(service=service, options=options)
driver.get("https://www.google.com")
driver.quit()
Other Alternatives
It is observed that in some specific os variants a couple of diverse settings/configuration can bypass the bot detectation which are as follows:
selenium4 compatible code block
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.chrome.service import Service
options = Options()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\\BrowserDrivers\\geckodriver.exe')
driver = webdriver.Chrome(service=s, options=options)
Potential Solution
A potential solution would be to use the tor browser as follows:
selenium4 compatible python code
from selenium.webdriver import Firefox
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
import os
torexe = os.popen(r'C:\Users\username\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe')
profile_path = r'C:\Users\username\Desktop\Tor Browser\Browser\TorBrowser\Data\Browser\profile.default'
firefox_options=Options()
firefox_options.set_preference('profile', profile_path)
firefox_options.set_preference('network.proxy.type', 1)
firefox_options.set_preference('network.proxy.socks', '127.0.0.1')
firefox_options.set_preference('network.proxy.socks_port', 9050)
firefox_options.set_preference("network.proxy.socks_remote_dns", False)
firefox_options.binary_location = r'C:\Users\username\Desktop\Tor Browser\Browser\firefox.exe'
service = Service('C:\\BrowserDrivers\\geckodriver.exe')
driver = webdriver.Firefox(service=service, options=firefox_options)
driver.get("https://www.tiktok.com/")
–
–
–
–
–
It may sound simple, but if you look how the website detects selenium (or bots) is by tracking the movements, so if you can make your program slightly towards like a human is browsing the website you can get less captcha, such as add cursor/page scroll movements in between your operations, and other actions which mimics the browsing. So between two operations try to add some other actions, Add some delay etc. This will make your bot slower and could get undetected.
Thanks
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.