I was reading the blog post about bot detection with browsers. The first layer being the IP address of the browser.
One rather unique scenario I've been trying to work out for a scraper is eliminating network latency. My use of the site is enhanced by the request from the browser having the lowest RTT latency to the webserver as possible. This means being in the same cloud provider.
To do this right now I manually navigate to the site and have a browser extension that clicks at just the right time.
I'd really like to eliminate that manual navigation but every time I've tried adding browser automation outside of the single click from the extension, I'm immediately met with bot detection.
Obviously adding a residential proxy step completely defeats the purpose of the RTT latency optimization.
Do modified browsers drive the overall bot detection heuristic low enough that the cloud IP address itself isn't a red flag? I've seen Camoufox and will try it at some point. What other options are available to drive down the overall "score" so I can still automate the browser but keep the latency low?
Hi Bronco, Omar here, a Lead Platform Engineer at Intuned.
> I've tried adding browser automation outside of the single click from the extension, and I'm immediately met with bot detection.
Can you explain how you write your automation? How do you do the click from the extension? Do you use CDP input commands to perform the click (`Input.dispatchMouseEvent`), or do you execute JS code and click the button using `element.click`?
using CDP will give you much better score than Executing JS to click a captcha button.
JS Execution can be easily detected by any bot detection provider. On the other hand, using CDP to click can mimic actual mouse movement and will make it much harder to detect abnormalities. and the click event will have `isTrusted=true` and handle detection methods related to monitoring JS executions on page.
> Do modified browsers drive the overall bot detection heuristic low enough that the cloud IP address itself isn't a red flag? I've seen Camoufox and will try it at some point. What other options are available to drive down the overall "score" so I can still automate the browser but keep the latency low?
Modified browsers reduce your bot score a lot, and Camoufox is a great option to test out. Will it work? It depends on how the website has set up their bot detection. using modified browser is a must for use case.
At Intuned, we use our own internal forked Chrome to hide the most popular signals, and a lot of the time, the browser alone without a residential proxy can help us bypass most websites, but not all of them(IP reputation has very high heuristic value).
I can't give exact recommendation on what will work with you for sure since each website has it's own ways to handle bot detection , one recommendation I can give is to try to use packages like patchright they can help alot and hide alot of popular signals.
Another recommendation I can make is to try using intuned agent and ask it to help you find a way to bypass bot detection on that website. If you can handle it using network interception or some other scraping technique, the agent is really good in these cases and knows most of the used scraping techniques.
While I understand that not every business wants automation on their site, I know some businesses are totally open to it. But from a technical perspective, it's very difficult to allow well-behaved browser automation while still blocking abusive bots. Web Bot Auth gives website owners / security vendors a lightweight way to allow providers like Intuned.
Biggest question I have is how this will overcome sites that implement aggressive anti-automation security. I can easily automate websites with existing tools until I slam into that wall.
This never made it into prod since the scale was small, but one of the favorite leaks I found when working on bot detection was browsers which generated the same random numbers. Presumably because they were being init to the same VM snapshot and therefore the same random number state.
That is clever! I wish we could use tricks like that but we've never used client side JS for such purposes.
p.s. I've added this comment to https://news.ycombinator.com/highlights. I mention this so more people might learn that it exists and hopefully send us nominations!
So far its cost me $2.27 to submit a contact form 3 times - why is this better than a captcha solver with human solves at 1000 per $2?
On your automation, your tool fed back to me as follows after 3 submissions:
> The CAPTCHA is persistently blocking now — Prosopo's widget appears to have flagged the session/IP due to the repeated submissions. The checkbox won't reset this time. This is expected behavior from their bot protection product. To submit again, you'd likely need to wait a while for the rate limit to clear, or submit manually from your own browser.
The cost is AI cost for using the agent - not captcha cost. Usually, you would write the project and then call it via API - instead of asking the agent to do the action more than 1 time. Considering using the web task API for this use case.
I'm always genuinely curious on how startups navigate the founder maze as it helps to break the myth of an overnight success story.
Based on your YC page, you went through a couple of pivots over the last years:
- 4 years ago: Intuned - The data assistant for engineering leaders [0]
- 2 years ago: Intuned - The browser automation platform for developers and product teams [1]
- 1 year ago: Intuned Auth Sessions - Build authenticated scrapers and RPA [2]
What was kind of the evolution from YC S22 4 years ago till you arrived at today's launch? How did you find your differentiation in a highly commoditized space? Even within YC, there are many competitors like Firecrawl, Reworkd, BrowserUse, NotteLabs, Browserbase, etc.
Another thing that might interest HN: AI crawlers come with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported here on HN (and experienced myself).
Does Intuned respect robots.txt directives and do you disclose the identity of your crawlers via user-agent header?
We actually went through 1 hard pivot only, the reset is more about framing the problem as we dug deeper and understood the customers and issue more. As an example, "Intuned Auth Sessions" is just a feature that we still support today!
But you are right, being a founder is not easy and the hardest part is figuring out what to build, does it make sense to keep going or should you stop - those are questions I still struggle with until today.
For your question about how is this different - I think if you dig into those product you will see that our focus is different, many of the companies mentioned are focused on powering agents via APIs, some are focused on enabling users to use AI at runtime, we do feel that our product is somewhat differentiated - the closest one is possibly Reworkd and I would still say the product is somewhat different. Now, the hardest part is actually commenting this with customers and the market in general - and there, we have a lot to figure out!
For robots.txt and user-agents question, we think of ourselves as providing infrastructure and flexibility for our customers to do what they want - we do encourage in our docs that they respect robots.txt but we don't enforce it on a platform level.
Appreciate you taking the time to leave this comment - very thoughtful
Congrats on launch. I have experienced these issues first hand with `Open Finance` a few years ago.
I feel that you'll end up being an automation agency (you mentioned UiPath), companies who have the skills and capacity to build, will not need your service. But those who want the full service, you might fill a gap.
Yes, we we did our YC and social launches, we had few companies sign up and we have been building with them sense. For some of them, we have enabled them to run 1000+ scrapers which would have been very hard without Intuned. Some of these have been with us for 2+ years!
What our users love is the agent and the idea that we are not using AI at runtime which improves reliability.
this is not meant as personal assistant to do a task like this - the idea is that if you want to build a travel assistant and you want to do an integration with a travel booking service using browser automation - you can use Intuned to build this integration and expose an API to do search and one to do a booking, etc.
Does this make sense?
well, auth depends on the exact case, as an example, some 2FA we deal with it via code and we generate OPT on demand, on some cases, different approaches. One thing worth mentioning is that for auth, we allow you to attach a proxy to an auth session which will allow you to use that same IP with that auth session to reduce issues.
One more thing, we have something called recorder auth session which basicly allows you to use a remote browser to authenticate and then we just save that.
If you think about "price", "speed" and "accuracy (reliability/quality)", our bet is that models won't hit those 3 together. So you won't get a model that is very fast, very cheap and very accurate anytime soon.
Also, imagine that you have a case where you want to scrape 10,000 records from a website, why have AI navigate to every page to do this? why not write the code, run it, and get consistent and fast result? its also predictable, if it messes up, you know what happened and you can trace it to the exact line of code.
I’d like to see less focus on Playwright and more focus on giving the agent more than just an MCP to browser automation. Make it multi-modal, figure out how to optimize when to send screenshots to which model, etc… current coding harnesses are awful at any UI automation because they’re just automating DevTools and occasionally screenshotting. It’s obviously robotic, it’s slow, it’s ineffective and makes it difficult for the agent to validate success of code changes.
Generalized computer use is what will ultimately solve this, but I think there’s real intermediate value in optimizing browser workflows specifically, as a medley of remote browser automation and multi-modal browser use.
our product is infra + agent. You can use codex or other agent to generate the code. We actually have a cli that allows you to deploy projects to our infrastructure.
We are actually working on open sourcing a plugin that you can use with any coding harness!
We actually do support CDP directly as well. As for why playwright, playwright provides a lot of benefits that we like in terms of being code - like auto-waiting, good syntax for a lot of operations, agents can use playwright pretty well and playwright is generally like by the community we target.
If a customer, doesn't want to use playwright, they don't have to given CDP but we most of our templates use playwright.
definitely came up multiple times. Next week, we are releasing our codex/claude plugin, so you will be able to use codex to create the projects and deploy them to Intuned.
Intuned as a platform to deploy browser automation adds a lot - anti-bot detection, jobs, observability and more.
other than the stuff I mentioned, it’s the deep integration between the agent and platform. Because we have obserabiltity, you can open a failed run, and with a click of a button fix it.
you can also enable self healing on a project which puts it on autopilot. we have a lot to improve on but if you have 10+ active scrapes, intuned as a package saves users time and pain.
I was reading the blog post about bot detection with browsers. The first layer being the IP address of the browser.
One rather unique scenario I've been trying to work out for a scraper is eliminating network latency. My use of the site is enhanced by the request from the browser having the lowest RTT latency to the webserver as possible. This means being in the same cloud provider.
To do this right now I manually navigate to the site and have a browser extension that clicks at just the right time.
I'd really like to eliminate that manual navigation but every time I've tried adding browser automation outside of the single click from the extension, I'm immediately met with bot detection.
Obviously adding a residential proxy step completely defeats the purpose of the RTT latency optimization.
Do modified browsers drive the overall bot detection heuristic low enough that the cloud IP address itself isn't a red flag? I've seen Camoufox and will try it at some point. What other options are available to drive down the overall "score" so I can still automate the browser but keep the latency low?
> I've tried adding browser automation outside of the single click from the extension, and I'm immediately met with bot detection.
Can you explain how you write your automation? How do you do the click from the extension? Do you use CDP input commands to perform the click (`Input.dispatchMouseEvent`), or do you execute JS code and click the button using `element.click`? using CDP will give you much better score than Executing JS to click a captcha button. JS Execution can be easily detected by any bot detection provider. On the other hand, using CDP to click can mimic actual mouse movement and will make it much harder to detect abnormalities. and the click event will have `isTrusted=true` and handle detection methods related to monitoring JS executions on page.
> Do modified browsers drive the overall bot detection heuristic low enough that the cloud IP address itself isn't a red flag? I've seen Camoufox and will try it at some point. What other options are available to drive down the overall "score" so I can still automate the browser but keep the latency low?
Modified browsers reduce your bot score a lot, and Camoufox is a great option to test out. Will it work? It depends on how the website has set up their bot detection. using modified browser is a must for use case.
At Intuned, we use our own internal forked Chrome to hide the most popular signals, and a lot of the time, the browser alone without a residential proxy can help us bypass most websites, but not all of them(IP reputation has very high heuristic value).
I can't give exact recommendation on what will work with you for sure since each website has it's own ways to handle bot detection , one recommendation I can give is to try to use packages like patchright they can help alot and hide alot of popular signals.
Another recommendation I can make is to try using intuned agent and ask it to help you find a way to bypass bot detection on that website. If you can handle it using network interception or some other scraping technique, the agent is really good in these cases and knows most of the used scraping techniques.
While I understand that not every business wants automation on their site, I know some businesses are totally open to it. But from a technical perspective, it's very difficult to allow well-behaved browser automation while still blocking abusive bots. Web Bot Auth gives website owners / security vendors a lightweight way to allow providers like Intuned.
(I work on the Web Bot Auth implementation for Stytch, now a part of Twilio: https://stytch.com/blog/stytch-supports-web-bot-auth/ )
Also, one of our engineers did a write up on bot detection systems and how they work - https://intunedhq.com/blog/how-bot-detection-works
p.s. I've added this comment to https://news.ycombinator.com/highlights. I mention this so more people might learn that it exists and hopefully send us nominations!
On your automation, your tool fed back to me as follows after 3 submissions:
> The CAPTCHA is persistently blocking now — Prosopo's widget appears to have flagged the session/IP due to the repeated submissions. The checkbox won't reset this time. This is expected behavior from their bot protection product. To submit again, you'd likely need to wait a while for the rate limit to clear, or submit manually from your own browser.
Based on your YC page, you went through a couple of pivots over the last years:
- 4 years ago: Intuned - The data assistant for engineering leaders [0]
- 2 years ago: Intuned - The browser automation platform for developers and product teams [1]
- 1 year ago: Intuned Auth Sessions - Build authenticated scrapers and RPA [2]
What was kind of the evolution from YC S22 4 years ago till you arrived at today's launch? How did you find your differentiation in a highly commoditized space? Even within YC, there are many competitors like Firecrawl, Reworkd, BrowserUse, NotteLabs, Browserbase, etc.
Another thing that might interest HN: AI crawlers come with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported here on HN (and experienced myself).
Does Intuned respect robots.txt directives and do you disclose the identity of your crawlers via user-agent header?
[0] https://www.ycombinator.com/launches/Gqr-intuned-the-data-as...
[1]https://www.ycombinator.com/launches/LGE-intuned-the-browser...
[2] https://www.ycombinator.com/launches/Lpq-intuned-auth-sessio...
For your question about how is this different - I think if you dig into those product you will see that our focus is different, many of the companies mentioned are focused on powering agents via APIs, some are focused on enabling users to use AI at runtime, we do feel that our product is somewhat differentiated - the closest one is possibly Reworkd and I would still say the product is somewhat different. Now, the hardest part is actually commenting this with customers and the market in general - and there, we have a lot to figure out!
For robots.txt and user-agents question, we think of ourselves as providing infrastructure and flexibility for our customers to do what they want - we do encourage in our docs that they respect robots.txt but we don't enforce it on a platform level.
Appreciate you taking the time to leave this comment - very thoughtful
I feel that you'll end up being an automation agency (you mentioned UiPath), companies who have the skills and capacity to build, will not need your service. But those who want the full service, you might fill a gap.
I wish you all the best.
Also, imagine that you have a case where you want to scrape 10,000 records from a website, why have AI navigate to every page to do this? why not write the code, run it, and get consistent and fast result? its also predictable, if it messes up, you know what happened and you can trace it to the exact line of code.
Generalized computer use is what will ultimately solve this, but I think there’s real intermediate value in optimizing browser workflows specifically, as a medley of remote browser automation and multi-modal browser use.
We are actually working on open sourcing a plugin that you can use with any coding harness!
If a customer, doesn't want to use playwright, they don't have to given CDP but we most of our templates use playwright.
Intuned as a platform to deploy browser automation adds a lot - anti-bot detection, jobs, observability and more.
for jobs/durability/obs i have sqlite and had codex generate an ugly but functional dashboard
im just curious to know what intune does that is different
that sounds interesting
I am happy to give you a demo over a call as well