Almost half (46 per cent) of all web traffic is generated by web scrapers – a.k.a. ‘bots’ – but a significant portion (38 per cent) are out to copy, or steal, publishers’ content, with some available for a little over $3 per hour, according to a recent report.
Distil Networks this week published a report named The Economics of Web Scraping, claiming that publishers are second only to companies in the retail sector when it comes to the potential negative financial impact of web scraping.
This is performed by a software code that extracts info from websites and then repurposes it for third parties. This can be in the guise of legitimate parties, such as Google, for the purposes of indexing webpages, price comparison sites, etc.
Such software is crucial to the online strategy of many companies, with publishers especially relying on search engines to trawl their websites as a means of maintaining SEO.
However, such software is also employed for more nefarious purposes, such as stealing content from publishers, or hackers looking to compromise websites, etc., with malicious parties siphoning off anywhere up to two per cent of revenue from legitimate businesses, according to the report performed by 451 Research.
This is potentially threatening to publishers’ business model(s), with other verticals listed as potentially compromised by the ubiquity of the software including: online real estate agents; travel; online directories; e-commerce sites; as well as online marketplaces plus directories.
Software from companies such as Distil Networks – which raised $21m in funding last month – and The Media Trust, can be used to distinguish between malicious and non-malacious traffic (and ultimately block it), but both are quick to raise how the increasing sophistication of the peddlers of nefarious elements can make it difficult to protect their data/content.
Rami Essaid, CEO and co-founder of Distil Networks, said: “Not only does web scraping pose a critical challenge to a website’s brand, it can threaten sales and conversions, lower SEO rankings, or undermine the integrity of content that took considerable time and resources to produce.
“Understanding the pervasive nature of today’s web scraping economy not only raises awareness about this growing challenge, it also allows website owners to take action in the protection of their proprietary information.”
An earlier research study from The Media Trust explored the potential dangers posed by ‘malvertising’ and found that 76 per cent of those surveyed felt that such attacks were on the rise; EMEA managing director Matt O’Neill testified that up to 85 per cent of code on an average publisher’s website is beyond their control.
A full copy of the Distil Networks report can be viewed here.