PT0-002 · Question #100
PT0-002 Question #100: Real Exam Question with Answer & Explanation
The correct answer is A: Website scraping. The provided output, a robots.txt file, is typically consulted by web crawlers or scrapers to understand which parts of a website should not be accessed, making it most relevant to website scraping.
Question
Given the following output: User-agent:* Disallow: /author/ Disallow: /xmlrpc.php Disallow: /wp-admin Disallow: /page/ During which of the following activities was this output MOST likely obtained?
Options
- AWebsite scraping
- BWebsite cloning
- CDomain enumeration
- DURL enumeration
Explanation
The provided output, a robots.txt file, is typically consulted by web crawlers or scrapers to understand which parts of a website should not be accessed, making it most relevant to website scraping.
Common mistakes.
- B. Website cloning involves creating an exact copy of a website; while a cloner might encounter
robots.txt, its primary goal isn't to read and adhere to these directives for content extraction, but to replicate the site structure and content. - C. Domain enumeration focuses on discovering subdomains or related domains for a target, not on reading specific files within a known domain for access restrictions.
- D. URL enumeration involves discovering valid URLs or directories on a website, but the
robots.txtoutput specifically defines disallowed paths for robots, which is more directly relevant to automated content extraction (scraping) than just finding valid URLs.
Concept tested. robots.txt file purpose, website reconnaissance
Reference. https://developers.google.com/search/docs/crawling-indexing/robots/intro
Topics
Community Discussion
No community discussion yet for this question.