nerdexam
CompTIACompTIA

PT0-002 · Question #100

PT0-002 Question #100: Real Exam Question with Answer & Explanation

The correct answer is A: Website scraping. The provided output, a robots.txt file, is typically consulted by web crawlers or scrapers to understand which parts of a website should not be accessed, making it most relevant to website scraping.

Reconnaissance and enumeration

Question

Given the following output: User-agent:* Disallow: /author/ Disallow: /xmlrpc.php Disallow: /wp-admin Disallow: /page/ During which of the following activities was this output MOST likely obtained?

Options

  • AWebsite scraping
  • BWebsite cloning
  • CDomain enumeration
  • DURL enumeration

Explanation

The provided output, a robots.txt file, is typically consulted by web crawlers or scrapers to understand which parts of a website should not be accessed, making it most relevant to website scraping.

Common mistakes.

  • B. Website cloning involves creating an exact copy of a website; while a cloner might encounter robots.txt, its primary goal isn't to read and adhere to these directives for content extraction, but to replicate the site structure and content.
  • C. Domain enumeration focuses on discovering subdomains or related domains for a target, not on reading specific files within a known domain for access restrictions.
  • D. URL enumeration involves discovering valid URLs or directories on a website, but the robots.txt output specifically defines disallowed paths for robots, which is more directly relevant to automated content extraction (scraping) than just finding valid URLs.

Concept tested. robots.txt file purpose, website reconnaissance

Reference. https://developers.google.com/search/docs/crawling-indexing/robots/intro

Topics

#Robots.txt#Website scraping#Web reconnaissance#Information gathering

Community Discussion

No community discussion yet for this question.

Full PT0-002 PracticeBrowse All PT0-002 Questions