htmldf: Simple Scraping and Tidy Webpage Summaries

Simple tools for scraping webpages, extracting common html tags and parsing contents to a tidy, tabular format. Tools help with extraction of page titles, links, images, rss feeds, social media handles and page metadata.

Version: 0.6.0
Depends: R (≥ 3.5.0)
Imports: cld3, dplyr, httr, lubridate, magrittr, processx, progress, R.utils, ranger, rvest, stringr, tibble, tidyr, tools, urltools, xml2
Suggests: testthat
Published: 2022-07-09
DOI: 10.32614/CRAN.package.htmldf
Author: Alastair Rushworth
Maintainer: Alastair Rushworth <alastairmrushworth at gmail.com>
BugReports: https://github.com/alastairrushworth/htmldf/issues
License: GPL-2
URL: https://github.com/alastairrushworth/htmldf/
NeedsCompilation: no
Language: en_GB
Materials: README NEWS
CRAN checks: htmldf results [issues need fixing before 2025-01-10]

Documentation:

Reference manual: htmldf.pdf

Downloads:

Package source: htmldf_0.6.0.tar.gz
Windows binaries: r-devel: htmldf_0.6.0.zip, r-release: htmldf_0.6.0.zip, r-oldrel: htmldf_0.6.0.zip
macOS binaries: r-release (arm64): htmldf_0.6.0.tgz, r-oldrel (arm64): htmldf_0.6.0.tgz, r-release (x86_64): htmldf_0.6.0.tgz, r-oldrel (x86_64): htmldf_0.6.0.tgz
Old sources: htmldf archive

Linking:

Please use the canonical form https://CRAN.R-project.org/package=htmldf to link to this page.