功能: 将通过http请求url,然后读取内容转化为markdown输出
工程创建
# 创建工程cargo new scrape-url
添加依赖
在工程目录下的文件 Cargo.toml 的[dependencies] 加入
## http依赖reqwest = { version = "0.11", features = ["blocking"] }## html 转 markdownhtml2md = "0.2"
Cargo.toml完整文件
[package]name = "scrape-url"version = "0.1.0"authors = ["yangxuan_321 <yangxuan_321@163.com>"]edition = "2021"# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html[dependencies]# http依赖reqwest = { version = "0.11", features = ["blocking"] }# html 转 markdownhtml2md = "0.2"
代码编写
use std::fs;fn main() {let url = "https://jingyan.baidu.com/article/358570f6bf07f08f4624fc3e.html";let output = "358570f6bf07f08f4624fc3e.md";println!("Fetching url: {}", url);let body = reqwest::blocking::get(url).unwrap().text().unwrap();println!("Converting html to markdown...");let md = html2md::parse_html(&body);fs::write(output, md.as_bytes()).unwrap();println!("Converted markdown has been saved in {}.", output);}
运行
cargo run
如果编译过程中报错cargo版本或者相关包下载失败问题,请参照
Fetching url: https://jingyan.baidu.com/article/358570f6bf07f08f4624fc3e.htmlConverting html to markdown...Converted markdown has been saved in 358570f6bf07f08f4624fc3e.md.
错误解决参照
- this version of Cargo is older than the
2021edition, and only supports2015and2018editions.
rustup default nightly && rustup update
- error: failed to run custom build command for
openssl-sys v0.9.75
sudo apt install libssl-dev
