feat(readme): 对部分文本进行格式调整,包括金额数字空格分隔、API 参数说明优化、标题层级对齐等,提升可读性。
```
This commit is contained in:
2025-12-15 10:36:18 +08:00
parent 745faa0ecc
commit b044e918aa
9 changed files with 949 additions and 80 deletions

33
.gitignore vendored Normal file
View File

@@ -0,0 +1,33 @@
# 依赖目录
node_modules/
# 日志文件
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
# 环境变量文件
.env
.env.local
.env.*.local
# 编辑器目录和文件
.vscode/
.idea/
*.swp
*.swo
*~
# 操作系统文件
.DS_Store
Thumbs.db
# 构建输出
dist/
build/
*.log
# 临时文件
*.tmp
.cache/

View File

@@ -1,12 +1,12 @@
# 南京公共工程建设中心 - 公告抓取工具 # 南京公共工程建设中心 - 公告采集工具
一个用于抓取南京公共工程建设中心公告信息的 Web 可视化工具。 一个用于采集南京公共工程建设中心公告信息的 Web 可视化工具。
## 功能特性 ## 功能特性
-抓取公告列表(支持分页) -采集公告列表(支持分页)
- ✅ 按时间范围智能抓取 - ✅ 按时间范围智能采集
-抓取公告详情内容 -采集公告详情内容
- ✅ 智能提取预算金额 - ✅ 智能提取预算金额
- ✅ 生成统计报告 - ✅ 生成统计报告
- ✅ Web 可视化界面 - ✅ Web 可视化界面
@@ -34,17 +34,20 @@ npm start
### 3. 功能介绍 ### 3. 功能介绍
**公告列表标签** **公告列表标签**
- 快速查看所有公告 - 快速查看所有公告
- 支持分页浏览 - 支持分页浏览
- 一键获取最新公告列表 - 一键获取最新公告列表
**详情抓取标签** **详情采集标签**
- 批量抓取公告详情
- 支持按时间范围抓取 - 批量采集公告详情
- 支持按时间范围采集
- 自动提取预算金额 - 自动提取预算金额
- 可自定义抓取数量 - 可自定义采集数量
**生成报告标签** **生成报告标签**
- 支持按时间范围生成报告 - 支持按时间范围生成报告
- 设置金额阈值筛选项目 - 设置金额阈值筛选项目
- 实时统计项目信息 - 实时统计项目信息
@@ -78,14 +81,18 @@ npm start
服务器启动后提供以下 RESTful API 接口: 服务器启动后提供以下 RESTful API 接口:
### 1. 获取公告列表 ### 1. 获取公告列表
``` ```
GET /api/list?url=<列表页URL>&page=<页码> GET /api/list?url=<列表页URL>&page=<页码>
``` ```
参数: 参数:
- `url` (可选): 列表页 URL,默认为官网首页 - `url` (可选): 列表页 URL,默认为官网首页
- `page` (可选): 页码,默认为 1 - `page` (可选): 页码,默认为 1
### 2. 按时间范围获取列表 ### 2. 按时间范围获取列表
``` ```
POST /api/list-daterange POST /api/list-daterange
Content-Type: application/json Content-Type: application/json
@@ -98,6 +105,7 @@ Content-Type: application/json
``` ```
### 3. 批量获取详情 ### 3. 批量获取详情
``` ```
POST /api/details POST /api/details
Content-Type: application/json Content-Type: application/json
@@ -109,6 +117,7 @@ Content-Type: application/json
``` ```
### 4. 生成报告 ### 4. 生成报告
``` ```
POST /api/report POST /api/report
Content-Type: application/json Content-Type: application/json
@@ -121,6 +130,7 @@ Content-Type: application/json
``` ```
### 5. 按时间范围生成报告 ### 5. 按时间范围生成报告
``` ```
POST /api/report-daterange POST /api/report-daterange
Content-Type: application/json Content-Type: application/json
@@ -156,26 +166,28 @@ Content-Type: application/json
## 注意事项 ## 注意事项
1. 抓取速度已限制为每条延迟500ms-1s,避免请求过快 1. 采集速度已限制为每条延迟 500ms-1s,避免请求过快
2. 仅支持 gjzx.nanjing.gov.cn 域名的详情页解析 2. 仅支持 gjzx.nanjing.gov.cn 域名的详情页解析
3. 金额提取基于正则匹配,支持多种格式(预算金额、最高限价等) 3. 金额提取基于正则匹配,支持多种格式(预算金额、最高限价等)
4. Web 服务器默认端口 3000,可在 server.js 中修改 4. Web 服务器默认端口 3000,可在 server.js 中修改
5. 按时间范围抓取会在检测到所有公告早于起始日期时自动停止 5. 按时间范围采集会在检测到所有公告早于起始日期时自动停止
6. 编码自动识别,支持 GBK 和 UTF-8 网页 6. 编码自动识别,支持 GBK 和 UTF-8 网页
## 核心功能说明 ## 核心功能说明
### 时间范围抓取逻辑 ### 时间范围采集逻辑
按时间范围抓取时,程序会: 按时间范围采集时,程序会:
1. 从第一页开始顺序抓取
1. 从第一页开始顺序采集
2. 检查每页公告的日期是否在指定范围内 2. 检查每页公告的日期是否在指定范围内
3. 如果某页所有公告都早于起始日期,自动停止抓取 3. 如果某页所有公告都早于起始日期,自动停止采集
4. 支持设置最大页数限制,避免过度抓取 4. 支持设置最大页数限制,避免过度采集
### 金额提取规则 ### 金额提取规则
支持识别以下格式: 支持识别以下格式:
- 预算金额: XX 万元 - 预算金额: XX 万元
- 最高限价: XX 万元 - 最高限价: XX 万元
- 预算: XX 万元 - 预算: XX 万元
@@ -185,29 +197,35 @@ Content-Type: application/json
### 编码处理 ### 编码处理
自动识别网页编码: 自动识别网页编码:
- 优先读取 Content-Type 中的 charset - 优先读取 Content-Type 中的 charset
- 自动处理 GBK、GB2312 编码 - 自动处理 GBK、GB2312 编码
- 默认使用 UTF-8 - 默认使用 UTF-8
## 常见问题 ## 常见问题
### Q: 为什么抓取速度比较慢? ### Q: 为什么采集速度比较慢?
A: 为了避免对服务器造成过大压力,程序限制了请求频率(每条延迟 500ms-1s)。这是一个负责任的爬虫设计。 A: 为了避免对服务器造成过大压力,程序限制了请求频率(每条延迟 500ms-1s)。这是一个负责任的爬虫设计。
### Q: 如何抓取指定日期范围的公告? ### Q: 如何采集指定日期范围的公告?
A: 在Web界面的"详情抓取"和"生成报告"标签中勾选"按时间范围抓取",然后输入起始和结束日期即可。
A: 在 Web 界面的"详情采集"和"生成报告"标签中勾选"按时间范围采集",然后输入起始和结束日期即可。
### Q: 导出的报告在哪里? ### Q: 导出的报告在哪里?
A: 点击"导出 Word"或"导出 Markdown"按钮后会自动下载到浏览器的默认下载目录。 A: 点击"导出 Word"或"导出 Markdown"按钮后会自动下载到浏览器的默认下载目录。
### Q: 可以抓取其他网站吗? ### Q: 可以采集其他网站吗?
A: 需要修改 server.js 中的 BASE_URL 和相应的解析函数,因为不同网站的 HTML 结构不同。 A: 需要修改 server.js 中的 BASE_URL 和相应的解析函数,因为不同网站的 HTML 结构不同。
## 更新日志 ## 更新日志
### v1.0.0 (2025-12-12) ### v1.0.0 (2025-12-12)
- Web 可视化界面 - Web 可视化界面
- 支持按时间范围抓取 - 支持按时间范围采集
- 支持分页浏览 - 支持分页浏览
- 支持导出 Word/Markdown 报告 - 支持导出 Word/Markdown 报告
- RESTful API 接口 - RESTful API 接口

81
node_modules/.package-lock.json generated vendored
View File

@@ -4,6 +4,46 @@
"lockfileVersion": 3, "lockfileVersion": 3,
"requires": true, "requires": true,
"packages": { "packages": {
"node_modules/@napi-rs/canvas": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas/-/canvas-0.1.80.tgz",
"integrity": "sha512-DxuT1ClnIPts1kQx8FBmkk4BQDTfI5kIzywAaMjQSXfNnra5UFU9PwurXrl+Je3bJ6BGsp/zmshVVFbCmyI+ww==",
"license": "MIT",
"workspaces": [
"e2e/*"
],
"engines": {
"node": ">= 10"
},
"optionalDependencies": {
"@napi-rs/canvas-android-arm64": "0.1.80",
"@napi-rs/canvas-darwin-arm64": "0.1.80",
"@napi-rs/canvas-darwin-x64": "0.1.80",
"@napi-rs/canvas-linux-arm-gnueabihf": "0.1.80",
"@napi-rs/canvas-linux-arm64-gnu": "0.1.80",
"@napi-rs/canvas-linux-arm64-musl": "0.1.80",
"@napi-rs/canvas-linux-riscv64-gnu": "0.1.80",
"@napi-rs/canvas-linux-x64-gnu": "0.1.80",
"@napi-rs/canvas-linux-x64-musl": "0.1.80",
"@napi-rs/canvas-win32-x64-msvc": "0.1.80"
}
},
"node_modules/@napi-rs/canvas-win32-x64-msvc": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-win32-x64-msvc/-/canvas-win32-x64-msvc-0.1.80.tgz",
"integrity": "sha512-Z8jPsM6df5V8B1HrCHB05+bDiCxjE9QA//3YrkKIdVDEwn5RKaqOxCJDRJkl48cJbylcrJbW4HxZbTte8juuPg==",
"cpu": [
"x64"
],
"license": "MIT",
"optional": true,
"os": [
"win32"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@types/node": { "node_modules/@types/node": {
"version": "24.10.3", "version": "24.10.3",
"resolved": "https://registry.npmmirror.com/@types/node/-/node-24.10.3.tgz", "resolved": "https://registry.npmmirror.com/@types/node/-/node-24.10.3.tgz",
@@ -971,6 +1011,15 @@
"node": ">= 0.6" "node": ">= 0.6"
} }
}, },
"node_modules/nodemailer": {
"version": "7.0.11",
"resolved": "https://registry.npmmirror.com/nodemailer/-/nodemailer-7.0.11.tgz",
"integrity": "sha512-gnXhNRE0FNhD7wPSCGhdNh46Hs6nm+uTyg+Kq0cZukNQiYdnCsoQjodNP9BQVG9XrcK/v6/MgpAPBUFyzh9pvw==",
"license": "MIT-0",
"engines": {
"node": ">=6.0.0"
}
},
"node_modules/nth-check": { "node_modules/nth-check": {
"version": "2.1.1", "version": "2.1.1",
"resolved": "https://registry.npmmirror.com/nth-check/-/nth-check-2.1.1.tgz", "resolved": "https://registry.npmmirror.com/nth-check/-/nth-check-2.1.1.tgz",
@@ -1099,6 +1148,38 @@
"url": "https://opencollective.com/express" "url": "https://opencollective.com/express"
} }
}, },
"node_modules/pdf-parse": {
"version": "2.4.5",
"resolved": "https://registry.npmmirror.com/pdf-parse/-/pdf-parse-2.4.5.tgz",
"integrity": "sha512-mHU89HGh7v+4u2ubfnevJ03lmPgQ5WU4CxAVmTSh/sxVTEDYd1er/dKS/A6vg77NX47KTEoihq8jZBLr8Cxuwg==",
"license": "Apache-2.0",
"dependencies": {
"@napi-rs/canvas": "0.1.80",
"pdfjs-dist": "5.4.296"
},
"bin": {
"pdf-parse": "bin/cli.mjs"
},
"engines": {
"node": ">=20.16.0 <21 || >=22.3.0"
},
"funding": {
"type": "github",
"url": "https://github.com/sponsors/mehmet-kozan"
}
},
"node_modules/pdfjs-dist": {
"version": "5.4.296",
"resolved": "https://registry.npmmirror.com/pdfjs-dist/-/pdfjs-dist-5.4.296.tgz",
"integrity": "sha512-DlOzet0HO7OEnmUmB6wWGJrrdvbyJKftI1bhMitK7O2N8W2gc757yyYBbINy9IDafXAV9wmKr9t7xsTaNKRG5Q==",
"license": "Apache-2.0",
"engines": {
"node": ">=20.16.0 || >=22.3.0"
},
"optionalDependencies": {
"@napi-rs/canvas": "^0.1.80"
}
},
"node_modules/process-nextick-args": { "node_modules/process-nextick-args": {
"version": "2.0.1", "version": "2.0.1",
"resolved": "https://registry.npmmirror.com/process-nextick-args/-/process-nextick-args-2.0.1.tgz", "resolved": "https://registry.npmmirror.com/process-nextick-args/-/process-nextick-args-2.0.1.tgz",

229
package-lock.json generated
View File

@@ -13,7 +13,193 @@
"cors": "^2.8.5", "cors": "^2.8.5",
"docx": "^9.5.1", "docx": "^9.5.1",
"express": "^5.2.1", "express": "^5.2.1",
"iconv-lite": "^0.6.3" "iconv-lite": "^0.6.3",
"nodemailer": "^7.0.11",
"pdf-parse": "^2.4.5"
}
},
"node_modules/@napi-rs/canvas": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas/-/canvas-0.1.80.tgz",
"integrity": "sha512-DxuT1ClnIPts1kQx8FBmkk4BQDTfI5kIzywAaMjQSXfNnra5UFU9PwurXrl+Je3bJ6BGsp/zmshVVFbCmyI+ww==",
"license": "MIT",
"workspaces": [
"e2e/*"
],
"engines": {
"node": ">= 10"
},
"optionalDependencies": {
"@napi-rs/canvas-android-arm64": "0.1.80",
"@napi-rs/canvas-darwin-arm64": "0.1.80",
"@napi-rs/canvas-darwin-x64": "0.1.80",
"@napi-rs/canvas-linux-arm-gnueabihf": "0.1.80",
"@napi-rs/canvas-linux-arm64-gnu": "0.1.80",
"@napi-rs/canvas-linux-arm64-musl": "0.1.80",
"@napi-rs/canvas-linux-riscv64-gnu": "0.1.80",
"@napi-rs/canvas-linux-x64-gnu": "0.1.80",
"@napi-rs/canvas-linux-x64-musl": "0.1.80",
"@napi-rs/canvas-win32-x64-msvc": "0.1.80"
}
},
"node_modules/@napi-rs/canvas-android-arm64": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-android-arm64/-/canvas-android-arm64-0.1.80.tgz",
"integrity": "sha512-sk7xhN/MoXeuExlggf91pNziBxLPVUqF2CAVnB57KLG/pz7+U5TKG8eXdc3pm0d7Od0WreB6ZKLj37sX9muGOQ==",
"cpu": [
"arm64"
],
"license": "MIT",
"optional": true,
"os": [
"android"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-darwin-arm64": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-darwin-arm64/-/canvas-darwin-arm64-0.1.80.tgz",
"integrity": "sha512-O64APRTXRUiAz0P8gErkfEr3lipLJgM6pjATwavZ22ebhjYl/SUbpgM0xcWPQBNMP1n29afAC/Us5PX1vg+JNQ==",
"cpu": [
"arm64"
],
"license": "MIT",
"optional": true,
"os": [
"darwin"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-darwin-x64": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-darwin-x64/-/canvas-darwin-x64-0.1.80.tgz",
"integrity": "sha512-FqqSU7qFce0Cp3pwnTjVkKjjOtxMqRe6lmINxpIZYaZNnVI0H5FtsaraZJ36SiTHNjZlUB69/HhxNDT1Aaa9vA==",
"cpu": [
"x64"
],
"license": "MIT",
"optional": true,
"os": [
"darwin"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-arm-gnueabihf": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-arm-gnueabihf/-/canvas-linux-arm-gnueabihf-0.1.80.tgz",
"integrity": "sha512-eyWz0ddBDQc7/JbAtY4OtZ5SpK8tR4JsCYEZjCE3dI8pqoWUC8oMwYSBGCYfsx2w47cQgQCgMVRVTFiiO38hHQ==",
"cpu": [
"arm"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-arm64-gnu": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-arm64-gnu/-/canvas-linux-arm64-gnu-0.1.80.tgz",
"integrity": "sha512-qwA63t8A86bnxhuA/GwOkK3jvb+XTQaTiVML0vAWoHyoZYTjNs7BzoOONDgTnNtr8/yHrq64XXzUoLqDzU+Uuw==",
"cpu": [
"arm64"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-arm64-musl": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-arm64-musl/-/canvas-linux-arm64-musl-0.1.80.tgz",
"integrity": "sha512-1XbCOz/ymhj24lFaIXtWnwv/6eFHXDrjP0jYkc6iHQ9q8oXKzUX1Lc6bu+wuGiLhGh2GS/2JlfORC5ZcXimRcg==",
"cpu": [
"arm64"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-riscv64-gnu": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-riscv64-gnu/-/canvas-linux-riscv64-gnu-0.1.80.tgz",
"integrity": "sha512-XTzR125w5ZMs0lJcxRlS1K3P5RaZ9RmUsPtd1uGt+EfDyYMu4c6SEROYsxyatbbu/2+lPe7MPHOO/0a0x7L/gw==",
"cpu": [
"riscv64"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-x64-gnu": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-x64-gnu/-/canvas-linux-x64-gnu-0.1.80.tgz",
"integrity": "sha512-BeXAmhKg1kX3UCrJsYbdQd3hIMDH/K6HnP/pG2LuITaXhXBiNdh//TVVVVCBbJzVQaV5gK/4ZOCMrQW9mvuTqA==",
"cpu": [
"x64"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-linux-x64-musl": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-linux-x64-musl/-/canvas-linux-x64-musl-0.1.80.tgz",
"integrity": "sha512-x0XvZWdHbkgdgucJsRxprX/4o4sEed7qo9rCQA9ugiS9qE2QvP0RIiEugtZhfLH3cyI+jIRFJHV4Fuz+1BHHMg==",
"cpu": [
"x64"
],
"license": "MIT",
"optional": true,
"os": [
"linux"
],
"engines": {
"node": ">= 10"
}
},
"node_modules/@napi-rs/canvas-win32-x64-msvc": {
"version": "0.1.80",
"resolved": "https://registry.npmmirror.com/@napi-rs/canvas-win32-x64-msvc/-/canvas-win32-x64-msvc-0.1.80.tgz",
"integrity": "sha512-Z8jPsM6df5V8B1HrCHB05+bDiCxjE9QA//3YrkKIdVDEwn5RKaqOxCJDRJkl48cJbylcrJbW4HxZbTte8juuPg==",
"cpu": [
"x64"
],
"license": "MIT",
"optional": true,
"os": [
"win32"
],
"engines": {
"node": ">= 10"
} }
}, },
"node_modules/@types/node": { "node_modules/@types/node": {
@@ -983,6 +1169,15 @@
"node": ">= 0.6" "node": ">= 0.6"
} }
}, },
"node_modules/nodemailer": {
"version": "7.0.11",
"resolved": "https://registry.npmmirror.com/nodemailer/-/nodemailer-7.0.11.tgz",
"integrity": "sha512-gnXhNRE0FNhD7wPSCGhdNh46Hs6nm+uTyg+Kq0cZukNQiYdnCsoQjodNP9BQVG9XrcK/v6/MgpAPBUFyzh9pvw==",
"license": "MIT-0",
"engines": {
"node": ">=6.0.0"
}
},
"node_modules/nth-check": { "node_modules/nth-check": {
"version": "2.1.1", "version": "2.1.1",
"resolved": "https://registry.npmmirror.com/nth-check/-/nth-check-2.1.1.tgz", "resolved": "https://registry.npmmirror.com/nth-check/-/nth-check-2.1.1.tgz",
@@ -1111,6 +1306,38 @@
"url": "https://opencollective.com/express" "url": "https://opencollective.com/express"
} }
}, },
"node_modules/pdf-parse": {
"version": "2.4.5",
"resolved": "https://registry.npmmirror.com/pdf-parse/-/pdf-parse-2.4.5.tgz",
"integrity": "sha512-mHU89HGh7v+4u2ubfnevJ03lmPgQ5WU4CxAVmTSh/sxVTEDYd1er/dKS/A6vg77NX47KTEoihq8jZBLr8Cxuwg==",
"license": "Apache-2.0",
"dependencies": {
"@napi-rs/canvas": "0.1.80",
"pdfjs-dist": "5.4.296"
},
"bin": {
"pdf-parse": "bin/cli.mjs"
},
"engines": {
"node": ">=20.16.0 <21 || >=22.3.0"
},
"funding": {
"type": "github",
"url": "https://github.com/sponsors/mehmet-kozan"
}
},
"node_modules/pdfjs-dist": {
"version": "5.4.296",
"resolved": "https://registry.npmmirror.com/pdfjs-dist/-/pdfjs-dist-5.4.296.tgz",
"integrity": "sha512-DlOzet0HO7OEnmUmB6wWGJrrdvbyJKftI1bhMitK7O2N8W2gc757yyYBbINy9IDafXAV9wmKr9t7xsTaNKRG5Q==",
"license": "Apache-2.0",
"engines": {
"node": ">=20.16.0 || >=22.3.0"
},
"optionalDependencies": {
"@napi-rs/canvas": "^0.1.80"
}
},
"node_modules/process-nextick-args": { "node_modules/process-nextick-args": {
"version": "2.0.1", "version": "2.0.1",
"resolved": "https://registry.npmmirror.com/process-nextick-args/-/process-nextick-args-2.0.1.tgz", "resolved": "https://registry.npmmirror.com/process-nextick-args/-/process-nextick-args-2.0.1.tgz",

View File

@@ -2,7 +2,7 @@
"name": "gjzx-scraper", "name": "gjzx-scraper",
"version": "1.0.0", "version": "1.0.0",
"type": "module", "type": "module",
"description": "工具:抓取 https://gjzx.nanjing.gov.cn/gggs/ 公示列表信息及详情", "description": "工具:采集 https://gjzx.nanjing.gov.cn/gggs/ 公示列表信息及详情",
"main": "src/server.js", "main": "src/server.js",
"scripts": { "scripts": {
"start": "node src/server.js" "start": "node src/server.js"
@@ -13,6 +13,7 @@
"cors": "^2.8.5", "cors": "^2.8.5",
"docx": "^9.5.1", "docx": "^9.5.1",
"express": "^5.2.1", "express": "^5.2.1",
"iconv-lite": "^0.6.3" "iconv-lite": "^0.6.3",
"nodemailer": "^7.0.11"
} }
} }

View File

@@ -134,11 +134,11 @@ async function fetchDetails() {
listData = await dateRangeResponse.json(); listData = await dateRangeResponse.json();
} else { } else {
// 普通模式 - 按数量抓取多页 // 普通模式 - 按数量采集多页
const url = document.getElementById('detailUrl').value; const url = document.getElementById('detailUrl').value;
const limit = parseInt(document.getElementById('detailLimit').value); const limit = parseInt(document.getElementById('detailLimit').value);
// 抓取多页直到获得足够数量 // 采集多页直到获得足够数量
const allItems = []; const allItems = [];
let page = 1; let page = 1;
const maxPagesToFetch = Math.ceil(limit / 10) + 1; // 假设每页约10条 const maxPagesToFetch = Math.ceil(limit / 10) + 1; // 假设每页约10条
@@ -177,7 +177,7 @@ async function fetchDetails() {
return; return;
} }
// 抓取详情 // 采集详情
const limit = useDetailDateRange ? listData.data.length : parseInt(document.getElementById('detailLimit').value); const limit = useDetailDateRange ? listData.data.length : parseInt(document.getElementById('detailLimit').value);
const detailResponse = await fetch(`${API_BASE}/details`, { const detailResponse = await fetch(`${API_BASE}/details`, {
method: 'POST', method: 'POST',
@@ -202,7 +202,7 @@ async function fetchDetails() {
function displayDetails(items, container) { function displayDetails(items, container) {
const html = ` const html = `
<div style="max-height: 380px; overflow-y: auto;"> <div style="max-height: 380px; overflow-y: auto;">
<h3 style="margin-bottom: 15px;">抓取${items.length} 条详情</h3> <h3 style="margin-bottom: 15px;">采集${items.length} 条详情</h3>
${items.map((item, index) => ` ${items.map((item, index) => `
<div class="list-item"> <div class="list-item">
<h3>${index + 1}. ${item.title}</h3> <h3>${index + 1}. ${item.title}</h3>
@@ -212,7 +212,7 @@ function displayDetails(items, container) {
${item.detail.budget ? ` ${item.detail.budget ? `
<span class="budget">${item.detail.budget.amount}${item.detail.budget.unit}</span> <span class="budget">${item.detail.budget.amount}${item.detail.budget.unit}</span>
` : '<div class="meta">未找到预算信息</div>'} ` : '<div class="meta">未找到预算信息</div>'}
` : '<div class="error">抓取失败</div>'} ` : '<div class="error">采集失败</div>'}
<br><a href="${item.href}" target="_blank">查看原文 →</a> <br><a href="${item.href}" target="_blank">查看原文 →</a>
</div> </div>
`).join('')} `).join('')}
@@ -271,6 +271,7 @@ async function generateReport() {
currentReport = data.data; currentReport = data.data;
displayReport(data.data, results); displayReport(data.data, results);
exportBtn.style.display = 'inline-block'; exportBtn.style.display = 'inline-block';
document.getElementById('sendEmailBtn').style.display = 'inline-block';
} else { } else {
results.innerHTML = `<div class="error">错误: ${data.error}</div>`; results.innerHTML = `<div class="error">错误: ${data.error}</div>`;
} }
@@ -475,3 +476,197 @@ async function exportReport() {
document.body.removeChild(a); document.body.removeChild(a);
URL.revokeObjectURL(url); URL.revokeObjectURL(url);
} }
// ========== 邮件功能 ==========
// 页面加载时加载邮件配置
document.addEventListener('DOMContentLoaded', function() {
loadEmailConfig();
});
// 保存邮件配置到localStorage
function saveEmailConfig() {
const config = {
smtpHost: document.getElementById('smtpHost').value,
smtpPort: parseInt(document.getElementById('smtpPort').value) || 587,
smtpUser: document.getElementById('smtpUser').value,
smtpPass: document.getElementById('smtpPass').value,
recipients: document.getElementById('recipients').value
};
// 验证配置
if (!config.smtpHost || !config.smtpUser || !config.smtpPass || !config.recipients) {
showEmailStatus('请填写所有必填项', 'error');
return;
}
// 保存到localStorage
localStorage.setItem('emailConfig', JSON.stringify(config));
showEmailStatus('邮件配置已保存', 'success');
}
// 从localStorage加载邮件配置
function loadEmailConfig() {
const configStr = localStorage.getItem('emailConfig');
if (configStr) {
try {
const config = JSON.parse(configStr);
document.getElementById('smtpHost').value = config.smtpHost || '';
document.getElementById('smtpPort').value = config.smtpPort || 587;
document.getElementById('smtpUser').value = config.smtpUser || '';
document.getElementById('smtpPass').value = config.smtpPass || '';
document.getElementById('recipients').value = config.recipients || '';
} catch (e) {
console.error('加载邮件配置失败:', e);
}
}
}
// 测试邮件配置
async function testEmailConfig() {
const config = {
smtpHost: document.getElementById('smtpHost').value,
smtpPort: parseInt(document.getElementById('smtpPort').value) || 587,
smtpUser: document.getElementById('smtpUser').value,
smtpPass: document.getElementById('smtpPass').value,
recipients: document.getElementById('recipients').value
};
// 验证配置
if (!config.smtpHost || !config.smtpUser || !config.smtpPass || !config.recipients) {
showEmailStatus('请填写所有必填项', 'error');
return;
}
// 创建测试报告
const testReport = {
summary: {
total_count: 1,
filtered_count: 1,
threshold: '50万元',
total_amount: '100.00万元',
generated_at: new Date().toISOString()
},
projects: [{
title: '这是一封测试邮件',
date: new Date().toLocaleDateString('zh-CN'),
publish_time: new Date().toLocaleString('zh-CN'),
budget: {
amount: 100,
unit: '万元',
text: '测试金额',
originalUnit: '万元'
},
url: 'https://gjzx.nanjing.gov.cn'
}]
};
showEmailStatus('正在发送测试邮件...', 'info');
try {
const response = await fetch(`${API_BASE}/send-email`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
emailConfig: config,
report: testReport
})
});
const data = await response.json();
if (data.success) {
showEmailStatus('测试邮件发送成功!请检查收件箱', 'success');
} else {
showEmailStatus(`发送失败: ${data.error}`, 'error');
}
} catch (error) {
showEmailStatus(`请求失败: ${error.message}`, 'error');
}
}
// 发送报告到邮箱
async function sendReportByEmail() {
if (!currentReport) {
alert('请先生成报告');
return;
}
// 从localStorage加载邮件配置
const configStr = localStorage.getItem('emailConfig');
if (!configStr) {
alert('请先在"邮件配置"标签页配置邮件服务器');
return;
}
let emailConfig;
try {
emailConfig = JSON.parse(configStr);
} catch (e) {
alert('邮件配置格式错误,请重新配置');
return;
}
// 验证配置
if (!emailConfig.smtpHost || !emailConfig.smtpUser || !emailConfig.smtpPass || !emailConfig.recipients) {
alert('邮件配置不完整,请在"邮件配置"标签页检查配置');
return;
}
const sendBtn = document.getElementById('sendEmailBtn');
const originalText = sendBtn.textContent;
sendBtn.disabled = true;
sendBtn.textContent = '正在发送...';
try {
const response = await fetch(`${API_BASE}/send-email`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
emailConfig: emailConfig,
report: currentReport
})
});
const data = await response.json();
if (data.success) {
alert('报告已成功发送到邮箱!');
} else {
alert(`发送失败: ${data.error}`);
}
} catch (error) {
alert(`请求失败: ${error.message}`);
} finally {
sendBtn.disabled = false;
sendBtn.textContent = originalText;
}
}
// 显示邮件配置状态
function showEmailStatus(message, type) {
const statusDiv = document.getElementById('emailConfigStatus');
const bgColors = {
success: '#d4edda',
error: '#f8d7da',
info: '#d1ecf1'
};
const textColors = {
success: '#155724',
error: '#721c24',
info: '#0c5460'
};
statusDiv.innerHTML = `
<div style="background: ${bgColors[type]}; color: ${textColors[type]}; padding: 15px; border-radius: 8px;">
${message}
</div>
`;
// 3秒后自动隐藏成功消息
if (type === 'success') {
setTimeout(() => {
statusDiv.innerHTML = '';
}, 3000);
}
}

View File

@@ -3,7 +3,7 @@
<head> <head>
<meta charset="UTF-8"> <meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>南京公共工程建设中心 - 公告抓取工具</title> <title>南京公共工程建设中心 - 公告采集工具</title>
<style> <style>
* { * {
margin: 0; margin: 0;
@@ -335,13 +335,14 @@
<div class="container"> <div class="container">
<div class="header"> <div class="header">
<h1>南京公共工程建设中心</h1> <h1>南京公共工程建设中心</h1>
<p>公告抓取与分析工具</p> <p>公告采集与分析工具</p>
</div> </div>
<div class="tabs"> <div class="tabs">
<button class="tab active" onclick="switchTab('list')">公告列表</button> <button class="tab active" onclick="switchTab('list')">公告列表</button>
<button class="tab" onclick="switchTab('detail')">详情抓取</button> <button class="tab" onclick="switchTab('detail')">详情采集</button>
<button class="tab" onclick="switchTab('report')">生成报告</button> <button class="tab" onclick="switchTab('report')">生成报告</button>
<button class="tab" onclick="switchTab('email')">邮件配置</button>
</div> </div>
<div class="content"> <div class="content">
@@ -359,7 +360,7 @@
<div id="listLoading" class="loading"> <div id="listLoading" class="loading">
<div class="spinner"></div> <div class="spinner"></div>
<p>正在抓取...</p> <p>正在采集...</p>
</div> </div>
<div id="listResults" class="results"></div> <div id="listResults" class="results"></div>
@@ -372,12 +373,12 @@
</div> </div>
</div> </div>
<!-- 详情抓取 --> <!-- 详情采集 -->
<div id="detail" class="tab-content"> <div id="detail" class="tab-content">
<div class="form-group"> <div class="form-group">
<div class="checkbox-wrapper" onclick="document.getElementById('useDetailDateRange').click();"> <div class="checkbox-wrapper" onclick="document.getElementById('useDetailDateRange').click();">
<input type="checkbox" id="useDetailDateRange" onchange="toggleDetailDateRange()" onclick="event.stopPropagation();"> <input type="checkbox" id="useDetailDateRange" onchange="toggleDetailDateRange()" onclick="event.stopPropagation();">
<label for="useDetailDateRange">按时间范围抓取</label> <label for="useDetailDateRange">按时间范围采集</label>
</div> </div>
</div> </div>
@@ -391,7 +392,7 @@
<input type="date" id="detailEndDate"> <input type="date" id="detailEndDate">
</div> </div>
<div class="form-group"> <div class="form-group">
<label>最大抓取页数</label> <label>最大采集页数</label>
<input type="number" id="detailMaxPages" value="1" min="1"> <input type="number" id="detailMaxPages" value="1" min="1">
</div> </div>
</div> </div>
@@ -402,16 +403,16 @@
<input type="text" id="detailUrl" placeholder="默认: https://gjzx.nanjing.gov.cn/gggs/"> <input type="text" id="detailUrl" placeholder="默认: https://gjzx.nanjing.gov.cn/gggs/">
</div> </div>
<div class="form-group"> <div class="form-group">
<label>抓取数量</label> <label>采集数量</label>
<input type="number" id="detailLimit" value="5" min="1" max="50"> <input type="number" id="detailLimit" value="5" min="1" max="50">
</div> </div>
</div> </div>
<button class="btn" onclick="fetchDetails()">开始抓取</button> <button class="btn" onclick="fetchDetails()">开始采集</button>
<div id="detailLoading" class="loading"> <div id="detailLoading" class="loading">
<div class="spinner"></div> <div class="spinner"></div>
<p>正在抓取详情...</p> <p>正在采集详情...</p>
</div> </div>
<div id="detailResults" class="results"></div> <div id="detailResults" class="results"></div>
@@ -422,7 +423,7 @@
<div class="form-group"> <div class="form-group">
<div class="checkbox-wrapper" onclick="document.getElementById('useDateRange').click();"> <div class="checkbox-wrapper" onclick="document.getElementById('useDateRange').click();">
<input type="checkbox" id="useDateRange" onchange="toggleDateRange()" onclick="event.stopPropagation();"> <input type="checkbox" id="useDateRange" onchange="toggleDateRange()" onclick="event.stopPropagation();">
<label for="useDateRange">按时间范围抓取</label> <label for="useDateRange">按时间范围采集</label>
</div> </div>
</div> </div>
@@ -436,7 +437,7 @@
<input type="date" id="endDate"> <input type="date" id="endDate">
</div> </div>
<div class="form-group"> <div class="form-group">
<label>最大抓取页数</label> <label>最大采集页数</label>
<input type="number" id="maxPages" value="1" min="1" > <input type="number" id="maxPages" value="1" min="1" >
</div> </div>
</div> </div>
@@ -447,7 +448,7 @@
<input type="text" id="reportUrl" placeholder="默认: https://gjzx.nanjing.gov.cn/gggs/"> <input type="text" id="reportUrl" placeholder="默认: https://gjzx.nanjing.gov.cn/gggs/">
</div> </div>
<div class="form-group"> <div class="form-group">
<label>抓取数量</label> <label>采集数量</label>
<input type="number" id="reportLimit" value="15" min="1" max="50"> <input type="number" id="reportLimit" value="15" min="1" max="50">
</div> </div>
</div> </div>
@@ -459,6 +460,7 @@
<button class="btn" onclick="generateReport()">生成报告</button> <button class="btn" onclick="generateReport()">生成报告</button>
<button class="btn export-btn" onclick="exportReport()" id="exportBtn" style="display:none;">导出Word</button> <button class="btn export-btn" onclick="exportReport()" id="exportBtn" style="display:none;">导出Word</button>
<button class="btn" onclick="sendReportByEmail()" id="sendEmailBtn" style="display:none; background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);">发送邮件</button>
<div id="reportLoading" class="loading"> <div id="reportLoading" class="loading">
<div class="spinner"></div> <div class="spinner"></div>
@@ -467,6 +469,56 @@
<div id="reportResults" class="results"></div> <div id="reportResults" class="results"></div>
</div> </div>
<!-- 邮件配置 -->
<div id="email" class="tab-content">
<h2 style="margin-bottom: 20px; color: #667eea;">邮件配置</h2>
<p style="color: #666; margin-bottom: 20px;">配置SMTP邮件服务器信息,用于发送报告到指定邮箱</p>
<div class="form-group">
<label>SMTP服务器地址 *</label>
<input type="text" id="smtpHost" placeholder="例如: smtp.qq.com, smtp.163.com, smtp.gmail.com">
</div>
<div class="form-group">
<label>SMTP端口 *</label>
<input type="number" id="smtpPort" value="587" placeholder="通常为 587 (TLS) 或 465 (SSL)">
</div>
<div class="form-group">
<label>发件人邮箱 (SMTP用户名) *</label>
<input type="email" id="smtpUser" placeholder="your-email@example.com">
</div>
<div class="form-group">
<label>SMTP密码/授权码 *</label>
<input type="password" id="smtpPass" placeholder="邮箱密码或授权码">
</div>
<div class="form-group">
<label>收件人邮箱 (多个用逗号分隔) *</label>
<input type="text" id="recipients" placeholder="email1@example.com, email2@example.com">
</div>
<button class="btn" onclick="saveEmailConfig()">保存配置</button>
<button class="btn" onclick="testEmailConfig()" style="background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);">测试连接</button>
<div id="emailConfigStatus" style="margin-top: 20px;"></div>
<div style="margin-top: 30px; padding: 20px; background: #f0f8ff; border-radius: 8px; border-left: 4px solid #667eea;">
<h3 style="margin-top: 0; color: #667eea;">常用邮箱配置参考</h3>
<ul style="line-height: 1.8; color: #666;">
<li><strong>QQ邮箱:</strong> smtp.qq.com, 端口 587 或 465, 需要使用授权码</li>
<li><strong>163邮箱:</strong> smtp.163.com, 端口 465 或 25, 需要使用授权码</li>
<li><strong>Gmail:</strong> smtp.gmail.com, 端口 587 或 465, 需要开启"允许不够安全的应用"</li>
<li><strong>Outlook:</strong> smtp-mail.outlook.com, 端口 587</li>
<li><strong>企业邮箱:</strong> 请咨询您的IT管理员获取SMTP配置</li>
</ul>
<p style="margin: 10px 0 0 0; color: #999; font-size: 13px;">
提示: QQ和163邮箱需要在邮箱设置中开启SMTP服务并生成授权码,授权码不是邮箱密码。
</p>
</div>
</div>
</div> </div>
</div> </div>

213
src/emailService.js Normal file
View File

@@ -0,0 +1,213 @@
import nodemailer from 'nodemailer';
// 创建邮件发送服务
export async function sendReportEmail(emailConfig, report) {
try {
// 创建SMTP传输器
const transporter = nodemailer.createTransport({
host: emailConfig.smtpHost,
port: emailConfig.smtpPort || 587,
secure: emailConfig.smtpPort === 465, // true for 465, false for other ports
auth: {
user: emailConfig.smtpUser,
pass: emailConfig.smtpPass,
},
});
// 生成HTML格式的报告内容
const htmlContent = generateReportHtml(report);
// 发送邮件
const info = await transporter.sendMail({
from: `"公告采集系统" <${emailConfig.smtpUser}>`,
to: emailConfig.recipients,
subject: `采购公告分析报告 - ${new Date().toLocaleDateString('zh-CN')}`,
html: htmlContent,
});
return {
success: true,
messageId: info.messageId,
};
} catch (error) {
console.error('发送邮件失败:', error);
throw new Error(`邮件发送失败: ${error.message}`);
}
}
// 生成HTML格式的报告
function generateReportHtml(report) {
const { summary, projects } = report;
return `
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>采购公告分析报告</title>
<style>
body {
font-family: 'PingFang SC', 'Microsoft YaHei', Arial, sans-serif;
line-height: 1.6;
color: #333;
max-width: 900px;
margin: 0 auto;
padding: 20px;
background-color: #f5f5f5;
}
.container {
background: white;
border-radius: 8px;
padding: 30px;
box-shadow: 0 2px 10px rgba(0,0,0,0.1);
}
h1 {
color: #667eea;
border-bottom: 3px solid #667eea;
padding-bottom: 10px;
margin-bottom: 20px;
}
.summary {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
padding: 20px;
border-radius: 8px;
margin-bottom: 30px;
}
.summary h2 {
margin-top: 0;
margin-bottom: 15px;
font-size: 18px;
}
.stat-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
gap: 15px;
}
.stat {
background: rgba(255,255,255,0.15);
padding: 12px;
border-radius: 6px;
}
.stat-label {
font-size: 13px;
opacity: 0.9;
margin-bottom: 5px;
}
.stat-value {
font-size: 22px;
font-weight: bold;
}
.project-list {
margin-top: 20px;
}
.project-item {
background: #f9f9f9;
border-left: 4px solid #667eea;
padding: 15px;
margin-bottom: 15px;
border-radius: 4px;
}
.project-item h3 {
color: #333;
margin: 0 0 10px 0;
font-size: 16px;
}
.project-meta {
color: #666;
font-size: 14px;
margin: 5px 0;
}
.budget {
display: inline-block;
background: #667eea;
color: white;
padding: 4px 12px;
border-radius: 4px;
font-weight: bold;
margin-top: 8px;
font-size: 14px;
}
.project-link {
color: #667eea;
text-decoration: none;
font-size: 13px;
word-break: break-all;
}
.footer {
margin-top: 30px;
padding-top: 20px;
border-top: 1px solid #e0e0e0;
color: #999;
font-size: 12px;
text-align: center;
}
</style>
</head>
<body>
<div class="container">
<h1>南京公共工程建设中心 - 采购公告分析报告</h1>
<div class="summary">
<h2>报告摘要</h2>
<div class="stat-grid">
<div class="stat">
<div class="stat-label">总公告数量</div>
<div class="stat-value">${summary.total_count} 条</div>
</div>
<div class="stat">
<div class="stat-label">符合条件</div>
<div class="stat-value">${summary.filtered_count} 条</div>
</div>
<div class="stat">
<div class="stat-label">金额阈值</div>
<div class="stat-value">${summary.threshold}</div>
</div>
<div class="stat">
<div class="stat-label">总金额</div>
<div class="stat-value">${summary.total_amount}</div>
</div>
</div>
${summary.date_range ? `
<div style="margin-top: 15px; padding-top: 15px; border-top: 1px solid rgba(255,255,255,0.2);">
<div class="stat-label">时间范围</div>
<div style="font-size: 14px; margin-top: 5px;">
${summary.date_range.startDate || '不限'}${summary.date_range.endDate || '不限'}
</div>
</div>
` : ''}
</div>
<h2>项目详情</h2>
<div class="project-list">
${projects.length === 0 ? '<p style="color: #999; text-align: center; padding: 20px;">暂无符合条件的项目</p>' : ''}
${projects.map((project, index) => `
<div class="project-item">
<h3>${index + 1}. ${project.title}</h3>
<div class="project-meta">
<strong>发布日期:</strong> ${project.date}
${project.publish_time ? ` | <strong>发布时间:</strong> ${project.publish_time}` : ''}
</div>
${project.budget ? `
<div class="budget">
预算金额: ${project.budget.amount.toFixed(2)} ${project.budget.unit}
${project.budget.originalUnit !== project.budget.unit ? ` (原始: ${project.budget.originalUnit})` : ''}
</div>
` : ''}
<div style="margin-top: 10px;">
<a href="${project.url}" class="project-link" target="_blank">${project.url}</a>
</div>
</div>
`).join('')}
</div>
<div class="footer">
<p>报告生成时间: ${new Date(summary.generated_at).toLocaleString('zh-CN')}</p>
<p>本报告由公告采集系统自动生成</p>
</div>
</div>
</body>
</html>
`;
}

View File

@@ -3,6 +3,7 @@ import cors from 'cors';
import axios from 'axios'; import axios from 'axios';
import * as cheerio from 'cheerio'; import * as cheerio from 'cheerio';
import iconv from 'iconv-lite'; import iconv from 'iconv-lite';
import { sendReportEmail } from './emailService.js';
const app = express(); const app = express();
const PORT = 3000; const PORT = 3000;
@@ -33,24 +34,24 @@ function isDateInRange(dateStr, startDate, endDate) {
return true; return true;
} }
// 按时间范围抓取多页列表 // 按时间范围采集多页列表
async function fetchListByDateRange(startDate, endDate, maxPages = 23) { async function fetchListByDateRange(startDate, endDate, maxPages = 23) {
const allItems = []; const allItems = [];
let shouldContinue = true; let shouldContinue = true;
let pageIndex = 0; let pageIndex = 0;
console.log(`开始按时间范围抓取: ${startDate || '不限'}${endDate || '不限'}`); console.log(`开始按时间范围采集: ${startDate || '不限'}${endDate || '不限'}`);
while (shouldContinue && pageIndex < maxPages) { while (shouldContinue && pageIndex < maxPages) {
const pageUrl = getPageUrl(pageIndex); const pageUrl = getPageUrl(pageIndex);
console.log(`正在抓取${pageIndex + 1} 页: ${pageUrl}`); console.log(`正在采集${pageIndex + 1} 页: ${pageUrl}`);
try { try {
const html = await fetchHtml(pageUrl); const html = await fetchHtml(pageUrl);
const items = parseList(html); const items = parseList(html);
if (items.length === 0) { if (items.length === 0) {
console.log(`${pageIndex + 1} 页没有数据,停止抓取`); console.log(`${pageIndex + 1} 页没有数据,停止采集`);
break; break;
} }
@@ -70,7 +71,7 @@ async function fetchListByDateRange(startDate, endDate, maxPages = 23) {
} }
if (allItemsBeforeRange && startDate) { if (allItemsBeforeRange && startDate) {
console.log(`${pageIndex + 1} 页所有项目都早于起始日期,停止抓取`); console.log(`${pageIndex + 1} 页所有项目都早于起始日期,停止采集`);
shouldContinue = false; shouldContinue = false;
} }
@@ -82,12 +83,12 @@ async function fetchListByDateRange(startDate, endDate, maxPages = 23) {
await new Promise(resolve => setTimeout(resolve, 500)); await new Promise(resolve => setTimeout(resolve, 500));
} }
} catch (err) { } catch (err) {
console.error(`抓取${pageIndex + 1} 页失败: ${err.message}`); console.error(`采集${pageIndex + 1} 页失败: ${err.message}`);
break; break;
} }
} }
console.log(`总共抓取${pageIndex} 页,找到 ${allItems.length} 条符合条件的公告`); console.log(`总共采集${pageIndex} 页,找到 ${allItems.length} 条符合条件的公告`);
return allItems; return allItems;
} }
@@ -207,6 +208,10 @@ function parseDetail(html) {
} }
function extractBudget(content) { function extractBudget(content) {
// 预处理内容:去除数字之间的换行符和空白字符
// 这样可以匹配被换行符分隔的数字,例如 "1\n1\n0\n9\n0\n0" -> "110900"
let cleanedContent = content.replace(/(\d)\s*[\n\r]\s*(?=\d)/g, '$1');
// 直接定义金额匹配模式(从高优先级到低优先级) // 直接定义金额匹配模式(从高优先级到低优先级)
const patterns = [ const patterns = [
// 优先级1: 带货币符号的万元 // 优先级1: 带货币符号的万元
@@ -230,7 +235,7 @@ function extractBudget(content) {
// 遍历所有模式,找到优先级最高的匹配 // 遍历所有模式,找到优先级最高的匹配
for (const pattern of patterns) { for (const pattern of patterns) {
const match = content.match(pattern.regex); const match = cleanedContent.match(pattern.regex);
if (match && pattern.priority < bestPriority) { if (match && pattern.priority < bestPriority) {
// 清理数字中的逗号并转换 // 清理数字中的逗号并转换
const numberStr = match[1].replace(/[,]/g, ''); const numberStr = match[1].replace(/[,]/g, '');
@@ -329,21 +334,21 @@ app.post('/api/report', async (req, res) => {
const { limit = 15, threshold = 50, url } = req.body; const { limit = 15, threshold = 50, url } = req.body;
const targetUrl = url && url.trim() !== '' ? url : BASE_URL; const targetUrl = url && url.trim() !== '' ? url : BASE_URL;
// 按需抓取多页以获取足够的数据 // 按需采集多页以获取足够的数据
const items = []; const items = [];
let pageIndex = 0; let pageIndex = 0;
const maxPagesToFetch = Math.ceil(limit / 10) + 1; // 假设每页约10条多抓一页保险 const maxPagesToFetch = Math.ceil(limit / 10) + 1; // 假设每页约10条多抓一页保险
while (items.length < limit && pageIndex < maxPagesToFetch) { while (items.length < limit && pageIndex < maxPagesToFetch) {
const pageUrl = getPageUrl(pageIndex, targetUrl); const pageUrl = getPageUrl(pageIndex, targetUrl);
console.log(`正在抓取${pageIndex + 1} 页: ${pageUrl}`); console.log(`正在采集${pageIndex + 1} 页: ${pageUrl}`);
try { try {
const html = await fetchHtml(pageUrl); const html = await fetchHtml(pageUrl);
const pageItems = parseList(html); const pageItems = parseList(html);
if (pageItems.length === 0) { if (pageItems.length === 0) {
console.log(`${pageIndex + 1} 页没有数据,停止抓取`); console.log(`${pageIndex + 1} 页没有数据,停止采集`);
break; break;
} }
@@ -354,7 +359,7 @@ app.post('/api/report', async (req, res) => {
await new Promise(resolve => setTimeout(resolve, 500)); await new Promise(resolve => setTimeout(resolve, 500));
} }
} catch (err) { } catch (err) {
console.error(`抓取${pageIndex + 1} 页失败: ${err.message}`); console.error(`采集${pageIndex + 1} 页失败: ${err.message}`);
break; break;
} }
} }
@@ -417,7 +422,7 @@ app.post('/api/report-daterange', async (req, res) => {
try { try {
const { startDate, endDate, threshold = 50, maxPages = 23 } = req.body; const { startDate, endDate, threshold = 50, maxPages = 23 } = req.body;
// 按时间范围抓取列表 // 按时间范围采集列表
const items = await fetchListByDateRange(startDate, endDate, maxPages); const items = await fetchListByDateRange(startDate, endDate, maxPages);
if (items.length === 0) { if (items.length === 0) {
@@ -437,7 +442,7 @@ app.post('/api/report-daterange', async (req, res) => {
}); });
} }
// 抓取详情 // 采集详情
const results = []; const results = [];
for (const item of items) { for (const item of items) {
try { try {
@@ -491,6 +496,50 @@ app.post('/api/report-daterange', async (req, res) => {
} }
}); });
// 发送报告邮件
app.post('/api/send-email', async (req, res) => {
try {
const { emailConfig, report } = req.body;
// 验证必需的配置参数
if (!emailConfig || !emailConfig.smtpHost || !emailConfig.smtpUser || !emailConfig.smtpPass) {
return res.status(400).json({
success: false,
error: '邮件配置不完整,请填写SMTP服务器、用户名和密码',
});
}
if (!emailConfig.recipients || emailConfig.recipients.trim() === '') {
return res.status(400).json({
success: false,
error: '请至少指定一个收件人',
});
}
if (!report) {
return res.status(400).json({
success: false,
error: '没有可发送的报告数据',
});
}
// 发送邮件
const result = await sendReportEmail(emailConfig, report);
res.json({
success: true,
message: '邮件发送成功',
messageId: result.messageId,
});
} catch (error) {
console.error('发送邮件API错误:', error);
res.status(500).json({
success: false,
error: error.message,
});
}
});
app.listen(PORT, () => { app.listen(PORT, () => {
console.log(`Server running at http://localhost:${PORT}`); console.log(`Server running at http://localhost:${PORT}`);
}); });