需求分析
产品经理召集开会,文章审核功能已经交付了,文章也能正常发布审核。对于上次提出的自管理敏感词也很满意,这次会议核心的内容如下:
- 文章中包含的图片要识别文字,过滤掉图片文字的敏感词
8.2)图片文字识别
什么是OCR?
OCR (Optical Character Recognition,光学字符识别)是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程
| 方案 | 说明 |
|---|---|
| 百度OCR | 收费 |
| Tesseract-OCR | Google维护的开源OCR引擎,支持Java,Python等语言调用 |
| Tess4J | 封装了Tesseract-OCR ,支持Java调用 |
8.3)Tess4j案例
①:创建项目导入tess4j对应的依赖
<dependency><groupId>net.sourceforge.tess4j</groupId><artifactId>tess4j</artifactId><version>4.1.1</version></dependency>
②:导入中文字体库, 把资料中的tessdata文件夹拷贝到自己的工作空间下

③:编写测试类进行测试
package com.heima.tess4j;import net.sourceforge.tess4j.ITesseract;import net.sourceforge.tess4j.Tesseract;import java.io.File;public class Application {public static void main(String[] args) {try {//获取本地图片File file = new File("D:\\26.png");//创建Tesseract对象ITesseract tesseract = new Tesseract();//设置字体库路径tesseract.setDatapath("D:\\workspace\\tessdata");//中文识别tesseract.setLanguage("chi_sim");//执行ocr识别String result = tesseract.doOCR(file);//替换回车和tal键 使结果为一行result = result.replaceAll("\\r|\\n","-").replaceAll(" ","");System.out.println("识别的结果为:"+result);} catch (Exception e) {e.printStackTrace();}}}
8.4)管理敏感词和图片文字识别集成到文章审核
①:在heima-leadnews-common中创建工具类,简单封装一下tess4j
需要先导入pom
<dependency><groupId>net.sourceforge.tess4j</groupId><artifactId>tess4j</artifactId><version>4.1.1</version></dependency>
工具类
package com.heima.common.tess4j;import lombok.Getter;import lombok.Setter;import net.sourceforge.tess4j.ITesseract;import net.sourceforge.tess4j.Tesseract;import net.sourceforge.tess4j.TesseractException;import org.springframework.boot.context.properties.ConfigurationProperties;import org.springframework.stereotype.Component;import java.awt.image.BufferedImage;@Getter@Setter@Component@ConfigurationProperties(prefix = "tess4j")public class Tess4jClient {private String dataPath;private String language;public String doOCR(BufferedImage image) throws TesseractException {//创建Tesseract对象ITesseract tesseract = new Tesseract();//设置字体库路径tesseract.setDatapath(dataPath);//中文识别tesseract.setLanguage(language);//执行ocr识别String result = tesseract.doOCR(image);//替换回车和tal键 使结果为一行result = result.replaceAll("\\r|\\n", "-").replaceAll(" ", "");return result;}}
在spring.factories配置中添加该类,完整如下:
org.springframework.boot.autoconfigure.EnableAutoConfiguration=\com.heima.common.exception.ExceptionCatch,\com.heima.common.swagger.SwaggerConfiguration,\com.heima.common.swagger.Swagger2Configuration,\com.heima.common.aliyun.GreenTextScan,\com.heima.common.aliyun.GreenImageScan,\com.heima.common.tess4j.Tess4jClient
②:在heima-leadnews-wemedia中的配置中添加两个属性
tess4j:data-path: D:\workspace\tessdatalanguage: chi_sim
③:在WmNewsAutoScanServiceImpl中的handleImageScan方法上添加如下代码
try {for (String image : images) {byte[] bytes = fileStorageService.downLoadFile(image);//图片识别文字审核---begin-----//从byte[]转换为butteredImageByteArrayInputStream in = new ByteArrayInputStream(bytes);BufferedImage imageFile = ImageIO.read(in);//识别图片的文字String result = tess4jClient.doOCR(imageFile);//审核是否包含自管理的敏感词boolean isSensitive = handleSensitiveScan(result, wmNews);if(!isSensitive){return isSensitive;}//图片识别文字审核---end-----imageList.add(bytes);}}catch (Exception e){e.printStackTrace();}
最后附上文章审核的完整代码如下:
package com.heima.wemedia.service.impl;import com.alibaba.fastjson.JSONArray;import com.baomidou.mybatisplus.core.toolkit.Wrappers;import com.heima.apis.article.IArticleClient;import com.heima.common.aliyun.GreenImageScan;import com.heima.common.aliyun.GreenTextScan;import com.heima.common.tess4j.Tess4jClient;import com.heima.file.service.FileStorageService;import com.heima.model.article.dtos.ArticleDto;import com.heima.model.common.dtos.ResponseResult;import com.heima.model.wemedia.pojos.WmChannel;import com.heima.model.wemedia.pojos.WmNews;import com.heima.model.wemedia.pojos.WmSensitive;import com.heima.model.wemedia.pojos.WmUser;import com.heima.utils.common.SensitiveWordUtil;import com.heima.wemedia.mapper.WmChannelMapper;import com.heima.wemedia.mapper.WmNewsMapper;import com.heima.wemedia.mapper.WmSensitiveMapper;import com.heima.wemedia.mapper.WmUserMapper;import com.heima.wemedia.service.WmNewsAutoScanService;import lombok.extern.slf4j.Slf4j;import org.apache.commons.lang3.StringUtils;import org.springframework.beans.BeanUtils;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.scheduling.annotation.Async;import org.springframework.stereotype.Service;import org.springframework.transaction.annotation.Transactional;import javax.imageio.ImageIO;import java.awt.image.BufferedImage;import java.io.ByteArrayInputStream;import java.util.*;import java.util.stream.Collectors;@Service@Slf4j@Transactionalpublic class WmNewsAutoScanServiceImpl implements WmNewsAutoScanService {@Autowiredprivate WmNewsMapper wmNewsMapper;/*** 自媒体文章审核** @param id 自媒体文章id*/@Override@Async //标明当前方法是一个异步方法public void autoScanWmNews(Integer id) {// int a = 1/0;//1.查询自媒体文章WmNews wmNews = wmNewsMapper.selectById(id);if (wmNews == null) {throw new RuntimeException("WmNewsAutoScanServiceImpl-文章不存在");}if (wmNews.getStatus().equals(WmNews.Status.SUBMIT.getCode())) {//从内容中提取纯文本内容和图片Map<String, Object> textAndImages = handleTextAndImages(wmNews);//自管理的敏感词过滤boolean isSensitive = handleSensitiveScan((String) textAndImages.get("content"), wmNews);if(!isSensitive) return;//2.审核文本内容 阿里云接口boolean isTextScan = handleTextScan((String) textAndImages.get("content"), wmNews);if (!isTextScan) return;//3.审核图片 阿里云接口boolean isImageScan = handleImageScan((List<String>) textAndImages.get("images"), wmNews);if (!isImageScan) return;//4.审核成功,保存app端的相关的文章数据ResponseResult responseResult = saveAppArticle(wmNews);if (!responseResult.getCode().equals(200)) {throw new RuntimeException("WmNewsAutoScanServiceImpl-文章审核,保存app端相关文章数据失败");}//回填article_idwmNews.setArticleId((Long) responseResult.getData());updateWmNews(wmNews, (short) 9, "审核成功");}}@Autowiredprivate WmSensitiveMapper wmSensitiveMapper;/*** 自管理的敏感词审核* @param content* @param wmNews* @return*/private boolean handleSensitiveScan(String content, WmNews wmNews) {boolean flag = true;//获取所有的敏感词List<WmSensitive> wmSensitives = wmSensitiveMapper.selectList(Wrappers.<WmSensitive>lambdaQuery().select(WmSensitive::getSensitives));List<String> sensitiveList = wmSensitives.stream().map(WmSensitive::getSensitives).collect(Collectors.toList());//初始化敏感词库SensitiveWordUtil.initMap(sensitiveList);//查看文章中是否包含敏感词Map<String, Integer> map = SensitiveWordUtil.matchWords(content);if(map.size() >0){updateWmNews(wmNews,(short) 2,"当前文章中存在违规内容"+map);flag = false;}return flag;}@Autowiredprivate IArticleClient articleClient;@Autowiredprivate WmChannelMapper wmChannelMapper;@Autowiredprivate WmUserMapper wmUserMapper;/*** 保存app端相关的文章数据** @param wmNews*/private ResponseResult saveAppArticle(WmNews wmNews) {ArticleDto dto = new ArticleDto();//属性的拷贝BeanUtils.copyProperties(wmNews, dto);//文章的布局dto.setLayout(wmNews.getType());//频道WmChannel wmChannel = wmChannelMapper.selectById(wmNews.getChannelId());if (wmChannel != null) {dto.setChannelName(wmChannel.getName());}//作者dto.setAuthorId(wmNews.getUserId().longValue());WmUser wmUser = wmUserMapper.selectById(wmNews.getUserId());if (wmUser != null) {dto.setAuthorName(wmUser.getName());}//设置文章idif (wmNews.getArticleId() != null) {dto.setId(wmNews.getArticleId());}dto.setCreatedTime(new Date());ResponseResult responseResult = articleClient.saveArticle(dto);return responseResult;}@Autowiredprivate FileStorageService fileStorageService;@Autowiredprivate GreenImageScan greenImageScan;@Autowiredprivate Tess4jClient tess4jClient;/*** 审核图片** @param images* @param wmNews* @return*/private boolean handleImageScan(List<String> images, WmNews wmNews) {boolean flag = true;if (images == null || images.size() == 0) {return flag;}//下载图片 minIO//图片去重images = images.stream().distinct().collect(Collectors.toList());List<byte[]> imageList = new ArrayList<>();try {for (String image : images) {byte[] bytes = fileStorageService.downLoadFile(image);//图片识别文字审核---begin-----//从byte[]转换为butteredImageByteArrayInputStream in = new ByteArrayInputStream(bytes);BufferedImage imageFile = ImageIO.read(in);//识别图片的文字String result = tess4jClient.doOCR(imageFile);//审核是否包含自管理的敏感词boolean isSensitive = handleSensitiveScan(result, wmNews);if(!isSensitive){return isSensitive;}//图片识别文字审核---end-----imageList.add(bytes);}}catch (Exception e){e.printStackTrace();}//审核图片try {Map map = greenImageScan.imageScan(imageList);if (map != null) {//审核失败if (map.get("suggestion").equals("block")) {flag = false;updateWmNews(wmNews, (short) 2, "当前文章中存在违规内容");}//不确定信息 需要人工审核if (map.get("suggestion").equals("review")) {flag = false;updateWmNews(wmNews, (short) 3, "当前文章中存在不确定内容");}}} catch (Exception e) {flag = false;e.printStackTrace();}return flag;}@Autowiredprivate GreenTextScan greenTextScan;/*** 审核纯文本内容** @param content* @param wmNews* @return*/private boolean handleTextScan(String content, WmNews wmNews) {boolean flag = true;if ((wmNews.getTitle() + "-" + content).length() == 0) {return flag;}try {Map map = greenTextScan.greeTextScan((wmNews.getTitle() + "-" + content));if (map != null) {//审核失败if (map.get("suggestion").equals("block")) {flag = false;updateWmNews(wmNews, (short) 2, "当前文章中存在违规内容");}//不确定信息 需要人工审核if (map.get("suggestion").equals("review")) {flag = false;updateWmNews(wmNews, (short) 3, "当前文章中存在不确定内容");}}} catch (Exception e) {flag = false;e.printStackTrace();}return flag;}/*** 修改文章内容** @param wmNews* @param status* @param reason*/private void updateWmNews(WmNews wmNews, short status, String reason) {wmNews.setStatus(status);wmNews.setReason(reason);wmNewsMapper.updateById(wmNews);}/*** 1。从自媒体文章的内容中提取文本和图片* 2.提取文章的封面图片** @param wmNews* @return*/private Map<String, Object> handleTextAndImages(WmNews wmNews) {//存储纯文本内容StringBuilder stringBuilder = new StringBuilder();List<String> images = new ArrayList<>();//1。从自媒体文章的内容中提取文本和图片if (StringUtils.isNotBlank(wmNews.getContent())) {List<Map> maps = JSONArray.parseArray(wmNews.getContent(), Map.class);for (Map map : maps) {if (map.get("type").equals("text")) {stringBuilder.append(map.get("value"));}if (map.get("type").equals("image")) {images.add((String) map.get("value"));}}}//2.提取文章的封面图片if (StringUtils.isNotBlank(wmNews.getImages())) {String[] split = wmNews.getImages().split(",");images.addAll(Arrays.asList(split));}Map<String, Object> resultMap = new HashMap<>();resultMap.put("content", stringBuilder.toString());resultMap.put("images", images);return resultMap;}}
9)文章详情-静态文件生成
9.1)思路分析
文章端创建app相关文章时,生成文章详情静态页上传到MinIO中

9.2)实现步骤
1.新建ArticleFreemarkerService创建静态文件并上传到minIO中
package com.heima.article.service;import com.heima.model.article.pojos.ApArticle;public interface ArticleFreemarkerService {/*** 生成静态文件上传到minIO中* @param apArticle* @param content*/public void buildArticleToMinIO(ApArticle apArticle,String content);}
实现
package com.heima.article.service.impl;import com.alibaba.fastjson.JSON;import com.alibaba.fastjson.JSONArray;import com.baomidou.mybatisplus.core.toolkit.Wrappers;import com.heima.article.mapper.ApArticleContentMapper;import com.heima.article.service.ApArticleService;import com.heima.article.service.ArticleFreemarkerService;import com.heima.file.service.FileStorageService;import com.heima.model.article.pojos.ApArticle;import freemarker.template.Configuration;import freemarker.template.Template;import lombok.extern.slf4j.Slf4j;import org.apache.commons.lang3.StringUtils;import org.springframework.beans.BeanUtils;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.scheduling.annotation.Async;import org.springframework.stereotype.Service;import org.springframework.transaction.annotation.Transactional;import java.io.ByteArrayInputStream;import java.io.InputStream;import java.io.StringWriter;import java.util.HashMap;import java.util.Map;@Service@Slf4j@Transactionalpublic class ArticleFreemarkerServiceImpl implements ArticleFreemarkerService {@Autowiredprivate ApArticleContentMapper apArticleContentMapper;@Autowiredprivate Configuration configuration;@Autowiredprivate FileStorageService fileStorageService;@Autowiredprivate ApArticleService apArticleService;/*** 生成静态文件上传到minIO中* @param apArticle* @param content*/@Async@Overridepublic void buildArticleToMinIO(ApArticle apArticle, String content) {//已知文章的id//4.1 获取文章内容if(StringUtils.isNotBlank(content)){//4.2 文章内容通过freemarker生成html文件Template template = null;StringWriter out = new StringWriter();try {template = configuration.getTemplate("article.ftl");//数据模型Map<String,Object> contentDataModel = new HashMap<>();contentDataModel.put("content", JSONArray.parseArray(content));//合成template.process(contentDataModel,out);} catch (Exception e) {e.printStackTrace();}//4.3 把html文件上传到minio中InputStream in = new ByteArrayInputStream(out.toString().getBytes());String path = fileStorageService.uploadHtmlFile("", apArticle.getId() + ".html", in);//4.4 修改ap_article表,保存static_url字段apArticleService.update(Wrappers.<ApArticle>lambdaUpdate().eq(ApArticle::getId,apArticle.getId()).set(ApArticle::getStaticUrl,path));}}}
2.在ApArticleService的saveArticle实现方法中添加调用生成文件的方法
/*** 保存app端相关文章* @param dto* @return*/@Overridepublic ResponseResult saveArticle(ArticleDto dto) {// try {// Thread.sleep(3000);// } catch (InterruptedException e) {// e.printStackTrace();// }//1.检查参数if(dto == null){return ResponseResult.errorResult(AppHttpCodeEnum.PARAM_INVALID);}ApArticle apArticle = new ApArticle();BeanUtils.copyProperties(dto,apArticle);//2.判断是否存在idif(dto.getId() == null){//2.1 不存在id 保存 文章 文章配置 文章内容//保存文章save(apArticle);//保存配置ApArticleConfig apArticleConfig = new ApArticleConfig(apArticle.getId());apArticleConfigMapper.insert(apArticleConfig);//保存 文章内容ApArticleContent apArticleContent = new ApArticleContent();apArticleContent.setArticleId(apArticle.getId());apArticleContent.setContent(dto.getContent());apArticleContentMapper.insert(apArticleContent);}else {//2.2 存在id 修改 文章 文章内容//修改 文章updateById(apArticle);//修改文章内容ApArticleContent apArticleContent = apArticleContentMapper.selectOne(Wrappers.<ApArticleContent>lambdaQuery().eq(ApArticleContent::getArticleId, dto.getId()));apArticleContent.setContent(dto.getContent());apArticleContentMapper.updateById(apArticleContent);}//异步调用 生成静态文件上传到minio中articleFreemarkerService.buildArticleToMinIO(apArticle,dto.getContent());//3.结果返回 文章的idreturn ResponseResult.okResult(apArticle.getId());}
3.文章微服务开启异步调用

