微信公众号文章简单解析

32F  2018/12/19 php



这里展示简单的微信公众号解析步骤:

获取html
id=js_content里为正文内容
提取图片转换为本地图片(有img,也有background-img,处理两次即可) ;这里省略的本地上传图片代码,自己处理
封面图和描述(转发朋友圈需用) 分别是msg_cdn_urlmsg_desc

贴下代码
            //$link 是一篇公众号文章链接
            $html = file_get_contents($link);

            //标题
            preg_match_all("/id=\"activity-name\">(.*)<\/h2/iUs",$html,$title,PREG_PATTERN_ORDER);
            $title = trim($title[1][0]);

            //内容
            preg_match_all("/id=\"js_content\">(.*)<script/iUs",$html,$content,PREG_PATTERN_ORDER);
            $content = "<p id='js_content'>".$content[1][0];
            $content = str_replace("preview.html","player.html",$content);
            $content = str_replace("  ","",$content);

            //处理所有图片
            //1.这一步主要处理section的img
            preg_match_all("/background-image: url[\(][\"](.*)[\"][\)]/iUs",$content,$url,PREG_PATTERN_ORDER);

            $url = $url[1];
            if($url){
                foreach ($url as $u){
                    $oldU = $u;
                    preg_match_all("/wx_fmt=(.*)$/iUs",$oldU,$oldUe,PREG_PATTERN_ORDER);
                    $newUe = $oldUe[1][0];
                    if($oldU && $newUe){
                        $newU = $this->upload($oldU,$newUe);
                        $content = str_replace($oldU,$newU,$content);
                    }
                }
            }
            //大部分的img
            preg_match_all("/<img (.*)\>/iUs",$content,$images,PREG_PATTERN_ORDER);
            $images = $images[1];
            if($images){
                foreach ($images as $i){
                    preg_match_all("/data-src=[\"](.*)[\"]/iUs",$i,$src,PREG_PATTERN_ORDER);

                    $oldSrc = $src[1][0];
                    preg_match_all("/^wx_fmt=(.*)$/iUs",$oldSrc,$oldExt,PREG_PATTERN_ORDER);
                    $newExt = $oldExt[1][0];

                    $newSrc = $this->upload($oldSrc,$newExt);
                    $oldImg = "<img ".$i.">";
                    $newImg = "<img src='{$newSrc}' />";

                    $content = str_replace($oldImg,$newImg,$content);
                }
            }

            //封面图
            preg_match_all("/var msg_cdn_url = [\"](.*)[\"];/iUs",$html,$image,PREG_PATTERN_ORDER);
            $image = (string)$image[1][0];
            preg_match_all("/wx_fmt=(.*)$/iUs",$image,$ext,PREG_PATTERN_ORDER);
            $ext = $ext[1][0];
            $url = $this->upload($image,$ext);
            //描述
            preg_match_all("/var msg_desc = [\"](.*)[\"];/iUs",$html,$desc,PREG_PATTERN_ORDER);
            $desc = $desc[1][0];


            if($html){
                $gzh = [
                    'type'=>1,
                    'title'=>$title,
                    'openid'=>$_W['openid'],
                    'html'=>$content,
                    'desc'=>$desc,
                    'image'=>$url
                ];

                pdo_insert('article',$gzh);
            }

添加评论
全部回复
暂无回复