Home Made WordPress OpenGraph hacking

Nowadays, with the HalfLife of Facebook and Google+ taking over our reality, the need for Open Graph tags in our blog posts are getting a necessity. Several WordPress plugins try to help the user in automatically populating several of these tags inferred from the content of the blog posted. I tried several of them, mostly SEO by Yoast and wpSSO, but also some others. All of them suck. They suck out of the simple reason that they try to work for each and everyone, and thus just work barely. But there is a simpler way – just add the Open Graph tags yourself, with a bit of code, a bit of help from other blog posts, I have finally put together a simpler way to have OG tags in my posts, without using those bloated monsters of SEO optimization stuff.

OpenGraph-homemade

Things that simply don’t worked out in the case of my blog are:

  • Excerpt generation – broken in most SEO plugins with respect to Japanese text, and shortcode handling
  • Image extraction – broken due to the usage of PiwigoPress shortcodes

These were the things I wanted to change, and digging through the internet I found this useful blog. I didn’t do all of the steps there, and changed quite some things, but the basic idea was taken from there.

Since I am already using a child theme, it was an easy thing to do. Here is an outline of the changes:

  • change header.php to add the various og:... tags
  • add some functions to extract the excerpt and images to functions.php

The first part is easy, namely adding stuff to the header. I copied the original misty-lake header.php to my child theme, and added after the wp_head() call the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<?php if (have_posts()):while(have_posts()):the_post(); endwhile; endif;?>  
<!-- if page is content page -->  
<?php if (is_single()) { ?>  
<meta property="og:url" content="<?php 
  $permalink = get_permalink() ; echo str_ireplace('https://','http://', $permalink) ?>" />  
<meta property="og:title" content="<?php single_post_title(''); ?>" />  
<meta property="og:description" content="<?php echo htmlspecialchars(strip_tags(get_the_excerpt($post->ID))); ?>" />  
<meta property="og:type" content="article" />  
<?php if (function_exists('catch_all_images')) {
  $imgs = catch_all_images();
  foreach ($imgs as $gotone) {
    echo '<meta property="og:image" content="', $gotone, '" />' ;
  }
} ?>
 
<!-- if page is others -->  
<?php } else { ?>  
<meta property="og:url" content="http://www.preining.info/blog/" />  
<meta property="og:title" content="<?php bloginfo('name'); ?>" />  
<meta property="og:site_name" content="<?php bloginfo('name'); ?>" />  
<meta property="og:description" content="<?php bloginfo('description'); ?>" />  
<meta property="og:type" content="website" />  
<meta property="og:image" content="http://www.preining.info/front-page.png" />
<meta property="og:image:secure_url" content="https://www.preining.info/front-page.png" />
<?php } ?>

Let’s go through it line by line: Line 1 is some magic I have no idea about. Seems to be necessary for some WordPress internal, maybe the mysterious “The Loop” – don’t ask me details.

Lines 3-14 concern single post views, while lines 18-24 all other pages, that is posts, archives, front page etc.

For the single posts we add the URL in lines 4-5. Here I trick around https/http duality (I am only editing over https) and replace the https part in the url with http.

Line 6 adds the og:title tag by using the WordPress function single_post_title.

Line 7 adds the excerpt as the og:description. Here we strip tags and escape HTML characters since it is expected to be plain text. Note that further down here, we will use some filters on get_the_excerpt defined in functions.php.

Line 8 adds the og:type tag.

Line 9-13 scraps all available images from the post using the self-written function catch_all_images (see below) and adds all of them as og:image tags.

If the page is not a single post view, then we are adding various og:... tags in a straight-forward way in lines 18-24.

Now for the two changes mentioned above, namely filters for the excerpt and the scraping of images from the post. These are added in functions.php, here the relevant part, first for the excerpt handling. This is based on another blog post I found. The problem with the normal excerpt handling is that the part between shortcodes [foo]text[/foo] is removed completely. Since I use the page-to-page shortcode plugin to link between my posts, this is not optimal, as the text between the p2p shortcodes is part of the sentences. So I want to keep that. On the other hand, I want shortcodes of PiwigoPress (included images) completely stripped.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// to deal with short codes in the excerpt
// https://wordpress.org/support/topic/stripping-shortcodes-keeping-the-content
function custom_excerpt($text = '') {
  $raw_excerpt = $text;
  if ( '' == $text ) {
    $text = get_the_content('');
    //$text = apply_filters('the_content', $text);
    // strip shortcodes but leave content for all but PiwigoPress
    $exclude_codes = 'PiwigoPress';
    $text = preg_replace("~(?:\[/?)(?!(?:$exclude_codes))[^/\]]+/?\]~s", '', $text);
    // throw away remaining PiwigoPress short codes
    $text = strip_shortcodes( $text );      
    $excerpt_length = apply_filters('excerpt_length', 55);
    $excerpt_more = apply_filters('excerpt_more', ' ' . '[...]');
    $text = wp_trim_words( $text, $excerpt_length, $excerpt_more );
  }
  return apply_filters('wp_trim_excerpt', $text, $raw_excerpt);
}
remove_filter( 'get_the_excerpt', 'wp_trim_excerpt'  );
add_filter( 'get_the_excerpt', 'custom_excerpt'  );

Here lines 9-10 removes treats all shortcodes but PiwigoPress, and removes the shortcode tags only, but keep the text between. Line 12 then strips the remaining tags, that is only the PiwigoPress, completely.

In line 13-15 we shorten the text to 55 chars and add the more tag, and finally in line 17 apply the usual wp_trim_excerpt filter, which strips of more html code and reduces everything to plain text.

The important part is now that normally, adding the above function as filter, only works after WordPress has already generated the excerpt with wp_trim_excerpt, which would not help. We thus remove that filter, and add our own, in lines 19-20.

Finally for the image scrapping, this function is again taken from the first blog post mentioned, but adapted for my own needs. What it does is pulling out all img tags and putting the respective urls into an array. That of course does not work for PiwigoPress shortcodes, as there is no img tag. I have added, but commented, code that expands shortcodes in the post, and then does the image scrapping. The good things with this approach is that, if there are only PiwigoPress images, as it is often in my blog posts, then no og:image tag is added at all, and Facebook and Google+ just do their own scrapping, which often works better and does not have the shortcode problem. Maybe I drop the og:image property completely at some point. Here now the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function catch_all_images() {  
  global $post, $posts;  
  $first_img = '';  
  ob_start();  
  ob_end_clean();  
  $output = preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $post->post_content, $matches);  
  // if I want to get piwigo images also extracted, I need to 
  // apply the filters for the_content here!
  //$output = preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', 
  //  apply_filters( 'the_content', $post->post_content), $matches);  
  foreach ($matches[1] as $img) {
    if (substr($img,0,2)=='//') { $img = "http:$img"; }
    if (substr($img,0,6)=='https:') {
      $img = substr_replace($img, 'http:', 0, 6);
    }
    $imgarray[] = $img;
  }
  return $imgarray;
}

Again, don’t ask me about lines 2-5, seem to be necessary. Line 6 does the whole work of pulling out the img urls. Lines 7-10 would do the same, but expand shortcodes first. As speed is always an issue, I don’t do this for now.

Line 12 is another trick I need to do: I am using links of the form //www.preining.info/... instead of specifying the protocol (http or https). The reason is that if I specify always http for images, current browsers complain when the page is loaded over https, that some content is unsafe. And when I always load over https, then caching is disabled, which is also not good for clients. (BTW, why is that not the default in WordPress? Really stupid). So here I add the http: prefix if the url starts with literal //.

Finally, as I am serving everything both via http and https, I replace https links with http. This of course would be wrong if I include images from a different server that only serves via https, but since it is my page, I know what I am doing 😉

That concludes the whole code. Easy to adapt, easy to change, nothing fancy. And in total a few lines of code, not a bloated super-fat plugin. And the header now looks like shown in the top image. Simple!

Enjoy, and let me know if you have any improvements or explanations!

Email this to someonePrint this pageShare on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInFlattr the author

4 Responses

  1. Frans says:

    That sounds potentially somewhat interesting, but for the most part also quite redundant. I already have rel=canonical, rel=shortlink as standard WP features, and meta name=description as something I added myself because it’s nice for bookmarks and other stuff. Anyway, you could add these weird OG tags to more places by adapting this (see here):

    $desc;
    if ( is_single() ) {
    $desc = get_the_excerpt();
    }
    elseif ( is_page() ) {
    $desc = get_the_excerpt();
    }
    elseif ( is_category() ) {
    $desc = category_description();
    }
    elseif ( is_home() ) {
    $desc = get_bloginfo('description');
    }
    $desc = htmlspecialchars(trim(strip_tags($desc)));

    • Hi Frans,
      thanks, interesting. I actually have no idea about what the whole meta stuff is, so I never bothered adding it… who is evaluating it? I guess google and some other bots, right.

      Thanks for the code, I will adapt my stuff to add also meta information.

      Norbert

      • Frans says:

        Search engines (and also the likes of Facebook) may or may not use the meta information, but I primarily like it for bookmarking stuff. It prefills the “description” field in bookmarks so you don’t have to, which is quite useful if the description is even half decent. It shows up on page info, too. There’s also meta name=title, btw. Another potentially useful field is revised, which for some reason seems to be missing from OG.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>