AWS CloudFront duplicate content issue Solution

AWS Cloudfront is like proxy service, e.g. A request to Cloudfront will be served with content hosted on origin server. You chose your origin server, in the process Cloudfront will cache any files it has already served and will return the cached version for future requests.

AWS Cloudfront process:

Request -> CloudFront -> Origin Server

Origin Server -> CloudFront -> Response

 

Upon searching on internet other solutions suggest to use robot.txt, but the issue with robot.txt is that you have to make changes to your site plus it will block access to CSS and JS files too. As now Google bot are like modern browsers they need access to CSS files (to detect if your site is responsive/mobile friendly).

This solution assumes that you want to only serve static files like CSS, JS and image files.

You can use AWS Cloudfront service to cache a complete domain of your choosing, but this would create duplicate for all the content served by origin server.

So for this example we suppose that you have domain static.example.com and you are serving all its content from XXXX.cloudfront.net, this is not good SEO as it would cause duplicate content issue with search engines.

To get around the issue: once you have created your CloudFront distribution, go to “Origins” tab and add a new origin to domain say “non-existent.example.com

Once you have added a new origin to a domain that doesn’t exists any request to domain won’t get served by CloudFront.

Now go to “Behaviors” tab edit the default behavior and set origin to non existent, after this all requests to XXXX.cloudfront.net should give error(given that it is still not cached by edge locations).

Now create a new behavior with “Path Pattern” set to something like *.css for CSS files and set the origin to static.example.com, repeat this step for all the path patterns that you actually want to resolve to a successful request.

The above setup will ensure that your distribution only serve the paths patterns that you have included.

 

How to clear EhCache OnDemand

I found the following solution on Spring framework forum.

You can add following controller to your admin app, make sure the below URL is only available to you and not exposed to public.

In below code we are injecting CacheManager object, the I name used (ehCacheManager) might be different for your code.

After deploying below code visit http://localhost:8080/list-ehcache-objects and click on the cache names to clear them.


package com.mycompany.controller;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;

import javax.annotation.Resource;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import net.sf.ehcache.CacheManager;


import org.springframework.stereotype.Controller;
import org.springframework.ui.Model;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.servlet.ModelAndView;
import org.springframework.web.servlet.mvc.ParameterizableViewController;

@Controller
public class EHCacheController  {
	
	@Resource(name="ehCacheManager")
	private CacheManager cacheManager;

    /* (non-Javadoc)
     * @see org.springframework.web.servlet.mvc.ParameterizableViewController#handleRequestInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse)
     */
    @SuppressWarnings("unchecked")
    @RequestMapping( value = "/list-ehcache-objects")
	protected String clear(Model viewModel, HttpServletRequest request, HttpServletResponse response) throws Exception {
	        HashMap model = new HashMap();
	        //Get all the active caches
	        List caches = new ArrayList(cacheManager.getCacheNames().length);
	        ArrayList cacheNamesList = new ArrayList();
	        String[] cacheNames = cacheManager.getCacheNames();
	        Iterator iter = Arrays.asList(cacheNames).iterator();
	        String cacheName = request.getParameter("cacheName");
	        while (iter.hasNext()){
	        	
	            // If the cache name has been passed from the request then flush it //
	            String cacheNameTest = (String) iter.next();
	            if (cacheNameTest.equalsIgnoreCase(cacheName)){
	                cacheManager.getCache(cacheNameTest).removeAll();
	            }
	            caches.add(cacheManager.getCache(cacheNameTest));
	            cacheNamesList.add(cacheNameTest);
	        }
	        //Stick the caches in the page model
	        model.put("caches", caches);
	        model.put("cacheNames", cacheNamesList);
	        viewModel.addAllAttributes(model);
	        return "layout/clearehcache";
	    }

    
	/**
	 * Setter for the EHCacheManager
	 * @param cacheManager
	 */
	public void setCacheManager(CacheManager cacheManager) {
		this.cacheManager = cacheManager;
	}
	
}

For my controller view I used the following, since I am using Thymeleaf the syntax will be different to that of JSP files.

Error connecting to self generated SSL certifiate

Even now & then I run into this issue, so adding here for my own reference and everyone else.

If you are getting following error:

javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException

This means the server you are trying to connect to use self generated certificate, to solve this issue you need to tell JRE/JDK to trust the certificate.
To do that you need to import the SSL certificate into your JRE.

In my case I am trying to connect to SMTP over SSL, a quick service for “download smtp certificate” gave me http://notepad2.blogspot.co.uk/2012/04/import-gmail-certificate-into-java.html

To download certificate for HTTP, you can use Firefox and Internet explorer, by clicking the secure icon in address bar.

so to download the certificate for SMTP run following in console:

openssl s_client -connect smtp.gmail.com:465

Although I downloaded the certificate using port 465 but my java configuration only works on port 587 for SMTP with TLS enabled.

which outputs the certificate, e.g.

-----BEGIN CERTIFICATE----- .......... -----END CERTIFICATE-----

you can save the certificate into a file e.g.

nano smtp.gmail.com.cert

now import the certificate using:

keytool -import -trustcacerts -alias smtp.gmail.com -file /path/to/smtp.gmail.com.cert

Say “yes” to the command prompt, now you should be able to connect.

This will import the certificate to the default keystore, for me it under the home directory. As in the above command I didn’t specify a keystore, I use the same keystore e.g “~/.tomcat” for my tomcat configurations.

Also import the certificate to your JRE/JDK keystore using:

keytool -import -trustcacerts -alias smtp.gmail.com -file /path/to/smtp.gmail.com.cert -keystore $JAVA_HOME/jre/lib/security/cacerts