Sometimes, we come across a situation where we may need to update the last modified time of a file in HDFS. We can achieve it using Hadoop FileSystem API in Java.
Step 1:
Create a simple maven project in Eclipse.
Step 2:
Update the below dependencies in pom.xml file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
<dependencies> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>2.2.1</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.2.1</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.9.1</version> </dependency> </dependencies> |
Step 3:
Create a java class “UpdateHDFSFileModifiedTime” in the package “org.sample“.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
package org.sample; import java.io.IOException; import java.util.Calendar; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class UpdateHDFSFileModifiedTime { public static void main(String[] args) { if(args.length < 1) { System.out.println("Parameters are compulsory. Please see below the usage:"); System.out.println("UpdateHDFSFileModifiedTime <HDFS File Path> <Number of days older than today>"); } Configuration conf = new Configuration(); FileSystem fileSystem = null; Path path = new Path(args[0]); int noOfDays = Integer.parseInt(args[1]); Calendar cal = Calendar.getInstance(); cal.add(Calendar.DAY_OF_MONTH, 0-noOfDays); try { fileSystem = FileSystem.get(conf); fileSystem.setTimes(path, cal.getTimeInMillis(), cal.getTimeInMillis()); } catch (IOException e) { e.printStackTrace(); } } } |
Step 4:
Please do “Maven Install” to create the jar as below.
Step 5:
In Cloudera VM or UNIX machine where Hadoop and HDFS are installed, open a terminal and execute the below command:
1 |
$ hadoop jar /user/workspace/clearurdoubt/lib/UpdateHDFSFileModifiedTime-0.0.1-SNAPSHOT.jar org.sample.UpdateHDFSFileModifiedTime <HDFS File Path> <No of Days> |
This command will update the last modified time of the provided HDFS file to with the date time older than <No of Days> specified.
Eg:
1 2 3 4 5 6 7 8 9 |
[admin@myworkstation ~]$ hadoop fs -ls /user/workspace/clearurdoubt/test_folder1/ Found 1 items -rw-r--r-- 1 admin admin 62 2018-07-25 12:01 /user/workspace/clearurdoubt/test_folder1/testfile_1.txt [admin@myworkstation ~]$ [admin@myworkstation ~]$ hadoop jar /user/workspace/clearurdoubt/lib/UpdateHDFSFileModifiedTime-0.0.1-SNAPSHOT.jar org.sample.UpdateHDFSFileModifiedTime /user/workspace/clearurdoubt/test_folder1/testfile_1.txt 10 [admin@myworkstation ~]$ [admin@myworkstation ~]$ hadoop fs -ls /user/workspace/clearurdoubt/test_folder1/ Found 1 items -rw-r--r-- 1 admin admin 62 2018-07-15 12:02 /user/workspace/clearurdoubt/test_folder1/testfile_1.txt |
Please comment below in case of any queries.