We can use the Sqoop incremental import command with the “-merge-key” option for updating the records in an already imported Hive table.
--incremental lastmodified will import the updated and new records from RDBMS (MySQL) database based on last latest value of the
emp_timestamp in Hive.
--merge-key employee_id will "flatten" two datasets into one, taking the newest available records for each primary key (
When it comes to loading data into Apache Hadoop™, the de facto choice for bulk loads of data from leading relational databases is Apache Sqoop™. After initially entering Apache Incubator status in 2011, it quickly saw widespread adoption and development, eventually graduating to a Top-Level Project (TLP) in 2012.
In StreamSets Data Collector (SDC) 2.7, we added additional capabilities that enable SDC to behave in a manner almost identical to Sqoop. Now, customers can use SDC as a way to modernize Sqoop-like workloads, performing the same load functions while getting the ease of use and flexibility benefits that SDC delivers.
In addition to adding Sqoop-like capabilities, we’ve also added an importer tool that can automatically convert existing Sqoop commands to an equivalent pipeline within SDC. Check out this short video for a demonstration. The tool can be accessed by simply executing the Python command
pip3 install StreamSets.
There are many ways in which SDC can be used as a way to modernize Sqoop, namely:
Since Sqoop has been around for longer than StreamSets, in some cases, Sqoop functionality varies enough from SDC that we recommend you reach out and ask us for best practices — specifically:
In summary, for anyone loading data to Hadoop using StreamSets, you no longer need to use a separate tool for bulk data transfer if you don’t want to. Not only does StreamSets give you the same great functionality, it also enables drift detection and integration with StreamSets Dataflow Performance Manager — so it’s possible to run complex pipelines at scale with operational oversight and confidence. Check out this video and tell us what you think.