Data Loading performance considerations with Clustered Columnstore indexes. In it they talk of loading data into partitioned tables where the load is aligned. Interestingly, their recommended process is to load into a staging table with the clustered columnstore index created first.

This is still based on the fact that you can hit the magic number of 1,048,576, or at the very least 102,400, per bulk insert, to create compressed segments. If you cannot, then I still feel that the method of loading into a heap, and then creating the clustered columnstore index, is preferential. This is because if you trickle insert into a CCI, you will end up with a mixture of open/closed delta stores and come compressed segments. You could leave it to the tuple mover to compress these, and you would still be left with some open delta stores. However, as Remus Rusanu points out, this is not recommended. To complete the process quickly, and to remove any open delta stores, you would need to rebuild the partition.